管理100台服务器是什么体验？Python一行代码搞定-平芜编程栈

本文详解Python Fabric库实现SSH远程执行、批量部署和自动化运维。

前言

运维日常：

登录服务器A，执行命令
登录服务器B，执行同样命令
登录服务器C…

这太痛苦了！用Python + Fabric实现批量自动化。

一、Fabric简介

1.1 什么是Fabric

Fabric = Python + SSH + 批量执行 功能： - 远程执行命令 - 文件上传下载 - 批量操作多台服务器 - 部署自动化

1.2 对比其他工具

工具	语言	学习成本	灵活性
Fabric	Python	低	高
Ansible	YAML	中	中
SaltStack	Python/YAML	高	高
Shell脚本	Bash	低	低

Fabric适合：中小规模、需要灵活定制的场景。

二、环境准备

2.1 安装Fabric

# Python 3.6+pipinstallfabric# 验证安装fab --version

2.2 基础用法

# fabfile.pyfromfabricimportConnection# 连接远程服务器conn=Connection(host='192.168.1.100',user='root',connect_kwargs={'password':'your_password'# 或使用密钥# 'key_filename': '/path/to/id_rsa'})# 执行命令result=conn.run('uname -a')print(result.stdout)# 关闭连接conn.close()

三、核心功能

3.1 远程执行命令

fromfabricimportConnectiondefremote_exec():"""远程执行命令"""conn=Connection('root@192.168.1.100')# 基本执行result=conn.run('hostname')print(f"主机名:{result.stdout.strip()}")# 隐藏输出result=conn.run('cat /etc/os-release',hide=True)print(result.stdout)# 带环境变量conn.run('echo $HOME',env={'HOME':'/root'})# 指定工作目录withconn.cd('/var/log'):conn.run('ls -la')conn.close()

3.2 文件操作

fromfabricimportConnectiondeffile_operations():"""文件上传下载"""conn=Connection('root@192.168.1.100')# 上传文件conn.put('local_file.txt','/tmp/remote_file.txt')# 下载文件conn.get('/var/log/syslog','local_syslog.txt')# 上传并设置权限conn.put('script.sh','/opt/script.sh')conn.run('chmod +x /opt/script.sh')conn.close()

3.3 Sudo执行

fromfabricimportConnectiondefsudo_exec():"""以sudo方式执行"""conn=Connection('user@192.168.1.100',connect_kwargs={'password':'user_password'})# sudo执行conn.sudo('systemctl restart nginx',password='sudo_password')# 以其他用户执行conn.sudo('whoami',user='www-data')conn.close()

四、批量操作

4.1 ThreadingGroup并行执行

fromfabricimportThreadingGroupdefbatch_exec():"""批量并行执行"""# 服务器列表hosts=['root@192.168.1.101','root@192.168.1.102','root@192.168.1.103',]# 创建组（并行执行）group=ThreadingGroup(*hosts,connect_kwargs={'password':'password'})# 批量执行results=group.run('hostname')forconn,resultinresults.items():print(f"{conn.host}:{result.stdout.strip()}")group.close()

4.2 SerialGroup串行执行

fromfabricimportSerialGroupdefserial_exec():"""串行执行（适合有依赖的任务）"""hosts=['root@192.168.1.101','root@192.168.1.102']group=SerialGroup(*hosts,connect_kwargs={'password':'password'})# 依次执行group.run('apt update')group.run('apt upgrade -y')group.close()

4.3 从配置文件读取服务器

# servers.yaml# web:# - 192.168.1.101# - 192.168.1.102# db:# - 192.168.1.201importyamlfromfabricimportThreadingGroupdefload_servers():"""从配置文件加载服务器"""withopen('servers.yaml')asf:servers=yaml.safe_load(f)# 批量操作web服务器web_hosts=[f'root@{ip}'foripinservers['web']]group=ThreadingGroup(*web_hosts,connect_kwargs={'password':'password'})group.run('nginx -t')group.run('systemctl reload nginx')group.close()

五、实战案例

5.1 案例：批量部署应用

# deploy.pyfromfabricimportConnection,ThreadingGroupimportosclassDeployer:def__init__(self,hosts,user='root',password=None,key_file=None):self.hosts=[f'{user}@{h}'forhinhosts]self.connect_kwargs={}ifpassword:self.connect_kwargs['password']=passwordifkey_file:self.connect_kwargs['key_filename']=key_filedefdeploy(self,local_path,remote_path,service_name=None):"""部署应用"""group=ThreadingGroup(*self.hosts,connect_kwargs=self.connect_kwargs)print(">>> 上传文件...")forconningroup:conn.put(local_path,remote_path)ifservice_name:print(f">>> 重启服务{service_name}...")group.sudo(f'systemctl restart{service_name}')print(">>> 部署完成!")group.close()defrollback(self,backup_path,remote_path,service_name=None):"""回滚"""group=ThreadingGroup(*self.hosts,connect_kwargs=self.connect_kwargs)print(">>> 回滚中...")group.run(f'cp{backup_path}{remote_path}')ifservice_name:group.sudo(f'systemctl restart{service_name}')print(">>> 回滚完成!")group.close()# 使用if__name__=='__main__':hosts=['192.168.1.101','192.168.1.102']deployer=Deployer(hosts,password='password')deployer.deploy('./app.jar','/opt/app/app.jar','myapp')

5.2 案例：服务器健康检查

# health_check.pyfromfabricimportThreadingGroupfromdatetimeimportdatetimedefhealth_check(hosts):"""服务器健康检查"""group=ThreadingGroup(*hosts,connect_kwargs={'password':'password'})print(f"\n{'='*60}")print(f"服务器健康检查 -{datetime.now()}")print(f"{'='*60}\n")# 检查项checks={'hostname':'hostname','uptime':'uptime -p','disk':"df -h / | tail -1 | awk '{print $5}'",'memory':"free -m | grep Mem | awk '{printf \"%.1f%%\", $3/$2*100}'",'load':"cat /proc/loadavg | awk '{print $1, $2, $3}'",}forconningroup:print(f"\n---{conn.host}---")forname,cmdinchecks.items():try:result=conn.run(cmd,hide=True,warn=True)print(f"{name}:{result.stdout.strip()}")exceptExceptionase:print(f"{name}: ERROR -{e}")group.close()if__name__=='__main__':hosts=['root@192.168.1.101','root@192.168.1.102','root@192.168.1.103',]health_check(hosts)

5.3 案例：日志收集

# collect_logs.pyfromfabricimportThreadingGroupfrompathlibimportPathfromdatetimeimportdatetimedefcollect_logs(hosts,remote_log,local_dir='./logs'):"""批量收集日志"""Path(local_dir).mkdir(exist_ok=True)timestamp=datetime.now().strftime('%Y%m%d_%H%M%S')group=ThreadingGroup(*hosts,connect_kwargs={'password':'password'})forconningroup:host_ip=conn.host local_file=f"{local_dir}/{host_ip}_{timestamp}.log"try:# 压缩远程日志conn.run(f'gzip -c{remote_log}> /tmp/log.gz',hide=True)# 下载conn.get('/tmp/log.gz',f'{local_file}.gz')print(f"✓{host_ip}->{local_file}.gz")exceptExceptionase:print(f"✗{host_ip}:{e}")group.close()print(f"\n日志已收集到:{local_dir}/")if__name__=='__main__':hosts=['root@192.168.1.101','root@192.168.1.102']collect_logs(hosts,'/var/log/nginx/access.log')

六、进阶技巧

6.1 异常处理

fromfabricimportConnectionfrominvoke.exceptionsimportUnexpectedExitdefsafe_exec(host,command):"""安全执行（带异常处理）"""conn=Connection(host)try:result=conn.run(command,warn=True)# warn=True 不抛异常ifresult.ok:print(f"✓ 成功:{result.stdout.strip()}")else:print(f"✗ 失败:{result.stderr.strip()}")returnresult.okexceptExceptionase:print(f"✗ 连接错误:{e}")returnFalsefinally:conn.close()

6.2 上下文管理器

fromfabricimportConnectionfromcontextlibimportcontextmanager@contextmanagerdefremote_server(host,**kwargs):"""上下文管理器"""conn=Connection(host,**kwargs)try:yieldconnfinally:conn.close()# 使用withremote_server('root@192.168.1.100',connect_kwargs={'password':'pwd'})asconn:conn.run('hostname')# 自动关闭连接

6.3 SSH配置复用

fromfabricimportConnection# 使用~/.ssh/config中的配置# Host myserver# HostName 192.168.1.100# User root# IdentityFile ~/.ssh/id_rsaconn=Connection('myserver')# 自动读取SSH配置conn.run('hostname')

七、使用fabfile

7.1 定义任务

# fabfile.pyfromfabricimporttask,Connection@taskdefdeploy(ctx,host,version='latest'):"""部署应用 使用: fab deploy --host=192.168.1.100 --version=1.0.0 """conn=Connection(host)print(f"部署版本{version}到{host}")conn.run(f'docker pull myapp:{version}')conn.run(f'docker-compose up -d')conn.close()@taskdefstatus(ctx,host):"""检查状态 使用: fab status --host=192.168.1.100 """conn=Connection(host)conn.run('docker ps')conn.close()@taskdeflogs(ctx,host,lines=100):"""查看日志 使用: fab logs --host=192.168.1.100 --lines=200 """conn=Connection(host)conn.run(f'docker logs --tail{lines}myapp')conn.close()

7.2 执行任务

# 列出所有任务fab --list# 执行任务fab deploy --host=root@192.168.1.100 --version=1.0.0 fab status --host=root@192.168.1.100 fab logs --host=root@192.168.1.100 --lines=200

八、跨网络管理

8.1 问题

常见场景： - 办公室需要管理机房服务器 - 在家需要管理公司服务器 - 管理多个不同网络的服务器 传统方案： - VPN接入 → 配置复杂 - 跳板机 → 多一跳延迟 - 公网暴露SSH → 安全风险

8.2 组网方案

使用组网软件（如星空组网）简化跨网络管理：

┌─────────────────────────────────────────────────────────┐ │ 组网虚拟局域网 │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ 服务器A │ │ 服务器B │ │ 服务器C │ │ │ │10.10.0.1 │ │10.10.0.2 │ │10.10.0.3 │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ↑ │ │ │ │ │ ┌──────────┐ │ │ │ 管理电脑 │ │ │ │10.10.0.10│ │ │ │ Fabric │ │ │ └──────────┘ │ │ │ │ 所有设备在同一虚拟局域网，直接SSH互通 │ └─────────────────────────────────────────────────────────┘

# 配置组网后，直接用虚拟IPhosts=['root@10.10.0.1',# 机房服务器A'root@10.10.0.2',# 机房服务器B'root@10.10.0.3',# 云服务器C]group=ThreadingGroup(*hosts,connect_kwargs={'key_filename':'~/.ssh/id_rsa'})group.run('hostname')

优势：

不同网络的服务器统一管理
无需配置跳板机
安全加密传输
像局域网一样简单

九、最佳实践

9.1 安全建议

# 1. 使用密钥而非密码connect_kwargs={'key_filename':'~/.ssh/id_rsa'}# 2. 使用SSH Agent# ssh-add ~/.ssh/id_rsaconnect_kwargs={}# 自动使用agent# 3. 限制sudo权限conn.sudo('systemctl restart nginx',password=get_password())# 4. 敏感信息不写代码importos password=os.environ.get('SSH_PASSWORD')

9.2 代码组织

automation/ ├── fabfile.py # 任务定义 ├── config/ │ ├── servers.yaml # 服务器配置 │ └── settings.py # 全局设置 ├── tasks/ │ ├── deploy.py # 部署任务 │ ├── backup.py # 备份任务 │ └── monitor.py # 监控任务 └── utils/ └── ssh.py # SSH工具函数

十、总结

Python Fabric自动化运维要点：

功能	方法
单机执行	`Connection.run()`
文件传输	`put()`/`get()`
批量并行	`ThreadingGroup`
批量串行	`SerialGroup`
sudo执行	`conn.sudo()`
任务定义	`@task`装饰器

适用场景：

批量部署应用
服务器健康检查
日志收集分析
配置同步
自动化运维脚本

参考资料

Fabric官方文档：https://docs.fabfile.org/
Fabric GitHub：https://github.com/fabric/fabric

💡 Fabric让Python成为运维神器，配合组网软件可以轻松管理分布在不同网络的所有服务器。

前言

一、Fabric简介

1.1 什么是Fabric

1.2 对比其他工具

二、环境准备

2.1 安装Fabric

2.2 基础用法

三、核心功能

3.1 远程执行命令

3.2 文件操作

3.3 Sudo执行

四、批量操作

4.1 ThreadingGroup并行执行

4.2 SerialGroup串行执行

4.3 从配置文件读取服务器

五、实战案例

5.1 案例：批量部署应用

5.2 案例：服务器健康检查

5.3 案例：日志收集

六、进阶技巧

6.1 异常处理

6.2 上下文管理器

6.3 SSH配置复用

七、使用fabfile

7.1 定义任务

7.2 执行任务

八、跨网络管理

8.1 问题

8.2 组网方案

九、最佳实践

9.1 安全建议

9.2 代码组织

十、总结

参考资料

【Open-AutoGLM适配战报】：TOP 10模型更新速度大比拼，谁是真正的效率之王？

【资深架构师亲授】：Open-AutoGLM双端部署资源分配黄金法则

1、深入探索Windows系统：核心概念、架构与管理机制

44、深入解析Windows操作系统的安全机制

从OCR到控件识别：Open-AutoGLM与Airtest技术路径对比（附性能实测数据）

13、超流形上局部自由层的分类定理与量子控制的发展