从零开始部署Qwen2.5-7B-Instruct并集成自定义工具-平芜编程栈

从零开始部署Qwen2.5-7B-Instruct并集成自定义工具

一、学习目标与技术背景

随着大模型在实际业务场景中的广泛应用，如何高效部署开源语言模型并实现工具调用能力（Tool Usage）已成为构建智能代理系统的关键环节。本文将带你从零开始完成Qwen2.5-7B-Instruct 模型的本地部署，并通过vLLM 加速推理服务，结合Chainlit 构建交互式前端界面，最终使用Qwen-Agent 框架集成自定义工具，实现一个具备真实世界交互能力的 AI 助手。

✅ 学完本教程你将掌握： - 使用 vLLM 部署 Qwen2.5-7B-Instruct 模型 - 通过 Chainlit 创建可视化对话前端 - 基于 Qwen-Agent 实现自定义工具注册与调用逻辑 - 完整的工程化流程：模型 → API → 工具 → 应用

二、环境准备与前置依赖

2.1 系统与硬件要求

项目	推荐配置
操作系统	CentOS 7 / Ubuntu 20.04+
GPU 显卡	NVIDIA Tesla V100 / A100 / L40S（建议 ≥24GB 显存）
CUDA 版本	12.2 或以上
Python 版本	3.10
内存	≥32GB RAM

⚠️ 注意：Qwen2.5-7B-Instruct 参数量为 76.1 亿，FP16 加载需约 15GB 显存，推荐使用量化或 vLLM 的 PagedAttention 技术优化显存占用。

2.2 安装 Conda 虚拟环境

# 创建独立虚拟环境 conda create --name qwen-deploy python=3.10 conda activate qwen-deploy

2.3 下载模型文件

你可以选择 Hugging Face 或 ModelScope 下载模型权重：

方式一：Hugging Face

git lfs install git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

方式二：ModelScope（国内推荐）

pip install modelscope from modelscope.hub.snapshot_download import snapshot_download snapshot_download('qwen/Qwen2.5-7B-Instruct', cache_dir='./models')

确保模型路径如：./models/qwen/Qwen2.5-7B-Instruct

三、使用 vLLM 部署模型服务

vLLM 是当前最高效的 LLM 推理引擎之一，支持连续批处理（Continuous Batching）、PagedAttention 和 OpenAI 兼容 API 接口。

3.1 安装 vLLM

pip install vllm==0.4.3

📌 当前稳定版本为0.4.3，兼容 Qwen2.5 系列模型。

3.2 启动 vLLM 服务

python -m vllm.entrypoints.openai.api_server \ --model ./models/qwen/Qwen2.5-7B-Instruct \ --host 0.0.0.0 \ --port 9000 \ --tensor-parallel-size 1 \ --dtype auto \ --max-model-len 131072 \ --gpu-memory-utilization 0.9 \ --enable-auto-tool-choice \ --tool-call-parser hermes

🔍 参数说明： ---max-model-len: 支持最长 128K 上下文 ---enable-auto-tool-choice: 启用自动工具选择功能 ---tool-call-parser hermes: 使用 Hermes 解析器解析 JSON 工具调用格式（适配 Qwen）

此时，模型已暴露 OpenAI 兼容接口：

http://localhost:9000/v1/chat/completions

可通过 curl 测试是否正常运行：

curl http://localhost:9000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen2.5-7B-Instruct", "messages": [{"role": "user", "content": "你好"}], "temperature": 0.7 }'

四、使用 Chainlit 构建前端交互界面

Chainlit 是一个专为 LLM 应用设计的全栈开发框架，可快速搭建聊天 UI 并连接后端模型服务。

4.1 安装 Chainlit

pip install chainlit==1.1.207

4.2 编写 Chainlit 前端脚本

创建文件app.py：

# app.py import chainlit as cl import openai import os # 设置 OpenAI 兼容客户端 client = openai.OpenAI( base_url="http://localhost:9000/v1", api_key="EMPTY" ) @cl.on_chat_start async def start(): cl.user_session.set("message_history", []) await cl.Message(content="欢迎使用 Qwen2.5-7B-Instruct 助手！").send() @cl.on_message async def main(message: cl.Message): message_history = cl.user_session.get("message_history") message_history.append({"role": "user", "content": message.content}) # 调用 vLLM 模型 stream = client.chat.completions.create( model="Qwen2.5-7B-Instruct", messages=message_history, stream=True, max_tokens=8192 ) msg = cl.Message(content="") for part in stream: if token := part.choices[0].delta.content or "": await msg.stream_token(token) await msg.send() message_history.append({"role": "assistant", "content": msg.content})

4.3 启动 Chainlit 前端

chainlit run app.py -w

访问http://localhost:8000即可看到如下界面：

提问“今天广州天气怎么样？”时显示：

💡 此时仅是基础问答，尚未启用工具调用功能。接下来我们将引入 Qwen-Agent 实现真正的“智能体”行为。

五、基于 Qwen-Agent 集成自定义工具

5.1 安装 Qwen-Agent 框架

pip install -U "qwen-agent[gui,rag,code_interpreter,python_executor]"

该命令安装了以下扩展组件： -[gui]: Gradio 图形界面支持 -[rag]: 检索增强生成（RAG） -[code_interpreter]: 内置代码解释器 -[python_executor]: 支持 Tool-Integrated Reasoning（TIR）

5.2 注册自定义工具：获取实时天气

我们以get_current_weather为例，演示如何让模型主动调用外部函数。

创建weather_agent.py：

# -*- coding: utf-8 -*- import json5 from qwen_agent.agents import Assistant from qwen_agent.tools.base import BaseTool, register_tool @register_tool('get_current_weather') class GetCurrentWeather(BaseTool): description = '获取指定城市的实时天气信息' parameters = [ { 'name': 'location', 'type': 'string', 'description': '城市名称，例如：北京、上海、广州', 'required': True } ] def call(self, params: str, **kwargs) -> str: location = json5.loads(params)['location'] print(f"[DEBUG] 查询天气: {location}") weather_data = { '广州': '目前我市多云间晴，局部有阵雨，气温29~32℃，吹轻微的东南风。', '北京': '晴转多云，气温-3~5℃，北风3级，空气质量良。', '上海': '小雨转阴，气温8~11℃，东风2级，湿度较高。' } return weather_data.get(location, f'抱歉，暂无 {location} 的天气数据。')

5.3 配置 LLM 并初始化智能体

继续在weather_agent.py中添加主程序逻辑：

# 配置模型服务地址（必须与 vLLM 一致） llm_cfg = { 'model': 'Qwen2.5-7B-Instruct', 'model_server': 'http://localhost:9000/v1', # vLLM 提供的 OpenAI 兼容接口 'api_key': 'EMPTY', 'generate_cfg': { 'top_p': 0.8, 'temperature': 0.7 } } # 初始化助手智能体 system_instruction = '你是一个乐于助人的AI助手，能够调用工具获取实时信息。' tools = ['get_current_weather', 'code_interpreter'] # 包含自定义 + 内置工具 assistant = Assistant( llm=llm_cfg, system_message=system_instruction, function_list=tools ) if __name__ == '__main__': # 用户输入 messages = [{'role': 'user', 'content': '今天广州的天气怎么样？'}] # 流式输出响应 response_stream = [] for res in assistant.run(messages=messages): if len(res) == 3: content = res[2]['content'] print(content, end='', flush=True) response_stream.append(content)

5.4 运行结果分析

执行脚本：

python weather_agent.py

输出如下：

params: {"location": "广州"} 今天广州的天气是多云间晴，局部有阵雨，气温在29到32℃之间，吹的是轻微的东南风。记得出门携带雨具哦！

数据流转过程详解：

第一步：模型决定调用工具json [{ "role": "assistant", "content": "", "function_call": { "name": "get_current_weather", "arguments": "{\"location\": \"广州\"}" } }]
第二步：Qwen-Agent 自动执行本地方法json [{ "role": "function", "name": "get_current_weather", "content": "目前我市多云间晴，局部有阵雨，气温29~32℃，吹轻微的东南风。" }]
第三步：模型整合结果生成自然语言回复json [{ "role": "assistant", "content": "今天广州的天气是多云间晴……记得出门携带雨具哦！" }]

✅ 成功实现了“感知 → 决策 → 执行 → 反馈”的完整智能体闭环。

六、关键问题与最佳实践

6.1 常见错误排查

问题现象	原因	解决方案
`Connection refused`到 localhost:9000	vLLM 未启动或端口冲突	检查进程`ps aux \| grep api_server`
工具不被调用，直接文本回复	模型未识别 tool schema	确保`--enable-auto-tool-choice`已开启
中文乱码或编码异常	文件未声明 UTF-8 编码	添加`# -- coding: utf-8 --`头部
显存不足 OOM	模型加载失败	使用`--dtype half`或`--quantization awq`量化

6.2 性能优化建议

启用 Tensor Parallelism：若有多卡，设置--tensor-parallel-size N
限制最大输出长度：避免生成过长内容导致延迟增加
缓存常用工具结果：对高频查询（如天气）加入 Redis 缓存层
使用 AWQ/GGUF 量化版本：降低显存需求至 10GB 以内

6.3 安全性注意事项

不要暴露api_key="EMPTY"的服务到公网
对用户输入做合法性校验，防止注入攻击
工具函数中避免执行os.system()等危险操作

七、总结与进阶方向

7.1 核心收获回顾

本文完整实现了从模型部署 → 前端交互 → 工具集成的全流程落地：

阶段	技术栈	成果
模型服务	vLLM	高性能、低延迟的 OpenAI 兼容 API
前端交互	Chainlit	可视化聊天界面，便于调试
智能体能力	Qwen-Agent	支持自定义工具 + 代码解释器
工程闭环	Python + Docker（可选）	可复用、可扩展的智能体架构

7.2 下一步学习建议

接入更多工具：如数据库查询、邮件发送、网页爬取等
集成 RAG 能力：使用qwen-agent[rag]实现文档问答
封装为微服务：使用 FastAPI + Docker 打包部署
支持多轮规划（Planning）：尝试更复杂的任务分解场景

附录：完整依赖清单

# requirements.txt vllm==0.4.3 chainlit==1.1.207 openai>=1.0 qwen-agent[gui,rag,code_interpreter,python_executor] modelscope json5

🌐 完整项目结构示例：
qwen-deploy/ ├── models/ │ └── qwen/Qwen2.5-7B-Instruct/ ├── app.py # Chainlit 前端 ├── weather_agent.py # Qwen-Agent 工具集成 ├── requirements.txt └── README.md

现在你已经拥有了一个真正“能做事”的 AI 助手原型。下一步，可以将其嵌入企业客服、自动化办公、数据分析等真实场景中，释放大模型的实际生产力价值。

从零开始部署Qwen2.5-7B-Instruct并集成自定义工具