SiameseUniNLU部署教程：Prometheus+Grafana监控API QPS/延迟/错误率全流程-平芜编程栈

SiameseUniNLU部署教程：Prometheus+Grafana监控API QPS/延迟/错误率全流程

1. 为什么需要监控你的NLP服务

当你把SiameseUniNLU模型部署到生产环境后，最头疼的问题就是：怎么知道它运行得好不好？用户请求有没有变多？响应速度是不是变慢了？有没有出现错误？

没有监控就像开车没有仪表盘——你不知道速度多少，油还剩多少，发动机是否正常。通过Prometheus+Grafana这套监控组合，你可以实时看到：

每秒处理多少请求（QPS）
每个请求花了多少时间（延迟）
有多少请求失败了（错误率）
服务器资源使用情况（CPU、内存）

这样你就能及时发现性能瓶颈，优化服务体验，确保你的NLP服务稳定可靠。

2. 环境准备与快速部署

2.1 先确保SiameseUniNLU服务正常运行

按照官方文档快速启动你的服务：

# 进入模型目录 cd /root/nlp_structbert_siamese-uninlu_chinese-base # 启动服务（推荐后台运行） nohup python3 app.py > server.log 2>&1 & # 检查服务状态 ps aux | grep app.py tail -f server.log

服务正常启动后，你应该能看到类似这样的日志：

* Serving Flask app 'app' * Debug mode: off * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:7860

2.2 安装Prometheus和Grafana

我们使用Docker快速部署监控组件：

# 创建监控目录 mkdir -p ~/monitoring/{prometheus,grafana} cd ~/monitoring # 创建Prometheus配置文件 cat > prometheus/prometheus.yml << 'EOF' global: scrape_interval: 15s scrape_configs: - job_name: 'siamese-uninlu' static_configs: - targets: ['host.docker.internal:7860'] metrics_path: '/metrics' EOF # 启动Prometheus docker run -d \ --name=prometheus \ -p 9090:9090 \ -v $(pwd)/prometheus:/etc/prometheus \ prom/prometheus # 启动Grafana docker run -d \ --name=grafana \ -p 3000:3000 \ grafana/grafana-enterprise

3. 为SiameseUniNLU添加监控指标

3.1 安装必要的Python依赖

首先为你的服务添加监控支持：

pip install prometheus-client flask-prometheus-metrics

3.2 修改app.py添加监控功能

打开/root/nlp_structbert_siamese-uninlu_chinese-base/app.py，在文件开头添加：

from prometheus_client import Counter, Histogram, generate_latest, REGISTRY from flask import Response import time # 定义监控指标 REQUEST_COUNT = Counter( 'siamese_uninlu_requests_total', 'Total number of requests', ['method', 'endpoint', 'http_status'] ) REQUEST_LATENCY = Histogram( 'siamese_uninlu_request_latency_seconds', 'Request latency in seconds', ['method', 'endpoint'] ) ERROR_COUNT = Counter( 'siamese_uninlu_errors_total', 'Total number of errors', ['error_type'] )

在Flask app初始化后添加：

# 添加/metrics端点 @app.route('/metrics') def metrics(): return Response(generate_latest(REGISTRY), mimetype='text/plain') # 添加请求监控中间件 @app.before_request def before_request(): request.start_time = time.time() @app.after_request def after_request(response): # 计算请求耗时 latency = time.time() - request.start_time # 记录指标 REQUEST_COUNT.labels( method=request.method, endpoint=request.path, http_status=response.status_code ).inc() REQUEST_LATENCY.labels( method=request.method, endpoint=request.path ).observe(latency) if response.status_code >= 400: ERROR_COUNT.labels(error_type='http_error').inc() return response

在预测函数中添加错误监控：

@app.route('/api/predict', methods=['POST']) def predict(): try: data = request.get_json() # ... 原有的预测逻辑 ... except Exception as e: ERROR_COUNT.labels(error_type='prediction_error').inc() return jsonify({'error': str(e)}), 500

保存文件后重启服务：

pkill -f app.py nohup python3 app.py > server.log 2>&1 &

4. 配置Prometheus数据采集

4.1 验证监控端点

访问监控端点确认数据正常：

curl http://localhost:7860/metrics

你应该能看到类似这样的输出：

# HELP siamese_uninlu_requests_total Total number of requests # TYPE siamese_uninlu_requests_total counter siamese_uninlu_requests_total{endpoint="/api/predict",http_status="200",method="POST"} 0 # HELP siamese_uninlu_request_latency_seconds Request latency in seconds # TYPE siamese_uninlu_request_latency_seconds histogram

4.2 更新Prometheus配置

如果发现连接问题，可能需要调整Prometheus配置：

# prometheus/prometheus.yml scrape_configs: - job_name: 'siamese-uninlu' static_configs: - targets: ['192.168.1.100:7860'] # 替换为你的服务器IP metrics_path: '/metrics' scrape_interval: 10s

重启Prometheus使配置生效：

docker restart prometheus

5. Grafana仪表板配置

5.1 登录并添加数据源

访问Grafana：http://你的服务器IP:3000

初始账号：admin/admin（首次登录会要求修改密码）
添加Prometheus数据源：
- Name: Prometheus
- URL: http://prometheus:9090 （如果在同一台服务器）

5.2 导入现成的监控仪表板

使用Grafana的导入功能，输入仪表板ID13695（一个优秀的API监控模板），或者手动创建以下面板：

QPS监控面板：

PromQL:rate(siamese_uninlu_requests_total[1m])
可视化：Stat图表，显示当前QPS

延迟监控面板：

PromQL:histogram_quantile(0.95, rate(siamese_uninlu_request_latency_seconds_bucket[5m]))
可视化：Time series，显示P95延迟

错误率监控面板：

PromQL:rate(siamese_uninlu_errors_total[5m]) / rate(siamese_uninlu_requests_total[5m]) * 100
可视化：Gauge，显示错误百分比

5.3 创建完整的监控仪表板

{ "panels": [ { "title": "QPS实时监控", "type": "stat", "targets": [{ "expr": "rate(siamese_uninlu_requests_total[1m])", "legendFormat": "请求量" }] }, { "title": "API延迟分布", "type": "heatmap", "targets": [{ "expr": "histogram_quantile(0.95, rate(siamese_uninlu_request_latency_seconds_bucket[5m]))", "legendFormat": "P95延迟" }] } ] }

6. 实际效果与使用示例

6.1 生成测试流量验证监控

创建一个测试脚本模拟真实用户请求：

# test_load.py import requests import time import random url = "http://localhost:7860/api/predict" test_cases = [ { "text": "谷爱凌在北京冬奥会获得金牌", "schema": '{"人物": null, "地理位置": null}' }, { "text": "这部电影真的很精彩，演员表演出色", "schema": '{"情感分类": null}' } ] def send_requests(): for i in range(100): data = random.choice(test_cases) try: start = time.time() response = requests.post(url, json=data, timeout=10) latency = time.time() - start print(f"Request {i}: {response.status_code}, Latency: {latency:.3f}s") except Exception as e: print(f"Request {i} failed: {str(e)}") time.sleep(random.uniform(0.1, 0.5)) if __name__ == "__main__": send_requests()

运行测试脚本，然后在Grafana中观察监控数据的变化。

6.2 监控数据解读

正常情况下的监控指标应该显示：

指标	正常范围	异常情况
QPS	根据业务需求而定	突然降至0或异常飙升
P95延迟	< 1秒	> 3秒
错误率	< 1%	> 5%

7. 高级监控与告警配置

7.1 设置关键告警规则

在Prometheus中添加告警规则：

# prometheus/alert.rules.yml groups: - name: siamese-uninlu-alerts rules: - alert: HighErrorRate expr: rate(siamese_uninlu_errors_total[5m]) / rate(siamese_uninlu_requests_total[5m]) * 100 > 5 for: 5m labels: severity: critical annotations: summary: "高错误率报警" description: "错误率超过5%，当前值: {{ $value }}%" - alert: HighLatency expr: histogram_quantile(0.95, rate(siamese_uninlu_request_latency_seconds_bucket[5m])) > 3 for: 5m labels: severity: warning annotations: summary: "高延迟报警" description: "P95延迟超过3秒，当前值: {{ $value }}秒"

7.2 配置Grafana告警通知

在Grafana中配置告警通道：

进入Alerting → Notification channels
添加邮件、Slack或Webhook通知
在仪表板面板上设置告警阈值

8. 总结与最佳实践

通过本教程，你已经成功为SiameseUniNLU服务搭建了完整的监控体系。现在你可以：

实时监控API性能指标（QPS、延迟、错误率）
快速发现性能瓶颈和异常情况
设置告警及时获知服务状态变化
分析趋势优化服务性能和资源分配

日常维护建议：

每周检查一次监控仪表板，关注趋势变化
设置合理的告警阈值，避免误报
定期备份Prometheus数据
监控服务器基础资源（CPU、内存、磁盘）

故障排查流程：

查看Grafana仪表板确认问题现象
检查服务日志tail -f server.log
验证模型服务状态ps aux | grep app.py
检查资源使用情况top、free -h

获取更多AI镜像
想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

SiameseUniNLU部署教程：Prometheus+Grafana监控API QPS/延迟/错误率全流程