news 2026/3/29 17:08:02

CPU或内存飙升,如何快速定位MySQL问题?实战排查指南

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
CPU或内存飙升,如何快速定位MySQL问题?实战排查指南

前言

MySQL的CPU或内存突然飙升是生产环境中最紧急的故障之一,可能导致服务雪崩。本文提供一套从症状到根因的完整排查流程,涵盖实时监控、问题定位、紧急处理、根因分析四个阶段,帮助你快速响应并解决MySQL性能危机。

一、紧急响应:30秒内确定问题类型

  1. 快速检查系统负载
    bash

检查整体负载(立即执行)

top -c

或使用更清晰的htop

htop

查看MySQL进程资源占用

pidstat -p $(pgrep mysqld) 1 5

检查内存使用情况

free -h

检查SWAP使用

swapon --show
2. 判断问题类型:CPU vs 内存
bash

如果是CPU问题(top显示CPU%持续>80%):

特征:系统响应慢,查询超时增多

如果是内存问题(free显示可用内存极低):

特征:可能出现OOM,大量磁盘IO

  1. 快速缓解措施
    bash

临时限制MySQL资源(如有多个实例)

systemctl set-property mysqld.service CPUShares=512 MemoryLimit=4G

紧急重启(最后手段,按顺序执行)

sudo systemctl stop mysql

等待30秒

sudo systemctl start mysql

二、CPU飙升排查:定位高消耗SQL

  1. 实时查看活跃连接
    sql
    – 查看当前所有连接和状态
    SHOW FULL PROCESSLIST;
    – 或更详细的查询
    SELECT
    id,
    USER,
    HOST,
    DB,
    COMMAND,
    TIME,
    STATE,
    LEFT(INFO, 100) AS query_preview,
    ROWS_SENT,
    ROWS_EXAMINED
    FROM information_schema.PROCESSLIST
    WHERE COMMAND != ‘Sleep’
    AND INFO IS NOT NULL
    ORDER BY TIME DESC
    LIMIT 20;
  2. 定位消耗CPU的线程
    bash

Linux层面查看MySQL线程

top -H -p $(pgrep mysqld)

将线程ID转换为十六进制(用于MySQL内部查询)

printf “%x\n” 12345

输出:3039

sql
– 在MySQL中查看线程对应操作
SELECT
THREAD_ID,
PROCESSLIST_ID,
NAME AS thread_name,
PROCESSLIST_INFO AS current_query,
PROCESSLIST_TIME AS query_time,
PROCESSLIST_STATE AS state
FROM performance_schema.threads
WHERE THREAD_OS_ID = 12345; – 替换为实际线程ID
3. 识别问题SQL特征
sql
– 查看当前运行时间最长的查询
SELECT
ps.THREAD_ID,
esh.EVENT_NAME,
esh.SQL_TEXT,
esh.TIMER_WAIT/1000000000 as seconds,
esh.LOCK_TIME/1000000000 as lock_seconds,
esh.ROWS_EXAMINED,
esh.ROWS_AFFECTED
FROM performance_schema.events_statements_history esh
JOIN performance_schema.threads ps ON esh.THREAD_ID = ps.THREAD_ID
WHERE ps.PROCESSLIST_ID IS NOT NULL
ORDER BY esh.TIMER_WAIT DESC
LIMIT 10;
4. 检查锁争用情况
sql
– 查看锁等待
SELECT
r.trx_id AS waiting_trx_id,
r.trx_mysql_thread_id AS waiting_thread,
r.trx_query AS waiting_query,
b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_thread,
b.trx_query AS blocking_query,
TIMESTAMPDIFF(SECOND, r.trx_wait_started, NOW()) AS wait_seconds
FROM information_schema.INNODB_LOCK_WAITS w
JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id;

三、内存飙升排查:识别内存泄漏点

  1. 查看MySQL内存分配
    sql
    – 查看全局内存分配
    SELECT
    SUBSTRING_INDEX(event_name,‘/’,2) AS code_area,
    SUM(current_alloc) AS current_alloc
    FROM sys.memory_global_by_current_bytes
    GROUP BY SUBSTRING_INDEX(event_name,‘/’,2)
    ORDER BY SUM(current_alloc) DESC;

– 查看具体组件内存使用
SELECT * FROM sys.memory_global_by_current_bytes
WHERE current_count > 0
ORDER BY current_alloc DESC
LIMIT 15;
2. 检查缓冲区使用情况
sql
– 查看InnoDB缓冲池状态
SHOW ENGINE INNODB STATUS\G
– 重点关注BUFFER POOL AND MEMORY部分

– 查看缓冲池命中率
SELECT
(1 - (Variable_value / (SELECT Variable_value
FROM information_schema.global_status
WHERE Variable_name = ‘Innodb_buffer_pool_read_requests’))) * 100 AS hit_rate
FROM information_schema.global_status
WHERE Variable_name = ‘Innodb_buffer_pool_reads’;

– 查看排序和临时表内存使用
SHOW GLOBAL STATUS LIKE ‘Sort%’;
SHOW GLOBAL STATUS LIKE ‘%tmp%table%’;
3. 识别内存泄漏会话
sql
– 查看每个连接的内存使用
SELECT
thread_id,
processlist_id,
processlist_user,
processlist_host,
memory_current_allocated,
memory_total_allocated
FROM sys.session
WHERE memory_current_allocated > 10010241024 – 大于100MB
ORDER BY memory_current_allocated DESC;

– 查看内存事件
SELECT
event_name,
current_alloc,
high_alloc
FROM sys.memory_by_thread_by_current_bytes
ORDER BY current_alloc DESC
LIMIT 10;

四、深度分析:使用性能Schema

  1. 开启性能监控
    sql
    – 确保性能Schema已开启
    SHOW VARIABLES LIKE ‘performance_schema’;

– 开启所有监控(如果未开启)
UPDATE performance_schema.setup_instruments
SET ENABLED = ‘YES’, TIMED = ‘YES’;

UPDATE performance_schema.setup_consumers
SET ENABLED = ‘YES’;
2. 分析SQL执行统计
sql
– 查看消耗资源最多的SQL
SELECT
DIGEST_TEXT AS query,
COUNT_STAR AS exec_count,
SUM_TIMER_WAIT/1000000000 AS total_time_sec,
AVG_TIMER_WAIT/1000000000 AS avg_time_sec,
SUM_ROWS_EXAMINED AS rows_examined,
SUM_ROWS_SENT AS rows_sent,
SUM_SORT_ROWS AS sort_rows,
SUM_CREATED_TMP_TABLES AS tmp_tables,
FIRST_SEEN,
LAST_SEEN
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT IS NOT NULL
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
3. 检查表扫描情况
sql
– 查看全表扫描的表
SELECT
object_schema,
object_name,
count_read,
count_write,
count_fetch,
count_insert,
count_update,
count_delete
FROM performance_schema.table_io_waits_summary_by_table
ORDER BY count_read DESC
LIMIT 10;

五、紧急处理脚本集

  1. 快速诊断脚本
    bash
    #!/bin/bash

mysql_emergency_check.sh

echo “========== 系统状态 ==========”
top -bn1 | grep -E “^(%Cpu|Tasks|Mem|Swap)”

echo -e “\n========== MySQL进程状态 ==========”
ps aux | grep mysqld | grep -v grep

echo -e “\n========== MySQL连接数 ==========”
mysql -e “SHOW GLOBAL STATUS LIKE ‘Threads_connected’;”
mysql -e “SHOW GLOBAL STATUS LIKE ‘Max_used_connections’;”

echo -e “\n========== 运行时间最长的查询 ==========”
mysql -e "
SELECT
ID,
USER,
HOST,
DB,
COMMAND,
TIME,
STATE,
LEFT(INFO, 100) AS query,
ROWS_SENT,
ROWS_EXAMINED
FROM information_schema.PROCESSLIST
WHERE COMMAND != ‘Sleep’
AND TIME > 60
ORDER BY TIME DESC
LIMIT 5\G"

echo -e “\n========== 锁等待 ==========”
mysql -e “SHOW ENGINE INNODB STATUS\G” | grep -A 20 “LATEST DETECTED DEADLOCK”
2. 自动终止问题查询
bash
#!/bin/bash

kill_long_queries.sh

TIMEOUT=300 # 5分钟

终止运行超过指定时间的查询

mysql -e "
SELECT CONCAT('KILL ', ID, ‘;’) AS kill_command
FROM information_schema.PROCESSLIST
WHERE COMMAND != ‘Sleep’
AND TIME > $TIMEOUT
AND USER NOT IN (‘system user’, ‘event_scheduler’)
AND INFO NOT LIKE ‘%PROCESSLIST%’
" | grep ‘KILL’ | mysql
3. 内存泄漏监控脚本
bash
#!/bin/bash

monitor_mysql_memory.sh

LOG_FILE=“/var/log/mysql_memory.log”
THRESHOLD=80 # 内存使用率阈值%

while true; do
# 获取MySQL内存使用百分比
MEM_USAGE=$(ps -o %mem= -p $(pgrep mysqld) | awk ‘{sum+=$1} END {print sum}’)

if (( $(echo "$MEM_USAGE > $THRESHOLD" | bc -l) )); then echo "$(date): MySQL内存使用率 ${MEM_USAGE}% 超过阈值" >> $LOG_FILE # 捕获内存快照 mysql -e "SHOW ENGINE INNODB STATUS\G" >> $LOG_FILE mysql -e "SELECT * FROM sys.memory_global_by_current_bytes LIMIT 10;" >> $LOG_FILE # 发送告警 echo "MySQL内存告警: ${MEM_USAGE}%" | mail -s "MySQL内存异常" admin@example.com fi sleep 60 # 每分钟检查一次

done

六、根因分析与解决方案

  1. CPU飙升常见原因及解决
    原因 特征 解决方案
    全表扫描 ROWS_EXAMINED远大于ROWS_SENT 添加合适索引,优化查询条件
    排序操作 大量filesort,Sort_merge_passes增加 优化ORDER BY,增加sort_buffer_size
    子查询优化 使用DEPENDENT SUBQUERY 改写为JOIN或使用EXISTS
    锁争用 大量锁等待,Lock wait timeout 优化事务,减少锁持有时间
    并发连接高 Threads_connected接近max_connections 增加连接池,优化连接管理
    优化示例:

sql
– 从低效查询
SELECT * FROM orders WHERE DATE(create_time) = ‘2023-10-01’;

– 优化为
SELECT * FROM orders WHERE create_time >= ‘2023-10-01’ AND create_time < ‘2023-10-02’;
– 并在create_time上建立索引
2. 内存飙升常见原因及解决
原因 特征 解决方案
连接过多 每个连接占用大量内存 减少空闲连接,优化连接池
缓冲池过大 缓冲池占用物理内存80%以上 适当减小innodb_buffer_pool_size
临时表过多 Created_tmp_disk_tables高 优化查询,增加tmp_table_size
内存泄漏 内存持续增长不释放 检查bug,升级MySQL版本
排序缓冲区大 sort_buffer_size设置过大 适当减小,监控实际使用
配置优化示例:

ini

my.cnf 优化配置

[mysqld]

缓冲池大小(物理内存的50-70%)

innodb_buffer_pool_size = 4G

连接相关内存

max_connections = 200
thread_cache_size = 50

临时表内存

tmp_table_size = 64M
max_heap_table_size = 64M

排序缓冲区

sort_buffer_size = 2M
join_buffer_size = 2M

查询缓存(MySQL 8.0已移除,5.7可考虑禁用)

query_cache_size = 0

七、预防与监控体系建设

  1. 建立监控仪表盘
    bash

使用Prometheus + Grafana监控

部署mysqld_exporter

docker run -d
–name mysqld-exporter
-p 9104:9104
-e DATA_SOURCE_NAME=“user:password@(hostname:3306)/”
prom/mysqld-exporter

Grafana仪表盘导入ID:7362(MySQL Overview)

  1. 关键监控指标告警
    yaml

Prometheus告警规则示例

groups:

  • name: mysql_alerts
    rules:
    • alert: HighCPUUsage
      expr: rate(process_cpu_seconds_total{job=“mysql”}[5m]) * 100 > 80
      for: 5m
      labels:
      severity: warning
      annotations:
      summary: “MySQL CPU使用率过高”

    • alert: HighMemoryUsage
      expr: process_resident_memory_bytes{job=“mysql”} / node_memory_MemTotal_bytes * 100 > 85
      for: 5m
      labels:
      severity: critical
      annotations:
      summary: “MySQL内存使用率过高”

  1. 定期健康检查
    sql
    – 每日健康检查SQL
    SELECT
    – 连接状态
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
    WHERE VARIABLE_NAME = ‘Threads_connected’) AS current_connections,
    (SELECT VARIABLE_VALUE FROM performance_schema.global_variables
    WHERE VARIABLE_NAME = ‘max_connections’) AS max_connections,

    – 缓冲池命中率
    ROUND((1 - (
    SELECT VARIABLE_VALUE FROM performance_schema.global_status
    WHERE VARIABLE_NAME = ‘Innodb_buffer_pool_reads’
    ) / (
    SELECT VARIABLE_VALUE FROM performance_schema.global_status
    WHERE VARIABLE_NAME = ‘Innodb_buffer_pool_read_requests’
    )) * 100, 2) AS buffer_pool_hit_rate,

    – 临时表使用
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
    WHERE VARIABLE_NAME = ‘Created_tmp_disk_tables’) AS disk_temp_tables,

    – 慢查询数量
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
    WHERE VARIABLE_NAME = ‘Slow_queries’) AS slow_queries_today;

  2. 自动化优化建议
    sql
    – 使用MySQLTuner或pt-variable-advisor

下载并运行MySQLTuner

wget https://raw.githubusercontent.com/major/MySQLTuner-perl/master/mysqltuner.pl
perl mysqltuner.pl --host 127.0.0.1 --user root --pass ‘yourpassword’

或使用Percona工具

pt-variable-advisor localhost --user root --password ‘yourpassword’
八、排查流程总结
快速排查决策树
text
CPU/内存飙升

  1. top/htop确认MySQL是罪魁祸首
  2. SHOW PROCESSLIST查看活跃查询
    ├── 有大量慢查询 → 优化SQL/添加索引
    ├── 大量锁等待 → 分析锁争用,优化事务
    └── 连接数爆满 → 调整连接池配置
  3. 检查内存分配
    ├── 缓冲池过大 → 调整innodb_buffer_pool_size
    ├── 临时表过多 → 优化查询,调整tmp_table_size
    └── 内存泄漏 → 升级版本,检查bug
  4. 分析性能Schema数据
  5. 实施优化措施并监控效果
    必备命令速查表
    bash

系统层面

top -c # 整体资源监控
iotop # IO监控
vmstat 1 # 虚拟内存统计

MySQL层面

mysqladmin processlist # 查看进程列表
mysqladmin status # 查看状态摘要
mysql -e “SHOW ENGINE INNODB STATUS\G” # InnoDB详细信息

性能分析

pt-query-digest slow.log # 分析慢查询日志
mysqldumpslow -s t slow.log # 慢查询排序
九、高级工具推荐

  1. 实时监控工具
    bash

Percona Monitoring and Management (PMM)

docker run -d
–name pmm-server
-p 80:80
-p 443:443
-v /srv/pmm-data:/srv
percona/pmm-server:latest
2. SQL审核工具
bash

SOAR - SQL Optimizer And Rewriter

git clone https://github.com/XiaoMi/soar.git
cd soar
./soar -query “你的SQL” -report-type html
3. 压力测试工具
bash

sysbench基准测试

sysbench oltp_read_write
–mysql-host=127.0.0.1
–mysql-port=3306
–mysql-user=root
–mysql-password=xxx
–table-size=1000000
–threads=32
–time=300
run
问题解决验证清单
✅ CPU问题解决验证:

CPU使用率降至正常水平(<60%)

查询响应时间恢复正常

慢查询数量显著减少

锁等待时间降低

✅ 内存问题解决验证:

内存使用率稳定在合理范围

SWAP使用率为0或极低

缓冲池命中率>99%

临时表磁盘使用显著减少

✅ 预防措施就绪:

监控告警已配置

慢查询日志已开启并定期分析

定期健康检查脚本已部署

备份和恢复方案已验证

黄金法则:

当MySQL出现资源飙升时,保持冷静,按流程排查。先应急恢复服务,再分析根因,最后实施长效预防措施。记住:预防永远比救火更重要。建立一个完善的监控体系,是避免此类问题的最佳实践。

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/3/20 21:17:04

C#每日面试题-ref和out的区别

C#每日面试题-ref和out的区别 大家好&#xff0c;我是专注于C#面试干货分享的博主&#xff0c;今天咱们拆解另一道高频基础面试题——ref和out关键字的区别。这两个关键字都是C#中用于“按引用传递参数”的核心语法&#xff0c;看似功能相似&#xff0c;很多新手甚至资深开发者…

作者头像 李华
网站建设 2026/3/27 14:20:58

别再瞎找了!千笔,抢手爆款的AI论文软件

你是否曾为论文选题发愁&#xff0c;绞尽脑汁却无从下手&#xff1f;是否在深夜面对空白文档&#xff0c;思绪枯竭、无从落笔&#xff1f;又或者反复修改却始终不满意表达效果&#xff1f;论文写作的每一步都充满挑战&#xff0c;而这些难题&#xff0c;正在被千笔AI一一化解。…

作者头像 李华
网站建设 2026/3/27 15:51:14

【电力系统】基于极限学习机的DC-DC转换器建模附matlab代码

✅作者简介&#xff1a;热爱科研的Matlab仿真开发者&#xff0c;擅长毕业设计辅导、数学建模、数据处理、建模仿真、程序设计、完整代码获取、论文复现及科研仿真。&#x1f34e; 往期回顾关注个人主页&#xff1a;Matlab科研工作室&#x1f447; 关注我领取海量matlab电子书和…

作者头像 李华
网站建设 2026/3/19 3:57:45

Power BI Report Server 2026 v15.0.1120.113

Power BI 报表服务器是面向当前用户的本地报表解决方案&#xff0c;并可灵活迁移到云端。它包含在 Power BI Premium 中&#xff0c;因此您可以根据自身需求随时迁移到云端。Power BI 报表服务器是一款本地部署的报表解决方案&#xff0c;可满足您当前的报表需求&#xff0c;并…

作者头像 李华
网站建设 2026/3/28 19:52:56

模型剪枝大白话讲解:结构化 vs 非结构化

模型剪枝大白话讲解&#xff1a;结构化vs非结构化 先给核心结论&#xff1a;模型剪枝就是给训练好的模型“减肥”&#xff0c;删掉里面没用的部分&#xff0c;让模型变轻、计算变少&#xff1b;而结构化和非结构化剪枝的核心区别&#xff0c;就在于 “怎么剪”&#xff08;剪的…

作者头像 李华