news 2026/3/14 16:34:41

HG_REPMGR autofailvoer自动故障转移

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
HG_REPMGR autofailvoer自动故障转移

文章目录

  • 文档用途
  • 详细信息

文档用途

HG_REPMGR自动故障转移配置参考

详细信息

配置集群自动故障转移(failover),需要为集群中的每个节点开启 repmgrd 守护进程。当主节点出现故障后,会自动将合适的备节点提升为新主节点,继

续对外提供服务。示例如下。

  1. 配置 postgresql.replication.conf 文件(所有节点)

在上述 postgresql.replication.conf 的基础上,添加如下参数:

shared_preload_libraries='repmgr'

或者

altersystemsetshared_preload_libraries=pg_pathman,timescaledb,repmgr;

重启数据库:

pg_ctl restart
  1. 配置 hg_repmgr.conf(所有节点)

在现有的 hg_repmgr.conf 文件中添加如下参数:

failover=automatic promote_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote'follow_command='repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby follow --upstream-node-id=%n'

如果需要将 repmgr 的日志定位到固定的日志文件可添加 log_file 参数,如 下:

log_file='/opt/highgo/5.6.1/conf/data/log/hg_repmgr.log'

为了防止上述日志文件不断膨胀,可配置系统的 logrotate。(详细步骤略)

  1. 开启 repmgrd 进程(所有节点)
repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[highgo@dbrsconf]$ repmgrd-d-p/tmp/hg_repmgrd.pid[2019-05-0614:02:42][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:02:42][INFO]connectingtodatabase""[2019-05-0614:02:43][ERROR]repmgr extensionnotfoundonthis node[2019-05-0614:02:43][DETAIL]repmgr extensionisavailable butnotinstalledindatabase"highgo"[2019-05-0614:02:43][HINT]checkthat this nodeispartofa repmgr cluster[highgo@dbrsconf]$ highgo=# \cYou are now connectedtodatabase"highgo"asuser"highgo".createextension repmgr;[highgo@dbrsconf]$ repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[2019-05-0614:21:21][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:21:21][INFO]connectingtodatabase"host=dbrs user=hgrepmgr dbname=hgrepmgr connect_timeout=2"[highgo@dbrsconf]$ хϢ: set_repmgrd_pid(): provided pidfileis/tmp/hg_repmgrd.pid[2019-05-0614:21:21][NOTICE]startingmonitoringofnode"dbrs"(ID:1)[2019-05-0614:21:21][NOTICE]monitoring clusterprimary"dbrs"(node ID:1)[highgo@dbrs2conf]$ repmgrd-f/opt/highgo/5.6.1/conf/hg_repmgr.conf-d-p/tmp/hg_repmgrd.pid[2019-05-0614:21:50][NOTICE]repmgrd(repmgrd4.2)startingup[2019-05-0614:21:50][INFO]connectingtodatabase"host=dbrs2 user=hgrepmgr dbname=hgrepmgr connect_timeout=2"[highgo@dbrs2conf]$ хϢ: set_repmgrd_pid(): provided pidfileis/tmp/hg_repmgrd.pid[2019-05-0614:21:50][NOTICE]startingmonitoringofnode"dbrs2"(ID:2)[2019-05-0614:21:50][INFO]monitoring connectiontoupstream node"dbrs"(node ID:1)[highgo@dbrsconf]$ ls-atl/tmp/hg_repmgrd.pid-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid[highgo@dbrsconf]$[highgo@dbrs2conf]$ ls-atl/tmp/hg_repmgrd.pid-rw-rw-r--. 1 highgo highgo 5 May 6 14:21 /tmp/hg_repmgrd.pid[highgo@dbrs2conf]$

提示:这个后台进程,每次重启服务器,都要手动启动吗?

开发回复:目前是,后期会修改为自动

查看集群状态

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|primary|*running||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|standby|running|dbrs|default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2[highgo@dbrsconf]$

模拟主节点故障

1)在 node1 上关闭数据库

pg_ctl stop

2)在 node2 上查看集群状态

[highgo@dbrs2conf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|primary|-failed||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|primary|*running||default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2WARNING:followingissues were detected-unabletoconnecttonode"dbrs"(ID:1)[highgo@dbrs2conf]$

此时 node2 已经提升为 primary

日志

[highgo@dbrs2conf]$[2019-05-0614:24:14][WARNING]unabletoconnecttoupstream node"dbrs"(node ID:1)[2019-05-0614:24:14][INFO]checking stateofnode1,1of6attempts[2019-05-0614:24:14][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:24][INFO]checking stateofnode1,2of6attempts[2019-05-0614:24:24][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:34][INFO]checking stateofnode1,3of6attempts[2019-05-0614:24:34][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:44][INFO]checking stateofnode1,4of6attempts[2019-05-0614:24:44][INFO]sleeping10seconds untilnextreconnection attempt[2019-05-0614:24:54][INFO]checking stateofnode1,5of6attempts[2019-05-0614:24:54][INFO]sleeping10seconds untilnextreconnection attempt[highgo@dbrs2conf]$[2019-05-0614:25:04][INFO]checking stateofnode1,6of6attempts[2019-05-0614:25:04][WARNING]unabletoreconnecttonode1after6attempts[2019-05-0614:25:04][NOTICE]this nodeisthe only available candidateandwill now promote itself[2019-05-0614:25:04][INFO]promote_commandis:"repmgr -f /opt/highgo/5.6.1/conf/hg_repmgr.conf standby promote"NOTICE: promoting standbytoprimaryDETAIL: promoting server"dbrs2"(ID:2)using"/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data' promote"DETAIL: waiting upto60seconds(parameter"promote_check_timeout")forpromotiontocomplete NOTICE: STANDBY PROMOTE successful DETAIL: server"dbrs2"(ID:2)was successfully promotedtoprimary[2019-05-0614:25:10][INFO]switchingtoprimarymonitoringmode[2019-05-0614:25:10][NOTICE]monitoring clusterprimary"dbrs2"(node ID:2)
  1. 当 node1 的故障恢复之后,可重新加入集群
[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+----------------------+----------+----------+------------------------------------------------------------1|dbrs|primary|*running||default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|standby|!runningasprimary|dbrs|default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2

1)重新加入集群 (在故障节点上执行,host指定新的主节点,重新加入后作为standby节点。想想pg_rewind)

repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf node rejoin-d'host=dbrs2 dbname=hgrepmgr user=hgrepmgr'--force-rewind --verbose

注意:执行该命令前应关闭 node1 的 HGDB。

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf node rejoin-d'host=dbrs2 dbname=hgrepmgr user=hgrepmgr'--force-rewind --verboseNOTICE:usingprovided configurationfile"/opt/highgo/5.6.1/conf/hg_repmgr.conf"INFO: prerequisitesforusingpg_rewind are met INFO:0files copiedto"/tmp/repmgr-config-archive-dbrs"NOTICE: executing pg_rewind NOTICE:0files copiedto/opt/highgo/5.6.1/dataINFO: directory"/tmp/repmgr-config-archive-dbrs"deleted INFO: deleting"recovery.done"NOTICE: setting node1's primary to node 2 NOTICE: starting server using "/opt/highgo/5.6.1/bin/pg_ctl -w -D '/opt/highgo/5.6.1/data'start" INFO: demotedprimaryispingable INFO: node1has attachedtoits upstream node NOTICE: NODE REJOIN successful DETAIL: node1isnow attachedtonode2[highgo@dbrsconf]$

2)查看集群状态 repmgr cluster show

[highgo@dbrsconf]$ repmgr-f/opt/highgo/5.6.1/conf/hg_repmgr.conf clustershowID|Name|Role|Status|Upstream|Location|Connection string----+-------+---------+-----------+----------+----------+------------------------------------------------------------1|dbrs|standby|running|dbrs2|default|host=dbrsuser=hgrepmgr dbname=hgrepmgr connect_timeout=22|dbrs2|primary|*running||default|host=dbrs2user=hgrepmgr dbname=hgrepmgr connect_timeout=2[highgo@dbrsconf]$
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/3/10 6:26:07

最近在搞永磁同步电机离线参数辨识的项目,发现不少新手在玩SVPWM时总会遇到死区补偿和高频注入这两个大坑。今天就拿Simulink模型说事,咱们边看代码边唠嗑

SVPWM死区补偿(基于电流极性)高频注入法辨识PMSM的dq轴电感(离线辨识)—simulink先说说SVPWM的实现。在Simulink里用PWM Generator模块生成六路PWM信号时,记得把载波频率设成和实际硬件一致。比如我用的是20kHz,这时候…

作者头像 李华
网站建设 2026/3/9 8:47:00

Spring 中 Servlet 容器和 Python FastAPI 对比

一、核心架构对比:Java Servlet vs. Python ASGI 下表清晰地展示了两个生态在对应层级上的核心组件与关系:架构层级核心职责Java / Servlet 生态Python / ASGI 生态1. 网络与协议层处理原始HTTP请求/响应、连接管理、线程/进程调度。Tomcat, Jetty, Unde…

作者头像 李华
网站建设 2026/3/13 5:18:13

langchain 常见提示词模板使用案例

大模型对象创建&调用 """ 大模型共用定义""" import os from dotenv import load_dotenv from langchain_openai import ChatOpenAI load_dotenv()# 创建大模型对象 llm ChatOpenAI(model"qwen-max-latest",base_url"https…

作者头像 李华
网站建设 2026/3/13 11:40:49

电鱼智能 RK3576 实现商用清洁机器人的视觉避障与路径规划

什么是 电鱼智能 RK3576?电鱼智能 RK3576 是一款专为 AIoT 场景设计的中高端 SoC。它搭载 4 核 Cortex-A72 4 核 Cortex-A53 处理器,最大的亮点在于集成了 6TOPS 的独立 NPU(算力甚至接近旗舰级 RK3588 的单核 NPU 性能)。配合支…

作者头像 李华
网站建设 2026/3/6 12:10:57

销售要少夸赞自己实力强,多问问客户害怕什么

制造业的销售常常会犯一个致命的错误:一和客户见面就急着向对方证明“我们技术领先同行”“设备精度非常高”“服务响应速度快”……但客户内心里想的却是:“你说得再好,万一出现问题,这个责任还是得我来承担,”在责任…

作者头像 李华