1. switchover正常主从切换
1.1 正常切换
官网:https://repmgr.org/docs/current/repmgr-standby-switchover.html
repmgr -f /postgresql/pg13/repmgr.conf cluster show
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
repmgr -f /postgresql/pg13/repmgr.conf cluster show
注意:切换操作只能在standby 库上执行
-- 可以debug打印详细的切换过程
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose
案例1:
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 1 | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg1 | default | 100 | 1 | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg1 | default | 100 | 1 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | pg1 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
[pgsql@pg2 ~]$ repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
NOTICE: checking switchover on node "pg2" (ID: 2) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "172.72.10.2" succeeded
INFO: able to execute "repmgr" on remote host "172.72.10.2"
INFO: all sibling nodes are reachable via SSH
INFO: 3 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "pg2" (ID: 2) would be promoted to primary; current primary "pg1" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "pg1":
"/postgresql/pg13/bin/pg_ctl -D '/postgresql/pgdata' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
[pgsql@pg2 ~]$ repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "pg2" (ID: 2)
NOTICE: local node "pg2" (ID: 2) will be promoted to primary; current primary "pg1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "pg1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "pg1" (ID: 1)
DETAIL: executing server command "/postgresql/pg13/bin/pg_ctl -D '/postgresql/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/F000028
NOTICE: promoting standby to primary
DETAIL: promoting server "pg2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg2" (ID: 2) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "pg2" (ID: 2)
ERROR: unable to execute CHECKPOINT
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/F000028; rejoin target node's fork point: 0/F0000A0
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/postgresql/pg13/bin/pg_ctl -w -D '/postgresql/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
NOTICE: node "pg2" (ID: 2) promoted to primary, node "pg1" (ID: 1) demoted to standby
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "pg2" is now primary and node "pg1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg2 | default | 100 | 1 | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 2 | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg2 | default | 100 | 2 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | pg2 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repmgr=# \x
Expanded display is on.
repmgr=# select * from nodes;
-[ RECORD 1 ]----+------------------------------------------------------------------------------
node_id | 2
upstream_node_id |
active | t
node_name | pg2
type | primary
location | default
priority | 100
conninfo | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser | repmgr
slot_name |
config_file | /postgresql/pg13/repmgr.conf
-[ RECORD 2 ]----+------------------------------------------------------------------------------
node_id | 1
upstream_node_id | 2
active | t
node_name | pg1
type | standby
location | default
priority | 100
conninfo | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser | repmgr
slot_name |
config_file | /postgresql/pg13/repmgr.conf
-[ RECORD 3 ]----+------------------------------------------------------------------------------
node_id | 3
upstream_node_id | 2
active | t
node_name | pg3
type | standby
location | default
priority | 100
conninfo | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser | repmgr
slot_name |
config_file | /postgresql/pg13/repmgr.conf
-[ RECORD 4 ]----+------------------------------------------------------------------------------
node_id | 6
upstream_node_id | 2
active | t
node_name | wintess
type | witness
location | default
priority | 0
conninfo | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser | repmgr
slot_name |
config_file | /postgresql/pg13/repmgr.conf
1.2 切换故障
以下所有操作,均在备库执行
PS:若切换失败,则需要重新克隆从库
1) 启动主库
pg_ctl start
2) 重新克隆从库
repmgr -h 172.72.10.2 -U repmgr -d repmgr -f /postgresql/pg13/repmgr.conf standby clone --force
pg_ctl start
3) 重新注册
repmgr -f /postgresql/pg13/repmgr.conf standby register --force
repmgr -f /postgresql/pg13/repmgr.conf cluster show
4) 把新备库切换为主库
repmgr -f /postgresql/pg13/repmgr.conf cluster show
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
-- 可以debug打印详细的切换过程
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose
repmgr -f /postgresql/pg13/repmgr.conf cluster show
2 failover异常手工主从切换
2.1 主库出现故障,然后直接将从库提升为主库
1)主库宕机
pg_ctl -m fast stop
2)查询集群状态
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+---------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | ? pg2 | default | 100 | 2 | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | ? unreachable | ? | default | 100 | | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | ? pg2 | default | 100 | 2 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | ? pg2 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "pg1" (ID: 1)'s upstream node "pg2" (ID: 2)
- unable to determine if node "pg1" (ID: 1) is attached to its upstream node "pg2" (ID: 2)
- unable to connect to node "pg2" (ID: 2)
- node "pg2" (ID: 2) is registered as an active primary but is unreachable
- unable to connect to node "pg3" (ID: 3)'s upstream node "pg2" (ID: 2)
- unable to determine if node "pg3" (ID: 3) is attached to its upstream node "pg2" (ID: 2)
- unable to connect to node "wintess" (ID: 6)'s upstream node "pg2" (ID: 2)
HINT: execute with --verbose option to see connection error messages
3) pg3 提升为主库(pg3执行)
repmgr -f /postgresql/pg13/repmgr.conf --siblings-follow standby promote
4)查看集群状态
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | standby | running | pg3 | default | 100 | 2 | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | - failed | ? | default | 100 | | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | primary | * running | | default | 100 | 3 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | pg3 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "pg2" (ID: 2)
HINT: execute with --verbose option to see connection error messages
PS:pg2损坏,pg3升为新主
5) 因pg2损坏,现需修复,作为从库加入集群
pg_ctl stop
repmgr -h 172.72.10.5 -U repmgr -d repmgr -f /postgresql/pg13/repmgr.conf standby clone --force
repmgr -f /postgresql/pg13/repmgr.conf node rejoin -d 'host=172.72.10.5 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verbose
repmgr -f /postgresql/pg13/repmgr.conf node rejoin -d 'host=172.72.10.5 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose
pg_ctl status
pg_ctl: server is running (PID: 4356)
/postgresql/pg13/bin/postgres "-D" "/postgresql/pgdata"
6) 查看集群状态
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 4 | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg1 | default | 100 | 4 | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg1 | default | 100 | 4 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | pg1 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2.2 配置自动failover
1、在所有节点都配置
echo "shared_preload_libraries='repmgr' " >> /postgresql/pgdata/postgresql.conf
pg_ctl stop
pg_ctl start
2、在所有节点都配置文件` /postgresql/pg13/repmgr.conf`,增加内容包括:
cat >> /postgresql/pg13/repmgr.conf << "EOF"
monitoring_history=yes
monitor_interval_secs=5
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command='repmgr standby promote -f /postgresql/pg13/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f /postgresql/pg13/repmgr.conf --log-to-file --upstream-node-id=%n'
log_level=INFO
log_status_interval=10
log_file=' /postgresql/pg13/repmgr.log'
EOF
cat >> /etc/logrotate.conf <<"EOF"
/postgresql/pg13/repmgr.log {
missingok
compress
rotate 30
daily
dateext
create 0600 pgsql pgsql
}
EOF
3、在所有节点都启动repmgrd进程
-- 启动
repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize
-- 建议加到开机自动启动:/etc/rc.local
echo "repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize" >> /etc/rc.local
chmod +x /etc/rc.d/rc.local
-- 停止
kill -9 `cat /tmp/repmgrd.pid`
3 .自动切换测试
1 查看集群状态
[pgsql@pg1 ~]$ repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize
[2023-10-15 23:12:58] [NOTICE] redirecting logging output to "/postgresql/pg13/repmgr.log"
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | * running | | default | 100 | 4 | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | standby | running | pg1 | default | 100 | 4 | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg1 | default | 100 | 4 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | pg1 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 停止主库
pg_ctl stop
3 查看集群状态
[pgsql@pg3 ~]$ repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize
[2023-10-15 23:12:58] [NOTICE] redirecting logging output to "/postgresql/pg13/repmgr.log"
[pgsql@pg3 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
1 | pg1 | primary | - failed | ? | default | 100 | | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
2 | pg2 | primary | * running | | default | 100 | 5 | host=172.72.10.3 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
3 | pg3 | standby | running | pg2 | default | 100 | 4 | host=172.72.10.5 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
6 | wintess | witness | * running | pg2 | default | 0 | n/a | host=172.72.10.7 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
WARNING: following issues were detected
- unable to connect to node "pg1" (ID: 1)
HINT: execute with --verbose option to see connection error messages
4 修复原主库
repmgr -f /postgresql/pg13/repmgr.conf node rejoin -d 'host=172.72.10.3 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose
文章来源地址https://www.toymoban.com/news/detail-724818.html
文章来源:https://www.toymoban.com/news/detail-724818.html
到了这里,关于pg主备切换的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!