pg主备切换

这篇具有很好参考价值的文章主要介绍了pg主备切换。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

1. switchover正常主从切换

1.1 正常切换

官网:https://repmgr.org/docs/current/repmgr-standby-switchover.html
repmgr -f /postgresql/pg13/repmgr.conf cluster show
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
repmgr -f /postgresql/pg13/repmgr.conf cluster show
注意:切换操作只能在standby 库上执行

-- 可以debug打印详细的切换过程
repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose


案例1:
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | primary | * running |          | default  | 100      | 1        | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | standby |   running | pg1      | default  | 100      | 1        | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | standby |   running | pg1      | default  | 100      | 1        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running | pg1      | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2



[pgsql@pg2 ~]$ repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
NOTICE: checking switchover on node "pg2" (ID: 2) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "172.72.10.2" succeeded
INFO: able to execute "repmgr" on remote host "172.72.10.2"
INFO: all sibling nodes are reachable via SSH
INFO: 3 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "pg2" (ID: 2) would be promoted to primary; current primary "pg1" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "pg1":
  "/postgresql/pg13/bin/pg_ctl  -D '/postgresql/pgdata' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met



[pgsql@pg2 ~]$ repmgr -f /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "pg2" (ID: 2)
NOTICE: local node "pg2" (ID: 2) will be promoted to primary; current primary "pg1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "pg1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "pg1" (ID: 1) 
DETAIL: executing server command "/postgresql/pg13/bin/pg_ctl  -D '/postgresql/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/F000028
NOTICE: promoting standby to primary
DETAIL: promoting server "pg2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "pg2" (ID: 2) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "pg2" (ID: 2) 
ERROR: unable to execute CHECKPOINT
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/F000028; rejoin target node's fork point: 0/F0000A0
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/postgresql/pg13/bin/pg_ctl  -w -D '/postgresql/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
NOTICE: node  "pg2" (ID: 2) promoted to primary, node "pg1" (ID: 1) demoted to standby
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "pg2" is now primary and node "pg1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully



[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | standby |   running | pg2      | default  | 100      | 1        | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | primary | * running |          | default  | 100      | 2        | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | standby |   running | pg2      | default  | 100      | 2        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running | pg2      | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2



repmgr=# \x 
Expanded display is on.
repmgr=# select * from nodes;
-[ RECORD 1 ]----+------------------------------------------------------------------------------
node_id          | 2
upstream_node_id | 
active           | t
node_name        | pg2
type             | primary
location         | default
priority         | 100
conninfo         | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        | 
config_file      | /postgresql/pg13/repmgr.conf
-[ RECORD 2 ]----+------------------------------------------------------------------------------
node_id          | 1
upstream_node_id | 2
active           | t
node_name        | pg1
type             | standby
location         | default
priority         | 100
conninfo         | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        | 
config_file      | /postgresql/pg13/repmgr.conf
-[ RECORD 3 ]----+------------------------------------------------------------------------------
node_id          | 3
upstream_node_id | 2
active           | t
node_name        | pg3
type             | standby
location         | default
priority         | 100
conninfo         | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        | 
config_file      | /postgresql/pg13/repmgr.conf
-[ RECORD 4 ]----+------------------------------------------------------------------------------
node_id          | 6
upstream_node_id | 2
active           | t
node_name        | wintess
type             | witness
location         | default
priority         | 0
conninfo         | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        | 
config_file      | /postgresql/pg13/repmgr.conf


1.2 切换故障

以下所有操作,均在备库执行
PS:若切换失败,则需要重新克隆从库
1)  启动主库
pg_ctl start

2) 重新克隆从库
repmgr -h 172.72.10.2 -U repmgr -d repmgr -f /postgresql/pg13/repmgr.conf standby clone --force 
pg_ctl  start

3) 重新注册
repmgr -f /postgresql/pg13/repmgr.conf standby register --force
repmgr -f  /postgresql/pg13/repmgr.conf  cluster show


4)  把新备库切换为主库
repmgr -f  /postgresql/pg13/repmgr.conf cluster show
repmgr -f  /postgresql/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
-- 可以debug打印详细的切换过程
repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind --log-level DEBUG --verbose
repmgr -f  /postgresql/pg13/repmgr.conf cluster show


2 failover异常手工主从切换

2.1 主库出现故障,然后直接将从库提升为主库

1)主库宕机
pg_ctl -m fast stop

2)查询集群状态
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+---------------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | standby |   running     | ? pg2    | default  | 100      | 2        | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | primary | ? unreachable | ?        | default  | 100      |          | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | standby |   running     | ? pg2    | default  | 100      | 2        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running     | ? pg2    | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)'s upstream node "pg2" (ID: 2)
  - unable to determine if node "pg1" (ID: 1) is attached to its upstream node "pg2" (ID: 2)
  - unable to connect to node "pg2" (ID: 2)
  - node "pg2" (ID: 2) is registered as an active primary but is unreachable
  - unable to connect to node "pg3" (ID: 3)'s upstream node "pg2" (ID: 2)
  - unable to determine if node "pg3" (ID: 3) is attached to its upstream node "pg2" (ID: 2)
  - unable to connect to node "wintess" (ID: 6)'s upstream node "pg2" (ID: 2)

HINT: execute with --verbose option to see connection error messages


3) pg3 提升为主库(pg3执行)
repmgr  -f  /postgresql/pg13/repmgr.conf --siblings-follow standby promote


4)查看集群状态
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | standby |   running | pg3      | default  | 100      | 2        | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | primary | - failed  | ?        | default  | 100      |          | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | primary | * running |          | default  | 100      | 3        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running | pg3      | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "pg2" (ID: 2)

HINT: execute with --verbose option to see connection error messages
PS:pg2损坏,pg3升为新主


5) 因pg2损坏,现需修复,作为从库加入集群
pg_ctl stop
repmgr -h 172.72.10.5 -U repmgr -d repmgr -f /postgresql/pg13/repmgr.conf standby clone --force

repmgr -f /postgresql/pg13/repmgr.conf node rejoin -d 'host=172.72.10.5  user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verbose
repmgr -f /postgresql/pg13/repmgr.conf node rejoin -d 'host=172.72.10.5  user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

pg_ctl status
pg_ctl: server is running (PID: 4356)
/postgresql/pg13/bin/postgres "-D" "/postgresql/pgdata"

6) 查看集群状态
[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | primary | * running |          | default  | 100      | 4        | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | standby |   running | pg1      | default  | 100      | 4        | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | standby |   running | pg1      | default  | 100      | 4        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running | pg1      | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2

2.2 配置自动failover

1、在所有节点都配置
echo "shared_preload_libraries='repmgr' " >> /postgresql/pgdata/postgresql.conf
pg_ctl stop
pg_ctl start



2、在所有节点都配置文件` /postgresql/pg13/repmgr.conf`,增加内容包括:
cat >>  /postgresql/pg13/repmgr.conf << "EOF"

monitoring_history=yes
monitor_interval_secs=5
failover=automatic
reconnect_attempts=6
reconnect_interval=5
promote_command='repmgr standby promote -f  /postgresql/pg13/repmgr.conf --log-to-file'
follow_command='repmgr standby follow -f  /postgresql/pg13/repmgr.conf --log-to-file --upstream-node-id=%n'
log_level=INFO
log_status_interval=10
log_file=' /postgresql/pg13/repmgr.log'
EOF



cat >> /etc/logrotate.conf <<"EOF"
/postgresql/pg13/repmgr.log {
missingok
compress
rotate 30
daily
dateext
create 0600 pgsql pgsql
}
EOF


3、在所有节点都启动repmgrd进程

-- 启动
repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize

-- 建议加到开机自动启动:/etc/rc.local
echo "repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize" >> /etc/rc.local
chmod +x /etc/rc.d/rc.local

-- 停止
kill -9 `cat /tmp/repmgrd.pid`


3 .自动切换测试

1 查看集群状态
[pgsql@pg1 ~]$ repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize
[2023-10-15 23:12:58] [NOTICE] redirecting logging output to "/postgresql/pg13/repmgr.log"

[pgsql@pg1 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | primary | * running |          | default  | 100      | 4        | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | standby |   running | pg1      | default  | 100      | 4        | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | standby |   running | pg1      | default  | 100      | 4        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running | pg1      | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2



2  停止主库
pg_ctl stop

3  查看集群状态
[pgsql@pg3 ~]$ repmgrd -f /postgresql/pg13/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize
[2023-10-15 23:12:58] [NOTICE] redirecting logging output to "/postgresql/pg13/repmgr.log"

[pgsql@pg3 ~]$ repmgr -f /postgresql/pg13/repmgr.conf cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                            
----+---------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------------
 1  | pg1     | primary | - failed  | ?        | default  | 100      |          | host=172.72.10.2 user=repmgr password=repmgr dbname=repmgr connect_timeout=2 
 2  | pg2     | primary | * running |          | default  | 100      | 5        | host=172.72.10.3  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 3  | pg3     | standby |   running | pg2      | default  | 100      | 4        | host=172.72.10.5  user=repmgr password=repmgr dbname=repmgr connect_timeout=2
 6  | wintess | witness | * running | pg2      | default  | 0        | n/a      | host=172.72.10.7  user=repmgr password=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "pg1" (ID: 1)

HINT: execute with --verbose option to see connection error messages


4 修复原主库
repmgr -f /postgresql/pg13/repmgr.conf node  rejoin -d 'host=172.72.10.3 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

文章来源地址https://www.toymoban.com/news/detail-724818.html

到了这里,关于pg主备切换的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • openGauss学习笔记-61 openGauss 数据库管理-常见主备部署方案

    61.1 单中心 图 1 单中心部署图 组网特点: 单AZ部署,可以配置一个同步备一个异步备 优势: 三个node完全等价,故障任意一个node都可以提供服务; 成本低。 劣势: 高可用能力较低,发生AZ级故障只能依赖节点恢复; 适用性: 适用于对高可用性要求较低的业务系统。 61.2 同

    2024年02月09日
    浏览(47)
  • 大数据之LibrA数据库系统告警处理(ALM-12010 Manager主备节点间心跳中断)

    告警解释 当主Manager节点在7秒内没有收到备Manager节点的心跳信号时,产生该告警。 当主Manager节点收到备Manager节点的心跳信号后,告警恢复。 告警属性 告警参数 对系统的影响 如果主Manager进程异常,主备倒换无法进行,影响业务。 可能原因 主备Manager节点间链路异常。 处理

    2024年02月06日
    浏览(43)
  • 使用docker部署pg数据库

    使用 Docker 部署 PostgreSQL 数据库是一种常见的做法,它提供了方便、可移植和可重复的方式来运行数据库。下面是一个简单的示例,用于在 Docker 中部署 PostgreSQL 数据库: 首先,确保您已经安装了 Docker 并正确配置了 Docker 环境。 打开终端或命令提示符,并执行以下命令来下载

    2024年02月10日
    浏览(50)
  • pg数据库和mysql区别

    PostgreSQL (通常称为 PG) 和 MySQL 都是广泛使用的关系型数据库管理系统 (RDBMS)。虽然它们都是用于存储和管理数据的关系数据库,但它们在一些方面有很大的区别,如下所述: 数据类型:PostgreSQL 支持更多的数据类型,如范围类型、网络地址类型、JSON 和 XML 数据类型等,而 My

    2024年03月12日
    浏览(55)
  • Flink CDC实时同步PG数据库

    JDK:1.8 Flink:1.16.2 Scala:2.11 Hadoop:3.1.3 github地址:https://github.com/rockets0421/FlinkCDC-PG.git  1、更改配置文件postgresql.conf # 更改wal日志方式为logical wal_level = logical # minimal, replica, or logical # 更改solts最大数量(默认值为10),flink-cdc默认一张表占用一个slots max_replication_slots = 20 # m

    2024年02月13日
    浏览(69)
  • 使用pg_prewarm缓存PostgreSQL数据库表

    pg_prewarm 直接利用系统缓存的代码,对操作系统发出异步prefetch请求,在应用中,尤其在OLAP的情况下,对于大表的分析等等是非常耗费查询的时间的,而即使我们使用select table的方式,这张表也并不可能将所有的数据都装载到内存中,而pg_prewarm的功能就是完成一个张表全部进入

    2024年02月14日
    浏览(46)
  • openGauss数据库pg_xlog爆满问题解决

    最近有一个之前搭的环境登不上了,好久没用想拿来测试的时候发现启动不了。启动时报错: [Errno 28] No space left on device query也不行了,提示没有空间了。 查询磁盘使用情况 df -h ,果然100% 这个环境当时安装的是主备,看了下备库的服务器,发现不知道啥时候已经被删库了,

    2024年02月02日
    浏览(54)
  • PG数据库实现bool自动转smallint的方式

    语法: DROP FUNCTION IF EXISTS your_schema_name.function_name(arg_type1, arg_type2) CASCADE RESTRICT; 实例: DROP FUNCTION IF EXISTS platformyw.boolean_to_smallint(bool) CASCADE RESTRICT; 语法: SELECT * FROM information_schema.routines WHERE routine_catalog={you_catalog_name} and routine_schema={you_schema_name} and routine_name ={you_func_name} and

    2024年02月01日
    浏览(54)
  • 达蒙数据库:本地编码:PG_GBK, 导入文件编码:PG_UTF8错误解决

    在windows使用达梦管理工具导入.dmp文件时出现该错误 问题解决: 1、找到DM数据库的安装路径的bin 目录下 cmd 进入终端 2、输入命令行 使用dimp工具进行导入,最后需要加上FULL=Y是表示整个表格导入,但是中间可能会出现报错

    2024年02月06日
    浏览(67)
  • PG-DBA培训04:PostgreSQL数据类型与数据库设计规范

    一、风哥PG-DBA培训04:PostgreSQL数据类型与数据库设计规范 本课程由风哥发布的基于PostgreSQL数据库的系列课程,本课程属于PostgreSQL数据库SQL开发与应用实战阶段之PostgreSQL数据类型与数据库设计规范,学完本课程可以掌握PostgreSQL SQL语句基础讲解,PostgreSQL SQL语言基础知识,安

    2024年02月11日
    浏览(59)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包