PG14归档失败解决办法archiver failed on wal_lsn

这篇具有很好参考价值的文章主要介绍了PG14归档失败解决办法archiver failed on wal_lsn。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

问题描述

昨晚Repmgr+PG14主备主库因wal日志撑爆磁盘,删除主库过期wal文件重做备库后上午进行主备状态巡查,主库向备库发送wal文件正常,但是查主库状态时发现显示有1条归档失败的记录。
postgres: archiver failed on 000000010000006F00000086

  • 主库:

walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10" walsender正常
archiver failed on 000000010000006F00000086" 归档失败

  • 备库:

walreceiver streaming 77/9EB6A198" “” “” " walreceiver正常

--查主库数据库状态
[root@pgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710995 "postgres: archiver failed on 000000010000006F00000086" "" "" "" "" "" "" "" "" ""
├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10"  #wal 发送正常

--查备库状态
[root@pgslave ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2023-10-13 00:12:19 CST; 12h ago
Process: 1931221 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 1931223 (postgres)
Tasks: 7 (limit: 201967)
Memory: 23.2G
CGroup: /system.slice/postgres.service
├─ 1931223 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 1931224 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931225 "postgres: startup recovering 00000001000000770000009E" "" "" "" "" "" "" "" "" ""
├─ 1931226 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931227 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 1931230 "postgres: walreceiver streaming 77/9EB6A198" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""   #wal接收
└─ 1931430 "postgres: repmgr repmgr 172.28.32.23(22956) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""

Oct 13 00:12:17 pgslave systemd[1]: Starting PostgreSQL database server...
Oct 13 00:12:17 pgslave pg_ctl[1931221]: waiting for server to start....
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] LOG:  redirecting log output to logging collector process
Oct 13 00:12:17 pgslave pg_ctl[1931223]: 2023-10-13 00:12:17.497 CST [1931223] HINT:  Future log output will appear in directory "log".
Oct 13 00:12:19 pgslave pg_ctl[1931221]: . done
Oct 13 00:12:19 pgslave pg_ctl[1931221]: server started
Oct 13 00:12:19 pgslave systemd[1]: Started PostgreSQL database server.

问题分析

1.查看数据库日志

PG14归档失败解决办法archiver failed on wal_lsn,Postgres,pg14归档失败

2.查看归档配置参数

参数配置正确,归档目录权限也正确

postgres=# show archive_command;
                      archive_command                      
-----------------------------------------------------------
 /usr/bin/lz4 -q -z %p /server/data/pgdb/pg_archive/%f.lz4
(1 row)

postgres=# show archive_mode;
 archive_mode 
--------------
 on
(1 row)

--查看归档目录的权限
[postgres@pgmaster ~]$ ls -ld /server/data/pgdb/pg_archive
drwxr-x--- 2 postgres postgres 4214784 Oct 13 13:14 /server/data/pgdb/pg_archive

3.手动切日志

手工归档成功,但是未解决,查看状态依然时卡住归档失败的那条wal记录那里

--手工归档
top_portal=# select pg_switch_wal();
 pg_switch_wal 
---------------
 72/51C4CFD8
(1 row)

--查主库数据库状态
[root@pgmaster ~]# systemctl status postgres
● postgres.service - PostgreSQL database server
Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
Main PID: 3710970 (postgres)
Tasks: 53 (limit: 201967)
Memory: 19.0G
CGroup: /system.slice/postgres.service
├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3710995 "postgres: archiver failed on 000000010000006F00000086" "" "" "" "" "" "" "" "" ""
├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 72/1BAC3A10"  #wal 发送正常


--查当前wal_lsn
top_portal=# select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 72/52638F10
(1 row)

--查当前wal_lsn对应的wal文件
top_portal=# select pg_walfile_name(pg_current_wal_lsn());
     pg_walfile_name      
--------------------------
 000000010000007200000052
(1 row)

--查当前最新检查点,最新检查点之前的wal文件均可以删除
[postgres@pgmaster ~]$ pg_controldata $PGDATA
pg_control version number:            1300
Catalog version number:               202107181
Database system identifier:           7268852449124462799
Database cluster state:               in production
pg_control last modified:             Fri 13 Oct 2023 10:07:35 AM CST  
Latest checkpoint location:           71/CDD2FF28
Latest checkpoint's REDO location:    71/CDD28F18
Latest checkpoint's REDO WAL file:    0000000100000071000000CD

--查报错中的wal文件
[postgres@pgmaster pg_wal]$ ls -l 000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086
[postgres@pgmaster pg_wal]$ find /server/data/pgdb/pg_archive -name 000000010000006F00000086*
ls: cannot access '000000010000006F00000086': No such file or directory
[postgres@pgmaster pg_wal]$ find /server -name 000000010000006F00000086*
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 000000010000006F00000086

4.检查$PGDATA/pg_wal/archive_status/目录下文件

[postgres@pgmaster ~]$ cd /server/data/pgdb/data/pg_wal/archive_status/
[postgres@pgmaster archive_status]$ ls -l *.ready
ls: cannot access '*.ready': No such file or directory

说明不存在需要归档但没归档的文件

该目录下,ready说明是需要归档但是没归档的,done是归档完成了的

解决办法

1.将归档失败的wal文件备份到/home/postgres目录下(生产环境如果磁盘空间允许切记不要rm删除,mv备份到目标位置)
2.手工归档select pg_switch_wal();
3.再次查看主备库状态

--1.将归档失败的wal文件备份到/home/postgres目录下
[postgres@pgmaster pg_wal]$ mv 000000010000006F00000086 /home/postgres/000000010000006F00000086
[postgres@pgmaster pg_wal]$ ls -l /home/postgres/000000010000006F00000086
-rw------- 1 postgres postgres 16777216 Oct 12 21:12 /home/postgres/000000010000006F00000086

--2.手工归档
postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 73/7EF502E0
(1 row)

--3.再次查看主库状态显示正常
[root@pgmaster data]# systemctl status postgres
● postgres.service - PostgreSQL database server
     Loaded: loaded (/usr/lib/systemd/system/postgres.service; enabled; vendor preset: disabled)
     Active: active (running) since Thu 2023-10-12 22:04:08 CST; 13h ago
    Process: 3710968 ExecStart=/server/data/pgdb/pgsql/bin/pg_ctl start -D $PGDATA (code=exited, status=0/SUCCESS)
   Main PID: 3710970 (postgres)
      Tasks: 50 (limit: 201967)
     Memory: 26.6G
     CGroup: /system.slice/postgres.service
             ├─ 3710970 /server/data/pgdb/pgsql/bin/postgres -D /server/data/pgdb/data
             ├─ 3710971 "postgres: logger " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710992 "postgres: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710993 "postgres: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710994 "postgres: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3710995 "postgres: archiver archiving 000000010000007100000035" "" "" "" "" "" "" "" "" ""
             ├─ 3710996 "postgres: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3711001 "postgres: top_portal top_portal 172.28.32.18(41438) idle" "" "" "" "" "" ""
             ├─ 3711003 "postgres: tj_sjjh dataexchange 172.28.32.28(35406) idle" "" "" "" "" "" "" ""
             ├─ 3711009 "postgres: repmgr repmgr 172.28.32.22(64096) idle" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
             ├─ 3711468 "postgres: top_portal top_portal 172.28.32.18(41720) idle" "" "" "" "" "" ""
             ├─ 3713807 "postgres: top_portal top_portal 172.28.32.20(44492) idle" "" "" "" "" "" ""
             ├─ 3723017 "postgres: walsender repmgr 172.28.32.23(36122) streaming 73/7F000BD0"

补充:若$PGDATA/pg_wal/archive_status/目录下存在大量的*.ready文件

可能的原因分析:如果数据库是突然断电,那么可能arvchive命令没有完全完成,归档目录会存在不完整的文件名称,重启数据库后,会出现归档失败的情况,这个时候,需要去归档目录删除相关归档失败文件,那么归档就会重新归档。
还未遇到该场景的问题,暂未实验。
参考链接文章来源地址https://www.toymoban.com/news/detail-737459.html

到了这里,关于PG14归档失败解决办法archiver failed on wal_lsn的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 利用 pt-archiver 实现数据库归档功能

    一个完整的项目,除了开发阶段,还有运维移交阶段,运维移交往往需要考虑数据库后续的归档机制,比如:将1年前的数据归档到另外一台服务器的归档库。有些业务表数据量比较大,为了更快的查询速度更好的用户体验,可能会对某张特定的表做归档处理,只保留最近1个

    2024年02月19日
    浏览(40)
  • copy archived log from ASM 异地恢复归档

    NOTE:  In 11g, you can use RMAN to copy the files across the network. See Note 1909235.1 Copy database file directly across network using RMAN in 11g including Primary and Standby Please use the rman scripts below to backup missing archive log files from the primary site and restore to the standby site. RMAN run { allocate channel c1 type disk; backup arc

    2024年02月09日
    浏览(38)
  • MySQL数据库归档工具之【pt-archiver】

    前言 当MySQL数据库积累了大量历史数据时,有效地管理这些数据变得至关重要。为了避免影响数据库性能,并确保其顺畅运行,我们可以利用 pt-archiver 工具来对历史数据进行归档。 什么是pt-archiver? pt-archiver是Percona Toolkit的一部分,是一个强大的MySQL数据归档工具。它能够帮

    2024年04月14日
    浏览(37)
  • POSTGRESQL PG15关于归档的新模式

    开头还是介绍一下群,如果感兴趣polardb ,mongodb ,mysql ,postgresql ,redis 等有问题,有需求都可以加群群内有各大数据库行业大咖,CTO,可以解决你的问题。加群请联系 liuaustin3 ,在新加的朋友会分到3群(共810人左右 1 + 2 + 3),这里需要注意,如果想和 瑞典马工进行面对面的交

    2024年02月06日
    浏览(51)
  • git push失败, 提示! [rejected] master -> master (fetch first)error: failed to push some refs.解决办法

    我们按照如下的步骤上传了一个项目到仓库的时候,是可以成功的: 1、git init 初始化 2、git add .  将当前目录下修改的所有代码从工作区添加到暂存区 3、git commit -m  [\\\'注释\\\']  将缓存区内容添加到本地仓库 4、git remote add origin 仓库地址  将本地仓库与远程仓库连接起来 5、

    2024年02月01日
    浏览(86)
  • 遇到:PytorchStreamReader failed reading zip archive: failed finding central 错误应该如何解决

    遇到 \\\"PytorchStreamReader failed reading zip archive: failed finding central directory\\\" 错误是由于在读取PyTorch模型时出现的问题。这个错误通常发生在模型文件被损坏或不完整的情况下。要解决这个问题,你可以尝试以下方法: 检查模型文件:首先,确保你的模型文件没有被损坏或删除。你

    2024年02月08日
    浏览(45)
  • Postgresql的一个bug_涉及归档和pg_wal

    故障描述: 服务器ocmpgdbprod1,是流复制主节点,它的从节点是ocmpgdbprod2,两个节点的Postgresql数据库版本都是PostgreSQL 11.6,主节点ocmpgdbprod1配置了pg_wal归档,从节点ocmpgdbprod2没有配置pg_wal归档,ocmpgdbprod1配置的pg_wal归档路径突然连不上了导致pg_wal归档失败从而引发pg_wal日志堆

    2024年02月10日
    浏览(35)
  • IllegalStateException: Failed to get nested archive for entry 问题解决

    一、问题描述与原因 我遇到这个错误的原因是手动把jar包拖进目标jar包中,目标jar包是用压缩软件打开的: 这样拖进去的后果就是,被拖进去的jar包时被压缩过的: 下图红色方框jar是手动拖进去的,原始大小与压缩后大小不一样; 绿色方框jar是之前在里面的jar包,原始大小

    2024年02月03日
    浏览(39)
  • dedecms后台编辑文章时archives_do.php显示空白解决办法

    今天网站后台不知道哪里搞错了,后台总是出现状况,出现无法dede后台编辑文章显示空白的问题,查看编辑文章的文件是archives_do.php 这个文件。 然后找到文件了就可以查看出现的问题了,我们发现在archives_do.php里面查看有一行跳转代码(大概在第39行): header(\\\"location:{$gu

    2024年02月01日
    浏览(51)
  • 【超全解决办法】InvalidArchiveError(‘Error with archive /usr/local/anaconda3/[conda创建虚拟环境、pytorch安装报错]

    使用conda创建虚拟环境出现了以下问题,后面安装pytorch时也出现了这个问题,尝试了很多解决办法都没有用,最后成功解决,这里记录一下,希望大家可以少走弯路。 报错信息如下: InvalidArchiveError(\\\'Error with archive /usr/local/anaconda3/pkgs/libffi-3.4.4-h6a678d5_0vveg_hkq/info-libffi-3.4.4-h6

    2024年04月25日
    浏览(49)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包