linux pstore 存储内核崩溃日志
备注:
参考博客:
(1)Linux pstore 实现自动“抓捕”内核崩溃日志
(2)1-Linux 保存kernel panic信息到flash
背景
实际项目是嵌入式linux环境,内核为linux-4.19版本,一直想实现panic/oops等log信息存放在spinor/spinand的设备的功能,便于分析相关的异常log。直至看到 《Linux pstore 实现自动“抓捕”内核崩溃日志》 解决了困扰我已久的问题,但pstore-blk只支持linux-5.8后的内核,遂产生将此pstore-blk的功能移植到linux-4.19 版本上。我于是将linux-5.15.105移植到linux-4.19上,整个移植过程较为顺利,自测基本功能是可行。
简介
在系统运行过程中,如果内核发生了panic,那么开发人员需要通过内核报错日志来进行定位问题。但是很多时候出现问题的时候没有接调试串口,而报错日志是在内存里面的,重启后就丢失了。所以需要一种方法,可以在系统发生crash时,将crash info保存于非易失存储器中。
这对分析那种小概率且没办法抓到现场的问题非常实用,尤其是现在智能互联网的设备逐渐普及的时候,远端的设备可以自己捕抓崩溃日志再通过网络传输到服务器,维护人员就可以根据收集来的日志定位和解决问题,然后通过OTA让设备升级迭代。
内核使用 kmsg_dump_register() 函数来注册捕获panic或者oops,如今内核已经有多种捕获panic的方式,最新的是pstore方式。
根据网上搜寻的资料,在pstore文件系统之前其实有不少类似的实现。
-
apanic
Android最早的panic信息记录的方案。在linux 2.6的安卓的内核中找到,却没有提交到社区,后来被放弃维护了。网上找不到放弃的原因,我自己猜测是因为其只适用于mtd nand,然而现在的Android基本用的都是emmc。apanic应该是Android Panic的缩写吧,可以实现在内核崩溃时,把日志转存到mtd nand。
-
ramoops
这里指的是最早的ramoops实现,在最新代码已经整合入pstore中,以pstore/ram的后端形式存在。ramoops可以把日志转存到重启不掉电的ram中。这里对ram有一点要求,即使重启ram的数据也不能丢失。
-
crashlog
这是openwrt提供的内核patch,并没有提交到内核社区。它也是基于ram,只能转存Panic/Oops的日志。
-
mtdoops
MTD子系统支持的功能,与pstore非常相似,只支持转存Panic/Oops日志,不能以文件呈现,需要用户自行解析整个MTD分区。(因为功能的相似,我实现了mtdpstore用于替代mtdoops)
-
kdump
如果说pstore是个轻量级的内核崩溃日志转存的方案,kdump则是一个重量级的问题分析工具。在崩溃时,由kdump产生一个用于捕抓当前信息的内核,该内核会收集内存所有信息到dump core文件中。在重启后,捕抓到的信息保存在特定的文件中。类似的还有netdump和diskdump。kdump的方案适用于服务器这种有大量资源的设备,功能也非常强大,但对嵌入式设备非常不友好。
pstore的前端,是指转存的日志类型,pstore的后端,是指转存到什么类型的设备。
目前支持以下几个前端:
-
dmesg:主要是转存Panic/Oops时log_buf里面的内核日志
-
pmsg:提供给用户空间存储日志的入口,在Android里有看到被用于存储系统的日志。
-
console:终端日志
-
ftrace:function trace的信息
目前支持以下几种后端:
-
pstore/ram:Persistent Ram,重启不会丢数据的内存
-
pstore/blk:(v5.8以后的版本)所有可写的块设备,例如磁盘、U盘、emmc、NFTL nand等
-
mtd device:(v5.8以后的版本)mtd设备,例如 mtd nand。(mtd设备的支持依赖于 pstore/blk 后端,准确来说不是一种独立后端)
详细参考文档:
- Documentation/admin-guide/ramoops.rst
- Documentation/admin-guide/pstore-blk.rst
ramoops方式
ramoops配置打开
- 配置内核
File systems --->
[*] Miscellaneous filesystems --->
<*> Persistent store support
(10240) Default kernel log storage space
< > DEFLATE (ZLIB) compression
< > LZO compression
< > LZ4 compression
< > LZ4HC compression
[ ] 842 compression
[ ] zstd compression
[*] Log kernel console messages
[*] Log user space messages
<*> Log panic/oops to a RAM buffer
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_MTD_OOPS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_PANIC_TIMEOUT=-1
- 配置预留内存
reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;
ramoops@11000000{
compatible = "ramoops";
reg = <0x11000000 0x100000>;
record-size = <0x00020000>;
console-size = <0x00020000>;
ftrace-size = <0x00020000>;
};
};
ramoops写入测试
# echo c > /proc/sysrq-trigger
[ 53.539402] sysrq: Trigger a crash
[ 53.542844] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 53.551414] pgd = 78dc9424
[ 53.554145] [00000000] *pgd=16b33835, *pte=00000000, *ppte=00000000
[ 53.560683] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[ 53.566709] Modules linked in:
[ 53.569787] CPU: 1 PID: 144 Comm: sh Not tainted 4.19.123 #91
[ 53.575544] Hardware name: arobot r8 family
[ 53.579752] PC is at sysrq_handle_crash+0x1c/0x28
[ 53.584471] LR is at sysrq_handle_crash+0x8/0x28
[ 53.589102] pc : [<c030b818>] lr : [<c030b804>] psr: 600e0013
[ 53.595382] sp : c6b35ea0 ip : 00000000 fp : 000c8008
[ 53.600620] r10: 00000004 r9 : 00000000 r8 : 00000063
[ 53.605858] r7 : c090d300 r6 : 00000008 r5 : c090d300 r4 : c091aa28
[ 53.612399] r3 : 00000001 r2 : 00000000 r1 : c7ebb390 r0 : 00000063
.........
由于log数据存放于DDR,不能掉电,只能依靠自动重启机制来查看,故而要配置:CONFIG_PANIC_TIMEOUT,让系统在 panic 后能自动重启。
重启后,查看数据的流程如下:
# mount -t pstore pstore /sys/fs/pstore/
# cd /sys/fs/pstore/
# ls
console-ramoops-0 dmesg-ramoops-0 dmesg-ramoops-1
#
mtdoops方式
mtdoops配置打开
- 配置内核
File systems --->
[*] Miscellaneous filesystems --->
<*> Persistent store support
(10240) Default kernel log storage space
< > DEFLATE (ZLIB) compression
< > LZO compression
< > LZ4 compression
< > LZ4HC compression
[ ] 842 compression
[ ] zstd compression
[*] Log kernel console messages
[*] Log user space messages
< > Log panic/oops to a RAM buffer
< > Log panic/oops to a block device
Device Drivers --->
<*> Memory Technology Device (MTD) support --->
<*> Log panic/oops to an MTD buffer
< > Log panic/oops to an MTD buffer based on pstore
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_MTD_OOPS=y
CONFIG_MAGIC_SYSRQ=y
- 配置分区
cmdline方式:
bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs mtdoops.mtddev=pstore";
blkparts = "mtdparts=spi0.0:64k(spl)ro,256k(uboot)ro,64k(dtb)ro,128k(pstore),3m(kernel)ro,4m(rootfs)ro,-(data)";
part of方式:
bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs mtdoops.mtddev=pstore";
partition@60000 {
label = "pstore";
reg = <0x60000 0x20000>;
};
mtdoops写入测试
# echo c > /proc/sysrq-trigger
[55632.357502] sysrq: Trigger a crash
[55632.360984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[55632.369504] pgd = ddcf897d
[55632.372426] [00000000] *pgd=16b36835, *pte=00000000, *ppte=00000000
[55632.378878] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[55632.384897] Modules linked in:
[55632.387972] CPU: 1 PID: 144 Comm: sh Not tainted 4.19.123 #90
[55632.393727] Hardware name: arobot r8 family
[55632.397931] PC is at sysrq_handle_crash+0x1c/0x28
[55632.402648] LR is at sysrq_handle_crash+0x8/0x28
[55632.407276] pc : [<c030a5d8>] lr : [<c030a5c4>] psr: 600e0013
[55632.413553] sp : c6b2fea0 ip : 00000000 fp : 000c8008
[55632.418789] r10: 00000004 r9 : 00000000 r8 : 00000063
[55632.424025] r7 : c090d300 r6 : 00000008 r5 : c090d300 r4 : c091a9cc
[55632.430563] r3 : 00000001 r2 : 00000000 r1 : c7ebb390 r0 : 00000063
[55632.437103] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
.............
重启后查看mtd日志
# cat /dev/mtd3 > 1.txt
# cat 1.txt
............
mtdpstore
mtdpstore配置打开
- 配置内核
File systems --->
[*] Miscellaneous filesystems --->
<*> Persistent store support
(10240) Default kernel log storage space
< > DEFLATE (ZLIB) compression
< > LZO compression
< > LZ4 compression
< > LZ4HC compression
[ ] 842 compression
[*] zstd compression
Default pstore compression algorithm (zstd) --->
[*] Log kernel console messages
[*] Log user space messages
< > Log panic/oops to a RAM buffer
<*> Log panic/oops to a block device
( ) block device identifier
(64) Size in Kbytes of kmsg dump log to store
(2) Maximum kmsg dump reason to store
(64) Size in Kbytes of pmsg to store
(64) Size in Kbytes of console log to store
Device Drivers --->
<*> Memory Technology Device (MTD) support --->
< > Log panic/oops to an MTD buffer
<*> Log panic/oops to an MTD buffer based on pstore
CONFIG_PSTORE=y
CONFIG_PSTORE_CONSOLE=y
CONFIG_PSTORE_PMSG=y
CONFIG_PSTORE_BLK=y
CONFIG_MTD_PSTORE=y
CONFIG_MAGIC_SYSRQ=y
- 配置分区
cmdline方式:
bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs pstore_blk.blkdev=pstore";
blkparts = "mtdparts=spi0.0:64k(spl)ro,256k(uboot)ro,64k(dtb)ro,128k(pstore),3m(kernel)ro,4m(rootfs)ro,-(data)";
part of方式:文章来源:https://www.toymoban.com/news/detail-791220.html
bootargs = "console=ttyS1,115200 loglevel=8 rootwait root=/dev/mtdblock5 rootfstype=squashfs pstore_blk.blkdev=pstore";
partition@60000 {
label = "pstore";
reg = <0x60000 0x20000>;
};
mtdpstore写入测试
# echo c > /proc/sysrq-trigger
[ 121.945495] sysrq: Trigger a crash
[ 121.948979] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 121.957506] pgd = eabf0695
[ 121.960430] [00000000] *pgd=16b33835, *pte=00000000, *ppte=00000000
[ 121.966887] Internal error: Oops - BUG: 817 [#1] PREEMPT SMP ARM
[ 121.972908] Modules linked in:
[ 121.975982] CPU: 1 PID: 144 Comm: sh Not tainted 4.19.123 #90
[ 121.981738] Hardware name: arobot r8 family
[ 121.985942] PC is at sysrq_handle_crash+0x1c/0x28
[ 121.990659] LR is at sysrq_handle_crash+0x8/0x28
[ 121.995287] pc : [<c030a5d8>] lr : [<c030a5c4>] psr: 600e0013
[ 122.001564] sp : c6b35ea0 ip : 00000000 fp : 000c8008
[ 122.006800] r10: 00000004 r9 : 00000000 r8 : 00000063
[ 122.012036] r7 : c090d300 r6 : 00000008 r5 : c090d300 r4 : c091a9cc
[ 122.018574] r3 : 00000001 r2 : 00000000 r1 : c7ebb390 r0 : 00000063
......
重启后查看pstore日志文章来源地址https://www.toymoban.com/news/detail-791220.html
# mount -t pstore pstore /sys/fs/pstore/
# cd /sys/fs/pstore/
# ls
dmesg-pstore_blk-0 dmesg-pstore_blk-1
#
#
# head -n 5 dmesg-pstore_blk-0
Panic#2 Part1
<1>[ 0.000000] Booting Linux on physical CPU 0x0
<7>[ 0.000000] Linux version 4.19.123 (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #90 SMP PREEMPT 2023.03.30 19:36:54 a19d55d3c
<6>[ 0.000000] OF: fdt: Machine model: Tina
<6>[ 0.000000] Memory policy: Data cache writealloc
#
#
# head -n 5 dmesg-pstore_blk-1
Oops#1 Part1
<1>[ 0.000000] Booting Linux on physical CPU 0x0
<7>[ 0.000000] Linux version 4.19.123 (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #90 SMP PREEMPT 2023.03.30 19:36:54 a19d55d3c
<6>[ 0.000000] OF: fdt: Machine model: Tina
<6>[ 0.000000] Memory policy: Data cache writealloc
#
#
# tail -n 5 dmesg-pstore_blk-0
<4>[ 122.396718] [<c0101a0c>] (__irq_svc) from [<c0107dfc>] (arch_cpu_idle+0x1c/0x38)
<4>[ 122.404135] [<c0107dfc>] (arch_cpu_idle) from [<c013be70>] (do_idle+0xdc/0x100)
<4>[ 122.411462] [<c013be70>] (do_idle) from [<c013bfe0>] (cpu_startup_entry+0x18/0x1c)
<4>[ 122.419047] [<c013bfe0>] (cpu_startup_entry) from [<101023ac>] (0x101023ac)
<4>[ 123.434873] SMP: failed to stop secondary CPUs
#
#
# tail -n 5 dmesg-pstore_blk-1
<4>[ 122.206383] 5fc0: 00000000 00000001 000cb778 00000004 000c7d7c 00000020 000c82b8 000c8008
<4>[ 122.214571] 5fe0: 00000000 beee988c 0001bb20 b6ebc056
<0>[ 122.219640] Code: e59f2010 e5823000 f57ff04e e3a02000 (e5c23000)
<4>[ 122.225745] Disabling lock debugging due to kernel taint
<4>[ 122.233525] ---[ end trace 03f2787ef5d29e4a ]---
#
............
到了这里,关于linux pstore 存储内核崩溃日志的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!