这应该是ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)的一个bug。文章来源:https://www.toymoban.com/news/detail-629467.html
osd会在recovery的时候挂掉。报错日志如下。因为是实际使用的环境,目前并不能对于ceph进行修复或者升级。所以只能用命令“ceph osd set norecover ”把ceph的recovery先关掉,然后再启动osd,才行。虽然会有部分object丢失了就无法找到了,但是总比osd完全起不来好。文章来源地址https://www.toymoban.com/news/detail-629467.html
-12> 2023-08-01T22:36:02.702+0800 7f4f6f5b7700 10 monclient: handle_auth_request added challenge on 0x55d3267ca400
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14c) [0x55d30e0ea261]
2: (()+0x4df429) [0x55d30e0ea429]
3: (()+0x8e773d) [0x55d30e4f273d]
4: (ReplicatedBackend::recover_object(hobject_t const&, eversion_t, std::shared_ptr<ObjectContext>, std::shared_ptr<ObjectContext>, PGBackend::RecoveryHandle*)+0x16f) [0x55d30e4ead1f]
5: (PrimaryLogPG::prep_object_replica_pushes(hobject_t const&, eversion_t, PGBackend::RecoveryHandle*, bool*)+0x559) [0x55d30e2ea2a9]
6: (PrimaryLogPG::recover_replicas(unsigned long, ThreadPool::TPHandle&, bool*)+0x1330) [0x55d30e32d2c0]
7: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0x106) [0x55d30e334bb6]
8: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x232) [0x55d30e1cde92]
9: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x55d30e40a089]
10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x143a) [0x55d30e1e9e6a]
11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x55d30e7d5a56]
12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55d30e7d85a0]
13: (()+0x7ea5) [0x7f4f73f78ea5]
14: (clone()+0x6d) [0x7f4f72e3c8dd]
0> 2023-08-01T22:36:02.754+0800 7f4f51cfd700 -1 *** Caught signal (Aborted) **
in thread 7f4f51cfd700 thread_name:tp_osd_tp
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
1: (()+0xf630) [0x7f4f73f80630]
2: (gsignal()+0x37) [0x7f4f72d74387]
3: (abort()+0x148) [0x7f4f72d75a78]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19b) [0x55d30e0ea2b0]
5: (()+0x4df429) [0x55d30e0ea429]
6: (()+0x8e773d) [0x55d30e4f273d]
7: (ReplicatedBackend::recover_object(hobject_t const&, eversion_t, std::shared_ptr<ObjectContext>, std::shared_ptr<ObjectContext>, PGBackend::RecoveryHandle*)+0x16f) [0x55d30e4ead1f]
8: (PrimaryLogPG::prep_object_replica_pushes(hobject_t const&, eversion_t, PGBackend::RecoveryHandle*, bool*)+0x559) [0x55d30e2ea2a9]
9: (PrimaryLogPG::recover_replicas(unsigned long, ThreadPool::TPHandle&, bool*)+0x1330) [0x55d30e32d2c0]
10: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0x106) [0x55d30e334bb6]
11: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x232) [0x55d30e1cde92]
12: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x55d30e40a089]
13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x143a) [0x55d30e1e9e6a]
14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x55d30e7d5a56]
15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55d30e7d85a0]
16: (()+0x7ea5) [0x7f4f73f78ea5]
17: (clone()+0x6d) [0x7f4f72e3c8dd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
到了这里,关于ceph osd因为ReplicatedBackend::recover_object crash的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!