OpenStack+Ceph集群 清理pool池 解决 pgs: xxx% pgs unknown的问题

这篇具有很好参考价值的文章主要介绍了OpenStack+Ceph集群 清理pool池 解决 pgs: xxx% pgs unknown的问题。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

昨天没有清空pool直接删除osd节点,导致今天ceph挂掉了…
执行

ceph -s

显示

2022-05-07 08:10:08.273 7f998ddeb700 -1 asok(0x7f9988000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.230947.140297388437176.asok': (2) No such file or directory
  cluster:
    id:     0efd6fbe-870b-41c4-92b1-d1a028d397f1
    health: HEALTH_WARN
            5 pool(s) have no replicas configured
            Reduced data availability: 640 pgs inactive
            1/3 mons down, quorum node1,node2
 
  services:
    mon: 3 daemons, quorum node1,node2 (age 14h), out of quorum: node1_bak
    mgr: node1(active, since 14h), standbys: node2
    osd: 6 osds: 6 up (since 14h), 6 in (since 22h)
 
  data:
    pools:   5 pools, 640 pgs
    objects: 0 objects, 0 B
    usage:   8.4 GiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     100.000% pgs unknown
             640 unknown

可以看到:pgs: 100.000% pgs unknown数据异常
执行

ceph df

显示

2022-05-07 08:15:56.872 7f96a3477700 -1 asok(0x7f969c000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.54160.140284839079608.asok': (2) No such file or directory
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED 
    hdd       5.2 TiB     5.2 TiB     2.4 GiB      8.4 GiB          0.16 
    TOTAL     5.2 TiB     5.2 TiB     2.4 GiB      8.4 GiB          0.16 
 
POOLS:
    POOL        ID     PGS     STORED     OBJECTS     USED     %USED     MAX AVAIL 
    images       1     128        0 B           0      0 B         0           0 B 
    volumes      2     128        0 B           0      0 B         0           0 B 
    backups      3     128        0 B           0      0 B         0           0 B 
    vms          4     128        0 B           0      0 B         0           0 B 
    cache        5     128        0 B           0      0 B         0           0 B 

存储池分配的空间都是0
说明osd节点挂掉了
查看一下osd节点情况

ceph osd tree
2022-05-07 08:17:38.461 7f917c55c700 -1 asok(0x7f9174000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.54288.140262693154488.asok': (2) No such file or directory
ID  CLASS WEIGHT  TYPE NAME           STATUS REWEIGHT PRI-AFF 
 -9             0 root vm-disk                                
 -8             0 root cache-disk                             
 -7             0 root hdd-disk                               
 -1       3.25797 root default                                
-15       1.62898     host computer                           
  1   hdd 0.58600         osd.1           up  1.00000 1.00000 
  3   hdd 0.16399         osd.3           up  1.00000 1.00000 
  5   hdd 0.87900         osd.5           up  1.00000 1.00000 
-13       1.62898     host controller                         
  0   hdd 0.58600         osd.0           up  1.00000 1.00000 
  2   hdd 0.16399         osd.2           up  1.00000 1.00000 
  4   hdd 0.87900         osd.4           up  1.00000 1.00000 
 -3             0     host node1                              
 -5             0     host node2                              

可以看到node1和node2都不存在osd了,变成了外网IP的computer和controller上

可能是因为crush map设置不对,之前编译的newcrushmap/etc/ceph

cd /etc/ceph
ceph osd setcrushmap -i newcrushmap

再执行

ceph df

可以看到存储信息恢复

2022-05-07 08:26:48.930 7f946fa78700 -1 asok(0x7f9468000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.54898.140275376729784.asok': (2) No such file or directory
RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED 
    hdd       5.2 TiB     5.2 TiB     2.4 GiB      8.4 GiB          0.16 
    TOTAL     5.2 TiB     5.2 TiB     2.4 GiB      8.4 GiB          0.16 
 
POOLS:
    POOL        ID     PGS     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    images       1     128     1.7 GiB         222     1.7 GiB      0.15       1.1 TiB 
    volumes      2     128     495 MiB         146     495 MiB      0.04       1.1 TiB 
    backups      3     128        19 B           3        19 B         0       1.1 TiB 
    vms          4     128         0 B           0         0 B         0       3.5 TiB 
    cache        5     128     104 KiB         363     104 KiB         0       317 GiB 

执行

ceph -s
2022-05-07 08:27:01.426 7f7cf23db700 -1 asok(0x7f7cec000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.54928.140174512107192.asok': (2) No such file or directory
  cluster:
    id:     0efd6fbe-870b-41c4-92b1-d1a028d397f1
    health: HEALTH_WARN
            5 pool(s) have no replicas configured
            Reduced data availability: 128 pgs inactive
            application not enabled on 3 pool(s)
            1/3 mons down, quorum node1,node2
 
  services:
    mon: 3 daemons, quorum node1,node2 (age 14h), out of quorum: node1_bak
    mgr: node1(active, since 14h), standbys: node2
    osd: 6 osds: 6 up (since 14h), 6 in (since 22h)
 
  data:
    pools:   5 pools, 640 pgs
    objects: 734 objects, 2.2 GiB
    usage:   8.4 GiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     20.000% pgs unknown
             512 active+clean
             128 unknown

可以看到还有pgs: 20.000% pgs unknown20%的数据异常
这就是当初直接删除osd没清理vms导致的了

防止之后操作出现异常
先修改主机名
节点1:

hostnamectl set-hostname node1

节点2:

hostnamectl set-hostname node2

节点1上操作
执行删除pool命令

ceph osd pool delete vms vms --yes-i-really-really-mean-it

查看池情况:

ceph osd pool ls

OpenStack+Ceph集群 清理pool池 解决 pgs: xxx% pgs unknown的问题
可以看到vms已经删除了
再查看ceph -s

[root@controller ceph(keystone)]# ceph -s
2022-05-07 08:40:41.131 7f86eb0dc700 -1 asok(0x7f86e4000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.55936.140217327562424.asok': (2) No such file or directory
  cluster:
    id:     0efd6fbe-870b-41c4-92b1-d1a028d397f1
    health: HEALTH_WARN
            4 pool(s) have no replicas configured
            application not enabled on 3 pool(s)
            1/3 mons down, quorum node1,node2
 
  services:
    mon: 3 daemons, quorum node1,node2 (age 14h), out of quorum: node1_bak
    mgr: node1(active, since 14h), standbys: node2
    osd: 6 osds: 6 up (since 14h), 6 in (since 22h)
 
  data:
    pools:   4 pools, 512 pgs
    objects: 734 objects, 2.2 GiB
    usage:   8.4 GiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     512 active+clean

可以看到数据异常已经解决,剩下就是重建vms

ceph osd pool create vms 128 128

修改存储池规则

ceph osd pool set vms crush_rule vm-disk

查看osd分配情况

ceph osd tree

发现已经应用正常了

[root@controller ceph(keystone)]# ceph osd tree
2022-05-07 08:44:36.093 7fcf9ed4c700 -1 asok(0x7fcf98000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.56195.140529585106616.asok': (2) No such file or directory
ID CLASS WEIGHT  TYPE NAME       STATUS REWEIGHT PRI-AFF 
-9       1.74597 root vm-disk                            
 4   hdd 0.87299     osd.4           up  1.00000 1.00000 
 5   hdd 0.87299     osd.5           up  1.00000 1.00000 
-8       1.74597 root cache-disk                         
 2   hdd 0.87299     osd.2           up  1.00000 1.00000 
 3   hdd 0.87299     osd.3           up  1.00000 1.00000 
-7       1.09000 root hdd-disk                           
 0   hdd 0.54500     osd.0           up  1.00000 1.00000 
 1   hdd 0.54500     osd.1           up  1.00000 1.00000 
-1       3.25800 root default                            
-3       1.62900     host node1                          
 0   hdd 0.58600         osd.0       up  1.00000 1.00000 
 2   hdd 0.16399         osd.2       up  1.00000 1.00000 
 4   hdd 0.87900         osd.4       up  1.00000 1.00000 
-5       1.62900     host node2                          
 1   hdd 0.58600         osd.1       up  1.00000 1.00000 
 3   hdd 0.16399         osd.3       up  1.00000 1.00000 
 5   hdd 0.87900         osd.5       up  1.00000 1.00000 

剩下就是解决一直报的鉴权文件异常问题

2022-05-07 08:44:36.093 7fcf9ed4c700 -1 asok(0x7fcf98000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.56195.140529585106616.asok': (2) No such file or directory

编写重启ceph的脚本

cd /opt/sys_sh/
vi restart_ceph.sh
#!/bin/bash
mondirfile=/var/lib/ceph/mon/ceph-node1/store.db
mondir=/var/lib/ceph/mon/*
runguest=/var/run/ceph/guests/
logkvm=/var/log/qemu/
crushmap=/etc/ceph/newcrushmap
host=node1
host2=controller
echo "修改主机名为$host"
hostnamectl set-hostname $host
cd /etc/ceph
echo "检测ceph-mon服务异常并恢复重启"
if [ "$(netstat -nltp|grep ceph-mon|grep 6789|wc -l)" -eq "0" ]; then
    sleep 1
    if [ -e "$mondirfile" ]; then
        sleep 1
    else
        sleep 1
        rm -rf $mondir
        ceph-mon  --cluster ceph  -i $host  --mkfs  --monmap  /etc/ceph/monmap  --keyring  /etc/ceph/monkeyring  -c  /etc/ceph/ceph.conf
        chown -R ceph:ceph $mondir
    fi
    systemctl reset-failed ceph-mon@node1.service
    systemctl start ceph-mon@node1.service
else
    sleep 1
fi

if [ "$(netstat -nltp|grep ceph-mon|grep 6781|wc -l)" -eq "0" ]; then
sleep 1
ceph-mon -i node1_bak --public-addr 10.0.0.2:6781
else
sleep 1
fi

echo "重启ceph-osd和相关所有服务"
if [ "$(ps -aux|grep ceph-mgr|wc -l)" -eq "1" ]; then
    sleep 1
    systemctl reset-failed ceph-mgr@node1.service
    systemctl start ceph-mgr@node1.service
else
    sleep 1
fi

if [ "$(ps -e|grep ceph-osd|wc -l)" -eq "$(lsblk |grep osd|wc -l)" ]; then
    sleep 1
else
    sleep 1
    systemctl reset-failed ceph-osd@0.service
    systemctl start ceph-osd@0.service
    systemctl reset-failed ceph-osd@2.service
    systemctl start ceph-osd@2.service
    systemctl reset-failed ceph-osd@4.service
    systemctl start ceph-osd@4.service
fi

if [ -d "$runguest" -a -d "$logkvm" ]; then
    sleep 1
else
    sleep 1    
    mkdir -p $runguest $logkvm
    chown 777 -R $runguest $logkvm
fi

echo "重写ceph存储规则"
ceph osd setcrushmap -i $crushmap
echo "修改主机名为$host2"
hostnamectl set-hostname $host2

再执行ceph -s
可以看到错误解决

[root@node1 sys_sh(keystone)]# ceph -s
  cluster:
    id:     0efd6fbe-870b-41c4-92b1-d1a028d397f1
    health: HEALTH_WARN
            5 pool(s) have no replicas configured
            application not enabled on 3 pool(s)
 
  services:
    mon: 3 daemons, quorum node1,node2,node1_bak (age 70s)
    mgr: node1(active, since 15h), standbys: node2
    osd: 6 osds: 6 up (since 15h), 6 in (since 23h)
 
  data:
    pools:   5 pools, 640 pgs
    objects: 734 objects, 2.2 GiB
    usage:   8.4 GiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     640 active+clean

来到节点2执行ceph -s

2022-05-07 09:51:13.625 7f1bb62a0700 -1 asok(0x7f1bb0000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.259739.139756893646520.asok': (2) No such file or directory
  cluster:
    id:     0efd6fbe-870b-41c4-92b1-d1a028d397f1
    health: HEALTH_WARN
            5 pool(s) have no replicas configured
            application not enabled on 3 pool(s)
 
  services:
    mon: 3 daemons, quorum node1,node2,node1_bak (age 48m)
    mgr: node1(active, since 15h), standbys: node2
    osd: 6 osds: 6 up (since 15h), 6 in (since 23h)
 
  data:
    pools:   5 pools, 640 pgs
    objects: 734 objects, 2.2 GiB
    usage:   8.4 GiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     640 active+clean

OpenStack+Ceph集群 清理pool池 解决 pgs: xxx% pgs unknown的问题
也是提示

2022-05-07 09:51:13.625 7f1bb62a0700 -1 asok(0x7f1bb0000bf0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/guests/ceph-client.admin.259739.139756893646520.asok': (2) No such file or directory

编写重启ceph的代码

cd /opt/sys_sh/
vi restart_ceph.sh
#!/bin/bash
mondirfile=/var/lib/ceph/mon/ceph-node1/store.db
mondir=/var/lib/ceph/mon
runguest=/var/run/ceph/guests/
logkvm=/var/log/qemu/
crushmap=/etc/ceph/newcrushmap
host=node2
host2=computer
echo "修改主机名为$host"
hostnamectl set-hostname $host
cd /etc/ceph
echo "检测ceph-mon服务异常并恢复重启"
if [ "$(netstat -nltp|grep ceph-mon|grep 6789|wc -l)" -eq "0" ]; then
    sleep 1
    if [ -e "$mondirfile" ]; then
        sleep 1
    else
        sleep 1
        rm -rf $mondir
        ceph-mon  --cluster ceph  -i $host  --mkfs  --monmap  /etc/ceph/monmap  --keyring  /etc/ceph/monkeyring  -c  /etc/ceph/ceph.conf
        chown -R ceph:ceph $mondir
    fi
    systemctl reset-failed ceph-mon@node2.service
    systemctl start ceph-mon@node2.service
else
    sleep 1
fi

echo "重启ceph-osd和相关所有服务"
if [ "$(ps -aux|grep ceph-mgr|wc -l)" -eq "1" ]; then
    sleep 1
    systemctl reset-failed ceph-mgr@node2.service
    systemctl start ceph-mgr@node2.service
else
    sleep 1
fi

if [ "$(ps -e|grep ceph-osd|wc -l)" -eq "$(lsblk |grep osd|wc -l)" ]; then
    sleep 1
else
    sleep 1
    systemctl reset-failed ceph-osd@1.service
    systemctl start ceph-osd@1.service
    systemctl reset-failed ceph-osd@3.service
    systemctl start ceph-osd@3.service
    systemctl reset-failed ceph-osd@5.service
    systemctl start ceph-osd@5.service
fi

if [ -d "$runguest" -a -d "$logkvm" ]; then
    sleep 1
else
    sleep 1    
    mkdir -p $runguest $logkvm
    chown 777 -R $runguest $logkvm
fi
echo "修改主机名为$host2"
hostnamectl set-hostname $host2

执行ceph -s可以发现问题解决

[root@node2 sys_sh]# ceph -s
  cluster:
    id:     0efd6fbe-870b-41c4-92b1-d1a028d397f1
    health: HEALTH_WARN
            5 pool(s) have no replicas configured
            application not enabled on 3 pool(s)
 
  services:
    mon: 3 daemons, quorum node1,node2,node1_bak (age 59m)
    mgr: node1(active, since 16h), standbys: node2
    osd: 6 osds: 6 up (since 16h), 6 in (since 24h)
 
  data:
    pools:   5 pools, 640 pgs
    objects: 734 objects, 2.2 GiB
    usage:   8.4 GiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     640 active+clean

回主节点查看计算服务,可以看到计算节点连接正常

openstack compute service list
[root@controller ~(keystone)]# openstack compute service list
+----+----------------+------------+----------+---------+-------+----------------------------+
| ID | Binary         | Host       | Zone     | Status  | State | Updated At                 |
+----+----------------+------------+----------+---------+-------+----------------------------+
|  3 | nova-console   | controller | internal | enabled | up    | 2022-05-07T02:04:04.000000 |
|  5 | nova-conductor | controller | internal | enabled | up    | 2022-05-07T02:04:05.000000 |
|  7 | nova-scheduler | controller | internal | enabled | up    | 2022-05-07T02:04:00.000000 |
| 12 | nova-compute   | controller | nova     | enabled | up    | 2022-05-07T02:04:07.000000 |
| 13 | nova-compute   | computer   | nova     | enabled | up    | 2022-05-07T02:04:03.000000 |
+----+----------------+------------+----------+---------+-------+----------------------------+

OpenStack+Ceph集群 清理pool池 解决 pgs: xxx% pgs unknown的问题文章来源地址https://www.toymoban.com/news/detail-409325.html

到了这里,关于OpenStack+Ceph集群 清理pool池 解决 pgs: xxx% pgs unknown的问题的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • OpenStack对接Ceph平台

    2.1 创建密钥(ceph-01节点操作) 在ceph上创建cinder、glance、cinder-backup、nova用户创建密钥,允许访问使用Ceph存储池 2.1.1 创建用户client.cinder class-read:x的子集,授予用户调用类读取方法的能力 object_prefix 通过对象名称前缀。下例将访问限制为任何池中名称仅以 rbd_children 为开头的

    2024年02月13日
    浏览(38)
  • Openstack云计算(五)ceph

    Ceph是一种为优秀的性能、可靠性和可扩展性而设计的统一的、分布式文件系统。ceph]的统一体现在可以提供文件系统、块存储和对象存储,分布式体现在可以动态扩展。在国内一些公司的云环境中,通常会采用ceph作为openstack的唯一后端存储来提高数据转发效率。 Ceph项目最早起

    2024年01月23日
    浏览(31)
  • 【解决】Error response from daemon: Get "https://xxx.xx.xx.xxx/v2/": x509: certificate signed by unknown...

    【解决】Error response from daemon: Get \\\"https://xxx.xx.xx.xxx/v2/\\\": x509: certificate signed by unknown authority 登陆私有harbor时报错如下: 原因大概是docker默认支持https的协议,而私有库是http的协议。 windows桌面系统可以在Settings- Docker Engine里配置以下代码,xxx.xx.xx.xxx是自己私有库的地址。 mac桌

    2024年02月16日
    浏览(47)
  • 解决Nginx启动报错“nginx: [emerg] unknown directive “ngx_fastdfs_module“ in /xxx/nginx/conf/nginx.conf:43“

    centos7操作系统,nginx被人安装过多次,重启服务器,发现nignx未设置开机自启,启动报错nginx: [emerg] unknown directive “ngx_fastdfs_module” in /usr/local/nginx/conf/nginx.conf:43 1、尝试启动,查看报错日志 上述命令如果报nginx.service找不到: Failed to start nginx.service: Unit not found. ,则进入ngi

    2024年02月07日
    浏览(78)
  • Ceph入门到静态-deep scrub 深度清理处理

    除了为对象创建多个副本外,Ceph 还可通过 洗刷 归置组来确保数据完整性(请参见第 1.3.2 节 “归置组”了解有关归置组的详细信息)。Ceph 的洗刷类似于在对象存储层运行  fsck 。对于每个归置组,Ceph 都会生成一个包含所有对象的编目,并比较每个主对象及其副本,以确

    2024年02月07日
    浏览(31)
  • Unknown column ‘xxx‘ in ‘field list‘

    MySQL数据库插入数据时,出现Unknown column \\\'XXX\\\' in \\\'field list\\\' 问题(\\\'XXX\\\'表示任意字符)。例如下面: 然后向其中插入一行数据: 控制台就会报错:Unknown column \\\'info\\\' in \\\'field list\\\' 1、该列名在数据表中不存在,也就是SQL语句中的列名写错了。 2、数据表中的列名多了一个空格,解决

    2024年02月14日
    浏览(51)
  • 错误记录:Unknown column ‘xxx‘ in ‘where clause‘

    这个错误意思是没有找到where分句中的字段名,个人总结存在三种错误情况。判断错误方式可以直接在数据库运行语句,进一步判断是代码问题还是表格问题。 注意在一些sql版本中,字符串只能用双引号包裹。测试是否是这个错误很简单,直接在数据库中运行语句即可。 如果

    2024年02月11日
    浏览(54)
  • 【Ceph】Ceph集群应用详解

    接上文基于ceph-deploy部署Ceph集群详解 Pool是Ceph中 存储Object对象抽象概念 。我们可以将其理解为 Ceph存储上划分的逻辑分区 ,Pool由多个PG组成;而 PG通过CRUSH算法映射到不同的OSD上 ;同时Pool可以设置副本size大小,默认副本数量为3。 Ceph客户端向monitor请求集群的状态,并向P

    2024年02月16日
    浏览(40)
  • 【Ceph】基于ceph-deploy部署Ceph集群详解

    DAS(直接附加存储,是 直接接到计算机主板总线上的存储 ) IDE、SATA、SCSI、SAS、USB接口的磁盘 所谓接口就是一种 存储设备驱动下的磁盘设备 ,提供 块级别的存储 NAS(网络附加存储,是 通过网络附加到当前主机文件系统之上的存储 ) NFS、CIFS、FTP 文件系统级别的存储,本

    2024年02月16日
    浏览(38)
  • Ceph:关于Ceph 集群如何访问的一些笔记

    准备考试,整理 Ceph 相关笔记 博文内容涉及,Ceph 集群四种访问方式介绍及 Demo,Ceph 客户端支持的操作介绍 理解不足小伙伴帮忙指正 对每个人而言,真正的职责只有一个:找到自我。然后在心中坚守其一生,全心全意,永不停息。所有其它的路都是不完整的,是人的逃避方式

    2024年02月09日
    浏览(38)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包