集群重启后发现node1节点出现notready状态
排查:
1、查看服务器的物理环境
free -mh/df -h
2、查看内存是否溢出,磁盘空间是否够用,经查均在正常使用范围内;
3、top查看cpu使用状态,在可用范围内;
4、再查master组件scheduer,controller-manager,apiserver等都在正常运行;
5、查看node详细信息
[root@master ~]# kubectl describe nodes node1
Name: node1
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
disk=ssd
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"76:06:85:be:2e:f1"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.213.183
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 27 Nov 2022 10:18:28 +0800
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime: <unset>
RenewTime: Thu, 09 Feb 2023 14:30:01 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 08 Feb 2023 14:38:51 +0800 Wed, 08 Feb 2023 14:38:51 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure Unknown Thu, 09 Feb 2023 14:26:23 +0800 Mon, 13 Feb 2023 09:15:51 +0800 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Thu, 09 Feb 2023 14:26:23 +0800 Mon, 13 Feb 2023 09:15:51 +0800 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure Unknown Thu, 09 Feb 2023 14:26:23 +0800 Mon, 13 Feb 2023 09:15:51 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Ready Unknown Thu, 09 Feb 2023 14:26:23 +0800 Mon, 13 Feb 2023 09:15:51 +0800 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 192.168.213.139
Hostname: node1
Capacity:
cpu: 2
ephemeral-storage: 17394Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4002416Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 16415037823
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3900016Ki
pods: 110
System Info:
Machine ID: 7f16913a43d84397bd33fc081680947a
System UUID: 41fc4d56-275d-b583-b585-db862b9a5cc8
Boot ID: e3856941-2c2c-4afc-ae83-572b98bb1c82
Kernel Version: 5.4.221-1.el7.elrepo.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.15
Kubelet Version: v1.21.0
Kube-Proxy Version: v1.21.0
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
default nginx-6799fc88d8-j2f5v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d20h
default nginx-6799fc88d8-xstkz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d20h
kube-system kube-flannel-ds-kvx26 100m (5%) 100m (5%) 50Mi (1%) 50Mi (1%) 79d
kube-system kube-proxy-gj29x 0 (0%) 0 (0%) 0 (0%) 0 (0%) 79d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (5%) 100m (5%)
memory 50Mi (1%) 50Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events: <none>
由此一直kubelet已经不再工作,无法将node节点的状态信息提供给master。
7、登录node所在机器
查看kubelet状态
虽显示启动状态,但下面的事项说明中表名他其实是启动失败了的。
查看日志:[root@node1 ~]# journalctl -u kubelet
发现报错:文章来源:https://www.toymoban.com/news/detail-500621.html
"Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
由此可知,kubernets所使用的驱动与docker所使用驱动不同,导致kubelet启动失败。
这里我们将docker驱动修改与kubelet驱动一致即可解决。
修改配置文件:
[root@node1 ~]# vim /etc/docker/daemon.json
添加如下配置即可
最后重启docker,kubelet即可
[root@node1 ~]# systemctl daemon-reload
[root@node1 ~]# systemctl restart docker
[root@node1 ~]# systemctl restart kubelet
回到master节点进行查验
node已为ready状态。文章来源地址https://www.toymoban.com/news/detail-500621.html
到了这里,关于集群重启后发现node1节点出现notready状态,问题排查及解决(kubelet与docker的cgroup驱动不同导致)的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!