calico-kube-controllers 启动失败处理

这篇具有很好参考价值的文章主要介绍了calico-kube-controllers 启动失败处理。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

故障描述

calico-kube-controllers 异常,不断重启

calico-kube-controllers 启动失败处理

日志信息如下

2023-02-21 01:26:47.085 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0221 01:26:47.086980       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-02-21 01:26:47.087 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2023-02-21 01:26:47.106 [INFO][1] main.go 153: Getting initial config snapshot from datastore
2023-02-21 01:26:47.120 [INFO][1] main.go 156: Got initial config snapshot
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 89: Start called
2023-02-21 01:26:47.120 [INFO][1] main.go 173: Starting status report routine
2023-02-21 01:26:47.120 [INFO][1] main.go 182: Starting Prometheus metrics server on port 9094
2023-02-21 01:26:47.120 [INFO][1] main.go 418: Starting controller ControllerType="Node"
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2023-02-21 01:26:47.120 [INFO][1] node_syncer.go 65: Node controller syncer status updated: wait-for-ready
2023-02-21 01:26:47.120 [INFO][1] watchersyncer.go 147: Starting main event processing loop
2023-02-21 01:26:47.120 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/ipam/v2/assignment/"
2023-02-21 01:26:47.120 [INFO][1] node_controller.go 143: Starting Node controller
2023-02-21 01:26:47.121 [INFO][1] watchercache.go 174: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-02-21 01:26:47.121 [INFO][1] resources.go 349: Main client watcher loop
2023-02-21 01:26:47.121 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.121 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.121 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.121 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.124 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/ipam/v2/assignment/"
2023-02-21 01:26:47.125 [INFO][1] watchersyncer.go 127: Sending status update Status=resync
2023-02-21 01:26:47.125 [INFO][1] node_syncer.go 65: Node controller syncer status updated: resync
2023-02-21 01:26:47.125 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-02-21 01:26:47.125 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.125 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.129 [INFO][1] watchercache.go 271: Sending synced update ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2023-02-21 01:26:47.129 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-02-21 01:26:47.129 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 209: Received InSync event from one of the watcher caches
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 221: All watchers have sync'd data - sending data and final sync
2023-02-21 01:26:47.129 [INFO][1] watchersyncer.go 127: Sending status update Status=in-sync
2023-02-21 01:26:47.129 [INFO][1] node_syncer.go 65: Node controller syncer status updated: in-sync
2023-02-21 01:26:47.137 [INFO][1] hostendpoints.go 90: successfully synced all hostendpoints
2023-02-21 01:26:47.221 [INFO][1] node_controller.go 159: Node controller is now running
2023-02-21 01:26:47.226 [INFO][1] ipam.go 69: Synchronizing IPAM data
2023-02-21 01:26:47.236 [INFO][1] ipam.go 78: Node and IPAM data is in sync

定位问题在这里

Failed to write status error=open /status/status.json: permission denied

进入容器检查目录

尝试进入容器,但是该容器居然没 cat , ls 等常规命令,无法查看容器问题

检查配置

查看pod的配置,对比其它集群,没任何问题,一样的

[grg@i-A8259010 ~]$ kubectl describe pod calico-kube-controllers-9f49b98f6-njs2f -n kube-system
Name:                 calico-kube-controllers-9f49b98f6-njs2f
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 10.254.39.2/10.254.39.2
Start Time:           Thu, 16 Feb 2023 11:14:35 +0800
Labels:               k8s-app=calico-kube-controllers
                      pod-template-hash=9f49b98f6
Annotations:          cni.projectcalico.org/podIP: 10.244.29.73/32
                      cni.projectcalico.org/podIPs: 10.244.29.73/32
Status:               Running
IP:                   10.244.29.73
IPs:
  IP:           10.244.29.73
Controlled By:  ReplicaSet/calico-kube-controllers-9f49b98f6
Containers:
  calico-kube-controllers:
    Container ID:   docker://21594e3517a3fc8ffc5224496cec373117138acf5417d9a335a1c5e80e0c3802
    Image:          registry.custom.local:12480/kubeadm-ha/calico_kube-controllers:v3.19.1
    Image ID:       docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:2ff71ba65cd7fe10e183ad80725ad3eafb59899d6f1b2610446b90c84bf2425a
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 21 Feb 2023 09:34:06 +0800
      Finished:     Tue, 21 Feb 2023 09:35:15 +0800
    Ready:          False
    Restart Count:  1940
    Liveness:       exec [/usr/bin/check-status -l] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:      exec [/usr/bin/check-status -r] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ENABLED_CONTROLLERS:  node
      DATASTORE_TYPE:       kubernetes
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-55jbn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-api-access-55jbn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                        From     Message
  ----     ------     ----                       ----     -------
  Warning  Unhealthy  31m (x15164 over 4d22h)    kubelet  Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
  Warning  BackOff    6m23s (x23547 over 4d22h)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  79s (x11571 over 4d22h)    kubelet  Liveness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input

对比镜像

检查镜像版本,与其它集群一致,没问题

Image:          registry.custom.local:12480/kubeadm-ha/calico_kube-controllers:v3.19.1    
Image ID:       docker-pullable://registry.cn-beijing.aliyuncs.com/dotbalo/kube-controllers@sha256:2ff71ba65cd7fe10e183ad80725ad3eafb59899d6f1b2610446b90c84bf2425a

检查其余集群配置差异

检查与其它集群的配置信息,该机器的 docker 是原来已经安装的,版本是 19,其它机器是新安装的版本 20 。

处理方案

在无法重装 docker 的情况下

重启 pod,无效

百度,无相关信息

调整 calico-kube-controllers 配置

配置文件在 /etc/kubernetes/plugins/network-plugin/calico-typha.yaml

我们针对无法写入目录 /status ,添加卷映射

calico-kube-controllers 启动失败处理

应用配置

mkdir /var/run/calico/status
chmod 777/var/run/calico/status
kubectl apply -f  /etc/kubernetes/plugins/network-plugin/calico-typha.yaml

到此系统恢复文章来源地址https://www.toymoban.com/news/detail-436730.html

到了这里,关于calico-kube-controllers 启动失败处理的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • Kubernetes高可用集群二进制部署(四)部署kubectl和kube-controller-manager、kube-scheduler

    Kubernetes概述 使用kubeadm快速部署一个k8s集群 Kubernetes高可用集群二进制部署(一)主机准备和负载均衡器安装 Kubernetes高可用集群二进制部署(二)ETCD集群部署 Kubernetes高可用集群二进制部署(三)部署api-server Kubernetes高可用集群二进制部署(四)部署kubectl和kube-controller-man

    2024年02月12日
    浏览(41)
  • Docker启动失败问题解决:Job for docker.service failed because the control process exited with error code.....

    天行健,君子以自强不息;地势坤,君子以厚德载物。 每个人都有惰性,但不断学习是好好生活的根本,共勉! 文章均为学习整理笔记,分享记录为主,如有错误请指正,共同学习进步。 在搭建Harbor镜像仓库的时候配置insecure-registries参数,需要重启容器,然后重启失败了

    2024年04月11日
    浏览(48)
  • 记录k8s kube-controller-manager-k8s-master kube-scheduler-k8s-master重启

    1、报错如下 I0529 01:47:12.679312       1 event.go:307] \\\"Event occurred\\\" object=\\\"k8s-node-1\\\" fieldPath=\\\"\\\" kind=\\\"Node\\\" apiVersion=\\\"v1\\\" type=\\\"Normal\\\" reason=\\\"CIDRNotAvailable\\\" message=\\\"Node k8s-node-1 status is now: CIDRNotAvailable\\\" E0529 01:48:44.516760       1 controller_utils.go:262] Error while processing Node Add/Delete: failed to allocate cid

    2024年02月09日
    浏览(42)
  • centOS7 Mysql启动失败报错Job for mysqld.service failed because the control process exited with error code.

    在CentOS7中安装部署MySQL服务,首次启动服务时失败报错 Job for mysqld.service failed because the control process exited with error code.See “systemctl status mysqld.service” and “journal -xe” for details. 引起此报错的原因不尽相同,所以建议先找到引起报错的具体原因再针对性寻找解决方案 报错信息告

    2024年02月07日
    浏览(42)
  • k8s - kubelet启动失败处理记录

    测试环境好久没有使用了,启动kubelet发现失败了,查看状态,每看到具体报错点: [root@node1 ~]# systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent    Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: disabled)   Drop-In: /usr/lib/systemd/system/kubelet.service.d  

    2024年01月25日
    浏览(52)
  • android studio “run app”运行app 自行启动失败处理

    1.检查是否因为代码bug导致直接运行崩溃 2.检查是否配置 3.检查studio Edit Configurations 启动配置选项配置(Default Activity)    4.点击studio导航栏\\\"File\\\"  选择Clear cache and restart Android Studio 5.检查你的Android虚拟设备(AVD)设置:如果您正在使用一个模拟器运行您的应用程序,确保AVD设置正

    2024年04月09日
    浏览(43)
  • 关于k8s 安装Dashboard recommended.yaml下载失败以及calico.yaml 下载文件

    输入地址: https://www.ipaddress.com/ 查找 查询结果: 然后配置 linux 文件 vim /etc/hosts 然后再执行: calico.yaml 下载地址:https://docs.projectcalico.org/v3.8/manifests/calico.yaml

    2024年02月16日
    浏览(53)
  • k8s calico 网络异常处理

    故障 worker3故障重启后,该节点的 pod 访问不了其它节点服务 2023-06-26T07:44:41.041Z        ERROR   setup   unable to start manager {\\\"error\\\": \\\"Get \\\"https://10.244.64.1:443/api?timeout=32s\\\": dial tcp 10.244.64.1:443: i/o timeout\\\"} 发现网络组件也是有报错重启,对比其它节点的iptables,少了好多。     该节点

    2024年02月11日
    浏览(47)
  • 解决阿里云ESC启动kube-proxy服务时出现错误 亲测有效

    启动kube-proxy服务时出现错误如下:Failed to execute iptables-restore: exit status 1 (iptables-restore: invalid option – ‘5’ 我用的k8s是1.90 二进制安装 一般都是说版本过低怎么 降版本 1下载上传低版本iptable 下载地址 https://download.csdn.net/download/qq_33121481/13218948 主要是使用这个版本的 我以前使

    2024年02月03日
    浏览(40)
  • k8s v1.27.4 部署metrics-serverv:0.6.4,kube-prometheus,镜像下载问题处理

    只有一个问题,原来的httpGet存活、就绪检测一直不通过,于是改为tcpSocket后pod正常。 修改后的yaml文件,镜像修改为阿里云 部署kube-prometheus 兼容1.27的为main分支 只克隆main分支 处理: 修改prometheus-clusterRole.yaml 使用ServiceMonitor添加监控: 以ingress-nginx为例 修改ingress-nginx.yaml的

    2024年02月05日
    浏览(42)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包