Hi All I have failed downstream cluster with crashed all thr Rancher Users #general

Hi All, I have failed downstream cluster with cras...

astonishing-hair-28835

08/11/2023, 8:16 AM

Hi All, I have failed downstream cluster with crashed all three nodes with control planes, worker nodes works fine. I try to restore the control plane nodes using this manual https://www.suse.com/support/kb/doc/?id=000020695 On the stage 7 new one node don't reaches the active state, in the logs kube-apiserver on new control plane node we see much errors like

Copy code

W0811 07:48:41.061638       1 dispatcher.go:195] Failed calling webhook, failing closed <http://rancher.cattle.io.clusters.management.cattle.io|rancher.cattle.io.clusters.management.cattle.io>: failed calling webhook "<http://rancher.cattle.io.clusters.management.cattle.io|rancher.cattle.io.clusters.management.cattle.io>": failed to call webhook: Post "<https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s>": dial tcp 10.43.195.200:443: connect: connection refused

rancher-webhook pod up and running and don't have any errors in logs In additional i see different ip of rancher-webhook service

Copy code

kubectl -n cattle-system get svc rancher-webhook
rancher-webhook   ClusterIP   10.43.231.42   <none>        443/TCP

Also on the Rancher UI I see red banner

Copy code

Internal error occurred: failed calling webhook "rancher.cattle.io.namespaces.create-non-kubesystem": failed to call webhook: Post "<https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation/namespaces?timeout=10s>": context deadline exceeded

Also I check rancher logs and see that the snapshot restored successful

Copy code

2023/08/11 07:42:13 [INFO] Finished restoring snapshot [c-whxdl-rl-67qr6_2023-08-11T03:35:43Z] on all etcd hosts

Also check etcd state, this is ok

Copy code

root@master-1:/# etcdctl member list
b8c35b4cbb37e936, started, etcd-master-1, <https://10.14.1.17:2380>, <https://10.14.1.17:2379>, false

Can anyone help how possible to restore cluster from etcd snapshot and what I am doing wrong? Thanks in advance! 🙌

🤑 1

👀 1

195 Views

Open in Slack

Previous Next