Hi All, I have failed downstream cluster with cras...
# general
a
Hi All, I have failed downstream cluster with crashed all three nodes with control planes, worker nodes works fine. I try to restore the control plane nodes using this manual https://www.suse.com/support/kb/doc/?id=000020695 On the stage 7 new one node don't reaches the active state, in the logs kube-apiserver on new control plane node we see much errors like
Copy code
W0811 07:48:41.061638       1 dispatcher.go:195] Failed calling webhook, failing closed <http://rancher.cattle.io.clusters.management.cattle.io|rancher.cattle.io.clusters.management.cattle.io>: failed calling webhook "<http://rancher.cattle.io.clusters.management.cattle.io|rancher.cattle.io.clusters.management.cattle.io>": failed to call webhook: Post "<https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s>": dial tcp 10.43.195.200:443: connect: connection refused
rancher-webhook pod up and running and don't have any errors in logs In additional i see different ip of rancher-webhook service
Copy code
kubectl -n cattle-system get svc rancher-webhook
rancher-webhook   ClusterIP   10.43.231.42   <none>        443/TCP
Also on the Rancher UI I see red banner
Copy code
Internal error occurred: failed calling webhook "rancher.cattle.io.namespaces.create-non-kubesystem": failed to call webhook: Post "<https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation/namespaces?timeout=10s>": context deadline exceeded
Also I check rancher logs and see that the snapshot restored successful
Copy code
2023/08/11 07:42:13 [INFO] Finished restoring snapshot [c-whxdl-rl-67qr6_2023-08-11T03:35:43Z] on all etcd hosts
Also check etcd state, this is ok
Copy code
root@master-1:/# etcdctl member list
b8c35b4cbb37e936, started, etcd-master-1, <https://10.14.1.17:2380>, <https://10.14.1.17:2379>, false
Can anyone help how possible to restore cluster from etcd snapshot and what I am doing wrong? Thanks in advance! 🙌
🤑 1
👀 1
189 Views