This message was deleted.
# vsphere
a
This message was deleted.
b
Well, I also attempted to rotate certs like advised in https://github.com/rancher/rancher/issues/41125 since one of the nodes was stuck in
Waiting for probes: kube-controller-manager, kube-scheduler
and then I tried to remove one of the nodes that were stuck in provisioning because the certificate rotation froze and and the cluster status is:
waiting for all etcd machines to be deleted
... Yet, nothing's happening :/
After removing all control plane nodes except one, doing a cluster reset on it and renewing the certs I'm seeing
waiting for at least one control plane, etcd, and worker node to be registered
as the cluster status. What can satisfy this condition? The rancher agent pod is running in the cluster...
An entire day of troubleshooting later, I just removed all nodes and trying to spin up the CP from an etcd backup but I get
FATA[0000] starting kubernetes: preparing server: start managed database: snapshot missing hash but --skip-hash-check=false
and no, there is no zip extension in the snapshot name. It's a command invoked by Rancher when trigerred the restore from the UI.
Deleted all nodes and reprovisioned CP from scratch but stuck on
non-ready bootstrap machine(s) storclus-ashford-cp-7857d6899x564xz-45jpx and join url to be available on bootstrap node
🆘 Node is up, pods are running, cloud provider initialized... I am not sure what it's waiting for.
Back on
rkecontrolplane was already initialized but no etcd machines exist that have plans, indicating the etcd plane has been entirely replaced. Restoration from etcd snapshot is required.
so, how do I restore from s3 without the
--skip-hash-check=false
eror?
Any help? 🙏
Well, I've managed to get the snapshot to restore and things seem working now 🙂
350 Views