brash-waitress-85312
06/26/2025, 5:04 PMkube-scheduler
and kube-controller-manager
certificates were not rotated automatically by rancher.
• etcds were failing to start as they couldnt communicate with control planes.
• so I assumed cluster is failed and everything is haywire, I decided to perform the DR because I had the etcd snapshots from the day earlier.
• followed this guide first which asks to remove control planes from the rancher
◦ https://support.tools/post/rke2-with-rancher-disaster-recovery/
• above did not work so then followed this guide
◦ https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/backup-restore-and-dis[…]tore-rancher-launched-kubernetes-clusters-from-backup
Current situation of the cluster
1. In rancher, cluster has left with the 3 worker nodes, all the CP and ETCDs are removed from the rancher.
2. trying to add a node to the cluster but gives the following error message when I run the cluster registration command copied from rancher
curl -fL <https://rancher.internal/system-agent-install.sh> | sudo sh -s - --server <https://rancher.internal> --label '<http://cattle.io/os=linux|cattle.io/os=linux>' --token <token> --ca-checksum <checksum> --etcd --controlplane --worker
[INFO] Label: <http://cattle.io/os=linux|cattle.io/os=linux>
[INFO] Role requested: etcd
[INFO] Role requested: controlplane
[INFO] Role requested: worker
[INFO] CA strict verification is set to false
[INFO] Using default agent configuration directory /etc/rancher/agent
[INFO] Using default agent var directory /var/lib/rancher/agent
[INFO] Determined CA is not necessary to connect to Rancher
[INFO] Successfully tested Rancher connection
[INFO] Downloading rancher-system-agent binary from <https://rancher.internal/assets/rancher-system-agent-amd64>
[INFO] Successfully downloaded the rancher-system-agent binary.
[INFO] Downloading rancher-system-agent-uninstall.sh script from <https://rancher.internal/assets/system-agent-uninstall.sh>
[INFO] Successfully downloaded the rancher-system-agent-uninstall.sh script.
[INFO] Generating Cattle ID
curl: (28) Operation timed out after 60002 milliseconds with 0 bytes received
[ERROR] 000 received while downloading Rancher connection information. Sleeping for 5 seconds and trying again
3. I did little bit of troubleshooting and found out that the system-agent-install.sh
script which is run to register a node to the current cluster get stuck at this command:
curl --connect-timeout 60 --max-time 60 --write-out '%{http_code}\n' -sS -H 'Authorization: Bearer <token>' -H 'X-Cattle-Id: f8bcebdca8c1dcce980ee7d67b583b5b3db64419bc3a0e130f8a1369a8a395a' -H 'X-Cattle-Role-Etcd: true' -H 'X-Cattle-Role-Control-Plane: true' -H 'X-Cattle-Role-Worker: true' -H 'X-Cattle-Node-Name: <eradicated>' -H 'X-Cattle-Address: ' -H 'X-Cattle-Internal-Address: <eradicated>' -H 'X-Cattle-Labels: <http://cattle.io/os=linux|cattle.io/os=linux>' -H 'X-Cattle-Taints: ' <https://rancher.internal/v3/connect/agent> -o /var/lib/rancher/agent/rancher2_connection_info.json
is there a bug in rancher? is it not possible to register any node after you remove all the control planes/etcd?mysterious-animal-29850
06/26/2025, 9:51 PMbrash-waitress-85312
06/27/2025, 10:07 AMbumpy-portugal-40754
06/27/2025, 3:44 PMbrash-waitress-85312
06/27/2025, 4:38 PMbumpy-portugal-40754
06/27/2025, 4:59 PMbrash-waitress-85312
06/27/2025, 7:59 PMbumpy-portugal-40754
06/27/2025, 8:04 PMbrash-waitress-85312
06/28/2025, 4:58 PMbumpy-portugal-40754
06/29/2025, 12:15 PM