eager-refrigerator-66976
05/15/2025, 5:49 PMrancher-system-agent.service
not being able to bootstrap the node
here is what I see in rancher-system-agent
logs
May 15 17:39:55 ip-172-23-107-94 rancher-system-agent[1291]: W0515 17:39:55.025335 1291 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 13; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
May 15 17:41:56 ip-172-23-107-94 rancher-system-agent[1291]: W0515 17:41:56.588541 1291 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 17; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
May 15 17:43:57 ip-172-23-107-94 rancher-system-agent[1291]: W0515 17:43:57.409457 1291 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 21; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
So rke2 is never gets provisioned there so I can't complete recovery process.
Any idea much appreciated 🙏creamy-pencil-82913
05/15/2025, 5:57 PMeager-refrigerator-66976
05/15/2025, 6:00 PMMay 15 17:58:03 ip-172-23-107-94 rancher-system-agent[1470]: time="2025-05-15T17:58:03Z" level=debug msg="[K8s] Processing secret custom-6a69b19510e2-machine-plan in namespace fleet-default at generation 0 with resource version 173187"
May 15 17:58:08 ip-172-23-107-94 rancher-system-agent[1470]: time="2025-05-15T17:58:08Z" level=debug msg="[K8s] Processing secret custom-6a69b19510e2-machine-plan in namespace fleet-default at generation 0 with resource version 173187"
May 15 17:58:13 ip-172-23-107-94 rancher-system-agent[1470]: time="2025-05-15T17:58:13Z" level=debug msg="[K8s] Processing secret custom-6a69b19510e2-machine-plan in namespace fleet-default at generation 0 with resource version 173187"
May 15 17:58:18 ip-172-23-107-94 rancher-system-agent[1470]: W0515 17:58:18.273269 1470 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 5; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
At least I see it is able to find machine plan secret custom-6a69b19510e2-machine-plan
checking rancher logs...eager-refrigerator-66976
05/15/2025, 6:03 PM2025/05/15 17:59:59 [DEBUG] Searching for providerID for selector <http://rke.cattle.io/machine=75984c19-6ba4-4d23-8bb9-c44141054d9a|rke.cattle.io/machine=75984c19-6ba4-4d23-8bb9-c44141054d9a> in cluster fleet-default/dev-euc1-te-test06, machine custom-6a69b19510e2: an error on the server ("error trying to reach service: cluster agent disconnected") has prevented the request from succeeding (get nodes)
2025/05/15 17:59:59 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/custom-6a69b19510e2-machine-bootstrap for rke-bootstrap fleet-default/custom-6a69b19510e2
this happens same time as error on system agenteager-refrigerator-66976
05/15/2025, 6:04 PMeager-refrigerator-66976
05/15/2025, 6:07 PM2025/05/15 18:06:23 [INFO] [planner] rkecluster fleet-default/dev-euc1-te-test06: rkecontrolplane was already initialized but no etcd machines exist that have plans, indicating the etcd plane has been entirely replaced. Restoration from etcd snapshot is required.
eager-refrigerator-66976
05/15/2025, 6:08 PMcreamy-pencil-82913
05/15/2025, 6:09 PMeager-refrigerator-66976
05/15/2025, 6:09 PMcreamy-pencil-82913
05/15/2025, 6:09 PMeager-refrigerator-66976
05/15/2025, 6:10 PMeager-refrigerator-66976
05/15/2025, 6:10 PMeager-refrigerator-66976
05/15/2025, 6:11 PMcreamy-pencil-82913
05/15/2025, 6:23 PMcreamy-pencil-82913
05/15/2025, 6:24 PMeager-refrigerator-66976
05/15/2025, 6:26 PMcreamy-pencil-82913
05/15/2025, 6:29 PMcreamy-pencil-82913
05/15/2025, 6:30 PMeager-refrigerator-66976
05/15/2025, 6:30 PMcreamy-pencil-82913
05/15/2025, 6:30 PMcreamy-pencil-82913
05/15/2025, 6:31 PMcreamy-pencil-82913
05/15/2025, 6:31 PMcreamy-pencil-82913
05/15/2025, 6:33 PMeager-refrigerator-66976
05/15/2025, 6:33 PMcreamy-pencil-82913
05/15/2025, 6:33 PMcreamy-pencil-82913
05/15/2025, 6:33 PMcreamy-pencil-82913
05/15/2025, 6:34 PM1. Remove all etcd nodes from your cluster.
a. In the upper left corner, click ☰ > Cluster Management.
b. In the Clusters page, go to the cluster where you want to remove nodes.
c. In the Machines tab, click ⋮ > Delete on each node you want to delete. Initially, you will see the nodes hang in astate, but once all etcd nodes are deleting, they will be removed together. This is due to the fact that Rancher sees all etcd nodes deleting and proceeds to “short circuit” the etcd safe-removal logic.deleting
2. After all etcd nodes are removed, add the new etcd node that you are planning to restore from. Assign the new node the role of(etcd, controlplane, and worker).all
◦ If the node was previously in a cluster, clean the node first.
◦ For custom clusters, go to the Registration tab and check the box for. Then copy and run the registration command on your node.etcd, controlplane, and worker
◦ For node driver clusters, a new node is provisioned automatically.
3. At this point, Rancher will indicate that restoration from etcd snapshot is required.
eager-refrigerator-66976
05/15/2025, 6:34 PMyou saw a message in the logs that says restore is required, are you not seeing that same thing in the UI?both
Did you assign the new node the correct roles?yes all 4 as per documentation
you are reading the rke2/k3s steps there right? Not the rke steps?yes this is exactly what I am doing
eager-refrigerator-66976
05/15/2025, 6:35 PMcreamy-pencil-82913
05/15/2025, 6:35 PMcreamy-pencil-82913
05/15/2025, 6:35 PMcreamy-pencil-82913
05/15/2025, 6:35 PMAt this point, Rancher will indicate that restoration from etcd snapshot is required
eager-refrigerator-66976
05/15/2025, 6:35 PMeager-refrigerator-66976
05/15/2025, 6:36 PMcreamy-pencil-82913
05/15/2025, 6:37 PMcreamy-pencil-82913
05/15/2025, 6:37 PMcreamy-pencil-82913
05/15/2025, 6:38 PMeager-refrigerator-66976
05/15/2025, 6:38 PMcreamy-pencil-82913
05/15/2025, 6:39 PMeager-refrigerator-66976
05/15/2025, 6:39 PMeager-refrigerator-66976
05/15/2025, 6:39 PMeager-refrigerator-66976
05/15/2025, 6:39 PMMay 15 18:39:41 ip-172-23-107-94 rancher-system-agent[1470]: W0515 18:39:41.992571 1470 reflector.go:492] pkg/mod/k8s.io/client-go@v0.32.2/tools/cache/reflector.go:251: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 97; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
creamy-pencil-82913
05/15/2025, 6:40 PMeager-refrigerator-66976
05/15/2025, 6:41 PMeager-refrigerator-66976
05/15/2025, 6:41 PMcreamy-pencil-82913
05/15/2025, 6:42 PMeager-refrigerator-66976
05/15/2025, 6:43 PMeager-refrigerator-66976
05/15/2025, 6:43 PMcreamy-pencil-82913
05/15/2025, 6:43 PMeager-refrigerator-66976
05/15/2025, 6:43 PMeager-refrigerator-66976
05/15/2025, 6:47 PMeager-refrigerator-66976
05/15/2025, 6:48 PMstraight-actor-37028
05/23/2025, 7:21 PMcreamy-pencil-82913
05/23/2025, 7:27 PMeager-refrigerator-66976
05/27/2025, 8:59 AMeager-refrigerator-66976
05/27/2025, 9:01 AM