freezing-ability-583
07/28/2025, 11:55 AMrancher-system-agent.service
log i see the following:
Jul 28 10:04:17 kimura rancher-system-agent[4505]: W0728 10:04:17.749836 4505 reflector.go:462] pkg/mod/github.com/rancher/client-go@v1.29.3-rancher1/tools/cache/reflector.go:229: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 5; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
then later
Jul 28 11:11:09 kimura rancher-system-agent[14540]: time="2025-07-28T11:11:09Z" level=debug msg="[K8s] Processing secret custom-59c31a50be51-machine-plan in namespace fleet-local at generation 0 with resource version 640629564"
freezing-ability-583
07/28/2025, 11:59 AMfreezing-ability-583
07/28/2025, 12:02 PMJul 28 10:01:18 kimura rancherd[4304]: time="2025-07-28T10:01:18Z" level=info msg="[stderr]: time=\"2025-07-28T10:01:18Z\" level=info msg=\"Running probes defined in /var/lib/rancher/rancherd/plan/plan.json\""
Jul 28 10:01:19 kimura rancherd[4304]: time="2025-07-28T10:01:19Z" level=info msg="[stderr]: time=\"2025-07-28T10:01:19Z\" level=info msg=\"Probe [kubelet] is unhealthy\""
ambitious-knife-27333
07/29/2025, 4:53 PMambitious-knife-27333
07/29/2025, 5:13 PMmachine-plan
does not appear to have any data in it (kubectl get secret -n fleet-local custom-f836bd7aad6e-machine-plan
). It looks similar to this issue, but I'm not sure how to fix for Harvester.plain-ambulance-12322
08/10/2025, 11:08 AM/opt/rke2/bin/rke2 certificate rotate
Followed by kicking the rke2-server
service on the control plane nodes seems to have done the trick. The failed node automatically joined the cluster as soon as the certs were regenerated.