Hi, I am trying to increase my harvester (1.4.2) ...
# harvester
f
Hi, I am trying to increase my harvester (1.4.2) cluster. While installing a new node to my cluster, i give my join-token and the install process finishes, but on my
rancher-system-agent.service
log i see the following:
Copy code
Jul 28 10:04:17 kimura rancher-system-agent[4505]: W0728 10:04:17.749836    4505 reflector.go:462] pkg/mod/github.com/rancher/client-go@v1.29.3-rancher1/tools/cache/reflector.go:229: watch of *v1.Secret ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 5; INTERNAL_ERROR; received from peer") has prevented the request from succeeding
then later
Copy code
Jul 28 11:11:09 kimura rancher-system-agent[14540]: time="2025-07-28T11:11:09Z" level=debug msg="[K8s] Processing secret custom-59c31a50be51-machine-plan in namespace fleet-local at generation 0 with resource version 640629564"
how can i debug it further or make sure i can connect the new harvester with the running harvester?
my rancherd log looks like:
Copy code
Jul 28 10:01:18 kimura rancherd[4304]: time="2025-07-28T10:01:18Z" level=info msg="[stderr]: time=\"2025-07-28T10:01:18Z\" level=info msg=\"Running probes defined in /var/lib/rancher/rancherd/plan/plan.json\""
Jul 28 10:01:19 kimura rancherd[4304]: time="2025-07-28T10:01:19Z" level=info msg="[stderr]: time=\"2025-07-28T10:01:19Z\" level=info msg=\"Probe [kubelet] is unhealthy\""
a
@freezing-ability-583 we're getting the same thing on a 1.3.0 cluster.
One thing I have found is that the
machine-plan
does not appear to have any data in it (
kubectl get secret -n fleet-local custom-f836bd7aad6e-machine-plan
). It looks similar to this issue, but I'm not sure how to fix for Harvester.
👍 1
p
We've just resolved the problem @ambitious-knife-27333 mentioned above. Similar issue to here. In our case, certs did not auto-renew when restarting services, but:
/opt/rke2/bin/rke2 certificate rotate
Followed by kicking the
rke2-server
service on the control plane nodes seems to have done the trick. The failed node automatically joined the cluster as soon as the certs were regenerated.