Folks, im unsure if this is the right place for th...
# fleet
t
Folks, im unsure if this is the right place for the question, but I am seeing a couple of nodes with a failed plan in the
rancher-system-agent
logs, repeatedly complaining with
[K8s] Maximum failure threshold exceeded for plan with checksum value of absdef1234
This initially appears to have broken due to a network issue and being unable to retrieve the
<http://index.docker.io/rancher/system-agent-installer-rke2:v1.32.5-rke2r1|index.docker.io/rancher/system-agent-installer-rke2:v1.32.5-rke2r1>
image. However it now appears to be stuck. Is it feasible to update the refrenced secret and increment the
max-failures
field to have it at least attempt teh plan, or perhaps decrement the
failure-count
?
c
have you tried restarting the service?
t
Yep, restarted the service, rebooted the machine, the errors just start up immediately
This is the behaviour as per logs:
Copy code
Oct 24 15:05:41 oss-master001 systemd[1]: Stopping Rancher System Agent...
Oct 24 15:05:41 oss-master001 systemd[1]: rancher-system-agent.service: Deactivated successfully.
Oct 24 15:05:41 oss-master001 systemd[1]: Stopped Rancher System Agent.
Oct 24 15:05:41 oss-master001 systemd[1]: rancher-system-agent.service: Consumed 4.229s CPU time.
Oct 24 15:06:04 oss-master001 systemd[1]: Started Rancher System Agent.
Oct 24 15:06:04 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:04+11:00" level=info msg="Rancher System Agent version v0.3.12 (e4876a6) is starting"
Oct 24 15:06:04 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:04+11:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Oct 24 15:06:04 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:04+11:00" level=info msg="Starting remote watch of plans"
Oct 24 15:06:04 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:04+11:00" level=info msg="Starting /v1, Kind=Secret controller"
Oct 24 15:06:04 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:04+11:00" level=info msg="Detected first start, force-applying one-time instruction set"
Oct 24 15:06:04 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:04+11:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan with checksum value of 6e8deb12d53c4fab7ab353ef70551d337613aed2a48b156e3c8d330dd7293aec, (failures: 1, threshold: 1)"
Oct 24 15:06:09 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:09+11:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan with checksum value of 6e8deb12d53c4fab7ab353ef70551d337613aed2a48b156e3c8d330dd7293aec, (failures: 1, threshold: 1)"
Oct 24 15:06:14 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:14+11:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan with checksum value of 6e8deb12d53c4fab7ab353ef70551d337613aed2a48b156e3c8d330dd7293aec, (failures: 1, threshold: 1)"
Oct 24 15:06:19 oss-master001 rancher-system-agent[17887]: time="2025-10-24T15:06:19+11:00" level=error msg="[K8s] Maximum failure threshold exceeded for plan with checksum value of 6e8deb12d53c4fab7ab353ef70551d337613aed2a48b156e3c8d330dd7293aec, (failures: 1, threshold: 1)"