creamy-pharmacist-50075
11/08/2025, 10:36 AMcrooked-sunset-83417
11/09/2025, 1:16 AMnutritious-intern-6999
11/10/2025, 10:08 AMbreezy-restaurant-60331
11/11/2025, 2:10 PMadamant-kite-43734
11/12/2025, 9:45 AMbrief-vase-99095
11/12/2025, 5:14 PMshy-gold-40913
11/12/2025, 6:27 PM# firewall-cmd --zone=public --list-all
public (default, active)
target: default
ingress-priority: 0
egress-priority: 0
icmp-block-inversion: no
interfaces: ens192
sources:
services: etcd-client etcd-server kube-apiserver kubelet wireguard
ports: 9345/tcp 9099/tcp 30000-32767/tcp 2381/tcp 51821/udp 8472/udp
protocols:
forward: yes
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
Agent nodes:
# firewall-cmd --zone=public --list-all
public (default, active)
target: default
ingress-priority: 0
egress-priority: 0
icmp-block-inversion: no
interfaces: ens192 ens224
sources:
services: kubelet wireguard
ports: 9099/tcp 30000-32767/tcp 8472/udp 51821/udp
protocols:
forward: yes
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
Output from overlaytest:
# ./overlaytest.sh
=> Start network overlay test
k8sagent02 can reach k8sagent02
command terminated with exit code 1
FAIL: overlaytest-4dtr4 on k8sagent02 cannot reach pod IP 10.252.2.2 on k8ssvr02
command terminated with exit code 1
FAIL: overlaytest-4dtr4 on k8sagent02 cannot reach pod IP 10.252.0.4 on k8ssvr01
command terminated with exit code 1
FAIL: overlaytest-4dtr4 on k8sagent02 cannot reach pod IP 10.252.3.19 on k8sagent01
command terminated with exit code 1
FAIL: overlaytest-4dtr4 on k8sagent02 cannot reach pod IP 10.252.1.2 on k8ssvr03
command terminated with exit code 1
FAIL: overlaytest-8vxld on k8ssvr02 cannot reach pod IP 10.252.4.3 on k8sagent02
k8ssvr02 can reach k8ssvr02
command terminated with exit code 1
FAIL: overlaytest-8vxld on k8ssvr02 cannot reach pod IP 10.252.0.4 on k8ssvr01
command terminated with exit code 1
FAIL: overlaytest-8vxld on k8ssvr02 cannot reach pod IP 10.252.3.19 on k8sagent01
command terminated with exit code 1
FAIL: overlaytest-8vxld on k8ssvr02 cannot reach pod IP 10.252.1.2 on k8ssvr03
command terminated with exit code 1
FAIL: overlaytest-ds7sh on k8ssvr01 cannot reach pod IP 10.252.4.3 on k8sagent02
command terminated with exit code 1
FAIL: overlaytest-ds7sh on k8ssvr01 cannot reach pod IP 10.252.2.2 on k8ssvr02
k8ssvr01 can reach k8ssvr01
command terminated with exit code 1
FAIL: overlaytest-ds7sh on k8ssvr01 cannot reach pod IP 10.252.3.19 on k8sagent01
command terminated with exit code 1
FAIL: overlaytest-ds7sh on k8ssvr01 cannot reach pod IP 10.252.1.2 on k8ssvr03
command terminated with exit code 1
FAIL: overlaytest-jw99g on k8sagent01 cannot reach pod IP 10.252.4.3 on k8sagent02
command terminated with exit code 1
FAIL: overlaytest-jw99g on k8sagent01 cannot reach pod IP 10.252.2.2 on k8ssvr02
command terminated with exit code 1
FAIL: overlaytest-jw99g on k8sagent01 cannot reach pod IP 10.252.0.4 on k8ssvr01
k8sagent01 can reach k8sagent01
command terminated with exit code 1
FAIL: overlaytest-jw99g on k8sagent01 cannot reach pod IP 10.252.1.2 on k8ssvr03
command terminated with exit code 1
FAIL: overlaytest-mmsv9 on k8ssvr03 cannot reach pod IP 10.252.4.3 on k8sagent02
command terminated with exit code 1
FAIL: overlaytest-mmsv9 on k8ssvr03 cannot reach pod IP 10.252.2.2 on k8ssvr02
command terminated with exit code 1
FAIL: overlaytest-mmsv9 on k8ssvr03 cannot reach pod IP 10.252.0.4 on k8ssvr01
command terminated with exit code 1
FAIL: overlaytest-mmsv9 on k8ssvr03 cannot reach pod IP 10.252.3.19 on k8sagent01
k8ssvr03 can reach k8ssvr03
=> End network overlay test
I've also excluded the various tunnel interfaces from NetworkManager per this https://docs.rke2.io/known_issues#networkmanager
# cat /etc/NetworkManager/conf.d/rke2-canal.conf
[keyfile]
unmanaged-devices=interface-name:flannel*;interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico;interface-name:vxlan-v6.calico;interface-name:wireguard.cali;interface-name:wg-v6.cali
How do I begin troubleshooting this? I'm running rke2 stable on AlmaLinux 10nutritious-petabyte-80748
11/13/2025, 4:31 AMblue-jelly-47972
11/13/2025, 4:59 AMmost-balloon-51259
11/13/2025, 9:48 AMkind-air-74358
11/13/2025, 10:49 AMfleet-agent is constant being restarted. It could be well possible this is caused by either a Rancher update from 2.11 to 2.12 or caused by switching a root certificate from self-signed to a provided root certificate (following these docs).
In the fleet-controller/fleet-agentmanagement we see the following logs constantly
time="2025-11-13T10:44:39Z" level=info msg="Deleted old agent for cluster (fleet-local/local) in namespace cattle-fleet-local-system"
time="2025-11-13T10:44:39Z" level=info msg="Cluster import for 'fleet-local/local'. Deployed new agent"
time="2025-11-13T10:45:00Z" level=info msg="Waiting for service account token key to be populated for secret cluster-fleet-local-local-1a3d67d0a899/request-cs9x7-8645b8de-5e30-4eb0-a9fe-dc96f1081856-token"
time="2025-11-13T10:45:02Z" level=info msg="Cluster registration request 'fleet-local/request-cs9x7' granted, creating cluster, request service account, registration secret"
The fleet-agent in the cattle-fleet-local-system isn't reporting any errors but just reposts
I1113 10:44:40.589439 1 leaderelection.go:257] attempting to acquire leader lease cattle-fleet-local-system/fleet-agent...
{"level":"info","ts":"2025-11-13T10:44:40Z","logger":"setup","msg":"new leader","identity":"fleet-agent-6d5f55c7d7-4pncc-1"}
I1113 10:45:00.267179 1 leaderelection.go:271] successfully acquired lease cattle-fleet-local-system/fleet-agent
{"level":"info","ts":"2025-11-13T10:45:00Z","logger":"setup","msg":"renewed leader","identity":"fleet-agent-5cf8799b4c-xn274-1"}
time="2025-11-13T10:45:00Z" level=warning msg="Cannot find fleet-agent secret, running registration"
time="2025-11-13T10:45:00Z" level=info msg="Creating clusterregistration with id 'pwvp47nf7r8pg8zfmd4tx7vxb6rhr5dwv2gcnn2m6zlrtmt54ss9kl' for new token"
time="2025-11-13T10:45:02Z" level=info msg="Waiting for secret 'cattle-fleet-clusters-system/c-9072b2e8eac3a21368e0428adc1a0244a61acd4ee571c7f88f574d905cd52' on management cluster for request 'fleet-local/request-cs9x7': secrets \"c-9072b2e8eac3a21368e0428adc1a0244a61acd4ee571c7f88f574d905cd52\" not found"
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"setup","msg":"successfully registered with upstream cluster","namespace":"cluster-fleet-local-local-1a3d67d0a899"}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"setup","msg":"listening for changes on upstream cluster","cluster":"local","namespace":"cluster-fleet-local-local-1a3d67d0a899"}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"setup","msg":"Starting controller","metricsAddr":":8080","probeAddr":":8081","systemNamespace":"cattle-fleet-local-system"}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"setup","msg":"starting manager"}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"starting server","name":"health probe","addr":"0.0.0.0:8081"}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"Starting EventSource","controller":"bundledeployment","controllerGroup":"<http://fleet.cattle.io|fleet.cattle.io>","controllerKind":"BundleDeployment","source":"kind source: *v1alpha1.BundleDeployment"}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"setup","msg":"Starting cluster status ticker","checkin interval":"15m0s","cluster namespace":"fleet-local","cluster name":"local"}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"Starting EventSource","controller":"drift-reconciler","source":"channel source: 0xc00078f3b0"}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"Starting Controller","controller":"drift-reconciler"}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"Starting workers","controller":"drift-reconciler","worker count":50}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"Starting Controller","controller":"bundledeployment","controllerGroup":"<http://fleet.cattle.io|fleet.cattle.io>","controllerKind":"BundleDeployment"}
{"level":"info","ts":"2025-11-13T10:45:04Z","msg":"Starting workers","controller":"bundledeployment","controllerGroup":"<http://fleet.cattle.io|fleet.cattle.io>","controllerKind":"BundleDeployment","worker count":50}
{"level":"info","ts":"2025-11-13T10:45:04Z","logger":"bundledeployment.helm-deployer.install","msg":"Upgrading helm release","controller":"bundledeployment","controllerGroup":"<http://fleet.cattle.io|fleet.cattle.io>","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-local","namespace":"cluster-fleet-local-local-1a3d67d0a899"},"namespace":"cluster-fleet-local-local-1a3d67d0a899","name":"fleet-agent-local","reconcileID":"1e4df644-069b-4d05-84ed-2c447bc54d15","commit":"","dryRun":false}
{"level":"info","ts":"2025-11-13T10:45:05Z","logger":"bundledeployment.deploy-bundle","msg":"Deployed bundle","controller":"bundledeployment","controllerGroup":"<http://fleet.cattle.io|fleet.cattle.io>","controllerKind":"BundleDeployment","BundleDeployment":{"name":"fleet-agent-local","namespace":"cluster-fleet-local-local-1a3d67d0a899"},"namespace":"cluster-fleet-local-local-1a3d67d0a899","name":"fleet-agent-local","reconcileID":"1e4df644-069b-4d05-84ed-2c447bc54d15","deploymentID":"s-2f332c47bb36e1bc8d70932ee0158e1b3289ae7ef2ea995e2bd77828ef2e9:8a42b4463e55a59ce2ccdf3c53c32455ce5fd0f601587bf57b5624b3cf8bb623","appliedDeploymentID":"s-c1fc5eeb18677acb8c4a8fd2054c2c40c4022f002ea06437f9b108731be8f:8a42b4463e55a59ce2ccdf3c53c32455ce5fd0f601587bf57b5624b3cf8bb623","release":"cattle-fleet-local-system/fleet-agent-local:20","DeploymentID":"s-2f332c47bb36e1bc8d70932ee0158e1b3289ae7ef2ea995e2bd77828ef2e9:8a42b4463e55a59ce2ccdf3c53c32455ce5fd0f601587bf57b5624b3cf8bb623"
And afterwards the fleet-agent is restarted again... Any idea's on what could be wrong and how to fix it?
We've already redeployed the fleet-controller, fleet-agent deployments, reinstalled the fleet-agent and fleet-controller helm charts with no luckcrooked-sunset-83417
11/13/2025, 7:12 PMhallowed-manchester-34892
11/14/2025, 3:39 AMcold-jewelry-86345
11/14/2025, 9:59 AMlimited-cartoon-69023
11/17/2025, 5:14 AMlimited-action-18093
11/17/2025, 4:14 PMmammoth-postman-10874
11/18/2025, 12:08 PMbrainy-kilobyte-33711
11/18/2025, 12:20 PMmammoth-postman-10874
11/18/2025, 12:36 PMdelightful-gigabyte-66989
11/18/2025, 3:20 PMearly-lion-52243
11/19/2025, 11:24 AMhundreds-sugar-37524
11/20/2025, 8:25 AMagentToken and serverToken . Either a mismatch between rke2 file and etcd data or between rke2 and the secret in rancher (similar to /var/lib/rancher/rke2/server/cred/passwd newer than datastore and could cause a cluster outage. Remove the file(s ) from disk and restart to be recreated from datastore. · rancher/rke2 · Discussion #4035 and Agent token rotation leads to datastore being out-of-sync · Issue #5785 · rancher/rke2)
any ideas ?
Our steps are:
• backup downstream-1 on rancher-az-1
• create downstream-1-backup on rancher-az-2 with one node
• restore the cluster using rke2 cli
• cluster is up with data from downstream-1, clean stuff like old nodes
• if I restart rke2-server, it fails with /var/lib/rancher/rke2/server/cred/passwd newer than datastore . If I add a new node, same thing. Even if I update the rancher secret who holds agentToken and serverToken
We don't use velero to copy paste our resources between the clusters because we want to preserve ownerRefaloof-sunset-97310
11/20/2025, 11:14 AMhallowed-window-565
11/20/2025, 12:25 PM$ helm search repo rancher-stable/rancher --versions
NAME CHART VERSION APP VERSION DESCRIPTION
rancher-stable/rancher 2.12.3 v2.12.3 Install Rancher Server to manage Kubernetes clu...
rancher-stable/rancher 2.12.2 v2.12.2 Install Rancher Server to manage Kubernetes clu...
rancher-stable/rancher 2.12.1 v2.12.1 Install Rancher Server to manage Kubernetes clu...
rancher-stable/rancher 2.11.3 v2.11.3 Install Rancher Server to manage Kubernetes clu...
rancher-stable/rancher 2.11.2 v2.11.2 Install Rancher Server to manage Kubernetes clu...
rancher-stable/rancher 2.11.1 v2.11.1 Install Rancher Server to manage Kubernetes clu...
what is up with that ? am i using the wrong repo ? https://releases.rancher.com/server-charts/stableabundant-napkin-79526
11/20/2025, 2:48 PMminiature-sandwich-60499
11/20/2025, 5:55 PMminiature-sandwich-60499
11/20/2025, 5:56 PMminiature-sandwich-60499
11/20/2025, 5:57 PMenough-pizza-52106
11/20/2025, 8:13 PMnumerous-agency-66232
11/20/2025, 8:48 PM