Hi, i have a Problem in our Rancher Cluster where ...
# rke2
a
Hi, i have a Problem in our Rancher Cluster where when i try to change the config the new (worker) nodes cannot join the cluster. On the Rancher UI it states WaitingForNodeRef, after the nodes are provisioned on the infra provider. On the Nodes rke2-agent does not jet exist, and rancher-system-agent looks "passive" where no plan will be applied. In Rancher logsi was able to gather this:
Copy code
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:42 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-metallb-5454c47f9fxx8lnb-kd9xp - previous join server (<https://192.168.1.181:9345>) was not valid, using new join server (<https://192.168.1.173:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:42 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-metallb-5454c47f9fxx8lnb-wj4rs - previous join server (<https://192.168.1.179:9345>) was not valid, using new join server (<https://192.168.1.164:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:42 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-metallb-sgchp-f7qnx - previous join server () was not valid, using new join server (<https://192.168.1.167:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:42 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-tlbx-2qp4m-v2qmc - previous join server () was not valid, using new join server (<https://192.168.1.167:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:42 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-tlbx-76ff6b5547xpxgpb-5lkn4 - previous join server (<https://192.168.1.181:9345>) was not valid, using new join server (<https://192.168.1.167:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:47 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-metallb-5454c47f9fxx8lnb-kd9xp - previous join server (<https://192.168.1.181:9345>) was not valid, using new join server (<https://192.168.1.173:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:47 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-metallb-5454c47f9fxx8lnb-wj4rs - previous join server (<https://192.168.1.179:9345>) was not valid, using new join server (<https://192.168.1.164:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:47 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-metallb-sgchp-f7qnx - previous join server () was not valid, using new join server (<https://192.168.1.167:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:47 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-tlbx-2qp4m-v2qmc - previous join server () was not valid, using new join server (<https://192.168.1.167:9345>)
rancher-797cdb6b8f-m25ks 2025/03/25 15:59:47 [INFO] [planner] rkecluster fleet-default/cluster01 - machine fleet-default/cluster01-tlbx-76ff6b5547xpxgpb-5lkn4 - previous join server (<https://192.168.1.181:9345>) was not valid, using new join server (<https://192.168.1.167:9345>)
I already restarted all the rke2-server services on the control planes and the rest of the cluster seems fine. Any specific reason Nodes cannot be joined to the cluster? Appreciate any input on this. Rancher: v2.10.3 Edit: i have found this issue which roughtly describes the issue i am having https://github.com/rancher/rancher/issues/36573 However i cant find a solution to the problem apart from recreate the cluster or restore etcd snapshots (This has been for a while so Snapshots from before the issue are no longer available)