https://rancher.com/ logo
Title
b

brainy-whale-94197

03/17/2023, 3:25 PM
A cluster got broken during maintenance, "Updating, waiting for etcd, controlplane and worker nodes to be registered". The cluster works, it is just in a byggy state in Rancher. How can I get it to be fine again? I tried editing the custom resources in the local cluster to represent the current state exactly, nodes was not correct and it said that the control plane had been replaced for a while. Now there are two nodes(All roles), the cluster have a namespace in the local cluster, the global "clusters.management..." object look fine and also the nodes objects in the namespace. The cluster agent is connected the node agent however claims that "Waiting for node to register. Either cluster is not ready for registering, cluster is currently provisioning, or etcd, control-plane and worker node have to be registered" The nodes are "green" and "active". So things works but Rancher fails to acknowledge this. I looked at all resources I could find in the local cluster and in the downstream cluster, updated some node configs which was wrong and now I can not find anything more wrong. I even updated the super shady secret for the cluster in the local cluster which in fact seems to be a set of configs, obfuscated and then put into one mega huge bas64 encoded value. Where is this "clusterIsBrokenDoNotLetNodesConnect" variable which is false but should be true? Thanks!
p

polite-piano-74233

03/17/2023, 5:31 PM
I've run into similar with the whole 'nodes not ready' when they clearly are, it was a communication issue with the rancher agent and rancher server if i remember correctly. Where the rancher agent running on the VM couldnt properly communicate with whatever your rancher url was. For us the fix was to use an ip address instead of url for the rancher server name.
b

brainy-whale-94197

03/20/2023, 7:24 AM
Thanks for your input! I think this error must be different, seems like the only option left is to remove and replace
p

polite-piano-74233

03/20/2023, 5:23 PM
@brainy-whale-94197 what base os are you running?
if its ubuntu 22.04 theres a known flannel bug that causes all kinds of traffic issues
b

brainy-whale-94197

05/08/2023, 1:18 PM
Sry for late reply, I could not save this had to rebuild the cluster.
Thanks