05/09/2023, 4:28 PM
👋 Hey, I have a new cluster running in azure with a vmss with autoscaling (cluster version is: v1.25.9+rke2r1 and it was registered from a rancher server v2.7.3), the nodes are removed from rancher and the vmss (thanks to a small operator), however "new" nodes can no longer be added, the error on the new nodes is:
Waiting to retrieve agent configuration; server is not ready: Node password rejected, duplicate hostname or contents of '/etc/rancher/node/password' may not match server node-passwd entry, try enabling a unique node name with the --with-node-id flag
I have searched over in github issues but I couldn't get a reference that worked, the secret for the nodes
is also gone, so I'm trying to find any clue to make the cluster operational again, it is stuck and the agent cannot complete the process, any help is appreciated 🙏 , thanks
I was able to get the cluster working again by changing a node hostname and then run the uninstall script, then make it join the cluster again, also deleted the rancher folders in
However unique hostnames and that workaround doesn't seem to scale
well, maybe scale is not the right word, but for my use-case hostname re-use would be common, as node goes away, node goes back in sort of thing
for the time being, I will append a uuid to the hostnames and see if that makes this pain go away


05/10/2023, 5:13 AM


05/10/2023, 2:16 PM
Yes, it looks like the same issue, I was able to "overcome" the issue by adding a few lines to the cloudinit script to generate a somewhat unique hostname, but it is just a hack to make it work