https://rancher.com/ logo
Title
b

bumpy-portugal-40754

10/08/2022, 9:45 PM
To test disaster recovery we tried to re-create the first Harvester node of a 3-node-cluster (delete and reinstall). It didn't work and created a new single-node cluster. The same procedure was working for the other nodes... I could imagine this doesn't work because of the "first node" of the underlying RKE2, but Harvester might have a workaround. Is this supposed to work? How? If not, why it's not documented?
s

sticky-summer-13450

10/09/2022, 5:09 PM
I believe that once you have an HA cluster with 3 nodes which are all control-plane nodes there is no longer a "first node". You should be able to remove any node from the cluster, rebuild it, and join it back to the cluster. I believe you just use the procedure you used when you added the second and subsequent nodes. Just imagine you are adding the second node again.
b

bumpy-portugal-40754

10/09/2022, 6:41 PM
I know. I tried and it didn't work for the first node (but it worked for other nodes). If you check the underlying RKE2 config of any "non-first node" in /etc/rancher/rke2/config.yaml there is a reference to the first node with the "server: https://firstnode:9345" which needs to be present for other nodes to join. The recovery procedure of RKE2 is in this case to "promote" some other node to "first node" by changing a config.yaml of another node.
s

sticky-summer-13450

10/10/2022, 7:48 AM
I don't have that file on any of the nodes in my 3 node harvester cluster.
Did you set a cluster IP address when you created the cluster?
b

bumpy-portugal-40754

10/10/2022, 9:03 AM
My bad. Check /etc/rancher/rke2/config.yaml.d/* The server statement is in 50-rancher.yaml The server statement point to the first node's IP... which is expected from the rke2-point-of-view.