https://rancher.com/ logo
Title
c

chilly-caravan-31853

08/26/2022, 10:22 AM
Dear Colleagues, I'm trying to create a HA cluster with the embedded DB as described in https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/ . I launched the first Node A with the
--cluster-init
argument, then added two more Nodes B and C with the
--server A
argument. All seemed fine, but after I shut down all the three nodes and then start B and C only, the cluster is not operational (
k3s kubectl get nodes
on Node B gives the "apiserver not ready" error). What am I missing? Is the first A node still special and a single point of failure? It is also strange that I cannot locate an etcd pod on any of the nodes when they are running normally.
h

hundreds-evening-84071

08/26/2022, 1:18 PM
I have never done what you list here; i.e. shutdown all 3 nodes at the same time. From what I understand, 2 of the 3 nodes need to remain online for HA. However, to troubleshoot, what happens in your setup if you only shutdown node A? Does
kubectl get nodes
report api errors?
w

wonderful-baker-81666

08/26/2022, 2:24 PM
How about creating a Vagrantfile for this scenario?
c

chilly-caravan-31853

08/29/2022, 3:37 AM
2 of the 3 nodes need to remain online for HA.
That's interesting. Is it documented? I had experimented with a HA setup of 2 nodes with a PostgreSQL datastore, and it survived the loss of 1 node.
what happens in your setup if you only shutdown node A?
The cluster remains operational.
In https://rancher.com/docs/k3s/latest/en/installation/ha/ it says about two or more server nodes, and no info on the number of required nodes in https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/
h

hundreds-evening-84071

08/29/2022, 1:35 PM
Honestly, I do not know where it is documented. It is something I learned, couple of years ago, at one of the free Rancher training (now that has moved to community.suse.com). So, over the years in HA clusters I have always done maintenance (OS patches etc) on one node at a time (where there are 3 nodes in a HA-cluster).
c

chilly-caravan-31853

08/30/2022, 4:42 AM
@hundreds-evening-84071 OK, thank you, I'll just take it for granted.
h

hundreds-evening-84071

08/30/2022, 5:09 PM
In a way, you did the test I believe... I mean: when you shutdown all 3 nodes and power on 2 of the 3 you have noticed issues when you shutdown just 1 you do not notice any issues... For more testing: you can shutdown 2 of the 3 and see what happens?
c

chilly-caravan-31853

08/31/2022, 7:47 AM
@hundreds-evening-84071 when I shut down 2 nodes of the 3, I start getting the
apiserver not ready
error from the remaining node's 6443 port. Switching 1 node on (to make the total of 2) fixes the cluster.
A cluster of 4 nodes survives the loss of 1 node but not the loss of 2 nodes. Does it not look strange?
OTOH a 4-node cluster with a PostgreSQL datastore survives (more or less) the loss of 3 nodes, at least the API remains operational if you connect to the alive node.