Under what conditions will Harvester promote anoth...
# harvester
s
Under what conditions will Harvester promote another Default Role node to become a Management node (Etcd)? I'm trying to understand what would happen in a scenario where where we spread Management Nodes across two very closely located data closets with low-latency, high bandwidth communication and we lose one data closet (site)? For example, if Site A would go offline, would a Default Role node in Site B get promoted automatically to maintain minimum quorum for Harvester Etcd?
b
I'm pretty sure it'd only get promoted upon node removal.
I also don't think there's any way to "demote" a node.
So if Site A came back online, you don't want split brain and an ~un~even number for the control plane.
err even number
s
So promotion is ONLY a tool for manual maintenance?
b
No, it's still for HA.
If your power supply dies or you have a switch go out or something etcd needs to keep running.
Let's say something bad happens to an etcd node and it's gonna be more than a day or two, you probably want to remove that node out of the cluster and have a new node get promoted to keep up the HA for etcd (and longhorn replicas etc)
But Harvester manages that promotion, it's not something you can trigger (unless you are manually removing nodes intentionally)
It'll promote the nodes in order they were added to the cluster iirc.
Unless you have them tagged as a management node, but even then I think it only does 3
s
Right. But since Etcd can only tolerate the loss of (n-1)/2 for quorum. In a scenario where we could potentially lose a whole site, how can we mitigate? Technically losing 3 is more than is allowed for 5 Etcd nodes as well.
Thanks for your help and answers by the way. I don't have a lot of time with production/HA configuration of Harvester yet, but I do with RKE2. I know that losing 2 of 3 control plane nodes has caused us issue in the past.
b
It depends on how it goes down.
if siteA loses power and you have one etcd node on B, you won't have qorrum, but etcd will keep running.
If you have 2 in A and 1 in B
Worse would be if you lose connectivity but both sites stay up, and you end up with split brain.
This is worth getting support with Suse over and just asking.
s
Yep - support is in the works. We've had the same discussions with RKE2 installs. The interesting difference here was Harvester's ability to promote nodes to Managers. I was curious how that changed anything for it's underlying Etcd/HA.
Thanks