Under what conditions will Harvester promote another Default Rancher Users #harvester

Under what conditions will Harvester promote anoth...

strong-shoe-72392

05/14/2025, 3:17 PM

Under what conditions will Harvester promote another Default Role node to become a Management node (Etcd)? I'm trying to understand what would happen in a scenario where where we spread Management Nodes across two very closely located data closets with low-latency, high bandwidth communication and we lose one data closet (site)? For example, if Site A would go offline, would a Default Role node in Site B get promoted automatically to maintain minimum quorum for Harvester Etcd?

bland-article-62755

05/14/2025, 5:56 PM

I'm pretty sure it'd only get promoted upon node removal.

bland-article-62755

05/14/2025, 5:56 PM

I also don't think there's any way to "demote" a node.

bland-article-62755

05/14/2025, 5:57 PM

So if Site A came back online, you don't want split brain and an ~un~even number for the control plane.

bland-article-62755

05/14/2025, 5:58 PM

err even number

strong-shoe-72392

05/14/2025, 5:58 PM

So promotion is ONLY a tool for manual maintenance?

bland-article-62755

05/14/2025, 5:59 PM

No, it's still for HA.

bland-article-62755

05/14/2025, 5:59 PM

If your power supply dies or you have a switch go out or something etcd needs to keep running.

bland-article-62755

05/14/2025, 6:01 PM

Let's say something bad happens to an etcd node and it's gonna be more than a day or two, you probably want to remove that node out of the cluster and have a new node get promoted to keep up the HA for etcd (and longhorn replicas etc)

bland-article-62755

05/14/2025, 6:02 PM

But Harvester manages that promotion, it's not something you can trigger (unless you are manually removing nodes intentionally)

bland-article-62755

05/14/2025, 6:03 PM

It'll promote the nodes in order they were added to the cluster iirc.

bland-article-62755

05/14/2025, 6:03 PM

Unless you have them tagged as a management node, but even then I think it only does 3

strong-shoe-72392

05/14/2025, 6:24 PM

Right. But since Etcd can only tolerate the loss of (n-1)/2 for quorum. In a scenario where we could potentially lose a whole site, how can we mitigate? Technically losing 3 is more than is allowed for 5 Etcd nodes as well.

strong-shoe-72392

05/14/2025, 6:25 PM

Thanks for your help and answers by the way. I don't have a lot of time with production/HA configuration of Harvester yet, but I do with RKE2. I know that losing 2 of 3 control plane nodes has caused us issue in the past.

bland-article-62755

05/14/2025, 7:08 PM

It depends on how it goes down.

bland-article-62755

05/14/2025, 7:09 PM

if siteA loses power and you have one etcd node on B, you won't have qorrum, but etcd will keep running.

bland-article-62755

05/14/2025, 7:10 PM

If you have 2 in A and 1 in B

bland-article-62755

05/14/2025, 7:10 PM

Worse would be if you lose connectivity but both sites stay up, and you end up with split brain.

bland-article-62755

05/14/2025, 7:11 PM

This is worth getting support with Suse over and just asking.

strong-shoe-72392

05/14/2025, 7:29 PM

Yep - support is in the works. We've had the same discussions with RKE2 installs. The interesting difference here was Harvester's ability to promote nodes to Managers. I was curious how that changed anything for it's underlying Etcd/HA.

strong-shoe-72392

05/14/2025, 7:29 PM

Thanks

brainy-kilobyte-33711

07/04/2025, 8:26 AM

Did you get an answer around how it promotes the underlying rke2 node and installs etcd etc?

4 Views

Open in Slack

Previous Next