This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

11/09/2023, 10:47 AM

This message was deleted.

bland-article-62755

11/09/2023, 3:34 PM

I thought that it was just a tag/label you apply

bland-article-62755

11/09/2023, 3:34 PM

There should be either 1 or 3 I think

bland-article-62755

11/09/2023, 3:49 PM

what does

kubectl get nodes -o wide

show?

bland-article-62755

11/09/2023, 3:50 PM

Also: https://rancher-users.slack.com/archives/C01GKHKAG0K/p1697213998266789?thread_ts=1697208087.461179&cid=C01GKHKAG0K

bland-article-62755

11/09/2023, 3:51 PM

Read the warning in the thread for sure

miniature-lock-53926

11/09/2023, 3:51 PM

Yeah i know that I can label them like this, but is it supported?

bland-article-62755

11/09/2023, 3:52 PM

The bigger issue is do you have 3 control plane nodes?

bland-article-62755

11/09/2023, 3:52 PM

Because if you do, this isn't an issue

miniature-lock-53926

11/09/2023, 3:52 PM

And do I always have to make sure that only 3 are running as control-planes or can i just increase the number when my cluster grows?

bland-article-62755

11/09/2023, 3:53 PM

You need 3 to have HA

bland-article-62755

11/09/2023, 3:53 PM

for times like when you need to update or take down two of them

bland-article-62755

11/09/2023, 3:53 PM

As long as one is up you should be good

miniature-lock-53926

11/09/2023, 3:54 PM

I have 3. like harvester-node 1-3 are control-planes and we needed to put 2 and 3 in maintenance... we just extended the downtime windows and did one host after another and didnt have to put both simultaneously

bland-article-62755

11/09/2023, 3:55 PM

If it were me, I'd just do them.

miniature-lock-53926

11/09/2023, 3:55 PM

I cannot put the second node into maintenance mode when one already is. Harvester doesnt allow it

bland-article-62755

11/09/2023, 3:55 PM

Then you're gonna have to use a second window to drain and change the labels to make a new CP node

bland-article-62755

11/09/2023, 3:56 PM

Which kinda defeats the purpose.

bland-article-62755

11/09/2023, 3:56 PM

The easier/safer thing to would just be to do them one at a time

🙌 1

bland-article-62755

11/09/2023, 3:56 PM

Because you shouldn't have 4 nodes on the CP.

miniature-lock-53926

11/09/2023, 3:57 PM

I once broke my previous harvester cluster by deleting more then 1 control-plane... but i was even able to restore it by cluster-resetting the rke2 cluster... but that was bevor the current production setup and I didnt want to do it again xD

miniature-lock-53926

11/09/2023, 3:57 PM

That was the reason we did it like this

bland-article-62755

11/09/2023, 4:00 PM

As I see it you can: • Extend your window and just do one after the other • Add another window and do another. • Add two more nodes to the CP and pray that nothing screws up and etcd doesn't get corrupted then demote two you need to workers to avoid a split brain. (something you should probably do in a maint window so this doesn't actually save you time unless you want to just YOLO it)

bland-article-62755

11/09/2023, 4:02 PM

You can have more than 3 nodes for the CP but I haven't seen that in practice very much.

miniature-lock-53926

11/09/2023, 4:03 PM

That was what I wasn't sure enough to try it without first asking somewhere

bland-article-62755

11/09/2023, 4:04 PM

It's a good use case for a staging or dev/test env if yall don't have one already

miniature-lock-53926

11/09/2023, 4:04 PM

I think I am creating an "official" question on github to sort this out

bland-article-62755

11/09/2023, 4:05 PM

I'm not sure what they're going to say. If they have rules for not taking out more than 1 of the CP nodes at a time, adding a number higher than 3 will just make future upgrades more difficult.

miniature-lock-53926

11/09/2023, 4:06 PM

No I WANT to have more then 3 CP Nodes

miniature-lock-53926

11/09/2023, 4:06 PM

Shouldnt the control-plane scale with the size of the cluster anyways

bland-article-62755

11/09/2023, 4:06 PM

It's just handling the Kube API requests.

bland-article-62755

11/09/2023, 4:06 PM

and keeping track of etcd

bland-article-62755

11/09/2023, 4:07 PM

More is not always better

miniature-lock-53926

11/09/2023, 4:07 PM

and the documentation is silent on the issue with the exception of this Note

miniature-lock-53926

11/09/2023, 4:07 PM

And even this implies there could be more than 3 CP-Nodes

bland-article-62755

11/09/2023, 4:07 PM

Yes

bland-article-62755

11/09/2023, 4:08 PM

But just because you can do something, doesn't mean that you should.

miniature-lock-53926

11/09/2023, 4:09 PM

No but I think a fault tolerance of just one of 5 nodes is just a little sad

bland-article-62755

11/09/2023, 4:09 PM

So let me put it this way: If it's pretty standard for a 3 node CP to manage a 100 nodes, why bump your cp up to 9 at 21?

bland-article-62755

11/09/2023, 4:10 PM

You can still lose 2 of the CP and be functioning, but things are going to be angry.

bland-article-62755

11/09/2023, 4:12 PM

Fault tolerance isn't the same as availability.

bland-article-62755

11/09/2023, 4:12 PM

It's just what harvester is going to let you willingly do.

bland-article-62755

11/09/2023, 4:14 PM

But I don't know all the details of what you're running, where it's running, and what your business drivers and SLAs are.

bland-article-62755

11/09/2023, 4:14 PM

So there might be legit reasons to need X or Y.

miniature-lock-53926

11/09/2023, 4:15 PM

Hm you maybe right, but I am wondering if the scaling recommendations from rke2 are not applicable to harvester cluster atleast in principle

bland-article-62755

11/09/2023, 4:15 PM

I would say it's not

bland-article-62755

11/09/2023, 4:16 PM

The load for API stuff in a harvester cluster (just for running VMs) would look very different from a standard k8s cluster at capacity.

bland-article-62755

11/09/2023, 4:17 PM

Keeping track of a 16 core VM with 128GB is a lot easier than the 3,000 pods of microservces that could consume the same resources.

miniature-lock-53926

11/09/2023, 4:18 PM

Hmmm you got a point... but at some point i would think the 3 CP should become a bottleneck

bland-article-62755

11/09/2023, 4:20 PM

at some point, but I suspect that we're talking about hundreds or thousands of nodes, and you should have a support contract with Suse at that point for sanity and business continuity

bland-article-62755

11/09/2023, 4:22 PM

You'd also start to see warnings about error budget because the KubeAPI can't keep up with requests

bland-article-62755

11/09/2023, 4:22 PM

But there's lots of things that could affect that, not just the number of CP nodes

bland-article-62755

11/09/2023, 4:23 PM

And if it's something like your network throughput isn't big enough, adding more CP will just make the issue worse.

miniature-lock-53926

11/09/2023, 4:24 PM

Good points. We are discussing internally if we should get support anyway. But I still think I will ask anyways, because those information should be part of the docs, don't you think?

bland-article-62755

11/09/2023, 4:24 PM

Yes and no? ¯\_(ツ)_/¯

bland-article-62755

11/09/2023, 4:26 PM

The docs aren't meant to give you cut and dry how to's for every situation. They're suppose to explain how to get it started/running and explain enough of how things work so you can navigate to your own answers.

bland-article-62755

11/09/2023, 4:26 PM

This to me is more along the lines of consulting/support, which they charge for.

bland-article-62755

11/09/2023, 4:27 PM

100% transparency - I don't work for Suse, but I do have a support contract with them through my day job and I've been pleased so far.

miniature-lock-53926

11/09/2023, 4:27 PM

But it atleast should tell you if this is supported in principle or this is discouraged for some reason

bland-article-62755

11/09/2023, 4:28 PM

I think the docs are clear that it's supported in principal and why you wouldn't do it is completely environmental and dependent on your own circumstances.

miniature-lock-53926

11/09/2023, 4:30 PM

That's great to hear... I will recommend that we will ask for a quote for support. But where do you think it is clear that it is supported? The only reference in the whole docs was the note I screenshotted and the wording is so indirect that I am not sure if it is supported, Thats why I am here asking stupid questions xD

bland-article-62755

11/09/2023, 4:35 PM

It's built on Kubernetes. You can look to upstream docs for that too: https://kubernetes.io/docs/setup/best-practices/cluster-large/

bland-article-62755

11/09/2023, 4:38 PM

Even that is very loose because it's largely dependent on how good your controlplane nodes are at keeping up with the requests.

bland-article-62755

11/09/2023, 4:38 PM

anyways, I'll stop my yammering

bland-article-62755

11/09/2023, 4:38 PM

Good luck! 🙂

miniature-lock-53926

11/09/2023, 4:39 PM

Thank you I appreciate it very much.

2 Views

Open in Slack

Previous Next