This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

05/07/2024, 3:34 PM

This message was deleted.

powerful-librarian-10572

05/07/2024, 3:35 PM

I've faced this issue when removing the last working node in a cluster. I've yet to find a better solution than deleting the cluster and restarting from scratch

powerful-librarian-10572

05/07/2024, 3:35 PM

@few-coat-36487

🙏 1

powerful-librarian-10572

05/07/2024, 3:36 PM

Does your cluster still has other nodes available while you're tying to register again this particular one?

few-coat-36487

05/07/2024, 3:45 PM

So, I don't have any other nodes, because I'm in the process of creating my cluster and I'm encountering quite a few problems, since I'm quite new to Rancher. So, for the moment, I only have one node. Could this be a reason for the problem? What's the minimum number of nodes?

powerful-librarian-10572

05/07/2024, 3:48 PM

(please write in english for everyone to understand)

So, I don't have any other nodes, because I'm in the process of creating my cluster and I'm encountering quite a few problems, since I'm quite new to Rancher. So, for the moment, I only have one node. Could this be a reason for the problem? What's the minimum number of nodes?

Basically one node for testing is enough, and i started fighting seriously against rancher two weeks ago so i totally understand your feeling.

powerful-librarian-10572

05/07/2024, 3:49 PM

But yeah, if you have to reset your node for Y reason, re-do your cluster. i believe it has something about (as stated in the message) the etcd control plane being initialized in the cluster, but your node having no other controller to learn from

few-coat-36487

05/07/2024, 3:50 PM

Alright, I'll do what you said, destroy and recreate the cluster hoping that it works.

powerful-librarian-10572

05/07/2024, 3:50 PM

When you add your first node to a brand new cluster, etcd will start from scratch. Also, rancher-system-agent-uninstall followed by rke2-uninstall is enough before re-registering your node

powerful-librarian-10572

05/07/2024, 3:50 PM

Also please be aware that the minimal "production-ready" nodes is 3, by design

👍 1

few-coat-36487

05/07/2024, 3:55 PM

Okay, I tried with two nodes (and a fresh cluster), and the first machine that registers locks up with an error. But when I delete it, the second node goes into error, as if there's an error queue. I'm going to check the logs.

powerful-librarian-10572

05/07/2024, 3:56 PM

Huh, before registering your second node, the first node should be available

few-coat-36487

05/07/2024, 3:57 PM

Okay, good thing to know.

powerful-librarian-10572

05/07/2024, 3:58 PM

Also just to be sure, i recommend building a cluster on fresh installed nodes. I lost about 8h fighting (and losing) to make calico works on old nodes, while everything goes flawless on fresh installs

powerful-librarian-10572

05/07/2024, 3:59 PM

By fresh install, i don't mean uninstalling and reinstalling would break the thing, just that those node should be installed to become rke2 nodes and nothing else

few-coat-36487

05/07/2024, 4:02 PM

Reset everything 😕 Fresh almalinux... okay, I will do that on Friday. I'm taking my day off. Thanks for the help!

powerful-librarian-10572

05/07/2024, 4:03 PM

I wont be here on friday, i won't come back until monday. Take care and good luck !

👍 1

few-coat-36487

05/07/2024, 4:04 PM

One last question what happens if there's a problem on the node and there are data on it? Kubernetes isn't really viable if you want to put a MinIO or Longhorn to store data. If every time there's a problem, you have to reset the node.

powerful-librarian-10572

05/07/2024, 4:07 PM

If your cluster is properly setup and has made a quorum (requiring those 3 nodes i talked about earlier), Longhorn is such a superior solution. When the node is marked as failed after the quorum fails to comunicate with it, Longhorn will (supposedly) terminate the pods and it will be brought back up on another node.

powerful-librarian-10572

05/07/2024, 4:07 PM

Longhorn just manages all the data "magically". It has some quirks but is the sole reason i am migrating my whole production cluster to rancher 2.

few-coat-36487

05/13/2024, 7:42 AM

Okay, so last Friday I reinstalled the nodes nevertheless, it didn't work, and I was stuck with that on a fresh install. https://rancher-users.slack.com/archives/C01PHNP149L/p1715333103453829. I will retry a manual uninstall from rke2 last time I tried, it didn't seem to change anything.

Open in Slack

Previous Next