Hi! I have this custom cluster, with 4 machines wi...
# general
s
Hi! I have this custom cluster, with 4 machines with all roles. But, if i lost the machine 1 (the first joined in cluster), i cant click and i can't access the cluster (the button is blocked). Is it some bug or default behavior of rancher? What to do?
b
In some versions it seems like I can't click on badges/links until I refresh the page after logging in.
No idea if it's bad cache or a browser issue, just throwing it out there.
That being said, I've also had weirdness where if the IPs change from what's registered in Rancher, the cluster is unavailable.
So if the VMs have DHCP you have to go back in and reserve those IPs.
s
@bland-article-62755 The VM's do not have DHCP. My problem is when the first machine to join the cluster goes down, the button becomes unavailable.
b
What do the nodes look like from the cluster view?
s
image.png
All my 4 machines are control planes, etcd and workers.
the problem only occurs when the first machine that entered the cluster goes down
b
Right, but that doesn't show the status
More like this:
you can see that they're all running, and this is under the cluster view
use flameshot or something if you need to blur out sensitive info
that highlighted area is just to double check that the nodes are available at those registered IPs.
s
hmm... "Not in a Pool"
b
That's ok
It does pools because mine is a Harvester cluster
Does the Recent Events tab give any clues?
s
hm.. take a look at my infrastructure and see if I did it right: I have 5 machines 1 - RANCHER (Local) + custom cluster 2, 3, 4, 5 <-- workers, etc, control, plane Would I have to have rancher on the machines too?
b
So, Rancher is typically on a "meta" cluster.
It's listed as the local cluster in the Rancher UI.
It should manage/provision other downstream clusters.
If you're running down stream boxes on baremetal, you might consider Elemental as it'll help manage the host OS.
If they're VMs, you can use a provisioner for those as well.
If they're REALLY big boxes, you can do a Harvester cluster, which is a hypervisor based off kubernetes.
at $DayJob we use Elemental and Harvester, and we're using older boxes for our Rancher cluster so it's in HA.
Typically, I'd recommend not giving
etcd
control plane
an even number of nodes.
s
To configure HA correctly, i need Harvester?
b
Nope, not saying that at all.
It's general best practice to do odd numbers for the control plane/etcd to avoid split brain.
So 3 nodes with "all" the roles and 1 additional worker node.
If you want Rancher to be HA, then you should have 3 nodes running that "local" cluster.
but that local cluster isn't where you'd run your production workloads.
Those would be in a downstream cluster and should also be HA somehow.
s
Is it possible for me to change the roles after I have already added the machine to the cluster?
b
Honestly, I don't know. Maybe?
You can always delete 1 reprovision, then rejoin.
s
I think I found the problem, this cattle-system-agent is only running on the first node, and not on all 4 nodes, even though I added more nodes. So if I lose this node, everything goes down.
🔥 1