This message was deleted Rancher Users #rancher-setup

Join Slack

This message was deleted.

# rancher-setup

adamant-kite-43734

05/27/2025, 7:22 AM

This message was deleted.

stocky-smartphone-18501

05/27/2025, 10:31 AM

We were also troubleshooting the following files: •

/var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json

•

/var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json

What we found was that when a master node goes down, its IP is removed from the

ServerAddresses

list. However, for some worker nodes, the unhealthy master IP is not being removed. Additionally, we observed that after a couple of minutes, the unhealthy master IP is being re-added to the list, which is causing the cluster to go down.

stocky-smartphone-18501

05/29/2025, 7:23 AM

Seems like it was a bug on 1.28.0 rke2 https://github.com/rancher/rke2/issues/5949 after upgrade it is working as expected.

calm-controller-13162

06/10/2025, 2:00 PM

hey, I am also in a similar boat. My RKE2 cluster was deployed to Harvester via Rancher. The mentioned json files on the workers appears good, i.e. it lists all three rke2 server nodes. I'm trying to get my head around why the second and third rke2-server nodes (master, cp, etcd) both point directly to the first node. There is no vip for port 9345 I can see. There is kube-vip in ARP mode for 6443. /etc/rancher/rke2/config.yaml.d/50-rancher.yaml If I was to build RKE2 myself I would craft the config files to all point at the centralised, load balanced VIP for 6443 and 9345. If I shut a master node I get similar results to you @stocky-smartphone-18501

calm-controller-13162

06/10/2025, 2:24 PM

FYI im running v1.32.5+rke2r1, built by latest rancher, hosted on latest harvester.

delightful-crayon-15293

06/10/2025, 4:28 PM

We are experiencing a simmilair issue in a harvester setup. Don’t kknow the rke2 version but we just upgraded harvester to 1.5.0. We tracked down that it’s the first node we Installed and the others joined the cluster to. When the first node goes down, the vip is also down. Even though we can see that Kube-vip pod did get the of another node, the vip stays unavailable.

calm-controller-13162

06/10/2025, 4:30 PM

is your kube-vip just for kubeapi on 6443, what about rke2-server on 9345?

calm-controller-13162

06/10/2025, 4:30 PM

i only have a vip for kubeapi

calm-controller-13162

06/10/2025, 4:33 PM

this is all i have with kube-vip - name: cp_enable value: "false" - name: enable_service_security value: "true" - name: lb_enable value: "true" - name: lb_port value: "6443" - name: svc_election value: "false" - name: svc_enable value: "true" - name: vip_arp value: "true" - name: vip_cidr value: "32" - name: vip_interface - name: vip_leaderelection value: "false"

calm-controller-13162

06/10/2025, 4:33 PM

for rke2-server it points directly to nodes... which cant be right

calm-controller-13162

06/10/2025, 4:34 PM

If a master dies that the other two are pointing to, and then auto updates occur and reboots the remaining two... then its broken as rke2-server service will fail. Seems a strange way to build them. Am i going mad or is that just going to fail easily if you dont sit on it and respond to failed nodes within 24hrs.

calm-controller-13162

06/10/2025, 4:35 PM

default auto update is daily on suse sle-micro I think

calm-controller-13162

06/10/2025, 5:34 PM

For the kubeapi, I advertise the service to the underlay using calico, which works around api access issues. I’m more concerned with how the rke2 nodes register on tcp 9345, those are not on a vip on mine, when built with rancher using harvester provider. As described above it’s not great for resiliency when things reboot and you have nodes pointing to nodes - not via a LB. Dunno if it’s always like that or if it’s just my setup? Be good to hear others experience.

2 Views

Open in Slack

Previous Next