Help me troubleshoot Management URL stays unavailable and Se Rancher Users #harvester

Help me troubleshoot. Management URL stays unavail...

handsome-grass-93237

08/13/2025, 11:01 PM

Help me troubleshoot. Management URL stays unavailable and Setting up Harvester. Single Cluster Node on Dell R630 and DHCP worked and Node hostname and ip address show on the Harvester ASCII screen. I've poked through numerous logs but cannot pin down the reason yet. Which logs will help me while I'm running this test for Harvester 1.6.0-rc5 ?

happy-cat-90847

08/13/2025, 11:18 PM

I’ve seen that it says Unavailable, but is it really?

thousands-advantage-10804

08/13/2025, 11:48 PM

dhcp is a bad idea for the management ip. You should be able to ssh in as the rancher user and the password you set up during the install.

bland-article-62755

08/14/2025, 12:34 AM

• Management URL is fqdn? • Is the IP on the ASCII console a VIP or DHCP for the box you set up? • What's the output of the

ip -br a

when you log into the node? (Using ssh like Andy said, or via

f12

from the console? • Can you ping the IP from the same subnet/VLAN? What does

nmap

say is open for your host node?

handsome-grass-93237

08/14/2025, 1:52 AM

mgmt-br

is 192.168.1.100

mgmt-bo

has no IP but is assigned it seems to same MAC as

mgmt-br

I can SSH into rancher@192.168.1.100 and also

sudo -i

just fine. I cannot seem to get or find any responding user URL for managing via WEB GUI as Docs suggest.

eno1

and

eno2

are disabled (only 2 ports on Intel NIC for this server)...leaving both

eno3

and

eno4

available and

eno3

is mapped with same MAC as

mgmt-br

I really do need help at the

harvester

rancher-system-agent

logs and checked them. Seems like

kubectl get pods

is complaining about not reaching the API it reports as saying

<https://127.0.0.1:6443>

is not responding. The first time I installed via ISO and left as static and not DHCP and did get the Management URL. But with DHCP mode it doesn't and trying to learn how to trace this down to help with maybe better error logging, etc. with a PR.

handsome-grass-93237

08/14/2025, 2:07 AM

I can change the IP via DHCP mapping on router just fine and reboot of Harvester server does indeed change it for

mgmt-br

and

mgmt-bo

vip_hw_addr

is set to same MAC as

mgmt-br

in the

/oem/90_custom.yaml

??? weird? I also see Cattle reporting an IP 192.168.1.91 that is unexpected in the

90_custom.yaml

which says external Url: `https://192.168.1.91/api/v1/namespaces/cattle-monitoring-system/services ? And

tls-san

has that same 192.168.1.91 under path for

90-harvester-server.yaml

handsome-grass-93237

08/14/2025, 2:08 AM

Maybe there's some weird auto detection for IP happening with this Intel NIC slave or bonding? Just don't know how to filter down in the Harvester logs ecosystem just yet but learning more about it as I read the code base.

handsome-grass-93237

08/14/2025, 2:10 AM

but 192.168.1.91 is not shown as mapped to any interface I see when I do

ip addr

prehistoric-morning-49258

08/14/2025, 1:22 PM

Seems like
kubectl get pods
is complaining about not reaching the API it reports as saying
<https://127.0.0.1:6443>
is not responding.

prehistoric-morning-49258

08/14/2025, 1:23 PM

sounds borked. reinstall w/ static IP config and try again?

☝️ 1

thousands-advantage-10804

08/14/2025, 1:45 PM

Yup

handsome-grass-93237

08/14/2025, 2:27 PM

Copy code

Aug 14 14:20:25 junglebox rke2[2952]: time="2025-08-14T14:20:25Z" level=info msg="Failed to test etcd connection: this server is a not a member of the etcd cluster. Found [junglebox-63a6a0d9=<https://192.168.1.83:2380>], expect: junglebox-63a6a0d9=<https://192.168.1.100:2380>"

Looks like the

etcd

membership just needs a tweak? How can I update the server membership if

etcdctl

command doesn't exist as part of Harvester directly?

happy-cat-90847

08/14/2025, 2:30 PM

It’s in the etcd pod - you’d shell into the host pod. But it’s likely an issue with the wrong IP. Did you check that’s really the correct original IP?

happy-cat-90847

08/14/2025, 2:30 PM

And then reboot

handsome-grass-93237

08/14/2025, 2:31 PM

the IP is 192.168.1.100 indeed... but not .83

handsome-grass-93237

08/14/2025, 2:55 PM

After reboot, where would the etcd pod be picking up configuration for it? Somehow that .83 IP is stuck in somewhere some file? My VIP and MGMT BR are both static now and still problem exists with the wrong IP found/expected.

bland-article-62755

08/14/2025, 3:04 PM

The suggestion to start with a fresh install with static values is a good one.

bland-article-62755

08/14/2025, 3:05 PM

You don't want this to cause you problems a year from now.

handsome-grass-93237

08/14/2025, 3:05 PM

this is not about "a year from now" but about testing 1.6 rc5 and contributing back

bland-article-62755

08/14/2025, 3:05 PM

Still probably worth the 20 minutes to reinstall

➕ 1

bland-article-62755

08/14/2025, 3:06 PM

Plus any bug you find might be related to this

handsome-grass-93237

08/14/2025, 3:08 PM

I can see now that the rke2 config gets deleted by a shell script upon bootup. that's actually where the IP is stored correctly... but that config gets

rm

from the shell script, and I don't know why someone would have done that... I guess because the config gets recreated? But in actual use after reboot, I'm not seeing that

rancher-vip-config.yaml

getting recreated.

thousands-advantage-10804

08/14/2025, 3:09 PM

upon every boot harvester rebuilds the host OS. the configs are based on /oem/ yamls. It is easy to bork up that files and cause the node not to boot.

bland-article-62755

08/14/2025, 3:11 PM

Even then, there are certain values that k8s expects to be immutable. So even if you change it, there's some value in an object that can't be changed and will always pull the wrong one.

handsome-grass-93237

08/14/2025, 3:11 PM

The issue happened after I changed in

/oem/90_custom.yaml

from the VIP ip addr from mode: static to mode: dhcp

bland-article-62755

08/14/2025, 3:12 PM

I think node IP is one of those values, which is probably why etcd is wrong.

bland-article-62755

08/14/2025, 3:12 PM

kubectl get nodes -owide

bland-article-62755

08/14/2025, 3:13 PM

I had the same problem with VMs on DHCP failing to check in with rancher properly.

handsome-grass-93237

08/14/2025, 3:17 PM

Copy code

E0814 15:16:06.726863   19984 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"<https://127.0.0.1:6443/api?timeout=32s>\": dial tcp 127.0.0.1:6443: connect: connection refused"

handsome-grass-93237

08/14/2025, 3:18 PM

Likely because the rke2 config defaults to 127.0.0.1 when that rancher-vip- file gets

rm

from the shell script I think?

handsome-grass-93237

08/14/2025, 3:19 PM

Copy code

/etc/rancher/rke2/rke2.yaml

prehistoric-morning-49258

08/14/2025, 3:21 PM

journalctl -u rke2-server.service

handsome-grass-93237

08/14/2025, 3:27 PM

no errors and it ends with successfully generating the self-signed certificate

handsome-grass-93237

08/14/2025, 3:27 PM

at the beginning says:

handsome-grass-93237

08/14/2025, 3:27 PM

Copy code

r-url[3133]: + HARVESTER_CONFIG_FILE=/oem/harvester.config
r-url[3133]: + RKE2_VIP_CONFIG_FILE=/etc/rancher/rke2/config.yaml.d/90-harvester-vip.yaml
r-url[3133]: + case $1 in
r-url[3133]: + rm -f /etc/rancher/rke2/config.yaml.d/90-harvester-vip.yaml

handsome-grass-93237

08/14/2025, 3:30 PM

1 sec... more....

handsome-grass-93237

08/14/2025, 3:35 PM

Copy code

r":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"}
r":"etcdserver/server.go:759","msg":"started as single-node; fast-forwarding election ticks","l>
r":"embed/etcd.go:633","msg":"serving peer traffic","address":"127.0.0.1:2400"}
r":"embed/etcd.go:292","msg":"now serving peer/client/metrics","local-member-id":"c5a37df222778>
r":"embed/etcd.go:603","msg":"cmux::serve","address":"127.0.0.1:2400"}
eived: \"terminated\", canceling context..."
t temporary data store connection: failed to get etcd status: context canceled"
t temporary data store connection: etcd datastore is not started"
t temporary data store connection: etcd datastore is not started"
t temporary data store connection: etcd datastore is not started"
t temporary data store connection: etcd datastore is not started"

handsome-grass-93237

08/14/2025, 3:37 PM

I will reinstall harvester again via ISO and start with static VIP and MGMT BR again...then look at files and containers in more detail... but the bug was a simple, change from VIP static mode to dhcp in

/oem/90_custom.yaml

and then reboot.

brash-petabyte-67855

08/14/2025, 4:37 PM

I remember getting exact same error message (about IP being not what is expected) after I accidentally switched two NIC cables (both DHCP) of the node after RKE2 install.

handsome-grass-93237

08/14/2025, 11:05 PM

@brash-petabyte-67855 thanks but I've confirmed the NIC ports already.

13 Views

Open in Slack

Previous Next