Hey team,
I deployed a 3-node Harvester cluster on our local servers. Since yesterday, the Harvester UI has been inaccessible, and now the entire cluster seems to be down — all server nodes are showing as
NotReady. I also attempted to reset the Harvester configuration using the appropriate command. I’ve collected the related logs and can share them here.
Looking for help to troubleshoot and recover the cluster.
kubectl get nodes -A
E0903 12
0631.023219 25163 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"
https://127.0.0.1:6443/api?timeout=32s\": dial tcp 127.0.0.1
6443 connect: connection refused"
E0903 12
0631.024828 25163 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"
https://127.0.0.1:6443/api?timeout=32s\": dial tcp 127.0.0.1
6443 connect: connection refused"
E0903 12
0631.027031 25163 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"
https://127.0.0.1:6443/api?timeout=32s\": dial tcp 127.0.0.1
6443 connect: connection refused"
E0903 12
0631.028734 25163 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"
https://127.0.0.1:6443/api?timeout=32s\": dial tcp 127.0.0.1
6443 connect: connection refused"
E0903 12
0631.030133 25163 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"
https://127.0.0.1:6443/api?timeout=32s\": dial tcp 127.0.0.1
6443 connect: connection refused"
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
sudo /opt/rke2/bin/rke2 server --cluster-reset --config /etc/rancher/rke2/config.yaml.d/90-harvester-server.yaml
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Static pod cleanup completed successfully
INFO[0010] Starting rke2 v1.32.4+rke2r1 (4e465c0f03edba9a2af3b3c77d09840d3f7681ef)
INFO[0010] Managed etcd cluster initializing
INFO[0010] Updated load balancer rke2-agent-load-balancer default server: 127.0.0.1:9345
INFO[0010] Running load balancer rke2-agent-load-balancer 127.0.0.1:6444 -> [] [default: 127.0.0.1:9345]
WARN[0010] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1
2379 connect: connection refused"
INFO[0010] Running load balancer rke2-api-server-agent-load-balancer 127.0.0.1:6443 -> [] [default: ]
INFO[0011] Password verified locally for node orion1
INFO[0011] certificate CN=orion1 signed by CN=rke2-server-ca@1754633700: notBefore=2025-08-08 06
1500 +0000 UTC notAfter=2026-09-03 10
5320 +0000 UTC
INFO[0011] certificate CN=system
nodeorion1,O=system:nodes signed by CN=rke2-client-ca@1754633700: notBefore=2025-08-08 06
1500 +0000 UTC notAfter=2026-09-03 10
5320 +0000 UTC
INFO[0011] certificate CN=system:kube-proxy signed by CN=rke2-client-ca@1754633700: notBefore=2025-08-08 06
1500 +0000 UTC notAfter=2026-09-03 10
5321 +0000 UTC
INFO[0011] certificate CN=system:rke2-controller signed by CN=rke2-client-ca@1754633700: notBefore=2025-08-08 06
1500 +0000 UTC notAfter=2026-09-03 10
5321 +0000 UTC
INFO[0011] Using private registry config file at /etc/rancher/rke2/registries.yaml
INFO[0011] Module overlay was already loaded
INFO[0011] Module nf_conntrack was already loaded
INFO[0011] Module br_netfilter was already loaded
INFO[0011] Module iptable_nat was already loaded
INFO[0011] Module iptable_filter was already loaded
INFO[0011] Runtime image
index.docker.io/rancher/rke2-runtime:v1.32.4-rke2r1 bin and charts directories already exist; skipping extract
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-cilium.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-flannel.yaml to set cluster configuration values
INFO[0011] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values
INFO[0011] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-validation-webhook.yaml
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-traefik.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values
INFO[0012] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/rancher/rke2-etcd-snapshot-extra-metadata.yaml
INFO[0012] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/rancher/cluster-agent.yaml
INFO[0012] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/rancher/managed-chart-config.yaml
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-runtimeclasses.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values
INFO[0012] No cluster configuration value changes necessary for manifest /var/lib/rancher/rke2/server/manifests/rancher/addons.yaml
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-multus.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico-crd.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-calico.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-traefik-crd.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-cpi.yaml to set cluster configuration values
INFO[0012] Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-snapshot-controller-crd.yaml to set cluster configuration values
INFO[0012] Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0012] Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml
INFO[0013] containerd is now running
INFO[0013] Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt
INFO[0013] Pulling image
index.docker.io/rancher/rke2-cloud-provider:v1.32.0-rc3.0.20241220224140-68fbd1a6b543-build20250101
WARN[0015] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1
2379 connect: connection refused"
WARN[0020] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1
2379 connect: connection refused"
INFO[0021] Polling for API server readiness: GET /readyz failed: Get "
https://127.0.0.1:6443/readyz?timeout=15s&verbose=": EOF
WARN[0025] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1
2379 connect: connection refused"
ERRO[0029] Error encountered while importing /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt: image "
index.docker.io/rancher/rke2-cloud-provider:v1.32.0-rc3.0.20241220224140-68fbd1a6b543-build20250101": not found
INFO[0029] Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt
INFO[0029] Pulling image
index.docker.io/rancher/hardened-etcd:v3.5.21-k3s1-build20250411
WARN[0030] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1
2379 connect: connection refused"
WARN[0035] Failed to get apiserver address from etcd: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1
2379 connect: connection refused"
ERRO[0036] Error encountered while importing /var/lib/rancher/rke2/agent/images/etcd-image.txt: failed to pull images from /var/lib/rancher/rke2/agent/images/etcd-image.txt: image "
index.docker.io/rancher/hardened-etcd:v3.5.21-k3s1-build20250411": not found