https://rancher.com/ logo
#harvester
Title
# harvester
f

flat-finland-50817

03/02/2023, 10:06 AM
Hello, I'm trying to provision a single-node rke cluster with rancher using a bare-metal harvester, but I tried several times and I'm always stuck at
waiting for cluster agent to connect
... I was able to log into the VM, rke2-server and rancher-system-agent are both running and file, and I'm able to curl the rancher https endpoint (I'm using self-signed certificates). What am I missing and any idea on how I can debug this ?
a

agreeable-oil-87482

03/02/2023, 4:28 PM
SSH to the node and run:
Copy code
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
/var/lib/rancher/rke2/bin/kubectl logs $cluster-agent-pod-name
f

flat-finland-50817

03/03/2023, 8:56 AM
When I'm on a mono-node cluster, the cluster just doesn't seem to start, either with rke2 or k3s :
Copy code
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
However, when trying on a cluster separing the control-plane from the worker (on rke2), I'm able to get this :
Copy code
root@test-typhlos-pool1-3b5c2018-4s9f9:~# kubectl get pods -A
NAMESPACE         NAME                                                        READY   STATUS      RESTARTS       AGE
calico-system     calico-kube-controllers-f75c97ff6-42pp9                     0/1     Pending     0              18h
calico-system     calico-node-8zb8q                                           0/1     Running     6 (41m ago)    18h
calico-system     calico-typha-5b7ff966b-6m5mc                                0/1     Pending     0              18h
cattle-system     cattle-cluster-agent-68cd6b9759-zc477                       0/1     Pending     0              18h
kube-system       etcd-test-pool1-3b5c2018-4s9f9                      1/1     Running     11 (59m ago)   18h
kube-system       harvester-cloud-provider-568dd85c97-w67hj                   1/1     Running     51 (26m ago)   18h
kube-system       harvester-csi-driver-controllers-779c557d47-fvxsg           0/3     Pending     0              18h
kube-system       harvester-csi-driver-controllers-779c557d47-p6prq           0/3     Pending     0              18h
kube-system       harvester-csi-driver-controllers-779c557d47-pd9l4           0/3     Pending     0              18h
kube-system       helm-install-harvester-cloud-provider-swqbf                 0/1     Completed   0              18h
kube-system       helm-install-harvester-csi-driver-cz2jg                     0/1     Completed   0              18h
kube-system       helm-install-rke2-calico-8rr9m                              0/1     Completed   4              18h
kube-system       helm-install-rke2-calico-crd-xs28w                          0/1     Completed   0              18h
kube-system       helm-install-rke2-coredns-rnpxk                             0/1     Completed   0              18h
kube-system       helm-install-rke2-ingress-nginx-st8x6                       0/1     Pending     0              18h
kube-system       helm-install-rke2-metrics-server-wtzd2                      0/1     Pending     0              18h
kube-system       kube-apiserver-test-pool1-3b5c2018-4s9f9            1/1     Running     14 (59m ago)   18h
kube-system       kube-controller-manager-test-pool1-3b5c2018-4s9f9   1/1     Running     81 (26m ago)   18h
kube-system       kube-proxy-test-pool1-3b5c2018-4s9f9                1/1     Running     11 (59m ago)   18h
kube-system       kube-scheduler-test-pool1-3b5c2018-4s9f9            1/1     Running     69 (26m ago)   18h
kube-system       rke2-coredns-rke2-coredns-58fd75f64b-r89f6                  0/1     Pending     0              18h
kube-system       rke2-coredns-rke2-coredns-autoscaler-768bfc5985-tzs7v       0/1     Pending     0              18h
tigera-operator   tigera-operator-586758ccf7-c7m88                            1/1     Running     50 (26m ago)   18h
The cluster agent just didn't seem to start. I don't know where the problem is coming from, I'm checking the logs of the other running pods to see if I can find any errors.
a

agreeable-oil-87482

03/03/2023, 8:59 AM
grab the logs from
harvester-cloud-provider-568dd85c97-w67hj
f

flat-finland-50817

03/03/2023, 9:06 AM
Copy code
I0303 09:26:16.522816       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:26:46.523909       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
I0303 09:28:59.648924       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:29:29.650546       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
I0303 09:29:30.364131       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:30:00.364647       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
E0303 09:30:05.560254       1 node_controller.go:241] Error getting instance metadata for node addresses: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout
E0303 09:30:53.948634       1 leaderelection.go:361] Failed to update lock: Put "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager?timeout=5s>": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I0303 09:34:06.505646       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:34:36.507156       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
E0303 09:35:35.565342       1 node_controller.go:241] Error getting instance metadata for node addresses: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout
I0303 09:39:13.330073       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:39:43.331896       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
I0303 09:40:55.725850       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:41:05.569251       1 node_controller.go:241] Error getting instance metadata for node addresses: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout
E0303 09:41:25.727107       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
I0303 09:44:20.067785       1 node_controller.go:390] Initializing node test-pool1-3b5c2018-4s9f9 with cloud provider
E0303 09:44:50.068591       1 node_controller.go:212] error syncing 'test-pool1-3b5c2018-4s9f9': failed to get instance metadata for node test-pool1-3b5c2018-4s9f9: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout, requeuing
E0303 09:46:35.577583       1 node_controller.go:241] Error getting instance metadata for node addresses: Get "<https://192.168.10.10:6443/apis/kubevirt.io/v1/namespaces/default/virtualmachines/test-pool1-3b5c2018-4s9f9>": dial tcp 192.168.10.10:6443: i/o timeout
I guess the node is not able to contact Harvester correctly ? I thought that it just needed to contact the managing rancher
96 Views