This message was deleted.
# general
a
This message was deleted.
1
s
Here are the provisioning logs for one of the rke2 cluster. If everything else succeeds, what would stop the cluster agent from connecting?
Copy code
[INFO ] waiting for infrastructure ready
[INFO ] waiting for at least one control plane, etcd, and worker node to be registered
[INFO ] waiting for viable init node
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for probes: calico, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for probes: calico
[INFO ] configuring bootstrap node(s) test-rke-pool1-7f995b8df9-xvm7d: waiting for cluster agent to connect
I seem to be having issues with the tunnel and/or websocket connections between downstream nodes and the Rancher Server.
journalctl -u rke2-server
on one of the nodes shows the following logs. Seems the tunnel between the node and rancher isn’t holding?
Copy code
level=info msg="Stopped tunnel to 127.0.0.1:9345"
level=info msg="Proxy done" err="context canceled" url="<wss://127.0.0.1:9345/v1-rke2/connect>"
level=info msg="Connecting to proxy" url="<wss://10.3.3.126:9345/v1-rke2/connect>"
level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
I see a similar error in the Rancher Server docker container logs.
Copy code
[INFO] error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF
along with lots of errors that end with
"unable to decode an event from the watch stream: tunnel disconnect"
c
those message on the rke2 side are normal.
has nothing to do with a tunnel to rancher, those are about internal REK2 tunnels between cluster nodes. The rancher stuff all runs in pods, you won’t find anything about rancher in the rke2 service logs.
s
Gotcha. Thank you, I’ll keep looking.
I’m seeing the cattle-cluster-agent pod is in a
Pending
state on the node.
Copy code
Warning  FailedScheduling  18m (x10 over 68m)  default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {<http://node.cloudprovider.kubernetes.io/uninitialized|node.cloudprovider.kubernetes.io/uninitialized>: true}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Could this be an issue with Harvester? It is my cloud provider in this case.
c
well it’s certainly an issue with the cloud provider. Have you looked at the cloud provider pod logs to see why it’s not working?
the pod on the downstream cluster, not on the harvester side
s
Ahhh, ok. The issue is clear. I didn’t realize the downstream nodes needed to be able to reach the Harvester network. I have the Harvester cluster and downstream nodes deployed in separate subnets with no routes between, so I see I need to rethink how I have things set up. Thank you for pointing me in the right direction!
c
yes, the cloud controller manager needs to actually be able to talk to the cloud provider
349 Views