https://rancher.com/ logo
Title
s

stocky-article-82001

02/09/2023, 12:43 PM
Hi all, I’m adding a node to an existing RKE2 cluster on Rancher but I’m getting the following error in the
rke2-server
logs and I’m at a loss.
"Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
{"level":"warn","ts":"2023-02-09T07:42:48.269-0500","logger":"etcd-client","caller":"v3@v3.5.4-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc0005e3a40/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
e

eager-london-83975

02/09/2023, 12:54 PM
are the logs coming from the node agent you are trying to add or the master ?
s

stocky-article-82001

02/09/2023, 12:54 PM
The node I’m trying to add.
{"level":"warn","ts":"2023-02-09T12:33:59.607Z","logger":"etcd-client","caller":"v3@v3.5.4-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc001504000/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 172.30.12.3:2379: connect: no route to host\""}
Feb 09 12:33:59 <MASTER HOSTNAME> rke2[13805]: time="2023-02-09T12:33:59Z" level=warning msg="Learner <HOSTNAME OF NEW NODE>-9ad37932 stalled at RaftAppliedIndex=0 for 5m0.607218804s"
Feb 09 12:33:59 <MASTER HOSTNAME> rke2[13805]: time="2023-02-09T12:33:59Z" level=warning msg="Removed learner <HOSTNAME OF NEW NODE>-9ad37932 from etcd cluster"
These are some logs from the master which is weird.
e

eager-london-83975

02/09/2023, 12:55 PM
Your network is not routed correctly
Your masternodes cannot communicate with each other
s

stocky-article-82001

02/09/2023, 12:57 PM
That’s bizarre, this new one has the exact same config as the other 2 masters.
e

eager-london-83975

02/09/2023, 12:57 PM
transport: Error while dialing dial tcp 172.30.12.3:2379: connect: no route to host
this happens when the ip is not routed properly
It's easy to miss, but check your security groups/firewall
Or that your subnet is routed properly, or even if they are in the same network/vpc
s

stocky-article-82001

02/09/2023, 12:58 PM
Yeah ok, I’ll have a dig around. Thanks!
e

eager-london-83975

02/09/2023, 1:00 PM
Please don't forget to notify if you manage to solve it, might help others!
s

stocky-article-82001

02/09/2023, 3:16 PM
The nodes can communicate (I’ve confirmed) but it is still not working.
I can curl the :6443 on the master from the new node fine
e

eager-london-83975

02/09/2023, 3:17 PM
but is 2379 open, namely ETCD running
s

stocky-article-82001

02/09/2023, 3:18 PM
Hmm, seems I’ve spoken too soon. It is working now, however there were some 500 errors at the start
it has joined to the cluster successfully now, let me wait for it to fully reconcile.
Yeah it still seems to be fucking up with networking.
~~~scratch that~~~
e

eager-london-83975

02/09/2023, 5:31 PM
So it's working good now ?
s

stocky-article-82001

02/09/2023, 5:36 PM
No, we’ve found some networking issues that we’re currently working through.
e

eager-london-83975

02/09/2023, 5:36 PM
Is it on-premises or what cloud are you using to manage your network and nodes ?