This message was deleted.
# rke2
a
This message was deleted.
b
do you have static allocated IP addresses for each node?
usually, when the IP is changed, etcd has the old IP and will try to connect to it.. and when etcd is down, kubectl is down also
s
All IPs are static and have not changed.
👍 1
b
it would be useful to have more logs.. for eg.
journalctl -u rke2-server.service --no-pager -n 1000
s
attached the journalctl logs
b
i would suggest restarting the node and then collect journalctl logs again
PS. I’m not part of Rancher team or an expert in RKE2, i’m just trying to help with the experience I have while running our own RKE2 clusters
s
logs after restarting node
I have three nodes: 172.16.1.11, 172.16.1.12, and 172.16.1.13. I did try restarting all three yesterday, but it did not resolve the issue
b
could you please dump the journalctl again? i’m not able to find any errors (
"level":"error"
) logs
Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error
is expected while rke2 is starting..
I would debug why etcd is not up and running
<etcd-endpoints://0xc002dbe700/127.0.0.1:2379> transport: authentication handshake failed: context deadline exceeded
using these commands: https://gist.github.com/superseb/3b78f47989e0dbc1295486c186e944bf#etcd
also, i would look also in the other log files described in https://gist.github.com/superseb/3b78f47989e0dbc1295486c186e944bf#logging
s
Since I am unable to run kubectl commands, this may not be easy.
/var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
👍 1
Thanks for the suggestions though.
b
you might also try these commands (https://gist.github.com/superseb/3b78f47989e0dbc1295486c186e944bf#on-the-etcd-host-itself)
Copy code
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
/var/lib/rancher/rke2/bin/crictl exec $etcdcontainer sh -c "ETCDCTL_ENDPOINTS='<https://127.0.0.1:2379>' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl endpoint status --cluster --write-out=table"
s
Figured out what caused the failure and resolved it for now. Seems installing docker on the host machine breaks RKE2 networking somehow. Filed a bug: https://github.com/rancher/rke2/issues/4472
👍 1
1871 Views