https://rancher.com/ logo
Title
f

faint-airport-83518

07/08/2022, 8:23 PM
@creamy-pencil-82913 I see you a lot in the github handling issues, is there any written guidance on debugging an unhealthy RKE2 installation? Using ctr, crictl, any useful directories that logs get stored in? I’m able to pull some decent info from this issue: https://github.com/rancher/rke2/issues/2080, but was wondering if rancher provides anything else aside from the quickstart documentation (or if you had any notes handy 🙂)
c

creamy-pencil-82913

07/08/2022, 8:30 PM
unhealthy how? I usually just look at the service logs in journald, and the static pod logs in /var/log/pods. Everything you need will be in one of those two places.
oh, and the containerd log file too, where that exists
f

faint-airport-83518

07/08/2022, 9:02 PM
I’m currently deploying in an environment with spotty egress, so sometimes when I spin up a new control plane node the etcd connection might timeout (according to the journalctl output), for example, just not sure where to start to debug that
c

creamy-pencil-82913

07/08/2022, 9:11 PM
by egress, do you mean your connection to the internet?
the rke2-server logs will show timeouts connecting to etcd during initial startup until the etcd static pod starts up. If you have a spotty internet connection, it could be waiting for the etcd image to pull?
Have you tried dropping the airgap image tarballs on the nodes to make sure that all the images are available locally? or using a local registry mirror?
the containerd logs and
crictl ps
output will show you what’s going on with that, if it is indeed waiting for the etcd image.
f

faint-airport-83518

07/09/2022, 4:35 PM
Thanks for the info, I'll try checking out the containerd and crictl stuff next time.