https://rancher.com/ logo
Title
b

broad-farmer-70498

10/10/2022, 9:56 PM
I'm trying to restore a node that is a control plane node (I rebuilt the node and copied all the data/etc back into place) but it not really starting up. The logs keep saying effectively that etcd hasn't started but I'm not really sure where to look for further debugging? any tips?
c

creamy-pencil-82913

10/10/2022, 10:48 PM
have you tried the etcd pod logs?
Also, if you’re restoring from a backup and the node name or IP has changed, you would need to do a --cluster-reset, possibly with --cluster-reset-restore to restore from a datastore snapshot.
b

broad-farmer-70498

10/10/2022, 10:53 PM
I didn’t have a pod, but the issue has been discovered
The new node is also a new os and I had to change the cgroup driver to systemd
I’m not sure how that could get logged better, but in short it wasn’t able to start pods at a fundamental level it seems
c

creamy-pencil-82913

10/11/2022, 12:08 AM
That should be autodetected based on the running OS configuration. Had you customized the containerd config template or something?
b

broad-farmer-70498

10/11/2022, 12:09 AM
Nope
I’m running 22.04 though, not sure if that’s supported
c

creamy-pencil-82913

10/11/2022, 12:09 AM
where did you need to change the cgroup driver then?
That’s not directly configurable anywhere in RKE2, unless you’re overriding kubelet args or something
b

broad-farmer-70498

10/11/2022, 12:11 AM
Well, the previous nodes were frankenstien centos7 nodes with current kernels
So I think I had to set that as a custom kubelet arg yes
Which is what I had to change
c

creamy-pencil-82913

10/11/2022, 12:39 AM
ah yeah. If you get out in the weeds with component args then you’re on the hook for maintaining that. We try to autodetect as much as we can.
b

broad-farmer-70498

10/11/2022, 12:40 AM
Yeah, I wasn’t in the weeds previously but admittedly centos7 with fresh kernel isn’t exactly normal either
But that didn’t work without intervention as I recall
The tough thing is if containers are flat out failing to start it would be nice to have some feedback if possible
c

creamy-pencil-82913

10/11/2022, 1:14 AM
It was probably in the containerd log somewhere?
b

broad-farmer-70498

10/11/2022, 1:17 AM
Does that not go to the jounald log or just to flat file?
c

creamy-pencil-82913

10/11/2022, 2:45 AM
no, containerd has its own log file
b

broad-farmer-70498

10/11/2022, 3:00 AM
ok, it was probably in there then
I see some stuff in the kubelet logs actually as well
E1010 16:08:08.562192   22449 remote_runtime.go:201] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod2bfb17ee41d495f34ce04b7101a882c6/77912b0c5a67ab7443d50a97cf1d84e00b862559989ca10bbef398f9e017fc3f\" instead: unknown"
E1010 16:08:08.562232   22449 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod2bfb17ee41d495f34ce04b7101a882c6/77912b0c5a67ab7443d50a97cf1d84e00b862559989ca10bbef398f9e017fc3f\" instead: unknown" pod="kube-system/etcd-172.26.64.22"
E1010 16:08:08.562256   22449 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \"slice:prefix:name\" for systemd cgroups, got \"/kubepods/burstable/pod2bfb17ee41d495f34ce04b7101a882c6/77912b0c5a67ab7443d50a97cf1d84e00b862559989ca10bbef398f9e017fc3f\" instead: unknown" pod="kube-system/etcd-172.26.64.22"
E1010 16:08:08.562310   22449 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-172.26.64.22_kube-system(2bfb17ee41d495f34ce04b7101a882c6)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-172.26.64.22_kube-system(2bfb17ee41d495f34ce04b7101a882c6)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format \\\"slice:prefix:name\\\" for systemd cgroups, got \\\"/kubepods/burstable/pod2bfb17ee41d495f34ce04b7101a882c6/77912b0c5a67ab7443d50a97cf1d84e00b862559989ca10bbef398f9e017fc3f\\\" instead: unknown\"" pod="kube-system/etcd-172.26.64.22" podUID=2bfb17ee41d495f34ce04b7101a882c6
kubelet-arg:
- --make-iptables-util-chains=false
- --log-file-max-size=20
#- --cgroup-driver=cgroupfs
- --cgroup-driver=systemd
- --max-pods=220
c

creamy-pencil-82913

10/11/2022, 4:01 AM
K3s already sets that based on the detected cgroup driver. You’re not intended to set it yourself.
Just delete both lines
b

broad-farmer-70498

10/11/2022, 1:36 PM
As I say it didn’t work for me and I had to explicitly set it.
But admittedly the nodes I was using were an oddity. I’ll try removing altogether for the 22.04 nodes and see how it goes.