https://rancher.com/ logo
Title
b

broad-farmer-70498

03/16/2023, 9:50 PM
I'm trying to rebuild a node that was in a rke2 cluster. I've got it rebuilt and rejoined but it's in a notready state because
container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
but I have disabled deploying cni via rke2 (manual install of cilium). This is a chicken/egg situation that I'm not sure how I haven't run into it before now. How can I get kubelet to come up ready enough to get cni from daemonset in the cluster?
c

creamy-pencil-82913

03/16/2023, 9:54 PM
How did you get it working the first time?
As you noted, the CNI installation needs to tolerate any NotReady taints the node has, since it will remain NotReady until after the CNI is up
b

broad-farmer-70498

03/16/2023, 9:55 PM
it just worked honestly, not sure how I didn't hit this tbh
lemme check the daemonset and see what it has on there
c

creamy-pencil-82913

03/16/2023, 9:56 PM
just out of curiosity why are you deploying your own Cilium instead of using
cni: cilium
b

broad-farmer-70498

03/16/2023, 9:56 PM
we have always managed it manually going back to rke1 days (which is how this particular cluster started life)
c

creamy-pencil-82913

03/16/2023, 9:57 PM
You did an in-place conversion from RKE1 to RKE2?
b

broad-farmer-70498

03/16/2023, 9:57 PM
yes, not recently
this cluster was converted probably 6 or 7 months ago...just now needed to reinstall a specific node because reasons
c

creamy-pencil-82913

03/16/2023, 9:58 PM
ahh yikes. that’s still got a lot of rough edges to it, I wouldn’t personally have recommended that
b

broad-farmer-70498

03/16/2023, 9:59 PM
I'm well aware of the rough edges yes (I helped document/smooth a bunch of them) 😄
This appears to me to tolerate everything? (this is what the cilium chart tolerates)
tolerations:
      - operator: Exists
c

creamy-pencil-82913

03/16/2023, 10:19 PM
That should do it. Is there some other reason why the CNI pod isn’t coming up on that node?
b

broad-farmer-70498

03/16/2023, 10:20 PM
doh, thanks for the tip...tunnel vision at the moment
lemme take a look
Warning  FailedCreatePodSandBox  8m12s                   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/kubepods/burstable/pod7389c130-8e97-470f-81ca-aed24cccceab/8ab52a02eb8c112a60158ba4c5fd5c639798f57541d5abb2669ff48cce4451db" instead: unknown
  Warning  FailedCreatePodSandBox  4m50s (x16 over 7m57s)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/kubepods/burstable/pod7389c130-8e97-470f-81ca-aed24cccceab/e1f032e99b21cdef3f64d4cd304150ac78836900f9f3be84bb139c54c94bcdab" instead: unknown
hmm
c

creamy-pencil-82913

03/16/2023, 10:26 PM
That usually indicates that the kubelet and container runtime aren’t using the same cgroup driver
are you using the embedded containerd, or docker?
b

broad-farmer-70498

03/16/2023, 10:26 PM
embedded containerd
c

creamy-pencil-82913

03/16/2023, 10:28 PM
are you overriding the kubelet’s --cgroup-driver option for any reason, or providing your own containerd configuration with a config.toml.tmpl ?
the kubelet and containerd config need to agree about what cgroup driver to use. If you leave it to the defaults then RKE2 keeps them in sync, but if you are overriding kubelet config or containerd config its possible that you’ve got one set to something different
b

broad-farmer-70498

03/16/2023, 10:30 PM
perhaps, I do have some config in the node ansible scripts to set some kernel args (but the other nodes match)
--cgroup-driver=cgroupfs
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true
I'm assuming that's a mismatch?
I must be missing some config the other nodes have (or added something the others don't) 😞
c

creamy-pencil-82913

03/16/2023, 10:35 PM
yep that’s a mismatch
I would remove that kubelet arg
you want to be using the systemd cgroup driver, not cgroupfs
b

broad-farmer-70498

03/16/2023, 10:35 PM
yup, I see where I f'd that up
@creamy-pencil-82913 thanks for the tips! I forgot I had mucked with that from the originals because of reasons 😞