This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

03/16/2023, 9:50 PM

This message was deleted.

creamy-pencil-82913

03/16/2023, 9:54 PM

How did you get it working the first time?

creamy-pencil-82913

03/16/2023, 9:54 PM

As you noted, the CNI installation needs to tolerate any NotReady taints the node has, since it will remain NotReady until after the CNI is up

broad-farmer-70498

03/16/2023, 9:55 PM

it just worked honestly, not sure how I didn't hit this tbh

broad-farmer-70498

03/16/2023, 9:55 PM

lemme check the daemonset and see what it has on there

creamy-pencil-82913

03/16/2023, 9:56 PM

just out of curiosity why are you deploying your own Cilium instead of using

cni: cilium

broad-farmer-70498

03/16/2023, 9:56 PM

we have always managed it manually going back to rke1 days (which is how this particular cluster started life)

creamy-pencil-82913

03/16/2023, 9:57 PM

You did an in-place conversion from RKE1 to RKE2?

broad-farmer-70498

03/16/2023, 9:57 PM

yes, not recently

broad-farmer-70498

03/16/2023, 9:58 PM

this cluster was converted probably 6 or 7 months ago...just now needed to reinstall a specific node because reasons

creamy-pencil-82913

03/16/2023, 9:58 PM

ahh yikes. that’s still got a lot of rough edges to it, I wouldn’t personally have recommended that

broad-farmer-70498

03/16/2023, 9:59 PM

I'm well aware of the rough edges yes (I helped document/smooth a bunch of them) 😄

broad-farmer-70498

03/16/2023, 10:07 PM

This appears to me to tolerate everything? (this is what the cilium chart tolerates)

broad-farmer-70498

03/16/2023, 10:07 PM

Copy code

tolerations:
      - operator: Exists

creamy-pencil-82913

03/16/2023, 10:19 PM

That should do it. Is there some other reason why the CNI pod isn’t coming up on that node?

broad-farmer-70498

03/16/2023, 10:20 PM

doh, thanks for the tip...tunnel vision at the moment

broad-farmer-70498

03/16/2023, 10:20 PM

lemme take a look

broad-farmer-70498

03/16/2023, 10:22 PM

Copy code

Warning  FailedCreatePodSandBox  8m12s                   kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/kubepods/burstable/pod7389c130-8e97-470f-81ca-aed24cccceab/8ab52a02eb8c112a60158ba4c5fd5c639798f57541d5abb2669ff48cce4451db" instead: unknown
  Warning  FailedCreatePodSandBox  4m50s (x16 over 7m57s)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: expected cgroupsPath to be of format "slice:prefix:name" for systemd cgroups, got "/kubepods/burstable/pod7389c130-8e97-470f-81ca-aed24cccceab/e1f032e99b21cdef3f64d4cd304150ac78836900f9f3be84bb139c54c94bcdab" instead: unknown

broad-farmer-70498

03/16/2023, 10:22 PM

hmm

creamy-pencil-82913

03/16/2023, 10:26 PM

That usually indicates that the kubelet and container runtime aren’t using the same cgroup driver

creamy-pencil-82913

03/16/2023, 10:26 PM

are you using the embedded containerd, or docker?

broad-farmer-70498

03/16/2023, 10:26 PM

embedded containerd

creamy-pencil-82913

03/16/2023, 10:28 PM

are you overriding the kubelet’s --cgroup-driver option for any reason, or providing your own containerd configuration with a config.toml.tmpl ?

creamy-pencil-82913

03/16/2023, 10:29 PM

the kubelet and containerd config need to agree about what cgroup driver to use. If you leave it to the defaults then RKE2 keeps them in sync, but if you are overriding kubelet config or containerd config its possible that you’ve got one set to something different

broad-farmer-70498

03/16/2023, 10:30 PM

perhaps, I do have some config in the node ansible scripts to set some kernel args (but the other nodes match)

broad-farmer-70498

03/16/2023, 10:30 PM

--cgroup-driver=cgroupfs

broad-farmer-70498

03/16/2023, 10:30 PM

Copy code

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

broad-farmer-70498

03/16/2023, 10:31 PM

I'm assuming that's a mismatch?

broad-farmer-70498

03/16/2023, 10:32 PM

I must be missing some config the other nodes have (or added something the others don't) 😞

creamy-pencil-82913

03/16/2023, 10:35 PM

yep that’s a mismatch

creamy-pencil-82913

03/16/2023, 10:35 PM

I would remove that kubelet arg

creamy-pencil-82913

03/16/2023, 10:35 PM

you want to be using the systemd cgroup driver, not cgroupfs

broad-farmer-70498

03/16/2023, 10:35 PM

yup, I see where I f'd that up

broad-farmer-70498

03/16/2023, 10:38 PM

@creamy-pencil-82913 thanks for the tips! I forgot I had mucked with that from the originals because of reasons 😞

840 Views

Open in Slack

Previous Next