Hello I m trying to enable the graceful node shutdown featur Rancher Users #rke2

Hello. I'm trying to enable the graceful node shut...

melodic-tiger-82590

01/15/2024, 10:40 AM

Hello. I'm trying to enable the graceful node shutdown feature on a rke2 cluster. I created a config file for kubelet added a flag in

kubelet-arg

to use it:

Copy code

apiVersion: <http://kubelet.config.k8s.io/v1beta1|kubelet.config.k8s.io/v1beta1>
kind: KubeletConfiguration
shutdownGracePeriod: 2m
shutdownGracePeriodCriticalPods: 10s

Copy code

kubelet-arg: 
  - config=/etc/rancher/rke2/kubelet.yaml

I see the systemd inhibit lock taken by

kubelet

systemd-inhibit --list

. When I then execute

systemctl reboot

I see the node changes status to

NotReady

instantly. But I don't see the pods on that node terminating, when running

kubectl get pods

. I also verified, if the pods are actually terminated on that node by executing

/var/lib/rancher/rke2/bin/crictl ps

and they are! In the kubelet logs I see some connection issues to apiserver:

Copy code

E0115 11:39:15.784024    2080 kubelet.go:1899] failed to "KillPodSandbox" for "af6511b8-46f2-43bf-a590-0b9a130a4e98" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"b5fca2db2fd9a302ac72de1ea781e1da38ef0a488e862c441fe156489133930e\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"<https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default>\": dial tcp 10.43.0.1:443: connect: connection refused"
E0115 11:39:19.185411    2080 kubelet_node_status.go:540] "Error updating node status, will retry" err="error getting node \"xxx\": Get \"<https://127.0.0.1:6443/api/v1/nodes/xxx?timeout=10s>\": dial tcp 127.0.0.1:6443: connect: connection refused"

Has anybody used the graceful node shutdown feature with rke2 and had similar problems?

196 Views

Open in Slack

Previous Next