09/20/2022, 2:34 PM
hello, I have a strange behavior on a 4 node rke1 cluster(
kubernetesVersion: v1.23.10-rancher1-1
) All nodes are the same hardware-wise and software-wise. All were with ubuntu 20.04. I have upgraded 3 of them to 22.04. The problem is there is difference in kubelet behavior. On one node - everything looks fine. But the kubelet does not seem verbose enough (maybe it is normal for v=2 ) all the logs are of the type:
I0920 14:08:37.253539    3147 container_manager_linux.go:511] "Discovered runtime cgroup name" cgroupName="/system.slice/docker.service"
The other 2 nodes however had problems after the upgrade. Rancher started to spawn unlimited agent containers bricking the servers due to "too many files opened". I had to remove them from the cluster and add them again and it worked. However now there are those log lines on both nodes:
E0920 13:42:03.850145    2845 summary_sys_containers.go:83] "Failed to get system container stats" err="failed to get cgroup stats for \"/../docker.service\": failed to get container info for \"/../docker.service\": unknown container \"/../docker.service\"" containerName="/../docker.service
I see many more orphaned pods cleanup fails also, so I am hesitant to upgrade the last node. Can someone give me a hint as to want to look for to fix the cgroup issue? The systemd options are the same on all 3.