05/15/2023, 3:05 PM
Hi all. We run Rancher 2.6.11 and have 4 downstream clusters. All these clusters run Ubuntu 22.04.x with docker 20.10.24 Upgrading from 1.23.x to 1.24.x we've seen several issues that we can't pinpoint yet to the root cause. 1. When upgrading the Kubernetes version from 1.23.x to 1.24.x All containers were restarted. We assumed this was related to the docker-cri change. However when we upgraded from 1.24.10 to 1.24.13 we saw the same behaviour. 2. In the upgrade process it seems to lose the tunnel a lot of the times with the following error:
[ERROR] Failed to set up SSH tunneling for host []: Can't retrieve Docker Info: error during connect: Get "<http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info>": can not build dialer to [c-p9cqp:m-2bd0a9380d15]
[ERROR] Removing host [] from node lists
A restart of the affected node is required to recover it. The restart of all containers is very worrying and it's something I have never seen before. I've tried to search about the behaviour but no luck so far. Has anyone seen this behaviour before?
Could this be related to the cgroup v2 change in ubuntu 22.04? I can't find any clear details on if RKE1 supports it or not. Although the support pages do indicate that Ubuntu 22.04 is supported