hello, we have quite critical issue in out cluster...
# general
p
hello, we have quite critical issue in out cluster setup. We are running rancher 2.8.0 using a hetzner node driver. On this rancher server we operate 3 clusters. One cluster recently had sever problems losing etcd, kube API because of some hardware server issues etc. So we are now trying hard to recover it. We are almost there, but the rancher server seems to be uncapable to connect to etcd. Debugging further I can see in the rancher server that the problem seems to lie in the ssh tunnel, or better: the rancher server is not able to pull docker info from the CP:
Copy code
024/04/10 20:19:25 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:37283
2024/04/10 20:19:25 [INFO] Not checking if state file is included in snapshot file for [c-f8jgb-rl-zkgg8_2024-04-08T12:43:43Z], using local state file [management-state/rke/rke-3298610319/cluster.rkestate]
2024/04/10 20:19:25 [INFO] Restoring etcd snapshot c-f8jgb-rl-zkgg8_2024-04-08T12:43:43Z
2024/04/10 20:19:25 [INFO] Successfully Deployed state file at [management-state/rke/rke-3298610319/cluster.rkestate]
2024/04/10 20:19:25 [INFO] [dialer] Setup tunnel for host [10.2.0.3]
2024/04/10 20:19:25 [WARNING] Failed to set up SSH tunneling for host [10.2.0.3]: Can't retrieve Docker Info: error during connect: Get "http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info": can not build dialer to [c-f8jgb:m-8rj9w]
2024/04/10 20:19:25 [WARNING] Removing host [10.2.0.3] from node lists
2024/04/10 20:19:25 [INFO] kontainerdriver rancherkubernetesengine stopped
obvisouly this part is already wrong: http://%!F(MISSING)var%!F(MISSING)run%!F(MISSING)docker.sock/v1.24/info so it seems the rancher server is missing some variables in order to tailor the correct URL. I can confirm that ssh connection between rancher server and CP/ETCD is working. Docker is working on CP and docker info returns required information. So can anyone point a direction what vars are missing here and how to heal this? Seems this is the only thing between us and fully working custer again...