This message was deleted.
# general
a
This message was deleted.
r
Note that not knowing if you're using k3d or RKE1 or what makes this a bit of a shot in the dark (if you're using k3s or RKE2 directly, then docker daemon doesn't matter). I'm not 100% certain, but my gut reaction is to tell you to restart your kube-apiserver . By default, Docker will keep any containers it has running while the service restarts (there's an option you can set in the /etc/docker/daemon.json to change the behavior, I think it's live-reload but Docker docs will tell you for certain). While the container keeps running, open file handles or network sockets may end up stale and a lot of containerized software just assumes a restart rather than coding in recovery. I think one time I saw the container lose the network interface altogether. So that may be what's happening to you.
g
Hello @rough-farmer-49135 thank you for you hints. I'm using RKE1. I have already tried to restart the docker service a few times, which restarts also the rke1 kube-api container, but no luck. Not sure what else I can do. Happy to hear some more suggestions. thank you in advance!
r
I'd explicitly set live-restore (just looked it up - https://docs.docker.com/config/containers/live-restore/ ) so it does what you expect just in general. For RKE1, while I've never used it, if you're setting live-restore to false, I think I'd use the RKE1 scripts/binaries to shut down the Kubernetes cluster, restart docker daemon, then use RKE1 scripts/binaries to bring it back up. Distributed decision-making takes a while to settle, and sometimes if things are started up out of expected order you can get into deadlocked loops and never have it start.
g
Ah now I see I will try to check my
/etc/docker/daemon.json
! Thank you will report back!
I tried now to add the
live-restore
Then I did a
service docker restart
but unfortunately I still have the same issue. I need to check how to use RKE1 script to shut down the cluster. Maybe that helps. @rough-farmer-49135 one question: What are you using? rke2?
r
A year ago I was using RKE2 at a different job (security was being emphasized, so seemed a definitely better choice). Current job is using k3d (which runs k3s in a docker container, and it's just a handy development/demonstration environment for eventual resource constrained systems, so seems like a good choice too).
Before that I was using something that used Mesos, which was a competitor to Kubernetes that basically lost. That was where the live-restore caught me up, because it changed from default false to default true, and when that happened when we'd restart the docker daemon to fix something we'd get zombie processes since Mesos though they needed restarted but they were still running.
g
Thank you for the Information