https://rancher.com/ logo
Title
m

magnificent-glass-73162

08/19/2022, 9:52 AM
I am running rancher on a single node linux host (fedora 36) that I recently upgraded from fedora 32. Rancher is having a problem starting properly. I am trying to follow the troubleshooting guide (https://rancher.com/docs/rancher/v2.5/en/troubleshooting/dns/). I’m noticing that the coredns pod is not starting…
$ kubectl -n kube-system get pods -l k8s-app=kube-dns
NAME                       READY   STATUS             RESTARTS   AGE
coredns-685d6d555d-q9pn4   0/1     CrashLoopBackOff   4          2m40s
If I look at the pod description I see the events as follows:
Type     Reason          Age                From     Message
  ----     ------          ----               ----     -------
  Normal   SandboxChanged  62m (x2 over 62m)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          62m                kubelet  Container image "rancher/mirrored-cluster-proportional-autoscaler:1.8.3" already present on machine
  Normal   Created         62m                kubelet  Created container autoscaler
  Normal   Started         62m                kubelet  Started container autoscaler
  Normal   SandboxChanged  52m (x2 over 52m)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          52m                kubelet  Container image "rancher/mirrored-cluster-proportional-autoscaler:1.8.3" already present on machine
  Normal   Created         52m                kubelet  Created container autoscaler
  Normal   Started         52m                kubelet  Started container autoscaler
  Warning  Unhealthy       51m                kubelet  Readiness probe failed: Get "<http://10.42.0.17:8080/healthz>": dial tcp 10.42.0.17:8080: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy       51m (x3 over 52m)  kubelet  Readiness probe failed: Get "<http://10.42.0.17:8080/healthz>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   SandboxChanged  37m (x2 over 37m)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          37m                kubelet  Container image "rancher/mirrored-cluster-proportional-autoscaler:1.8.3" already present on machine
  Normal   Created         37m                kubelet  Created container autoscaler
  Normal   Started         37m                kubelet  Started container autoscaler
  Warning  Unhealthy       36m (x2 over 37m)  kubelet  Readiness probe failed: Get "<http://10.42.0.93:8080/healthz>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   SandboxChanged  22m (x2 over 22m)  kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          22m                kubelet  Container image "rancher/mirrored-cluster-proportional-autoscaler:1.8.3" already present on machine
  Normal   Created         22m                kubelet  Created container autoscaler
  Normal   Started         22m                kubelet  Started container autoscaler
  Warning  Unhealthy       21m (x4 over 22m)  kubelet  Readiness probe failed: Get "<http://10.42.0.161:8080/healthz>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
It seems to be failing the health check, but I’m not sure why
BTW, docker version is as follows:
$ docker version
Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:03:54 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:49 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683
I’m also seeing
Could not resolve host: <http://git.rancher.io|git.rancher.io>
in the rancher/server logs
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup <http://www.google.com|www.google.com>
If you don't see a command prompt, try pressing enter.
nslookup: can't resolve '<http://www.google.com|www.google.com>'
pod "busybox" deleted
pod default/busybox terminated (Error)
I’ve addressed some firewall issues, but I’m still having problems with network connectivity. If I do an
nslookup
from a
busybox
container in docker it works, but as a rancher pod not so much
$ docker run -it busybox:1.28 nslookup <http://www.google.com|www.google.com>
Server:    192.168.17.178
Address 1: 192.168.17.178 <http://legolas.inhouse-broker.org|legolas.inhouse-broker.org>

Name:      <http://www.google.com|www.google.com>
Address 1: 2607:f8b0:4009:807::2004 <http://ord38s19-in-x04.1e100.net|ord38s19-in-x04.1e100.net>
Address 2: 172.217.5.4 <http://lga15s49-in-f4.1e100.net|lga15s49-in-f4.1e100.net>
with
kubectl
no good
$ kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup <http://www.google.com|www.google.com>
If you don't see a command prompt, try pressing enter.
nslookup: can't resolve '<http://www.google.com|www.google.com>'
pod "busybox" deleted
pod default/busybox terminated (Error)
h

hundreds-evening-84071

08/19/2022, 8:11 PM
only thing I can think of is double check your entries in hosts file and resolv.conf (on the host). then restart network and docker service
m

magnificent-glass-73162

08/19/2022, 8:44 PM
There really isn’t much in the host file, but I am seeing this…
$ kubectl -n kube-system get pods -l k8s-app=kube-dns
NAME                       READY   STATUS             RESTARTS   AGE
coredns-685d6d555d-l4v4d   0/1     CrashLoopBackOff   205        10h
$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
when I do a
kubectl -n kube-system describe pod coredns-685d6d555d-l4v4d
I see there is a potential cgroup problem
Warning  Failed          74m (x4 over 75m)      kubelet  Error: failed to start container "coredns": Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:326: applying cgroup configuration for process caused: failed to write 1 to memory.kmem.limit_in_bytes: write /sys/fs/cgroup/memory/kubepods/burstable/podc57e2d18-9421-4c84-8d9b-f97dcd3ee3f2/coredns/memory.kmem.limit_in_bytes: operation not supported: unknown
but that seems like it is just a warning and not fatal