https://rancher.com/ logo
#general
Title
# general
p

proud-salesmen-12221

09/13/2022, 1:53 AM
Hi Everyone, I'm new to kubernetes and RKE2. I just setup a 3 node cluster using vagrant and installed one server and two agents and I see that a couple of my ingress Pods are stuck in 'ContainerCreating' and a couple Canal Pods stopped with 'CrashLoopBackOff'. Is this expected/normal?
Copy code
[vagrant@rke2-server1 ~]$ kubectl get pods -A -o wide
NAMESPACE     NAME                                                    READY   STATUS              RESTARTS         AGE     IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   cloud-controller-manager-rke2-server1                   1/1     Running             2 (174m ago)     3h14m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   etcd-rke2-server1                                       1/1     Running             1 (175m ago)     3h14m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   helm-install-rke2-canal-c48pq                           0/1     Completed           0                3h14m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-coredns-znhbd                         0/1     Completed           0                3h14m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-ingress-nginx-fbp8s                   0/1     Completed           0                3h14m   10.42.0.4        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-metrics-server-4kcj7                  0/1     Completed           0                3h14m   10.42.0.2        rke2-server1   <none>           <none>
kube-system   kube-apiserver-rke2-server1                             1/1     Running             1 (175m ago)     3h14m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-controller-manager-rke2-server1                    1/1     Running             2 (174m ago)     3h14m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   kube-proxy-rke2-agent1                                  1/1     Running             0                173m    10.0.2.15        rke2-agent1    <none>           <none>
kube-system   kube-proxy-rke2-agent2                                  1/1     Running             0                173m    10.0.2.15        rke2-agent2    <none>           <none>
kube-system   kube-proxy-rke2-server1                                 1/1     Running             1 (175m ago)     3h14m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-scheduler-rke2-server1                             1/1     Running             1 (175m ago)     3h14m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   rke2-canal-m8vb6                                        2/2     Running             2 (175m ago)     3h14m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   rke2-canal-ptnk8                                        0/2     CrashLoopBackOff    110 (87s ago)    3h10m   10.0.2.15        rke2-agent2    <none>           <none>
kube-system   rke2-canal-rmnv6                                        0/2     CrashLoopBackOff    110 (2m2s ago)   3h13m   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   rke2-coredns-rke2-coredns-76cb76d66-mkv2c               1/1     Running             1 (175m ago)     3h14m   10.42.0.12       rke2-server1   <none>           <none>
kube-system   rke2-coredns-rke2-coredns-76cb76d66-t569h               0/1     ContainerCreating   0                3h13m   <none>           rke2-agent1    <none>           <none>
kube-system   rke2-coredns-rke2-coredns-autoscaler-58867f8fc5-8n589   1/1     Running             1 (175m ago)     3h14m   10.42.0.11       rke2-server1   <none>           <none>
kube-system   rke2-ingress-nginx-controller-c8vmd                     0/1     ContainerCreating   0                3h9m    <none>           rke2-agent2    <none>           <none>
kube-system   rke2-ingress-nginx-controller-n9st4                     1/1     Running             1 (175m ago)     3h14m   10.42.0.13       rke2-server1   <none>           <none>
kube-system   rke2-ingress-nginx-controller-rdml6                     0/1     ContainerCreating   0                3h12m   <none>           rke2-agent1    <none>           <none>
kube-system   rke2-metrics-server-6979d95f95-4z57b                    1/1     Running             1 (175m ago)     3h14m   10.42.0.10       rke2-server1   <none>           <none>
1
[vagrant@rke2-server1 ~]$ kubectl -n kube-system get events
LAST SEEN TYPE REASON OBJECT MESSAGE 52m Normal Pulled pod/rke2-canal-ptnk8 Container image "rancher/hardened-calico:v3.22.2-build20220509" already present on machine 37m Warning BackOff pod/rke2-canal-ptnk8 Back-off restarting failed container 7m23s Warning Unhealthy pod/rke2-canal-ptnk8 Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.19099 connect: connection refused 2m12s Warning BackOff pod/rke2-canal-ptnk8 Back-off restarting failed container 7m23s Warning Unhealthy pod/rke2-canal-rmnv6 Readiness probe failed: Get "http://localhost:9099/readiness": dial tcp 127.0.0.19099 connect: connection refused 2m13s Warning BackOff pod/rke2-canal-rmnv6 Back-off restarting failed container 2m16s Warning FailedCreatePodSandBox pod/rke2-coredns-rke2-coredns-76cb76d66-t569h (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "73f957aa6c9bfa7ab3930dc7ca03c0bf3ffeb8e59cda674e73f25207b9c0c5a8": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/ 2m14s Warning FailedCreatePodSandBox pod/rke2-ingress-nginx-controller-c8vmd (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4bd0c31f8656f12d8e23bcb3eac70f7f917ea1083f57890d641a0f3f6b8b85bd": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/ 2m10s Warning FailedCreatePodSandBox pod/rke2-ingress-nginx-controller-rdml6 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9013c62529b9776bc1332121fa101ae82b91d83fcc161a43523ca8bef605bbb6": plugin type="calico" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico
The node hosts are Ubuntu 20.04 with firewalld off/disabled and Network Manager configured to ignore the Calico / Flannel network. https://docs.rke2.io/known_issues/#networkmanager
^ correction the OS is AlmaLinux8 with SELinux in permissive mode
p

plain-byte-79620

09/13/2022, 6:58 AM
Which version of RKE2 are you using?
p

proud-salesmen-12221

09/13/2022, 3:52 PM
server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4+rke2r1"
I made a few changes since yesterday, mainly changing the CNI to Cilium in the config. Still the same/similar results
^* and dropped one agent, so now just 1 server and 1 agent
Copy code
[vagrant@rke2-server1 ~]$ kubectl get pods -A -o wide
NAMESPACE     NAME                                                    READY   STATUS              RESTARTS        AGE   IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   cilium-69b5p                                            1/1     Running             0               25m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   cilium-mrxhv                                            0/1     CrashLoopBackOff    8 (80s ago)     24m   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   cilium-node-init-rg4t5                                  1/1     Running             0               24m   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   cilium-node-init-tgf8l                                  1/1     Running             0               25m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   cilium-operator-6d77cf7469-pbxch                        1/1     Running             0               25m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   cilium-operator-6d77cf7469-s9cp6                        0/1     CrashLoopBackOff    7 (4m29s ago)   25m   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   cloud-controller-manager-rke2-server1                   1/1     Running             0               25m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   etcd-rke2-server1                                       1/1     Running             0               25m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   helm-install-rke2-cilium-cfcps                          0/1     Completed           0               25m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-coredns-w2wb4                         0/1     Completed           0               25m   10.0.2.15        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-ingress-nginx-b9b5f                   0/1     Completed           0               25m   10.42.0.196      rke2-server1   <none>           <none>
kube-system   helm-install-rke2-metrics-server-zndxv                  0/1     Completed           0               25m   10.42.0.127      rke2-server1   <none>           <none>
kube-system   kube-apiserver-rke2-server1                             1/1     Running             0               25m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-controller-manager-rke2-server1                    1/1     Running             0               25m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-proxy-rke2-server1                                 1/1     Running             0               25m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-scheduler-rke2-server1                             1/1     Running             0               25m   192.168.33.101   rke2-server1   <none>           <none>
kube-system   rke2-coredns-rke2-coredns-76cb76d66-l2vs9               1/1     Running             0               25m   10.42.0.93       rke2-server1   <none>           <none>
kube-system   rke2-coredns-rke2-coredns-76cb76d66-xvt89               0/1     ContainerCreating   0               23m   <none>           rke2-agent1    <none>           <none>
kube-system   rke2-coredns-rke2-coredns-autoscaler-58867f8fc5-jpgb6   1/1     Running             0               25m   10.42.0.27       rke2-server1   <none>           <none>
kube-system   rke2-ingress-nginx-controller-clh78                     1/1     Running             0               22m   10.42.0.249      rke2-server1   <none>           <none>
kube-system   rke2-ingress-nginx-controller-p2lkd                     0/1     ContainerCreating   0               20m   <none>           rke2-agent1    <none>           <none>
kube-system   rke2-metrics-server-6979d95f95-88pw9                    1/1     Running             0               23m   10.42.0.47       rke2-server1   <none>           <none>
p

plain-byte-79620

09/13/2022, 3:54 PM
kubectl -n kube-system logs cilium-mrxhv
p

proud-salesmen-12221

09/13/2022, 3:56 PM
Copy code
[vagrant@rke2-server1 ~]$ kubectl -n kube-system logs cilium-mrxhv
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), wait-for-node-init (init), clean-cilium-state (init)
Error from server (NotFound): the server could not find the requested resource ( pods/log cilium-mrxhv)
[vagrant@rke2-server1 ~]$
p

plain-byte-79620

09/13/2022, 3:59 PM
kubectl -n kube-system describe pod cilium-mrxhv
p

proud-salesmen-12221

09/13/2022, 4:05 PM
Copy code
[vagrant@rke2-server1 ~]$ kubectl -n kube-system describe pod cilium-mrxhv
Name:                 cilium-mrxhv
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 rke2-agent1/10.0.2.15
Start Time:           Tue, 13 Sep 2022 15:27:27 +0000
Labels:               controller-revision-hash=78cf4df44d
                      k8s-app=cilium
                      pod-template-generation=1
Annotations:          <http://container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites|container.apparmor.security.beta.kubernetes.io/apply-sysctl-overwrites>: unconfined
                      <http://container.apparmor.security.beta.kubernetes.io/cilium-agent|container.apparmor.security.beta.kubernetes.io/cilium-agent>: unconfined
                      <http://container.apparmor.security.beta.kubernetes.io/clean-cilium-state|container.apparmor.security.beta.kubernetes.io/clean-cilium-state>: unconfined
                      <http://container.apparmor.security.beta.kubernetes.io/mount-cgroup|container.apparmor.security.beta.kubernetes.io/mount-cgroup>: unconfined
                      <http://kubernetes.io/psp|kubernetes.io/psp>: global-unrestricted-psp
                      <http://prometheus.io/port|prometheus.io/port>: 9962
                      <http://prometheus.io/scrape|prometheus.io/scrape>: true
Status:               Running
IP:                   10.0.2.15
IPs:
  IP:           10.0.2.15
Controlled By:  DaemonSet/cilium
Init Containers:
  mount-cgroup:
    Container ID:  <containerd://312b54397879591f51f5c3ee2d895a9e30290c055bc1c5e8ded1a67df68dbba>6
    Image:         rancher/mirrored-cilium-cilium:v1.12.0
    Image ID:      <http://docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade|docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -ec
      cp /usr/bin/cilium-mount /hostbin/cilium-mount;
      nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
      rm /hostbin/cilium-mount
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 13 Sep 2022 15:31:36 +0000
      Finished:     Tue, 13 Sep 2022 15:31:36 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      CGROUP_ROOT:  /run/cilium/cgroupv2
      BIN_PATH:     /opt/cni/bin
    Mounts:
      /hostbin from cni-path (rw)
      /hostproc from hostproc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcxz8 (ro)
  apply-sysctl-overwrites:
    Container ID:  <containerd://d5bb1b9af9de598c9497091365775420920539f2dde0682b2b8d5be9b9f4ecb>d
    Image:         rancher/mirrored-cilium-cilium:v1.12.0
    Image ID:      <http://docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade|docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -ec
      cp /usr/bin/cilium-sysctlfix /hostbin/cilium-sysctlfix;
      nsenter --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-sysctlfix";
      rm /hostbin/cilium-sysctlfix
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 13 Sep 2022 15:31:38 +0000
      Finished:     Tue, 13 Sep 2022 15:31:38 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      BIN_PATH:  /opt/cni/bin
    Mounts:
      /hostbin from cni-path (rw)
      /hostproc from hostproc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcxz8 (ro)
  mount-bpf-fs:
    Container ID:  <containerd://e7d4bbdce1c57626819d1927e4779c4d1548a9d380df1b20a88f618e298abf3>3
    Image:         rancher/mirrored-cilium-cilium:v1.12.0
    Image ID:      <http://docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade|docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      --
    Args:
      mount | grep "/sys/fs/bpf type bpf" || mount -t bpf bpf /sys/fs/bpf
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 13 Sep 2022 15:31:39 +0000
      Finished:     Tue, 13 Sep 2022 15:31:39 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /sys/fs/bpf from bpf-maps (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcxz8 (ro)
  wait-for-node-init:
    Container ID:  <containerd://96ca17d9bd8ebab56ac6d579530b8cd9210da4e88e8157cd26b46049d23f96f>6
    Image:         rancher/mirrored-cilium-cilium:v1.12.0
    Image ID:      <http://docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade|docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      until test -s "/tmp/cilium-bootstrap.d/cilium-bootstrap-time"; do
        echo "Waiting on node-init to run...";
        sleep 1;
      done
      
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 13 Sep 2022 15:31:40 +0000
      Finished:     Tue, 13 Sep 2022 15:31:40 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /tmp/cilium-bootstrap.d from cilium-bootstrap-file-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcxz8 (ro)
  clean-cilium-state:
    Container ID:  <containerd://23517402f6814bdf97ef245ab095f89a31d0b203cbe31ffd8c24a521d8680c5>a
    Image:         rancher/mirrored-cilium-cilium:v1.12.0
    Image ID:      <http://docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade|docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>
    Port:          <none>
    Host Port:     <none>
    Command:
      /init-container.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 13 Sep 2022 15:31:41 +0000
      Finished:     Tue, 13 Sep 2022 15:31:41 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  100Mi
    Environment:
      CILIUM_ALL_STATE:  <set to the key 'clean-cilium-state' of config map 'cilium-config'>      Optional: true
      CILIUM_BPF_STATE:  <set to the key 'clean-cilium-bpf-state' of config map 'cilium-config'>  Optional: true
    Mounts:
      /run/cilium/cgroupv2 from cilium-cgroup (rw)
      /sys/fs/bpf from bpf-maps (rw)
      /var/run/cilium from cilium-run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcxz8 (ro)
Containers:
  cilium-agent:
    Container ID:  <containerd://c03f1fcc1c6375c3b3f072de54a9e550f6d25f439444221e5f81a45468f89a4>f
    Image:         rancher/mirrored-cilium-cilium:v1.12.0
    Image ID:      <http://docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade|docker.io/rancher/mirrored-cilium-cilium@sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>
    Ports:         4244/TCP, 9962/TCP, 9964/TCP
    Host Ports:    4244/TCP, 9962/TCP, 9964/TCP
    Command:
      cilium-agent
    Args:
      --config-dir=/tmp/cilium/config-map
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 13 Sep 2022 16:02:01 +0000
      Finished:     Tue, 13 Sep 2022 16:03:09 +0000
    Ready:          False
    Restart Count:  10
    Liveness:       http-get <http://127.0.0.1:9879/healthz> delay=0s timeout=5s period=30s #success=1 #failure=10
    Readiness:      http-get <http://127.0.0.1:9879/healthz> delay=0s timeout=5s period=30s #success=1 #failure=3
    Startup:        http-get <http://127.0.0.1:9879/healthz> delay=0s timeout=1s period=2s #success=1 #failure=105
    Environment:
      K8S_NODE_NAME:               (v1:spec.nodeName)
      CILIUM_K8S_NAMESPACE:       kube-system (v1:metadata.namespace)
      CILIUM_CLUSTERMESH_CONFIG:  /var/lib/cilium/clustermesh/
      CILIUM_CNI_CHAINING_MODE:   <set to the key 'cni-chaining-mode' of config map 'cilium-config'>  Optional: true
      CILIUM_CUSTOM_CNI_CONF:     <set to the key 'custom-cni-conf' of config map 'cilium-config'>    Optional: true
    Mounts:
      /host/etc/cni/net.d from etc-cni-netd (rw)
      /host/opt/cni/bin from cni-path (rw)
      /host/proc/sys/kernel from host-proc-sys-kernel (rw)
      /host/proc/sys/net from host-proc-sys-net (rw)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/bpf from bpf-maps (rw)
      /tmp/cilium/config-map from cilium-config-path (ro)
      /var/lib/cilium/clustermesh from clustermesh-secrets (ro)
      /var/run/cilium from cilium-run (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcxz8 (ro)
Copy code
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  cilium-run:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/cilium
    HostPathType:  DirectoryOrCreate
  bpf-maps:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/bpf
    HostPathType:  DirectoryOrCreate
  hostproc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
  cilium-cgroup:
    Type:          HostPath (bare host directory volume)
    Path:          /run/cilium/cgroupv2
    HostPathType:  DirectoryOrCreate
  cni-path:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:  DirectoryOrCreate
  etc-cni-netd:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:  DirectoryOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  cilium-bootstrap-file-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/cilium-bootstrap.d
    HostPathType:  DirectoryOrCreate
  clustermesh-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cilium-clustermesh
    Optional:    true
  cilium-config-path:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cilium-config
    Optional:  false
  host-proc-sys-net:
    Type:          HostPath (bare host directory volume)
    Path:          /proc/sys/net
    HostPathType:  Directory
  host-proc-sys-kernel:
    Type:          HostPath (bare host directory volume)
    Path:          /proc/sys/kernel
    HostPathType:  Directory
  kube-api-access-tcxz8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
Tolerations:                 op=Exists
                             <http://node.kubernetes.io/disk-pressure:NoSchedule|node.kubernetes.io/disk-pressure:NoSchedule> op=Exists
                             <http://node.kubernetes.io/memory-pressure:NoSchedule|node.kubernetes.io/memory-pressure:NoSchedule> op=Exists
                             <http://node.kubernetes.io/network-unavailable:NoSchedule|node.kubernetes.io/network-unavailable:NoSchedule> op=Exists
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists
                             <http://node.kubernetes.io/pid-pressure:NoSchedule|node.kubernetes.io/pid-pressure:NoSchedule> op=Exists
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists
                             <http://node.kubernetes.io/unschedulable:NoSchedule|node.kubernetes.io/unschedulable:NoSchedule> op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  36m                   default-scheduler  Successfully assigned kube-system/cilium-mrxhv to rke2-agent1
  Warning  Failed     35m                   kubelet            Failed to pull image "rancher/mirrored-cilium-cilium:v1.12.0": rpc error: code = Unknown desc = failed to pull and unpack image "<http://docker.io/rancher/mirrored-cilium-cilium:v1.12.0|docker.io/rancher/mirrored-cilium-cilium:v1.12.0>": failed to copy: httpReadSeeker: failed open: failed to do request: Get "<https://registry-1.docker.io/v2/rancher/mirrored-cilium-cilium/blobs/sha256:534f4b5bc2dac3237b7fcc4f7b32bb5e87ab1caa0e9ebac82a56b1b1ef255a41>": dial tcp 127.0.0.1:443: connect: connection refused
  Warning  Failed     34m                   kubelet            Failed to pull image "rancher/mirrored-cilium-cilium:v1.12.0": rpc error: code = Unknown desc = failed to pull and unpack image "<http://docker.io/rancher/mirrored-cilium-cilium:v1.12.0|docker.io/rancher/mirrored-cilium-cilium:v1.12.0>": failed to copy: httpReadSeeker: failed open: failed to do request: Get "<https://registry-1.docker.io/v2/rancher/mirrored-cilium-cilium/blobs/sha256:75d8c87dfd66cb5f2f6c5958aefda6765cd97b953eeb237de55065ac343c33b9>": dial tcp: i/o timeout
  Warning  Failed     33m (x3 over 35m)     kubelet            Error: ErrImagePull
  Warning  Failed     33m                   kubelet            Failed to pull image "rancher/mirrored-cilium-cilium:v1.12.0": rpc error: code = Unknown desc = failed to pull and unpack image "<http://docker.io/rancher/mirrored-cilium-cilium:v1.12.0|docker.io/rancher/mirrored-cilium-cilium:v1.12.0>": failed to copy: httpReadSeeker: failed open: failed to do request: Get "<https://registry-1.docker.io/v2/rancher/mirrored-cilium-cilium/manifests/sha256:079baa4fa1b9fe638f96084f4e0297c84dd4fb215d29d2321dcbe54273f63ade>": dial tcp: lookup <http://registry-1.docker.io|registry-1.docker.io> on 10.0.2.3:53: read udp 10.0.2.15:45774->10.0.2.3:53: i/o timeout
  Warning  Failed     33m (x5 over 35m)     kubelet            Error: ImagePullBackOff
  Normal   BackOff    33m (x5 over 35m)     kubelet            Back-off pulling image "rancher/mirrored-cilium-cilium:v1.12.0"
  Normal   Pulling    33m (x4 over 36m)     kubelet            Pulling image "rancher/mirrored-cilium-cilium:v1.12.0"
  Normal   Pulled     32m                   kubelet            Successfully pulled image "rancher/mirrored-cilium-cilium:v1.12.0" in 47.723868398s
  Normal   Created    32m                   kubelet            Created container mount-cgroup
  Normal   Started    32m                   kubelet            Started container mount-cgroup
  Normal   Pulled     32m                   kubelet            Container image "rancher/mirrored-cilium-cilium:v1.12.0" already present on machine
  Normal   Created    32m                   kubelet            Created container apply-sysctl-overwrites
  Warning  BackOff    6m12s (x90 over 29m)  kubelet            Back-off restarting failed container
  Warning  Unhealthy  80s (x292 over 32m)   kubelet            Startup probe failed: Get "<http://127.0.0.1:9879/healthz>": dial tcp 127.0.0.1:9879: connect: connection refused
[vagrant@rke2-server1 ~]$
I'm confused by the events log. At age 32m, it says 'Successfully pulled image'. Then by age 33m and up, it backs off then has trouble pulling the image again?
p

plain-byte-79620

09/13/2022, 4:08 PM
Do you have any firewall installed that blocks 9879 port?
p

proud-salesmen-12221

09/13/2022, 4:14 PM
The firewalls on the server and agent VMs were off. I did have a firewall on my host on, I just disabled it and restarted the agent to see if that was it
Turning off the host firewall seems to have fixed the image pull error, but the cilium container still isn't starting up.
Copy code
[vagrant@rke2-server1 ~]$ kubectl get pods -A -o wide
NAMESPACE     NAME                                                    READY   STATUS              RESTARTS      AGE     IP               NODE           NOMINATED NODE   READINESS GATES
kube-system   cilium-dcwvt                                            0/1     CrashLoopBackOff    4 (78s ago)   8m22s   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   cilium-mpv6m                                            1/1     Running             0             9m38s   10.0.2.15        rke2-server1   <none>           <none>
kube-system   cilium-node-init-24n8v                                  1/1     Running             0             9m38s   10.0.2.15        rke2-server1   <none>           <none>
kube-system   cilium-node-init-bh9zs                                  1/1     Running             0             8m22s   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   cilium-operator-6d77cf7469-rjvdb                        1/1     Running             0             9m38s   10.0.2.15        rke2-server1   <none>           <none>
kube-system   cilium-operator-6d77cf7469-wlnmb                        0/1     CrashLoopBackOff    4 (60s ago)   9m38s   10.0.2.15        rke2-agent1    <none>           <none>
kube-system   cloud-controller-manager-rke2-server1                   1/1     Running             0             9m1s    192.168.33.101   rke2-server1   <none>           <none>
kube-system   etcd-rke2-server1                                       1/1     Running             0             9m31s   192.168.33.101   rke2-server1   <none>           <none>
kube-system   helm-install-rke2-cilium-4df5f                          0/1     Completed           0             9m44s   10.0.2.15        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-coredns-6f8ld                         0/1     Completed           0             9m44s   10.0.2.15        rke2-server1   <none>           <none>
kube-system   helm-install-rke2-ingress-nginx-hw99x                   0/1     Completed           0             9m44s   10.42.0.219      rke2-server1   <none>           <none>
kube-system   helm-install-rke2-metrics-server-nhml9                  0/1     Completed           0             9m44s   10.42.0.177      rke2-server1   <none>           <none>
kube-system   kube-apiserver-rke2-server1                             1/1     Running             0             9m12s   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-controller-manager-rke2-server1                    1/1     Running             0             9m7s    192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-proxy-rke2-server1                                 1/1     Running             0             8m59s   192.168.33.101   rke2-server1   <none>           <none>
kube-system   kube-scheduler-rke2-server1                             1/1     Running             0             9m7s    192.168.33.101   rke2-server1   <none>           <none>
kube-system   rke2-coredns-rke2-coredns-76cb76d66-nnvgw               0/1     ContainerCreating   0             8m17s   <none>           rke2-agent1    <none>           <none>
kube-system   rke2-coredns-rke2-coredns-76cb76d66-qvmpz               1/1     Running             0             9m38s   10.42.0.206      rke2-server1   <none>           <none>
kube-system   rke2-coredns-rke2-coredns-autoscaler-58867f8fc5-7sd2r   1/1     Running             0             9m38s   10.42.0.50       rke2-server1   <none>           <none>
kube-system   rke2-ingress-nginx-controller-92796                     1/1     Running             0             8m20s   10.42.0.1        rke2-server1   <none>           <none>
kube-system   rke2-ingress-nginx-controller-9hxrg                     0/1     ContainerCreating   0             7m11s   <none>           rke2-agent1    <none>           <none>
kube-system   rke2-metrics-server-6979d95f95-hk56w                    1/1     Running             0             8m26s   10.42.0.149      rke2-server1   <none>           <none>
Events for the new Pod:
Copy code
Events:
  Type     Reason     Age                      From               Message
  ----     ------     ----                     ----               -------
  Normal   Scheduled  8m35s                    default-scheduler  Successfully assigned kube-system/cilium-dcwvt to rke2-agent1
  Normal   Pulling    8m33s                    kubelet            Pulling image "rancher/mirrored-cilium-cilium:v1.12.0"
  Normal   Pulled     7m39s                    kubelet            Successfully pulled image "rancher/mirrored-cilium-cilium:v1.12.0" in 54.219246539s
  Normal   Created    7m39s                    kubelet            Created container mount-cgroup
  Normal   Started    7m39s                    kubelet            Started container mount-cgroup
  Normal   Pulled     7m37s                    kubelet            Container image "rancher/mirrored-cilium-cilium:v1.12.0" already present on machine
  Normal   Created    7m37s                    kubelet            Created container apply-sysctl-overwrites
  Normal   Started    7m37s                    kubelet            Started container apply-sysctl-overwrites
  Normal   Pulled     7m36s                    kubelet            Container image "rancher/mirrored-cilium-cilium:v1.12.0" already present on machine
  Normal   Created    7m36s                    kubelet            Created container mount-bpf-fs
  Normal   Started    7m36s                    kubelet            Started container mount-bpf-fs
  Normal   Pulled     7m35s                    kubelet            Container image "rancher/mirrored-cilium-cilium:v1.12.0" already present on machine
  Normal   Created    7m35s                    kubelet            Created container wait-for-node-init
  Normal   Started    7m35s                    kubelet            Started container wait-for-node-init
  Normal   Pulled     7m34s                    kubelet            Container image "rancher/mirrored-cilium-cilium:v1.12.0" already present on machine
  Normal   Created    7m34s                    kubelet            Created container clean-cilium-state
  Normal   Started    7m34s                    kubelet            Started container clean-cilium-state
  Normal   Pulled     7m33s                    kubelet            Container image "rancher/mirrored-cilium-cilium:v1.12.0" already present on machine
  Normal   Created    7m33s                    kubelet            Created container cilium-agent
  Normal   Started    7m33s                    kubelet            Started container cilium-agent
  Warning  Unhealthy  3m31s (x100 over 7m31s)  kubelet            Startup probe failed: Get "<http://127.0.0.1:9879/healthz>": dial tcp 127.0.0.1:9879: connect: connection refused
The rke2-agent service status shows:
Copy code
[vagrant@rke2-agent1 ~]$ sudo systemctl status rke2-agent
● rke2-agent.service - Rancher Kubernetes Engine v2 (agent)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2022-09-13 16:23:59 UTC; 10min ago
     Docs: <https://github.com/rancher/rke2#readme>
  Process: 8050 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 8045 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 8042 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 8055 (rke2)
    Tasks: 82
   Memory: 2.1G
   CGroup: /system.slice/rke2-agent.service
           ├─ 8055 /usr/local/bin/rke2 agent
           ├─ 8067 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           ├─ 8082 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/r>
           ├─ 8186 /var/lib/rancher/rke2/data/v1.24.4-rke2r1-85bc7d7bec85/bin/containerd-shim-runc-v2 -namespace <http://k8s.io|k8s.io> -id 6083c2c7a2ca38ec38549ae0113e1814888e67afbcca1e1f0f9a01bbeeefe2b7 -address /run/k3s/containerd/containerd.sock
           ├─ 8196 /var/lib/rancher/rke2/data/v1.24.4-rke2r1-85bc7d7bec85/bin/containerd-shim-runc-v2 -namespace <http://k8s.io|k8s.io> -id a95ec6ce3ed2290395a52213432d21110b565725769ce7b1f4a29994f666155f -address /run/k3s/containerd/containerd.sock
           ├─ 8261 /var/lib/rancher/rke2/data/v1.24.4-rke2r1-85bc7d7bec85/bin/containerd-shim-runc-v2 -namespace <http://k8s.io|k8s.io> -id 6ce42cf5411ba2ed3cb8c94bab1b15095e74a32f51ea92a6355f8f4080be3b82 -address /run/k3s/containerd/containerd.sock
           ├─10296 /opt/cni/bin/cilium-cni
           └─10310 /opt/cni/bin/cilium-cni

Sep 13 16:34:19 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:19Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Sep 13 16:34:19 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:19Z" level=error msg="Remotedialer proxy error" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Sep 13 16:34:23 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:23Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: CA cert validation failed: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:44974->127.0.0.1:6444: read: connection reset by peer"
Sep 13 16:34:24 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:24Z" level=info msg="Connecting to proxy" url="<wss://10.0.2.15:9345/v1-rke2/connect>"
Sep 13 16:34:24 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:24Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Sep 13 16:34:24 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:24Z" level=error msg="Remotedialer proxy error" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Sep 13 16:34:28 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:28Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: CA cert validation failed: Get \"<https://127.0.0.1:6444/cacerts>\": read tcp 127.0.0.1:45004->127.0.0.1:6444: read: connection reset by peer"
Sep 13 16:34:29 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:29Z" level=info msg="Connecting to proxy" url="<wss://10.0.2.15:9345/v1-rke2/connect>"
Sep 13 16:34:29 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:29Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Sep 13 16:34:29 rke2-agent1 rke2[8055]: time="2022-09-13T16:34:29Z" level=error msg="Remotedialer proxy error" error="dial tcp 10.0.2.15:9345: connect: connection refused"
[vagrant@rke2-agent1 ~]$ date
Tue Sep 13 16:34:37 UTC 2022
[vagrant@rke2-agent1 ~]$
p

plain-byte-79620

09/14/2022, 8:22 AM
How do you configure the server? I see two IPs 10.0.2.15 and 192.168.33.101
p

proud-salesmen-12221

09/14/2022, 10:02 PM
The 192.168s were the private network configured in the Vagrant file. I updated the Vagrant file since then to assign a 10.11.0.* IP, which is what you'll see below. The 10.0.2.* IP was automatically assigned by virtualbox for the NAT
Copy code
[vagrant@rke2-server1 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:fc:e9:96 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
       valid_lft 86280sec preferred_lft 86280sec
    inet6 fe80::a00:27ff:fefc:e996/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:56:23:b3 brd ff:ff:ff:ff:ff:ff
    inet 10.11.0.101/24 brd 10.11.0.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe56:23b3/64 scope link 
       valid_lft forever preferred_lft forever
Copy code
### VagrantFile

# rke2 SERVER NODES
  SERVER_N = 3
  (1..SERVER_N).each do |i|
  config.vm.define "server#{i}", primary: true do |server|
    server.vm.hostname = "rke2-server#{i}#{DOMAIN}"
    server.vm.box = VBOX
    server.vm.network "private_network", ip: "10.11.0.#{100+i}"
    server.vm.provider "virtualbox" do |vb|
      vb.memory = "8192"
      vb.cpus = "4"
    end
p

plain-byte-79620

09/15/2022, 8:01 AM
ok and how do you specify the RKE2 configuration? Are you using the
node-ip
config? On the pods
192.168.33.101
is still configured when you removed it do you uninstall RKE2?
p

proud-salesmen-12221

09/15/2022, 6:36 PM
192.168.33.* is no longer used on any of the nodes or config files. After I changed the IPs for the nodes in the VagrantFile, I just destroyed the cluster and brought it back up with the new IPs.
When you ask about the RKE2 configuration, do you mean the /etc/rancher/rke2/config.yaml file? I'm not familiar with the node-ip config. Can you direct me to how to look that up?
Thanks for pointing me to the IPs. I figured it out and fixed it. The error was that they were using Vagrant's NAT IPs. Just had to edit the config to set the node IPs correctly.
👍 1
750 Views