This message was deleted.
# random
a
This message was deleted.
m
The latest log before the pod restart
Copy code
2022-12-21 15:20:10	
2022-12-21 08:20:10.389 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:20:53	
bird: KIF: Received address message for unknown interface 101074
2022-12-21 15:21:10	
2022-12-21 08:21:10.395 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:22:07	
bird: KIF: Received address message for unknown interface 101077
2022-12-21 15:22:10	
2022-12-21 08:22:10.409 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:22:39	
bird: KIF: Received address message for unknown interface 101078
2022-12-21 15:22:43	
bird: KIF: Received address message for unknown interface 101083
2022-12-21 15:23:10	
2022-12-21 08:23:10.414 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:23:23	
bird: KIF: Received address message for unknown interface 101097
2022-12-21 15:24:10	
2022-12-21 08:24:10.429 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:24:41	
bird: KIF: Received address message for unknown interface 101068
2022-12-21 15:25:10	
2022-12-21 08:25:10.435 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:25:45	
bird: KIF: Received address message for unknown interface 101087
2022-12-21 15:26:10	
2022-12-21 08:26:10.449 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:26:19	
bird: KIF: Received address message for unknown interface 101102
2022-12-21 15:26:49	
bird: KIF: Received address message for unknown interface 101101
2022-12-21 15:27:10	
2022-12-21 08:27:10.457 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:27:17	
bird: KIF: Received address message for unknown interface 101110
2022-12-21 15:28:10	
2022-12-21 08:28:10.469 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:28:49	
bird: KIF: Received address message for unknown interface 101115
2022-12-21 15:28:51	
bird: KIF: Received address message for unknown interface 101114
2022-12-21 15:29:10	
2022-12-21 08:29:10.475 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:30:05	
bird: KRT: Received route 172.22.159.127/32 with unknown ifindex 101122
2022-12-21 15:30:10	
2022-12-21 08:30:10.494 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:30:27	
bird: KIF: Received address message for unknown interface 101108
2022-12-21 15:30:31	
bird: KIF: Received address message for unknown interface 101107
2022-12-21 15:30:39	
bird: KIF: Received address message for unknown interface 101117
2022-12-21 15:31:00	
2022-12-21 08:31:00.322 [INFO][55] tunnel-ip-allocator/watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2022-12-21 15:31:10	
2022-12-21 08:31:10.501 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:32:07	
2022-12-21 08:32:07.049 [INFO][58] confd/watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/bgppeers"
2022-12-21 15:32:10	
2022-12-21 08:32:10.512 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:32:29	
bird: KIF: Received address message for unknown interface 101131
2022-12-21 15:33:10	
2022-12-21 08:33:10.518 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:34:08	
2022-12-21 08:34:08.003 [INFO][58] confd/watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/ippools"
2022-12-21 15:34:10	
2022-12-21 08:34:10.541 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:34:35	
bird: KIF: Received address message for unknown interface 101141
2022-12-21 15:35:10	
2022-12-21 08:35:10.546 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:36:10	
2022-12-21 08:36:10.557 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:36:23	
bird: KIF: Received address message for unknown interface 101142
2022-12-21 15:36:45	
bird: KIF: Received address message for unknown interface 101148
2022-12-21 15:36:49	
bird: KIF: Received address message for unknown interface 101155
2022-12-21 15:37:10	
2022-12-21 08:37:10.564 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:37:12	
2022-12-21 08:37:12.524 [INFO][58] confd/watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2022-12-21 15:38:10	
2022-12-21 08:38:10.571 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:39:10	
2022-12-21 08:39:10.581 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:40:10	
2022-12-21 08:40:10.591 [INFO][56] monitor-addresses/startup.go 759: Using autodetected IPv4 address on interface eth0: 10.10.0.44/24
2022-12-21 15:40:43	
bird: KIF: Received address message for unknown interface 101159
g
Nothing obvious to me in there. What does
describe
for that pod say?
m
My restarted pod inspection:
Copy code
Init Containers:
  upgrade-ipam:
    Container ID:  <containerd://7d8572faf69a3a684fb99046471d632b8717f5c0bbe3ee5b48ba829f68ffc0f>9
    Image:         rancher/calico-cni:v3.17.2
    Image ID:      <http://docker.io/rancher/calico-cni@sha256:903ab84bf707dda646cbe76b58e76953fddd6eac11ce33d06841e0781dc5a2bb|docker.io/rancher/calico-cni@sha256:903ab84bf707dda646cbe76b58e76953fddd6eac11ce33d06841e0781dc5a2bb>
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/calico-ipam
      -upgrade
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 21 Dec 2022 09:35:41 +0700
      Finished:     Wed, 21 Dec 2022 09:35:44 +0700
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      KUBERNETES_NODE_NAME:        (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:  <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/lib/cni/networks from host-local-net-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-fd22m (ro)
  install-cni:
    Container ID:  <containerd://3906b7c0354e2051b8d181b6d27f5d657b11227c10282c88fb31f36200762c3>4
    Image:         rancher/calico-cni:v3.17.2
    Image ID:      <http://docker.io/rancher/calico-cni@sha256:903ab84bf707dda646cbe76b58e76953fddd6eac11ce33d06841e0781dc5a2bb|docker.io/rancher/calico-cni@sha256:903ab84bf707dda646cbe76b58e76953fddd6eac11ce33d06841e0781dc5a2bb>
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 21 Dec 2022 09:36:15 +0700
      Finished:     Wed, 21 Dec 2022 09:36:36 +0700
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:         10-calico.conflist
      CNI_NETWORK_CONFIG:    <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      KUBERNETES_NODE_NAME:   (v1:spec.nodeName)
      CNI_MTU:               <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      SLEEP:                 false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-fd22m (ro)
  flexvol-driver:
    Container ID:   <containerd://247a099ee96e806b6472f5a6e48dfba04d7683ffdfecdbc5994cf44dc473354>4
    Image:          rancher/calico-pod2daemon-flexvol:v3.17.2
    Image ID:       <http://docker.io/rancher/calico-pod2daemon-flexvol@sha256:adbe9ea3e36587828cf0fd2c008029052ad893abf923e2200ac0746b25a77248|docker.io/rancher/calico-pod2daemon-flexvol@sha256:adbe9ea3e36587828cf0fd2c008029052ad893abf923e2200ac0746b25a77248>
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 21 Dec 2022 09:37:25 +0700
      Finished:     Wed, 21 Dec 2022 09:37:26 +0700
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /host/driver from flexvol-driver-host (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-fd22m (ro)
Containers:
  calico-node:
    Container ID:   <containerd://908b7ca9fb179cba7a5e20475e8ca88e47323e175430d7c1b96eec7a20d0dcc>1
    Image:          rancher/calico-node:v3.17.2
    Image ID:       <http://docker.io/rancher/calico-node@sha256:6ba192911e28d052da5d830ff80521dfcb7444c886028795d94133914e187c6a|docker.io/rancher/calico-node@sha256:6ba192911e28d052da5d830ff80521dfcb7444c886028795d94133914e187c6a>
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 21 Dec 2022 17:19:59 +0700
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 21 Dec 2022 17:18:09 +0700
      Finished:     Wed, 21 Dec 2022 17:18:12 +0700
    Ready:          True
    Restart Count:  34
    Requests:
      cpu:      250m
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      DATASTORE_TYPE:                     kubernetes
      WAIT_FOR_DATASTORE:                 true
      NODENAME:                            (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      CALICO_IPV4POOL_IPIP:               Always
      CALICO_IPV4POOL_VXLAN:              Never
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_IPV4POOL_CIDR:               172.22.0.0/16
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_LOGFILEPATH:                  none
      FELIX_LOGSEVERITYSYS:
      FELIX_LOGSEVERITYSCREEN:            Warning
      FELIX_HEALTHENABLED:                true
      FELIX_IPTABLESBACKEND:              auto
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/ from sysfs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-fd22m (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sysfs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:
  host-local-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks
    HostPathType:
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  flexvol-driver-host:
    Type:          HostPath (bare host directory volume)
    Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
    HostPathType:  DirectoryOrCreate
  calico-node-token-fd22m:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-node-token-fd22m
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
Tolerations:     :NoSchedule op=Exists
                 :NoExecute op=Exists
                 CriticalAddonsOnly op=Exists
                 <http://node.kubernetes.io/disk-pressure:NoSchedule|node.kubernetes.io/disk-pressure:NoSchedule> op=Exists
                 <http://node.kubernetes.io/memory-pressure:NoSchedule|node.kubernetes.io/memory-pressure:NoSchedule> op=Exists
                 <http://node.kubernetes.io/network-unavailable:NoSchedule|node.kubernetes.io/network-unavailable:NoSchedule> op=Exists
                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists
                 <http://node.kubernetes.io/pid-pressure:NoSchedule|node.kubernetes.io/pid-pressure:NoSchedule> op=Exists
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists
                 <http://node.kubernetes.io/unschedulable:NoSchedule|node.kubernetes.io/unschedulable:NoSchedule> op=Exists
Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  36m (x312 over 8h)    kubelet  (combined from similar events): Readiness probe failed:
  Warning  Unhealthy  6m11s (x382 over 8h)  kubelet  Readiness probe failed:
  Warning  Unhealthy  32s (x540 over 8h)    kubelet  Liveness probe failed:
g
Liveness probe failed:
So looks like it failed a Liveness check and was killed by kubelet. Are there any logs about
liveness
?
(sorry, got liveness and readiness confused)
m
this is all information when I describe the pod But why kubelet cannot do Readiness and Liveness
g
But why kubelet cannot do Readiness and Liveness
Yes, that's what I'm wondering too. Are there any logs from the pod with
liveness
or
readiness
in them?
Are you running calico-node with a CPU limit?
Do kubelet logs show the error message from the failures?
πŸ‘€ 1
m
Nope, I didn't have any configuration related to resources, only CPU requests was setup (0.25) but I think it is default
πŸ‘ 1
hmm, I haven't check kubelet log yet, I would try to
πŸ‘ 1
@great-jewelry-76121 it's hard to debug with kubelet log, can you give some tips?
g
I guess you're looking for "Liveness probe" and probably the name of the pod
m
this is the log including
calico
in the last 1000 lines of kubelet log sorry, I cannot send logs in text because some characters are too long.
g
Deadline Exceeded
is the error. i.e. it tried to run the Liveness command in the calico-node pod, but it failed to return within the deadline. Is your system heavily loaded (any free CPU)? How large is it? How many pods per node? How many nodes?
❀️ 1
m
@great-jewelry-76121 tks u, I will check it and reply u later
πŸ‘ 1
I kubectl top node and the number of pods is 137 I have 12 nodes in total May disk pressure impact?
g
137 pods on this node? Or on the cluster?
May disk pressure impact?
I don't think disk pressure should affect calico-node
πŸ‘ 1
m
just on this node
g
And how much CPU is in use?
(i.e can you run
top
on the node itself?)
FWIW 137 pods per node is higher than the default k8s limit (110 pods per node). But I wouldn't expect calico-node to have trouble with 137.
πŸ‘ 1
Current Load average is 18.99. Does this box have more than 19 CPUs?
If not, then I'd say that this node is overloaded and calico-node is unable to get the CPU it needs to respond to kubelet fast enough to not get killed.
m
I think I only have 8 CPUs as the screenshot
but how it can peak to nearly 20
g
https://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html
Linux load averages are "system load averages" that show the running thread (task) demand on the system as an average number of running plus waiting threads. This measures demand, which can be greater than what the system is currently processing. Most tools show three averages, for 1, 5, and 15 minutes:
Some interpretations:
β€’ If the averages are 0.0, then your system is idle.
β€’ If the 1 minute average is higher than the 5 or 15 minute averages, then load is increasing.
β€’ If the 1 minute average is lower than the 5 or 15 minute averages, then load is decreasing.
β€’ If they are higher than your CPU count, then you might have a performance problem (it depends).
Load average is how many threads are asking for CPU time (averaged over the period). If its 20 and you have 8 CPUs, then its saying that 12 threads were waiting (on average)
βœ… 1
m
oh, tks for useful knowledge
do you think this load average (about 30 as I observe) is too high in this situation?
g
Yes. If calico-node doesn't get CPU in time to respond to Liveness checks, it will get restarted. Options: β€’ put CPU limits on pods on this node. β€’ put fewer pods on this node - which may mean adding more nodes to the cluster Be a little careful about putting CPU requests and limits on calico-node. calico-node CPU usage is very bursty - it does a lot of work when it is starting up (loading the state of the node) or when pods are created or deleted, very little at other times. If you set the limit too low, it may not start up fast enough, and kubelet will then kill it, causing it to start up again - basically a self-induced crashloop!
❀️ 1
(in case you're wondering, I work for Tigera on projectcalico as a Tester, and one of the things I do is scale test Calico πŸ™‚ )
πŸ˜ƒ 1
If you're going to set resource requests and limits, have a read of the k8s docs on the subject: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ As above, its easy to cause problems by poor request/limit choices
❀️ 1
m
oh, nice, I think I've already met the right person πŸ™‚ I will try to find out the solution following your suggestion Once again, thank u for all insight u gave me I will try to reduce the load average and report to u the result
g
No worries, glad to be able to help
m
hi @great-jewelry-76121, now, I cannot reduce the load average so I decided to increase the timeoutSeconds within the calico-node daemonset (from 1 second to 10 seconds) the restarting phenomenon stopped but the liveness probe + readiness probe failed events still happen. Does it mean 10 seconds timeout wasn't enough for probe?
g
Looking at https://docs.projectcalico.org/manifests/calico.yaml, I see the liveness/readiness sections say:
Copy code
livenessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-live
              - -bird-live
            periodSeconds: 10
            initialDelaySeconds: 10
            failureThreshold: 6
            timeoutSeconds: 10
          readinessProbe:
            exec:
              command:
              - /bin/calico-node
              - -felix-ready
              - -bird-ready
            periodSeconds: 10
            timeoutSeconds: 10
i.e. it shouldn't have been set to 1 second anyway.
Does it mean 10 seconds timeout wasn't enough for probe?
Yes, assuming that the restart was due to Liveness.
Are the restarts actually causing you a problem? calico-node isn't on the datapath anyway (it just sets up networking by programming the kernel).
m
I haven't met any critical problems, but my API gateway daemonset also was restarted and the pod is in the same with restarted calico-node
As I know, calico-node uses iptables to route so what calico's components attend to the datapath?
I think rke config timeoutSeconds: 1 by default 😞
g
As I know, calico-node uses iptables to route so what calico's components attend to the datapath?
The calico-node pod: β€’ installs the CNI plugin, which networks pods when they are created, and cleans up when they are deleted β€’ updates iptables rules on the node to implement network policy
πŸ‘ 1
m
@great-jewelry-76121 thank u I wanna learn more about Calico, where should I start? And does Calico project have QA forum like this?
g
https://slack.projectcalico.org/ for the Calico slack https://academy.tigera.io/course/certified-calico-operator-level-1/ for the free Calico Certified Operator L1 course.
Other free courses here too: https://academy.tigera.io/courses/
m
I have some critical problems with my Calico deployment, I hope it will help to find out the solution πŸ˜„