sticky-summer-1345011/29/2022, 3:08 PM
state and I want to know whether it's k3s, k8s, or me. Example: I have cluster with 1
node and several
nodes, and I have workloads spread across the workers. Lets say a worker node dies - maybe it's never going to return.
Some of the pods get stuck in the terminating state and don't get replaced on other worker nodes. This means the cluster is no-longer respecting the declarative state. Is this a problem specific to me, a problem specific to k3s, a problem with k8s, or something else?
$ kubectl get pods --context kube001 --all-namespaces -o=wide |grep Terminating kube-system traefik-9c6dc6686-jdt9f 1/1 Terminating 0 24d 10.42.1.4 kube002 <none> <none> active-mq active-mq-6665f5d8b9-ztwnq 1/1 Terminating 0 15d 10.42.1.82 kube002 <none> <none>
In this example the pod has been in that state for more than 2 days.
$ kubectl describe pod --context kube001 --namespace active-mq active-mq-6665f5d8b9-ztwnq Name: active-mq-6665f5d8b9-ztwnq Namespace: active-mq Priority: 0 Service Account: default Node: kube002/10.64.8.117 Start Time: Sun, 13 Nov 2022 15:07:50 +0000 Labels: app=active-mq pod-template-hash=6665f5d8b9 Annotations: <http://kubernetic.com/restartedAt|kubernetic.com/restartedAt>: 2021-04-30T16:34:24+01:00 Status: Terminating (lasts 2d5h) Termination Grace Period: 30s ...
creamy-pencil-8291311/29/2022, 6:40 PM
sticky-summer-1345011/29/2022, 6:44 PM
So - why have some pods terminated correctly and some show that they are terminating. It's inconsistent.
it can’t show as “terminated” because Kubernetes doesn’t know anything about that node.
creamy-pencil-8291311/29/2022, 7:35 PM
The node controller does not force delete pods until it is confirmed that they have stopped running in the cluster. You can see the pods that might be running on an unreachable node as being in theor
Terminatingstate. In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from Kubernetes causes all the Pod objects running on the node to be deleted from the API server and frees up their names.
sticky-summer-1345011/30/2022, 9:00 AM