This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

06/27/2023, 7:48 PM

This message was deleted.

hundreds-battery-84841

06/27/2023, 8:00 PM

@mammoth-winter-72426 AFK: If you want to ensure that replicas are rescheduled to different nodes when the nodes they were running on fail, you can utilize pod anti affinity feature. Pod anti affinity allows you to specify rules that prevent pods from being scheduled on nodes that already have pods meeting certain criteria.

mammoth-winter-72426

06/27/2023, 8:07 PM

@hundreds-battery-84841 thanks. I’ll look into it.

hundreds-battery-84841

06/27/2023, 8:07 PM

Sure

worried-jackal-89144

06/27/2023, 8:09 PM

I think the issue is that the nodes were stopped but left in the cluster. They stopped reporting the state (kubelet's didn't update their leases) and became NotReady. In such a case node should have NoExecute taint applied and by default this taint is tolerated for 300 seconds. @mammoth-winter-72426 how long did you wait till you started again the nodes? If this was a planned downtime you should consider first draining the node and then shutting it down. Then all the workloads should be moved to other nodes before the original node is down.

mammoth-winter-72426

06/27/2023, 8:15 PM

@worried-jackal-89144 I wanted to test k3s HA 5 node cluster failover of multiple nodes and see how the application deployments and their pod replicas reacted. I shutdown the 2 nodes as I explained earlier, and waited for around 25 minutes. And the pods on those nodes just remained in terminating state… but I thought they’d reschedule to the 2 other server nodes that had nothing running on them… But this was not the case, After 25 minutes I rebooted both nodes that I turned off, and only then did all the pod replicas in terminating state that were on those failed nodes reschedule successfully to different nodes.

worried-jackal-89144

06/27/2023, 9:07 PM

I set up a cluster with 3 worker nodes with k3s version: v1.26.5+k3s1. I stopped k3s agent on one of them. After some time I noticed there is an event:

Copy code

0s          Normal    TaintManagerEviction   pod/test-55cb9ff8d5-hmjgq    Marking for deletion Pod test/test-55cb9ff8d5-hmjgq

and then a new pod for this deployment has been spawned, but the old one is still in "Terminating" state. Kube-controller-manager simply marked the node for deletion, and now it waits for deletion confirmation from the worker node.

hundreds-battery-84841

06/27/2023, 9:09 PM

Yeah, that behavior is expected when a node becomes unresponsive or is marked for deletion.

worried-jackal-89144

06/27/2023, 9:12 PM

In case you have a "cloud" deployed cluster, with nodes running in autoscaling, you can install cloud-controller-manager for this cloud, and it should be able to terminate the node and then all the pods are deleted from the cluster as well.

creamy-pencil-82913

06/28/2023, 7:08 AM

you would need to either drain the pods first, or delete the down nodes from the cluster. Just shutting the node off doesn’t tell Kubernetes anything about its pods, for all it knows they could still be running and there is a network outage, or the kubelet has crashed, or so on.

7 Views

Open in Slack

Previous Next