This message was deleted.
# k3s
a
This message was deleted.
h
@mammoth-winter-72426 AFK: If you want to ensure that replicas are rescheduled to different nodes when the nodes they were running on fail, you can utilize pod anti affinity feature. Pod anti affinity allows you to specify rules that prevent pods from being scheduled on nodes that already have pods meeting certain criteria.
m
@hundreds-battery-84841 thanks. I’ll look into it.
h
Sure
w
I think the issue is that the nodes were stopped but left in the cluster. They stopped reporting the state (kubelet's didn't update their leases) and became NotReady. In such a case node should have NoExecute taint applied and by default this taint is tolerated for 300 seconds. @mammoth-winter-72426 how long did you wait till you started again the nodes? If this was a planned downtime you should consider first draining the node and then shutting it down. Then all the workloads should be moved to other nodes before the original node is down.
m
@worried-jackal-89144 I wanted to test k3s HA 5 node cluster failover of multiple nodes and see how the application deployments and their pod replicas reacted. I shutdown the 2 nodes as I explained earlier, and waited for around 25 minutes. And the pods on those nodes just remained in terminating state… but I thought they’d reschedule to the 2 other server nodes that had nothing running on them… But this was not the case, After 25 minutes I rebooted both nodes that I turned off, and only then did all the pod replicas in terminating state that were on those failed nodes reschedule successfully to different nodes.
w
I set up a cluster with 3 worker nodes with k3s version: v1.26.5+k3s1. I stopped k3s agent on one of them. After some time I noticed there is an event:
Copy code
0s          Normal    TaintManagerEviction   pod/test-55cb9ff8d5-hmjgq    Marking for deletion Pod test/test-55cb9ff8d5-hmjgq
and then a new pod for this deployment has been spawned, but the old one is still in "Terminating" state. Kube-controller-manager simply marked the node for deletion, and now it waits for deletion confirmation from the worker node.
h
Yeah, that behavior is expected when a node becomes unresponsive or is marked for deletion.
w
In case you have a "cloud" deployed cluster, with nodes running in autoscaling, you can install cloud-controller-manager for this cloud, and it should be able to terminate the node and then all the pods are deleted from the cluster as well.
c
you would need to either drain the pods first, or delete the down nodes from the cluster. Just shutting the node off doesn’t tell Kubernetes anything about its pods, for all it knows they could still be running and there is a network outage, or the kubelet has crashed, or so on.