This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

07/07/2024, 11:54 AM

This message was deleted.

bland-appointment-12982

07/08/2024, 6:04 PM

No, that’s because what you’re looking at are different things. Capacity is a nodes overall capacity in memory and cpu. Allocation is what you generally can allocate to pod ressources after system overhead is taken into account. This meaning. If some pods have some reserved memory and cpu. This will not be taken into account on these fields. The reason these are exactly the same in your situation, is because you allocated all the nodes ressources to be allocatable. This is generally not recommended, as the underlying system should have some overhead. This could potentially lead to a system crash, as all the nodes ressources could be used. To fix this, you can look into system-reserved and kube-reserved flags inside your config.

gifted-eye-83270

07/08/2024, 6:09 PM

@bland-appointment-12982 you are great. Thank you for that. How about this configrations ?

# /etc/rancher/rke2/config.yaml

kubelet-arg:

- "cpu-manager-policy=static"

- "cpu-manager-reconcile-period=5s"

- "topology-manager-policy=best-effort"

- "topology-manager-scope=container"

- "kube-reserved=cpu=500m,memory=1Gi,ephemeral-storage=1Gi"

- "system-reserved=cpu=500m,memory=1Gi,ephemeral-storage=1Gi"

- "reserved-system-cpus=0,1"

- "eviction-hard=memory.available<500Mi,nodefs.available<1Gi,imagefs.available<15Gi"

- "eviction-soft=memory.available<1Gi,nodefs.available<5Gi,imagefs.available<20Gi"

- "eviction-soft-grace-period=memory.available=1m,nodefs.available=1m,imagefs.available=1m"

bland-appointment-12982

07/09/2024, 8:33 AM

Looks about right. I don't know your specs or anything. But, i would recommend reserving about 10% of cpu, memory and storage for your system, and leave the rest up for grab

brainy-kilobyte-33711

07/09/2024, 3:18 PM

Does altering eviction-hard work? We are looking into tweaking these settings at the moment and saw it's dropped in the pebinary https://github.com/rancher/rke2/blob/61cebdaf4b649351142221f85302f9292e1aa275/pkg/pebinaryexecutor/pebinary.go#L162

bland-appointment-12982

07/09/2024, 6:26 PM

Eviction-hard essentially exists to keep the node healthy, and not overcommitted in terms of ressources.

brainy-kilobyte-33711

07/09/2024, 6:28 PM

Sorry I may have hijacked the thread. We are looking at eviction-hard because a node just went down due to OOM despite all pods having memory limits set and the total memory limit cluster commitment being <100%.

bland-appointment-12982

07/09/2024, 6:35 PM

I'm not sure what caused it, as i can't see it. So you guys gave some sort of monitoring on it? On top of my head, it seems like pod eviction policies might be misconfigured. Or even, have you looked into node affinity and node anti-affinity rules? If so, these can also cause resource imbalances.

bland-appointment-12982

07/09/2024, 6:38 PM

Could also be that deployments, daemons or stateful sets are misconfigured, without some sort of backoff mechanism, no pod disruption budget etc.

gifted-eye-83270

07/10/2024, 9:53 AM

@brainy-kilobyte-33711 Can yo confirm your configration ? curl -X GET http://127.0.0.1:8001/api/v1/nodes/<node-name>/proxy/configz | jq .

42 Views

Open in Slack

Previous Next