This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

08/01/2023, 6:39 PM

This message was deleted.

future-fountain-82544

08/01/2023, 6:41 PM

So far, I've really only thought to look at the kernel log message:

Copy code

kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cri-containerd-2ffcdeac7a102eb3f6f49b43ae4afa36589bb9765272cf953fe7ee15a3a3cc67.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf27c27f9_1a8b_40bf_80a0_339b93de7be2.slice/cri-containerd-2ffcdeac7a102eb3f6f49b43ae4afa36589bb9765272cf953fe7ee15a3a3cc67.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podf27c27f9_1a8b_40bf_80a0_339b93de7be2.slice/cri-containerd-2ffcdeac7a102eb3f6f49b43ae4afa36589bb9765272cf953fe7ee15a3a3cc67.scope,task=PROCESS,pid=3624942,uid=0

And then look at status.containerStatuses[].containerID to see if I find the

2ffcd...

in there somewhere, but it's a somewhat laborious process

creamy-pencil-82913

08/01/2023, 6:50 PM

don’t you get a kubernetes event for the pod when a container is restarted?

future-fountain-82544

08/01/2023, 6:50 PM

The container isn't restarted. The root process is sitting there happy-as-can-be, but one of its subprocesses borked

future-fountain-82544

08/01/2023, 6:51 PM

So, the container/pod never died

creamy-pencil-82913

08/01/2023, 6:52 PM

ah I see. in that case its the responsibility of pid 1 in the container to do something about it. The container runtime doesn’t get the signal indicating that a child process has exited for anything other than pid 1.

creamy-pencil-82913

08/01/2023, 6:54 PM

are you running a process supervisor or something in that container? It’s considered somewhat of an anti-pattern to run multiple processes in a single container, for this very reason. Makes it hard for the container runtime and kubelet to handle process termination events.

future-fountain-82544

08/01/2023, 6:55 PM

Yup, I get the idea, but at the same time, when workloads wind up on our cluster that don't behave that way, I'd like to track down where those errors came from. I'm not necessarily the workload author

future-fountain-82544

08/01/2023, 6:55 PM

In this case, it's a python process that's using multiprocessing.Process()s to get some parallelism

future-fountain-82544

08/01/2023, 6:56 PM

I've actually fixed this specific workload, but I'm looking for a more general solution

future-fountain-82544

08/01/2023, 6:57 PM

IE: What happens when someone rolls out a workload that doesn't abide by these best practices, how can I figure out which of the 500 pods it is?

future-fountain-82544

08/01/2023, 6:57 PM

This time, I lucked out

creamy-pencil-82913

08/01/2023, 6:59 PM

there are a couple moving pieces here

creamy-pencil-82913

08/01/2023, 6:59 PM

1, the kubelet and container runtime try to handle memory limits themselves so that the kernel OOM killer doesn’t get involved. If you have memory limits on the pods, kubernetes tries to kill the container when it hits the limit so that it can handle restarting it properly.

creamy-pencil-82913

08/01/2023, 7:00 PM

2. if the kernel does get involved, the kubelet watches the kernel message log (dmesg) so that it can see what process got OOM killed, and tries to handle restarting it

future-fountain-82544

08/01/2023, 7:00 PM

Oh interesting, I figured the kernel OOM killer did the bulk of the work

creamy-pencil-82913

08/01/2023, 7:01 PM

Does the pod in question have memory limits set? Does the node have swap enabled?

future-fountain-82544

08/01/2023, 7:01 PM

The pod has memory limits and the nodes do not have swap enabled

creamy-pencil-82913

08/01/2023, 7:02 PM

is the pod as a whole exceeding its memory limit when that process gets OOM killed?

future-fountain-82544

08/01/2023, 7:03 PM

You know, I'm not entirely sure, but I believe it's probably the the pod total going over, and then one unlucky process gets taken out. The node itself has gobs of memory free

future-fountain-82544

08/01/2023, 7:04 PM

The workload itself is limited to like 2GB of memory, and the node has 128GB of ram and sits at like 20-30% utilization

creamy-pencil-82913

08/01/2023, 7:04 PM

I think that’s kinda how it works. If the OOM killer gets to the pod first, and picks a process that isn’t pid 1, it’s not visible to the container runtime.

creamy-pencil-82913

08/01/2023, 7:05 PM

The app itself needs to be responsible for handling that, whether it’s restarting the child process, or exiting out so the container can be restarted.

future-fountain-82544

08/01/2023, 7:05 PM

Makes sense; That particular pod has been adjusted to do just that -- if a worker disappears, it'll shutdown.

creamy-pencil-82913

08/01/2023, 7:05 PM

see https://github.com/kubernetes/kubernetes/issues/78973#issuecomment-505283419

creamy-pencil-82913

08/01/2023, 7:06 PM

If your main process is forking, it should be fail if a critical child fails / restart the child itself, or you need to use a liveness probe.

It’s expected that child processes might exit for various reasons and it would be a breaking change to restart pods when a child process exits.

future-fountain-82544

08/01/2023, 7:07 PM

Makes sense now. I still wish there was a half-decent way to track this down. I'm combing through logs to see if I can track the containerd-id back to pod information somehow

creamy-pencil-82913

08/01/2023, 7:07 PM

yeah, tracking the cgroup ID back to a container/pod isn’t easy without poking at the internals

future-fountain-82544

08/01/2023, 7:07 PM

I see the container-id in the pod status information. I'm wondering if that's getting stored in a log somewhere that gets bubbled up to our logging system

future-fountain-82544

08/01/2023, 7:09 PM

So question on best-practices. We've been led to believe that it's best to set memory request/limits to the same thing, and set cpu requests and generally leave limits empty. Is this still a general good-practice?

future-fountain-82544

08/01/2023, 7:09 PM

I'm trying to think of a scenario where allowing a request/limit to not match up might make sense and/or setting a CPU limit might make sense

creamy-pencil-82913

08/01/2023, 7:19 PM

people have feelings about that. I’m not sure there’s a single “right way” to do it, what works best depends on how you want things scheduled and limits enforced.

future-fountain-82544

08/01/2023, 7:19 PM

Sure thing, curious if there's any resources as to when someone would choose one approach over another

future-fountain-82544

08/01/2023, 7:25 PM

I'm mostly curious about competing opinions/approaches to it. At any rate, I think I have the best answer I can for now and I learned some things about kubelet

alert-policeman-61846

08/08/2023, 3:10 PM

May me metrics can help ? node exporter

node_vmstat_oom_kill

creamy-pencil-82913

08/15/2023, 5:14 PM

@future-fountain-82544 possibly helpful for your use case: https://github.com/kubernetes/kubernetes/pull/117793

future-fountain-82544

08/15/2023, 5:31 PM

Ooh ty

138 Views

Open in Slack

Previous Next