This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

03/29/2024, 12:49 PM

This message was deleted.

shy-artist-94999

03/29/2024, 3:24 PM

If you're looking to set up alerting based on OOM events, you can use a combination of Kubernetes monitoring tools and log aggregation systems. For example, you can use Prometheus to monitor resource usage metrics, including memory usage, and set up alerts based on thresholds. You can also use tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd to aggregate and analyze logs, including the pod logs for OOM events, and create alerts based on specific log patterns or exit codes. Additionally, Kubernetes itself provides event logging that can be monitored for OOM-related events. You can use

kubectl get events

or tools like

kubectl describe pod

to check for events related to pod terminations.

hundreds-evening-84071

03/29/2024, 3:28 PM

Thanks for the reply; I have both setup monitoring and logging (to LOKI)... the pod in question was restarted, and when I did

kubectl describe pod

it shows exit

error code of 137

however, I do not see in grafana the pod consumed anywhere close to the defined memory limit And, when I look at

kubectl logs pod_name

, I do not see any entry for OOM So, I am trying to figure out why this happened and where I can see so alerting can be setup.

sparse-fireman-14239

03/29/2024, 4:17 PM

If the pod consumed all of it's memory, or tried to, in a matter of a few seconds, you won't necessarily see it in Prometheus because it doesn't scrape the metrics every second.

hundreds-evening-84071

03/29/2024, 4:26 PM

Thank you! I suspected that was the case... I will have to check the frequency of prometheus for scraping the metrics

hundreds-evening-84071

03/29/2024, 4:29 PM

looks like default is 60-seconds per this doc

calm-farmer-45530

04/01/2024, 5:01 PM

Also worth checking your system logs to confirm it's not the kernel doing the OOM kiling (this was the case for us recently when we had transparent huge pages enabled).

2 Views

Open in Slack

Previous Next