hi all, I found my rke2-agent node got lost when t...
# rke2
l
hi all, I found my rke2-agent node got lost when there exist high load, which cause some process oom and got killed. The OOM error message could be checked from the picture. unluckly, it seems rke2-agent got hurts in this condition and when I try to get some logs from my pod it says:
Copy code
$ kubectl logs -f my-app-deployment-58dcff5b4c-l45s8
Error from server: Get "<https://xxx.xxx.xxx.xxx:10250/containerLogs/default/my-app-deployment-58dcff5b4c-l45s8/my-container?follow=true>": proxy error from 127.0.0.1:9345 while dialing xxx.xxx.xxx.xxx:10250, code 502: 502 Bad Gateway
and i need to login to the worker node and do
sudo service rke2-agent restart
and then my log output normally. I want to know, how to protect my rke2 process from such situation thus I don't need to login to the worker to restart it. great thanks.
c
The service will get restarted periodically by systemd. It seems like you need some limits on your pods so that the kernel isn't killing other things on the node when you run out of memory.