07/15/2022, 6:18 PM
Just ran a complete restart of all 8 workers and figured I'd have to reup a new cluster if it didnt work. Like 15 minutes later the whole system came up, sweet jesus!


07/15/2022, 6:28 PM
If I recall correctly, the certs used for Kubernetes to talk to itself with K3S & RKE2 expire in one year and won't try to regenerate without restarting the services. It sounds like that's what you ran into. There's some grace period that if you restart within that then it'll regenerate early, but I'm not sure what it is. My guess is a monthly restart for patching is assumed?


07/15/2022, 6:31 PM
Yeah I would say so. I just took over this deployment. But this is good input. I will look it up thanks.


07/15/2022, 6:45 PM
that is correct, all the leaf certs are valid for 1 year and are automatically renewed at k3s startup if they will expire within 90 days. The expectation is that nodes are getting patched somewhat regularly, k3s is getting updated, and that would be an opportunity for the certs to renew themselves.
if you’re not patching and restarting k3s, you should probably at least add a cronjob to restart k3s every month or two.


07/16/2022, 9:15 AM
Thank you for the Input. It turns out our outtage was to the minute of a year and it does indeed look from what logs I could gather that we experienced a cert failure due to uptime. I just took over this cluster from the previous ops guy, and it seems he hasn't patched it ever. So I know what to do now.