This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

02/08/2024, 3:35 AM

This message was deleted.

colossal-monitor-64536

02/08/2024, 3:37 AM

I've made no recent changes. It's been stable on v2.8.0 for a month or so

colossal-monitor-64536

02/08/2024, 6:25 AM

oh, it seems I have 25k longhorn "backup" entries in my etcd

colossal-monitor-64536

02/08/2024, 6:30 AM

only 5k of my entries aren't longhorn backups 😄

colossal-monitor-64536

02/08/2024, 6:30 AM

getting "context cancelled", which causes a race condition in the kube-apiserver

colossal-monitor-64536

02/08/2024, 6:31 AM

having to constantly stop all kube-apiservers so etcd has enough RAM/CPU to breath, then I have to restart and try to delete some backup entries in like 30 seconds

miniature-notebook-6405

02/08/2024, 12:36 PM

Why do you have swap enabled? Doom sets in once swapping starts, we're supposed to rely on OOM killer as bad as that sounds.

👀 1

colossal-monitor-64536

02/08/2024, 3:11 PM

I found out it's because 85% of my etcd entries were all of one resource type and apparently kube-apiserver just kills itself after 1 "took too long".

miniature-notebook-6405

02/08/2024, 3:15 PM

Ah yes, the Timeout of Doom

colossal-monitor-64536

02/08/2024, 3:15 PM

is that a known thing? is there a fix? haha

miniature-notebook-6405

02/08/2024, 3:17 PM

Food for thought and grounds for further research as the radio hosts are known to say during the call-in segment

😄 1

colossal-monitor-64536

02/08/2024, 3:18 PM

would even adding more etcd nodes prevent this?

miniature-notebook-6405

02/08/2024, 3:20 PM

maybe increase the toxicity of this same scenario so it happens faster and then run on a different version of k8s, or several different versions

miniature-notebook-6405

02/08/2024, 3:21 PM

to learn more about it

miniature-notebook-6405

02/08/2024, 3:22 PM

then open a support ticket with the boiled down problem

👍 1

colossal-monitor-64536

02/08/2024, 3:24 PM

guess I gotta make sure every resource retention policy exists/works at all 😄

😁 1

Open in Slack

Previous Next