10/31/2022, 12:57 PM
Trying to track down a memory leak type of problem. Have a large cluster - 4 CP nodes, 5 ETCD Nodes, 360 Worker nodes. Workloads are constantly being re-deployed so workloads and configmaps are in the 15,000-25,000 range. ETCD has auto-compact every 30 min. No matter how big we make the instance sizes on ETCD, one random node (not the leader) out of 5 constantly increases its memory load until it runs out, while the other 4 stay in a reasonable low percentage resource utilization. Thoughts?