08/29/2022, 5:57 PM
I'm having an issue where k3s v1.20.11+k3s2 with Rancher 2.5.x in a 2 cluster of 2 VM nodes with MySQL as its external database is using all available cpu on the host and showing anywhere from 1500 - 12000 sockets in time_wait with the mysql server port varying by sometimes 200+ connections per second. Rancher pods are showing 100%+ cpu, Traefix is using 50%, and mysql is using 50% and k3s-server is using at times > 600% cpu. Rancher only manages 1 3-node baremetal RKE cluster. I'm unable to determine what is causing the massive number of TCP connections and/or the high CPU usage within the K3S cluster. Both clusters were running fine until Saturday when the Rancher K3S cluster pretty much blew up and hasn't been able to recover since. both k3s nodes are complaining about possible SYN flood due to the number of connections, but traffic is all internal.