When I noticed the problem, one of the first tests I did was to isolate the workloads in the agent node and taint the master. With that, just a few daemonsets were running there. Even though, I could still see ~21% of CPU utilization in the PI.
Searching for that, I ended up in the profiling
documentation, which describes a scenario like mine as 25% of a core, not ~21% of total, like mine.
I also read about that related
issue, it is near my case. Seems like this issue never got a proper answer.
Next, I also enabled pprof in the master. I’m running a continue profiling tool there (Parca), I can see that with the window sample that I have (15 minutes), around 18% of the time was spent in sqlite.
With all that being said, could you guys help me identify the root cause of this problem?