05/10/2023, 1:34 PM
G'day, we have been rolling out Rancher 2.6.11 to get fleet-agent-0.5.3 that includes the fix for ( The rollout has been successful and I can see a drop in traffic from the databases for all installations except in one cluster. That cluster have 1390 bundledeployments, I can see that I get a drop in database traffic, but that quickly increases again and after a couple of hours the traffic from the database is back to the exact behavior we had before (see attached image). I have tried to change " FLEET_CLUSTER_ENQUEUE_DELAY" and "CHECKIN_INTERVAL" but that did not help at all and I can not find much more information on how to tune the fleet-agent. So any advice here (except limit the number of bundles) are very welcome.
On the other clusters I can see a spike in network traffic every 30 minutes, I guess it's some kind of reconcile that happens. My idea here is that we have too many bundles so the reconcile periods goes into each other and we end up in a loop. Note, this is a guess