When scaling up the management plane on a large cluster, what are some other back-end settings that might normally be changed to improve performance? Should we increase the number of cattle-cluster-agents? We notice that in a large cluster (15000+ workloads) that the API gets overwhelmed, even with 4 high end controlplane instances and ETCD just grows until it hits its size cap (so far, with no matter what size limit we make it) which repeatedly results in a NOSPACE error
10/11/2022, 6:23 PM
Yea this really boils down to a limitation of etcd. It might be best to use k3s with kine which allows you to use an external data store backend like MySQL.
10/11/2022, 8:14 PM
hi, just curious but can i ask what your control plane looks like? how many nodes in the cluster(control/worker) ? what sizes? are the control nodes also running as workers or separate node pools?
is this running in cloud or on prem?