This message was deleted.
# neuvector-security
a
This message was deleted.
🙌 1
q
Not sure we’d want the Controller scaling like that. There are 3 not for scale, but for HA
a
Is there a reason the controller can only be 3? I would think its still valuable for HA if the controller is getting slammed/hitting its CPU limits?
(maybe we'd just want to adjust to min 3, max 5 as the default for the HPA?)
q
It can be more; an odd number preferred. 🙂
👍 1
THERE you go
🙌 1
a
I can update the PR accordingly - that way we'd have HA by default but ability to scale if needed? And worth noting this is all off by default.
q
I mean, you could run with one, but if that one hiccuped, you’d lose access to the UI at best, and complete loss of config at worse
👍 1
FWIW I don’t think too many folks have a load issue w/the Controller save for massive deployments,. 😄
👍 1
a
Yeah totally - just figured HPA doesn't hurt here. Better than having to hardcode a replica count of 5 and hog more pod-space than needed all the time.
q
not to in any way poo poo on your awesome work here! 🙂
a
Nah, its good context - figured I'd share the PR here for this type of feedback. I'm not too in touch with the architecture to know if there would be nuances to scaling.
q
Related: NV 5.x introduced auto-scaling of the Scanners based upon load, which i think is a fantastic idea. However, it is implement in the NeuVector UI:
👀 1
This may be AWESOME for some/most shops
but I wonder if there may other orgs who would say “Hey! Why is an app reaching down into K8S deployments/scaling outside the normal practices?!” Maybe.
a
Interesting, I'll have to look at that. Heh, I was wondering...how does the scaling there work?
q
It looks at the queue of images to be scanned, and then hz scales out/in.
Use-case: The Updater updates the Scanners at night, as normal. A scan of your registries fires off, but you have a bajillion images and it’s taking HOURS to complete.
1
I mean, it’s pretty damned cool
But I also worry about a place where an admin on NV sets it to scale waaaaaay up and suddenly there are OOM alerts. But I lay awake worrying about stuff like that. 😆
🤣 1
a
Trying to work out in my head the architecture of this scaling... is it the controller that triggers the scale up/down? And is it effectively "editing" the deployment with elevated RBAC?
q
Pretty much, yes. It’s just issuing a new deployment scale to K8S.
and when the storm cools off, it scales back down to the minimum you set in the UI
which may not be the number you set in the original deployment
this probably doesn’t matter in most any env, cuz the Scanners are super low impact when idle
a
Interesting...so we deploy neuvector via helm, via flux for gitops. It's not necessarily something we are doing today but drift-detection is a thing with Flux and would potentially stomp on the changes the controller makes because it no longer aligns with the "source of truth" in git. I think that's a benefit of using HPA instead here - its still all declarative and matches what is in git. Although I'm sure the scaling via UI is "smarter" about when to scale...tradeoffs 😅
q
Yes!
Let’s say you are the admin of all things K8S and I’m a security geek with access to the Neuvector console _for reasons_…
You deploy NV with, say, 4 scanners because you and your team have decided it’s The Right Thing to Do™
and I come along some time later and flip the bits on the NV Auto-scaling of Scanner Pods
and you’e like “Wait, why are there 8 running? Now it’s 2? Huh?”
👍 1
and maybe you’re just confused and annoyed, and then I explain it to you, and we laugh and laugh and go on with our lives
😁 1
but what if I go “wheeee, 99 scanners!” and it borks the cluster
and you, not knowing WTF is going on, think NeuVector is going insane and react accordingly
see?
and then the app team grabs their torches and pitchforks and says “Security kills agility! no more security tools!”
🫠 1
again, this is all in my head and I have no evidence that this has ever happened
but, well, [gestures widely to the humans]
a
Yeah 💯 - we're trying to do everything as declaratively in our config as possible, so even if we used the UI autoscaling it would be setup via the configmap/secret
sysinitcfg.yaml
and down the road drift detection is something we're considering too. That way changes like this in the UI that don't align with the init config (theoretically) wouldn't persist, and we'd see everything since it would be through changes in the code 🤷 . A bit of an optimistic path, but I'm a bit fan of config-as-code and gitops things 😄
🏆 1
q
and, kubernetes is supposed to kepe us from cutting ouysrelves on teh sharp edges like this
but, still, humans