This message was deleted Rancher Users #neuvector-security

Join Slack

This message was deleted.

# neuvector-security

adamant-kite-43734

03/27/2025, 4:37 PM

This message was deleted.

clean-magazine-25026

03/27/2025, 4:56 PM

Do you have istio running in NeuVector Namespace?

abundant-apple-86556

03/27/2025, 5:00 PM

Yep - istio injection + mTLS strict.

abundant-apple-86556

03/27/2025, 5:04 PM

We have a few "workarounds" that were required to get that working in the past: • permissive mtls on controller port 18300 and 30443 • headless services for controller, enforcer, scanner with a number of ports to make sure Istio setup listeners + proper protocols for them (gossip ports, healthz ports, etc)

clean-magazine-25026

03/27/2025, 5:05 PM

Would you be able to open an issue in github and provide an example enforcer log with the issue?

clean-magazine-25026

03/27/2025, 5:06 PM

As a test, can you disable istio on the NeuVector namespace and see if it improves?

abundant-apple-86556

03/27/2025, 5:07 PM

Yeah let me give that a shot - are you aware of any issues running with Istio? I know it definitely didn't seem to work out of the box for us in the past. Also if relevant this is on our dev clusters at the moment (k3d) - although I believe we've seen some of the same issues on rke2 as well.

abundant-apple-86556

03/27/2025, 6:01 PM

Not really seeing any better results without istio injection/mtls enforcement. Going to try to get an issue written up with full reproduction steps.

clean-magazine-25026

03/27/2025, 6:03 PM

Thank you.

abundant-apple-86556

03/31/2025, 10:48 PM

Discovered our issue. We're using images from Ironbank (DoD repository of hardened images). The change in 5.4.3 to the affinity check (here) resulted in the

neuvector.role

label being required on the container image. The Ironbank image is not built with that label so it was getting stuck on the affinity check and not starting up consul.

👏 1

👍 1

abundant-apple-86556

03/31/2025, 10:50 PM

Working with that team to hopefully resolve though...

abundant-apple-86556

04/02/2025, 10:06 PM

@clean-magazine-25026 do you happen to know if there would be other places in the code looking for

neuvector.role

on the other images? We got it added to the enforcer image (and that resolved the enforcer issue) but experiencing some weirdness on controllers in some clusters and trying to debug if its related. I skimmed through the code and didn't see anything obvious where the label would be needed on the controller - working to get logs here to better debug...

clean-magazine-25026

04/02/2025, 10:12 PM

I don't so it is likely the controller issue is something else. Please describe the symptom(s).

abundant-apple-86556

04/02/2025, 10:19 PM

Controllers just aren't becoming ready - due to where this is happening I don't have logs yet, going to try and get those tomorrow. I noticed this and am wondering if the self id might be messing up - https://github.com/neuvector/neuvector/blob/main/share/container/cri_client.go#L290-L295 (because I don't have the role label on controllers).

clean-magazine-25026

04/02/2025, 10:26 PM

the logs should give a hint on why it is not becoming ready. You can also enable debug logging upon startup if needed to see more output.

👍 1

abundant-apple-86556

04/02/2025, 10:27 PM

Yeah we'll have to reproduce this in an environment where we have more access to debug. Some of these errors were hit in ephemeral environments - mostly just wanted to check if you were aware of anything before we start digging further.

abundant-apple-86556

04/03/2025, 2:57 PM

So all we're really seeing is this on the pod events:

Copy code

Warning  Unhealthy  3m20s (x341 over 32m)  kubelet            Readiness probe failed:

Controller logs appear normal, we even see the readiness log (which should be when the

/tmp/ready

file gets created/updated?):

Copy code

❯ kl -n neuvector neuvector-controller-pod-xxx | grep "ctrl init done"
2025-04-03T14:10:15.407|INFO|CTL|utils.SetReady: - value=ctrl init done

Is it possible that the probe is getting killed by the enforcers? I was seeing this in enforcer logs:

Copy code

2025-04-03T14:53:15.843|DEBU|AGT|main.reportIncident: - eLog={LogUID: ID:11 HostID:xxx HostName:xxx AgentID:xxx AgentName:xxx WorkloadID:xxx WorkloadName: ReportedAt:2025-04-03 14:53:15.843069795 +0000 UTC ProcName:cat ProcPath:/usr/bin/busybox ProcCmds:[cat /tmp/ready ] ProcRealUID:0 ProcEffUID:0 ProcRealUser: ProcEffUser:root FilePath: Files:[] LocalIP:<nil> RemoteIP:<nil> EtherType:0 LocalPort:0 RemotePort:0 IPProto:0 ConnIngress:false LocalPeer:false ProcPName:runc ProcPPath:/usr/bin/runc Count:15 StartAt:2025-04-03 14:52:10.811128089 +0000 UTC m=+2506.654157597 Action:deny RuleID:00000000-0000-0000-0000-000000000006 Group:NV.Protect Msg:Process profile violation, not from its root process: execution denied}

abundant-apple-86556

04/03/2025, 3:45 PM

Seems like it's probably our custom image registry again and

cat

being in a different place 😞

69 Views

Open in Slack

Previous Next