This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

07/31/2023, 1:10 PM

This message was deleted.

hallowed-breakfast-56871

07/31/2023, 7:24 PM

Have you considered swapping ingress controller? Perhaps a different one in your dev cluster for testing could be a good idea. Yeah, it would likely mean some downtime, but a planned hour of so is likely better than customers seeing a 503.

straight-truck-52758

08/01/2023, 8:45 AM

Hi Josh, thanks for responding - So the 503s are not consistent. We don't believe it is the ingress controller or any of the virtual services (at this stage) as we are able to (both externally and internally) access some pods / services but not others. Our current (not) working theory is that there has been a change in what is and isn't allowed for pod / service specs in the underlying kubernetes or RKE2 between 1.22 and 1.24 and we're falling foul of that. We're ruled out the DNS by using the svc FQDN and been testing curl from both the host and within the CNI (inside a pod).

hallowed-breakfast-56871

08/02/2023, 4:09 AM

Right. So what is your 1.22 ingress look like vs a 1.24.x ingress?

straight-truck-52758

08/02/2023, 10:35 AM

Ingress is exactly the same - Istio ingress gateway using sidecar proxy injection and virtual services. Same version for both (1.16.5) to minimise the moving parts (as part of our leap frog approach)

hallowed-breakfast-56871

08/02/2023, 8:30 PM

Right. I'm afraid I am not going to be much help then. I tend to stick with the non-sidecar approach and the simple controllers like HAProxy and Nginx.

straight-truck-52758

08/04/2023, 8:44 AM

No problem / thanks for having a stab anyway

straight-truck-52758

08/04/2023, 3:35 PM

I'm adding this here in case some poor lost soul ends up in the same situation and needs a solution (https://xkcd.com/979/) It seems like from RKE2/ Kubernetes 1.23 > networkpolicy enforcement is stricter. Issues which we should have had with our network policies prior to this weren't proc'ing and so flew under the radar. Once we identified the network policies as the root cause it was fairly easy to find and fix. In our case a combination of issues around ingress selectors

17 Views

Open in Slack

Previous Next