This message was deleted.
# rke2
a
This message was deleted.
h
Have you considered swapping ingress controller? Perhaps a different one in your dev cluster for testing could be a good idea. Yeah, it would likely mean some downtime, but a planned hour of so is likely better than customers seeing a 503.
s
Hi Josh, thanks for responding - So the 503s are not consistent. We don't believe it is the ingress controller or any of the virtual services (at this stage) as we are able to (both externally and internally) access some pods / services but not others. Our current (not) working theory is that there has been a change in what is and isn't allowed for pod / service specs in the underlying kubernetes or RKE2 between 1.22 and 1.24 and we're falling foul of that. We're ruled out the DNS by using the svc FQDN and been testing curl from both the host and within the CNI (inside a pod).
h
Right. So what is your 1.22 ingress look like vs a 1.24.x ingress?
s
Ingress is exactly the same - Istio ingress gateway using sidecar proxy injection and virtual services. Same version for both (1.16.5) to minimise the moving parts (as part of our leap frog approach)
h
Right. I'm afraid I am not going to be much help then. I tend to stick with the non-sidecar approach and the simple controllers like HAProxy and Nginx.
s
No problem / thanks for having a stab anyway
I'm adding this here in case some poor lost soul ends up in the same situation and needs a solution (https://xkcd.com/979/) It seems like from RKE2/ Kubernetes 1.23 > networkpolicy enforcement is stricter. Issues which we should have had with our network policies prior to this weren't proc'ing and so flew under the radar. Once we identified the network policies as the root cause it was fairly easy to find and fix. In our case a combination of issues around ingress selectors