This message was deleted.
# k3s
a
This message was deleted.
c
have you tried the ethtool command listed here: https://github.com/flannel-io/flannel/blob/master/Documentation/troubleshooting.md#nat >
Copy code
/usr/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off
>
n
Just tried it; no change (more details added, above).
c
wait are you going through haproxy or not? this is a weird setup. Why are you running a loadbalancer outside the cluster? Why not just deploy metallb or kube-vip instead of managing a totally independent loadbalancer external to the cluster?
n
I normally use haproxy on a node on the public IP I have in front of the cluster. I wanted to eliminate my haproxy configuration as a possible cause of the problem; hence the
curl
command that uses
haproxy-protocol
to connect. Did that make any sense?
c
what is the
externalTrafficPolicy
on your service? Have you tried setting that to local so that it’s not bouncing connections between nodes?
this is all just managed by kube-proxy, I’m not convinced that flannel is even involved here
n
I may not have adequately described my situation. I have a single public IP address that is attached to a non-cluster node running haproxy. It has a backend that includes all (3) nodes in my cluster, mapping port 443 on the public IP to port 32443 on each of the nodes in the k3s cluster. It uses haproxy protocol to preserve the source IP address. On k3s I have a daemonset running ingress-nginx with a NodePort listing on port 32443 that maps to port 443 on each nginx pod. This NodePort service is what’s failing sometimes. If there’s a better way to achieve this, I’m all for it. But this setup worked great for a year until just recently, when the issue outlined above started happening.
Is there a way to debug what’s happening with the nodeport so I can try to learn why it’s failing sometimes?
c
not really. like I said it’s all just iptables rules managed by kube-proxy. There’s not really anything changing in there minute to minute, so your best bet is trying to capture the packets at various places and see what’s going on.
have you tried changing the externaltrafficpolicy to rule out issues with traffic going to pods on different nodes than the one you’re hitting the nodeport of?
n
The cluster policy is the only workable option for me, since most deployments can only run a single pod, which may/may not be on the node where ingress-nginx is running.
c
you didn’t say you were going through the ingress. you said you were going through the service nodeport to access the pod.
If you are going through the ingress then it doesn’t matter where the pod is for the workload, only matters where the ingress pod is running - since that’s what you’re hitting with the service nodeport.
So in reality your path is client -> haproxy -> ingress-ningx nodeport -> ingress-nginx pod -> workload pod
because the “service” you are hitting is the ingress-nginx service’s nodeport, NOT your workload’s service nodeport
and in that case changing the externaltrafficpolicy would only be a problem if ingress-nginx is not running on all the nodes
n
Your description of the path is correct; that’s what I was trying to convey.
c
ok so your logic about why you can’t change the policy is flawed then
by default the ingress-ningx nodeport -> ingress-nginx pod hop may be going between nodes. troubleshooting that will be much simpler if you change the policy so that traffic to the nodeport always hits the local ingress-nginx pod on that node. assuming you can run ingress-nginx on all nodes.
n
Ahh, now I understand. How do I do that?
c
change the policy in the ingress-nginx service spec
n
OK, let me try that.
c
you said you’re running ingress nginx as a daemonset, are you sure you’re accessing it at the service nodeport and just not at a nodeport on the pod itself?
n
Here’s what I’ve tried: • From the node,
curl
to the node’s IP address (fails as above) • From an ingress-nginx pod,
curl
to the cluster IP address (fails) • From an ingress-nginx pod,
curl
to the pod’s IP address (works) Changing the externalTrafficPolicy of the nodeport's service didn’t make any difference.
Well, strike that. It does seem to be working now when I
curl
to the node’s IP address from that node.
c
but are you curling the pod nodeport or the service nodeport on the node
you may have both, and which you’re doing makes a difference
I don’t understand why you’re doing things in the ingress-nginx pod. Are you trying to troubleshoot the connection between haproxy and nginx, or between nginx and your workload?
they are completely separate TCP connections. If you are seeing something timing out at the TLS handshake level, you need to figure out where that SPECIFIC handshake is, and troubleshoot htat.
n
I understand that there is a connection from the client to haproxy, another connection to ingress-nginx, and another connection to the pod. When I started, I didn’t know what was failing. By running curl from the node, I eliminated haproxy and the physical network. By running curl from inside the ingress-nginx pod to its own IP, I eliminated nginx itself and the backend pods that do the work. So what’s left is whatever networking magic is making the TCP connection from the node’s IP to ingress-nginx. Hopefully that wasn’t a misguided process.
c
right but the nginx pod nodeport or the service nodeport
n
What I would expect to happen is that iptables DNATs the TCP connection to the NodePort, routing it to the IP/port of the ingress-nginx pod running on that node. Which seems pretty simple, but afaict that’s what’s failing.
Whichever one is connected to the IP address of the node.
c
no
the question matters
if its the pod nodeport, then it will always hit the local pod
if its the service nodeport, then it may go to ANY pod in the cluster, even ones on other nodes. depending on the traffic policy.
and the path through the NAT layers ends up being VERY different
Are you hitting the pod nodeport on the node, or the service nodeport on the node.
n
That’s a great question. When I curl to the node’s IP address, which one is it going to? Because that’s the only one I care about.
(Re-reading your question, I’m pretty sure I’m hitting the service nodeport, since that’s the only one I’ve configured to use port 32488).
c
I’m not sure why you can’t answer that question. When you’re curling the node’s IP, what port are you using? and where did you get that port from?
Is it the port of the service, or the port of the pod
n
It’s the node of the service. The node has IP address 10.0.1.235, and the service configures a nodeport 32488 (see above link). I’m curling to 10.0.1.235:32488.
(there are more nodes, and the service configures a nodeport on each one).
c
no, the nodeports of that service are 32080 and 32443
Look at the pods. what ports are THEY using?
n
Sorry, dyslexia.
c
Can you show the yaml for one of the ingress-nginx daemonset pods?
c
can you show the yaml
(BTW thanks for all your time)
c
ok. I see node host ports in the pod port spec. So all you have is the service nodeport
With the external traffic policy set to local, traffic to the service nodeport will only ever go to the nginx pod on the node you’re accessing. it will not go to a pod on a different node.
👍 1
So then you can focus on identifying if it is just one node that does this, or all nodes, or what.
n
I’ve tried the curl command on each of the nodes, to each of the node’s own IP addresses. They all fail.
c
You can’t
curl -vks <https://NODE-IP:32443>
from the node itself?
n
I need to use
--connect-to $WEBSITE:443:NODE-IP:32443 --haproxy-protocol
so it actually works.
Without --haproxy-protocol there’s no hope, and if I use the --connect-to I get the right certificate.
c
ok but you said you’re just trying to diagnose a TLS handshake failure. or a connection timeout even.
n
Which is caused by a dropped connection that only happens when I connect through the nodeport. When I connect through the pod’s own IP address it works great.
c
yes but that bypasses lots and lots of things so it doesnt really help any
n
I know; it just confirms that nginx isn’t the problem.
Can you help me understand what the flow is when I connect to the nodeport?
c
If you’re troubleshooting basic connectivity I would probably disable all the haproxy stuff for now and try to just troubleshoot the TCP connection itself
See if you can get the connection to hang with
echo QUIT | openssl s_client -connect NODE-IP:32443 -servername WEBSITE
that way you are not doing ANYTHING other than testing the TLS connection
n
One time usually works. But here’s what happens when I try multiple times: https://bin.koehn.com/?d89850be142363f8#4gLrP81gWigd7QMQBi1iEvaiZAEaJT17JLw83Vj1FKpx
Again,
openssl
doesn’t speak proxy protocol, so that’s not going to work very well. In any case, I know proxy protocol isn’t the problem because I can use
curl
to successfully complete the connection from the pod itself.
c
ok so start doing some packet captures and see where you stop seeing traffic. is it not making it in to the pod? are the responses not making it out?
n
OK, I capture this tcpdump when I ran this command from the node to itself (the node is 10.0.1.236):
WEBSITE=<http://diaspora.koehn.com|diaspora.koehn.com> ; for i in $(seq 1 20) ; do curl -v --connect-to $WEBSITE:443:10.0.1.236:32443 --haproxy-protocol  https://$WEBSITE/ > /dev/null  ; done
I stopped it when the
for
loop hung.
Ah, poop. I need to remove it from haproxy. Stand by and I’ll run it again.
c
10.0.1.236.32443 > 10.0.1.1.12811: Flags [P.], cksum 0x21b3 (incorrect -> 0x784c)
you’re sure you disabled checksum offload on all the nodes?
n
Yes, I ran the ethtool command you sent earlier on all the nodes. I just checked my command history:
/usr/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off
c
I’d probably continue on down this path, trying to figure out where the packets are getting dropped. I can’t really tell from that text output what’s missing, I usually grab a capture file off all the interfaces I’m interested in and then pull it up in wireshark or something
but this seems pretty odd, I’ve not seen anything like this before so I suspect whatever is going on with your node is not going to be particularly easy to diagnose
I take it you’ve already upgraded everything to the greatest extent possible?
n
Yeah. Latest everything I can find.
alright, I’ll keep at it. Thanks again for all the help.
I eventually figured it out, BTW. The NodePort issue was masking the real problem because of the
externalTrafficPolicy
as you correctly pointed out. The actual problem was because one of the nodes had a too-low
kernel.threads-max
setting which sent nginx on a tailspin. So many thanks again; I think it would have taken much, much longer without all your help!
c
oh so after all that it was just nginx hanging on one of the nodes?
n
Yeah. The NodePort turned out to be a wild goose chase. As soon as I tweaked the threads of the one node and restarted nginx, it worked perfectly.
Still, I’m glad to have fixed the externalTrafficPolicy; I assumed that a NodePort in front of a DaemonSet would use Local by default.
One last aside… the incorrect checksums were not because of the ethtool setting. They’re because checksumming is offloaded to the NIC, and tcpdump cannot see it.
c
yeah that’s fair. there are bugs with vxlan packets getting dropped due to invalid checksums, thats why I asked
👍 1