This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

07/05/2022, 8:03 PM

This message was deleted.

creamy-pencil-82913

07/05/2022, 8:12 PM

I have no idea what 9135 is, that’s not a port that k3s uses by default. What have you deployed that’s using that port?

billowy-needle-49036

07/05/2022, 8:12 PM

9153 is coredns

creamy-pencil-82913

07/05/2022, 8:13 PM

oh ok sure. but coredns is normally accessed on port 53 within the cluster, not external to it.

creamy-pencil-82913

07/05/2022, 8:14 PM

You’re trying to access the coredns metrics port, by hitting that same port on the node running the pod?

billowy-needle-49036

07/05/2022, 8:15 PM

bang% ps ax | grep coredns 1437753 ? Ssl 0:03 /coredns -conf /etc/coredns/Corefile

billowy-needle-49036

07/05/2022, 8:15 PM

bang% curl http://10.42.0.4:9153/ 404 page not found

billowy-needle-49036

07/05/2022, 8:15 PM

yeah i see how that's not a good test

creamy-pencil-82913

07/05/2022, 8:16 PM

yes, that 10.42.0.4 address is the pod IP. coredns doesn’t run with host network, so there’s no reason that you’d be able to hit that same port on the node IP.

creamy-pencil-82913

07/05/2022, 8:16 PM

I’m confused and not sure what you’re trying to accomplish

billowy-needle-49036

07/05/2022, 8:17 PM

i had ping failures from pods on agent nodes to server node

billowy-needle-49036

07/05/2022, 8:17 PM

so i'm rebuilding carefully and trying to find where that goes wrong

billowy-needle-49036

07/05/2022, 8:17 PM

bang% kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 21m registry LoadBalancer 10.43.224.187 10.2.0.1,10.2.0.1 5000:32488/TCP 11s dash(pts/13):/home/drewp# curl http://10.2.0.1:5000/v2 <a href="/v2/">Moved Permanently</a>.

billowy-needle-49036

07/05/2022, 8:18 PM

^ this is encouraging

billowy-needle-49036

07/05/2022, 8:26 PM

ok here's a second node: dash(pts/13):/home/drewp# curl https://get.k3s.io/ | INSTALL_K3S_VERSION="v1.23.6+k3s1" K3S_TOKEN=... K3S_URL=https://10.2.0.1:6443 sh - then bang% kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- ping -c 2 10.43.0.10 which happens to choose the agent node

dash

From 10.42.2.3 icmp_seq=1 Destination Host Unreachable From 10.42.2.3 icmp_seq=2 Destination Host Unreachable

creamy-pencil-82913

07/05/2022, 8:35 PM

you’re pinging service IPs. That’s not guaranteed to work. ClusterIP services exist via port-specific iptables rules managed by kube-proxy. The only thing that you can count on with ClusterIP services is that traffic to the published port will be routed to a backend pod on the target port.

creamy-pencil-82913

07/05/2022, 8:38 PM

That 10.43.0.10 IP address doesnt exist anywhere except as the target of some iptables rules, there’s nothing to respond to an ICMP ping request

billowy-needle-49036

07/05/2022, 8:54 PM

i understand! now i wonder how many past tests were bogus

billowy-needle-49036

07/06/2022, 2:57 AM

here's an example failure: 🛁 bang(pts/3):/my/serv/href% telnet 10.43.35.177 27017 Trying 10.43.35.177... Connected to 10.43.35.177. Escape character is '^]'. ^] (this is a succesful connection to mongodb) Here, netshoot is running on a different agent and fails to connect: tmp-shell  ~  telnet 10.43.35.177 27017 telnet: can't connect to remote host (10.43.35.177): Host is unreachable tmp-shell  ~  telnet 10.42.0.34 27017 telnet: can't connect to remote host (10.42.0.34): Host is unreachable So i pin the service to run on server node and it works 😞

billowy-needle-49036

07/06/2022, 3:00 AM

tmp-shell  ~  ip a 1: lo: <LOOPBACK,UP,LOWER_UP> ... 2: eth0@if215: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether e275a9💿05:6a brd ffffffffff:ff link-netnsid 0 inet 10.42.2.26/24 brd 10.42.2.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80:e075a9fffecd56a/64 scope link valid_lft forever preferred_lft forever tmp-shell  ~  ip route default via 10.42.2.1 dev eth0 10.42.0.0/16 via 10.42.2.1 dev eth0 10.42.2.0/24 dev eth0 proto kernel scope link src 10.42.2.26

creamy-pencil-82913

07/06/2022, 5:29 PM

it sounds like there is some issue with the overlay network between nodes. Some networks don’t properly handle vxlan packets between nodes, and I have heard that some distros are still shipping kernels that corrupt vxlan packets when ip tx checksum offload is enabled.

billowy-needle-49036

07/06/2022, 5:46 PM

This breakage may have coincided with Ubuntu 21.10 - >22.04

billowy-needle-49036

07/06/2022, 5:47 PM

We all need better debug tools 🙁

billowy-needle-49036

07/06/2022, 5:48 PM

I tried host-gw a bit but couldn't get better results

38 Views

Open in Slack

Previous Next