https://rancher.com/ logo
#k3s
Title
b

billowy-needle-49036

07/05/2022, 8:03 PM
Copy code
curl <https://get.k3s.io/> | INSTALL_K3S_VERSION="v1.23.6+k3s1" INSTALL_K3S_EXEC="--disable=traefik --node-ip 10.2.0.1" sh -
curl <http://10.42.0.4:9153>
works (gives 404, ok). But on a neighbor host (10.2.0.210),
curl <http://10.2.0.1:9153>
is connection-refused. Shouldn't that work? where do i debug next?
c

creamy-pencil-82913

07/05/2022, 8:12 PM
I have no idea what 9135 is, that’s not a port that k3s uses by default. What have you deployed that’s using that port?
b

billowy-needle-49036

07/05/2022, 8:12 PM
9153 is coredns
c

creamy-pencil-82913

07/05/2022, 8:13 PM
oh ok sure. but coredns is normally accessed on port 53 within the cluster, not external to it.
You’re trying to access the coredns metrics port, by hitting that same port on the node running the pod?
b

billowy-needle-49036

07/05/2022, 8:15 PM
bang% ps ax | grep coredns 1437753 ? Ssl 0:03 /coredns -conf /etc/coredns/Corefile
bang% curl http://10.42.0.4:9153/ 404 page not found
yeah i see how that's not a good test
c

creamy-pencil-82913

07/05/2022, 8:16 PM
yes, that 10.42.0.4 address is the pod IP. coredns doesn’t run with host network, so there’s no reason that you’d be able to hit that same port on the node IP.
I’m confused and not sure what you’re trying to accomplish
b

billowy-needle-49036

07/05/2022, 8:17 PM
i had ping failures from pods on agent nodes to server node
so i'm rebuilding carefully and trying to find where that goes wrong
bang% kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 21m registry LoadBalancer 10.43.224.187 10.2.0.1,10.2.0.1 5000:32488/TCP 11s dash(pts/13):/home/drewp# curl http://10.2.0.1:5000/v2 <a href="/v2/">Moved Permanently</a>.
^ this is encouraging
ok here's a second node: dash(pts/13):/home/drewp# curl https://get.k3s.io/ | INSTALL_K3S_VERSION="v1.23.6+k3s1" K3S_TOKEN=... K3S_URL=https://10.2.0.1:6443 sh - then bang% kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- ping -c 2 10.43.0.10 which happens to choose the agent node
dash
From 10.42.2.3 icmp_seq=1 Destination Host Unreachable From 10.42.2.3 icmp_seq=2 Destination Host Unreachable
c

creamy-pencil-82913

07/05/2022, 8:35 PM
you’re pinging service IPs. That’s not guaranteed to work. ClusterIP services exist via port-specific iptables rules managed by kube-proxy. The only thing that you can count on with ClusterIP services is that traffic to the published port will be routed to a backend pod on the target port.
That 10.43.0.10 IP address doesnt exist anywhere except as the target of some iptables rules, there’s nothing to respond to an ICMP ping request
b

billowy-needle-49036

07/05/2022, 8:54 PM
i understand! now i wonder how many past tests were bogus
here's an example failure: 🛁 bang(pts/3):/my/serv/href% telnet 10.43.35.177 27017 Trying 10.43.35.177... Connected to 10.43.35.177. Escape character is '^]'. ^] (this is a succesful connection to mongodb) Here, netshoot is running on a different agent and fails to connect: tmp-shell  ~  telnet 10.43.35.177 27017 telnet: can't connect to remote host (10.43.35.177): Host is unreachable tmp-shell  ~  telnet 10.42.0.34 27017 telnet: can't connect to remote host (10.42.0.34): Host is unreachable So i pin the service to run on server node and it works 😞
tmp-shell  ~  ip a 1: lo: <LOOPBACK,UP,LOWER_UP> ... 2: eth0@if215: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default link/ether e275a9💿05:6a brd ffffffffff:ff link-netnsid 0 inet 10.42.2.26/24 brd 10.42.2.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80:e075a9fffecd56a/64 scope link valid_lft forever preferred_lft forever tmp-shell  ~  ip route default via 10.42.2.1 dev eth0 10.42.0.0/16 via 10.42.2.1 dev eth0 10.42.2.0/24 dev eth0 proto kernel scope link src 10.42.2.26
c

creamy-pencil-82913

07/06/2022, 5:29 PM
it sounds like there is some issue with the overlay network between nodes. Some networks don’t properly handle vxlan packets between nodes, and I have heard that some distros are still shipping kernels that corrupt vxlan packets when ip tx checksum offload is enabled.
b

billowy-needle-49036

07/06/2022, 5:46 PM
This breakage may have coincided with Ubuntu 21.10 - >22.04
We all need better debug tools 🙁
I tried host-gw a bit but couldn't get better results
33 Views