This message was deleted.
# k3s
a
This message was deleted.
c
If you're talking about k3s, you should probably read the k3s docs: https://docs.k3s.io/installation/requirements#networking
n
Thanks, I went back to the k3s docs and resolved a couple of issues related to
firewalld
and
iptables
. Besides the DNS test mentioned above I also tried the [Overlay test](https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/networking) - both still fail. Is there a k3s specific troubleshooting guide that I can follow?
c
have you tried just disabling firewalld and any custom iptables drop/reject rules you have in place just to confirm that it’s not something in your FW config?
n
yep. I even did
apt purge iptables
to make sure. ufw is inactive. and it seems there is also an issue with the metrics-server which fails with
panic: failed to create listener: failed to listen on 0.0.0.0:10250: listen tcp 0.0.0.0:10250: bind: address already in use
something is wrong with my networking, but I am not sure where to start troubleshooting
c
Hmm, that is odd. Metrics-server doesn't run with host network so there shouldn't be anything conflicting with that port
n
I resolved the issue with the metrics server. for some reason it was set to
hostnetwork: true
. the DNS and overlay network issue persists
I tried to go trough the
overlaytest
step by step and if I do
kubectl --request-timeout='10s' exec overlaytest-dbn8m -c overlaytest -- /bin/sh -c "ping -c2 gi-rm1 > /dev/null 2>&1"
I will get
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not 217.160.45.186
Independent of the overlay test I have created 4 pods (one on each node) based on the
jessie-dnsutils:1.3
image to run the ping command from there
my pods are called box0 etc.
kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 gi-rm1"
ping by hostname
gi-rm1
fails with
Copy code
PING gi-rm1 (<http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>) 56(84) bytes of data.

--- gi-rm1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1009ms

command terminated with exit code
ping by IP works works for some nodes and fails for others. my guess is the name resolution does not work
correction. the three manager nodes are
gi-rm0
,
gi-rm1
, and
gi-rm2
. the worker is
geo-node1.
ping by IP among the 3 manager nodes works, ping by name does not. ping from manager to worker node by IP fails, but works by name
kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 geo-node1"
. ping from worker to manager by name returns e.g.
unknown host gi-rm0
and fails with
100% packet loss
when using IP.
nslookup
is like that:
Copy code
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server:		10.43.0.10
Address:	10.43.0.10#53

Non-authoritative answer:
Name:	<http://www.google.com|www.google.com>
Address: 142.250.185.164
Copy code
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box1-78c59894bc-5cztq -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server:		10.43.0.10
Address:	10.43.0.10#53

Non-authoritative answer:
Name:	<http://www.google.com|www.google.com>
Address: 142.250.185.164
Copy code
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box2-5d49b9d674-jv2pg -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not <http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>
Copy code
$ kubectl --request-timeout='10s' exec -n whatwhatwhy geo-box1-74ccd9cfbd-wj5d5 -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
;; connection timed out; no servers could be reached

command terminated with exit code 1
174 Views