adamant-kite-43734
04/04/2023, 6:47 AMcreamy-pencil-82913
04/04/2023, 7:18 AMnutritious-oxygen-89191
04/05/2023, 7:24 AMfirewalld
and iptables
. Besides the DNS test mentioned above I also tried the [Overlay test](https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/networking) - both still fail. Is there a k3s specific troubleshooting guide that I can follow?creamy-pencil-82913
04/05/2023, 9:58 AMnutritious-oxygen-89191
04/05/2023, 11:26 AMapt purge iptables
to make sure. ufw is inactive. and it seems there is also an issue with the metrics-server which fails with panic: failed to create listener: failed to listen on 0.0.0.0:10250: listen tcp 0.0.0.0:10250: bind: address already in use
something is wrong with my networking, but I am not sure where to start troubleshootingcreamy-pencil-82913
04/05/2023, 3:13 PMnutritious-oxygen-89191
04/06/2023, 7:09 AMhostnetwork: true
. the DNS and overlay network issue persistsnutritious-oxygen-89191
04/06/2023, 7:46 AMoverlaytest
step by step and if I do kubectl --request-timeout='10s' exec overlaytest-dbn8m -c overlaytest -- /bin/sh -c "ping -c2 gi-rm1 > /dev/null 2>&1"
I will get Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not 217.160.45.186
nutritious-oxygen-89191
04/06/2023, 7:56 AMjessie-dnsutils:1.3
image to run the ping command from therenutritious-oxygen-89191
04/06/2023, 7:57 AMkubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 gi-rm1"
nutritious-oxygen-89191
04/06/2023, 8:00 AMgi-rm1
fails with
PING gi-rm1 (<http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>) 56(84) bytes of data.
--- gi-rm1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1009ms
command terminated with exit code
ping by IP works works for some nodes and fails for others. my guess is the name resolution does not worknutritious-oxygen-89191
04/06/2023, 8:12 AMgi-rm0
, gi-rm1
, and gi-rm2
. the worker is geo-node1.
ping by IP among the 3 manager nodes works, ping by name does not. ping from manager to worker node by IP fails, but works by name kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 geo-node1"
. ping from worker to manager by name returns e.g. unknown host gi-rm0
and fails with 100% packet loss
when using IP.nutritious-oxygen-89191
04/06/2023, 8:45 AMnslookup
is like that:
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
Name: <http://www.google.com|www.google.com>
Address: 142.250.185.164
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box1-78c59894bc-5cztq -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
Name: <http://www.google.com|www.google.com>
Address: 142.250.185.164
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box2-5d49b9d674-jv2pg -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not <http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>
$ kubectl --request-timeout='10s' exec -n whatwhatwhy geo-box1-74ccd9cfbd-wj5d5 -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
;; connection timed out; no servers could be reached
command terminated with exit code 1