adamant-kite-43734
04/04/2023, 6:47 AMcreamy-pencil-82913
04/04/2023, 7:18 AMnutritious-oxygen-89191
04/05/2023, 7:24 AMfirewalld and iptables. Besides the DNS test mentioned above I also tried the [Overlay test](https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/networking) - both still fail. Is there a k3s specific troubleshooting guide that I can follow?creamy-pencil-82913
04/05/2023, 9:58 AMnutritious-oxygen-89191
04/05/2023, 11:26 AMapt purge iptables to make sure. ufw is inactive. and it seems there is also an issue with the metrics-server which fails with panic: failed to create listener: failed to listen on 0.0.0.0:10250: listen tcp 0.0.0.0:10250: bind: address already in use something is wrong with my networking, but I am not sure where to start troubleshootingcreamy-pencil-82913
04/05/2023, 3:13 PMnutritious-oxygen-89191
04/06/2023, 7:09 AMhostnetwork: true . the DNS and overlay network issue persistsnutritious-oxygen-89191
04/06/2023, 7:46 AMoverlaytest step by step and if I do kubectl --request-timeout='10s' exec overlaytest-dbn8m -c overlaytest -- /bin/sh -c "ping -c2 gi-rm1 > /dev/null 2>&1" I will get Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not 217.160.45.186nutritious-oxygen-89191
04/06/2023, 7:56 AMjessie-dnsutils:1.3 image to run the ping command from therenutritious-oxygen-89191
04/06/2023, 7:57 AMkubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 gi-rm1"nutritious-oxygen-89191
04/06/2023, 8:00 AMgi-rm1 fails with
PING gi-rm1 (<http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>) 56(84) bytes of data.
--- gi-rm1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1009ms
command terminated with exit code
ping by IP works works for some nodes and fails for others. my guess is the name resolution does not worknutritious-oxygen-89191
04/06/2023, 8:12 AMgi-rm0, gi-rm1, and gi-rm2. the worker is geo-node1. ping by IP among the 3 manager nodes works, ping by name does not. ping from manager to worker node by IP fails, but works by name kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 geo-node1" . ping from worker to manager by name returns e.g. unknown host gi-rm0 and fails with 100% packet loss when using IP.nutritious-oxygen-89191
04/06/2023, 8:45 AMnslookup is like that:
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
Name: <http://www.google.com|www.google.com>
Address: 142.250.185.164
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box1-78c59894bc-5cztq -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server: 10.43.0.10
Address: 10.43.0.10#53
Non-authoritative answer:
Name: <http://www.google.com|www.google.com>
Address: 142.250.185.164
$ kubectl --request-timeout='10s' exec -n whatwhatwhy box2-5d49b9d674-jv2pg -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not <http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>
$ kubectl --request-timeout='10s' exec -n whatwhatwhy geo-box1-74ccd9cfbd-wj5d5 -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
;; connection timed out; no servers could be reached
command terminated with exit code 1