This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

04/04/2023, 6:47 AM

This message was deleted.

creamy-pencil-82913

04/04/2023, 7:18 AM

If you're talking about k3s, you should probably read the k3s docs: https://docs.k3s.io/installation/requirements#networking

nutritious-oxygen-89191

04/05/2023, 7:24 AM

Thanks, I went back to the k3s docs and resolved a couple of issues related to

firewalld

and

iptables

. Besides the DNS test mentioned above I also tried the [Overlay test](https://ranchermanager.docs.rancher.com/troubleshooting/other-troubleshooting-tips/networking) - both still fail. Is there a k3s specific troubleshooting guide that I can follow?

creamy-pencil-82913

04/05/2023, 9:58 AM

have you tried just disabling firewalld and any custom iptables drop/reject rules you have in place just to confirm that it’s not something in your FW config?

nutritious-oxygen-89191

04/05/2023, 11:26 AM

yep. I even did

apt purge iptables

to make sure. ufw is inactive. and it seems there is also an issue with the metrics-server which fails with

panic: failed to create listener: failed to listen on 0.0.0.0:10250: listen tcp 0.0.0.0:10250: bind: address already in use

something is wrong with my networking, but I am not sure where to start troubleshooting

creamy-pencil-82913

04/05/2023, 3:13 PM

Hmm, that is odd. Metrics-server doesn't run with host network so there shouldn't be anything conflicting with that port

nutritious-oxygen-89191

04/06/2023, 7:09 AM

I resolved the issue with the metrics server. for some reason it was set to

hostnetwork: true

. the DNS and overlay network issue persists

nutritious-oxygen-89191

04/06/2023, 7:46 AM

I tried to go trough the

overlaytest

step by step and if I do

kubectl --request-timeout='10s' exec overlaytest-dbn8m -c overlaytest -- /bin/sh -c "ping -c2 gi-rm1 > /dev/null 2>&1"

I will get

Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not 217.160.45.186

nutritious-oxygen-89191

04/06/2023, 7:56 AM

Independent of the overlay test I have created 4 pods (one on each node) based on the

jessie-dnsutils:1.3

image to run the ping command from there

nutritious-oxygen-89191

04/06/2023, 7:57 AM

my pods are called box0 etc.

kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 gi-rm1"

nutritious-oxygen-89191

04/06/2023, 8:00 AM

ping by hostname

gi-rm1

fails with

Copy code

PING gi-rm1 (<http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>) 56(84) bytes of data.

--- gi-rm1 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1009ms

command terminated with exit code

ping by IP works works for some nodes and fails for others. my guess is the name resolution does not work

nutritious-oxygen-89191

04/06/2023, 8:12 AM

correction. the three manager nodes are

gi-rm0

gi-rm1

, and

gi-rm2

. the worker is

geo-node1.

ping by IP among the 3 manager nodes works, ping by name does not. ping from manager to worker node by IP fails, but works by name

kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "ping -c2 geo-node1"

. ping from worker to manager by name returns e.g.

unknown host gi-rm0

and fails with

100% packet loss

when using IP.

nutritious-oxygen-89191

04/06/2023, 8:45 AM

nslookup

is like that:

Copy code

$ kubectl --request-timeout='10s' exec -n whatwhatwhy box0-5675664f8d-ljwvf -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server:		10.43.0.10
Address:	10.43.0.10#53

Non-authoritative answer:
Name:	<http://www.google.com|www.google.com>
Address: 142.250.185.164

Copy code

$ kubectl --request-timeout='10s' exec -n whatwhatwhy box1-78c59894bc-5cztq -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Server:		10.43.0.10
Address:	10.43.0.10#53

Non-authoritative answer:
Name:	<http://www.google.com|www.google.com>
Address: 142.250.185.164

Copy code

$ kubectl --request-timeout='10s' exec -n whatwhatwhy box2-5d49b9d674-jv2pg -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
Error from server: error dialing backend: x509: certificate is valid for 127.0.0.1, not <http://XXX.XXX.XXX.XXX|XXX.XXX.XXX.XXX>

Copy code

$ kubectl --request-timeout='10s' exec -n whatwhatwhy geo-box1-74ccd9cfbd-wj5d5 -- /bin/sh -c "nslookup <http://www.google.com|www.google.com>"
;; connection timed out; no servers could be reached

command terminated with exit code 1

182 Views

Open in Slack

Previous Next