This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

03/20/2024, 6:18 PM

This message was deleted.

quiet-area-89381

03/20/2024, 6:22 PM

the rancher ui is at rancher.ourdomain.net

quiet-area-89381

03/20/2024, 6:26 PM

there is like a LOT of those logs

quiet-area-89381

03/20/2024, 6:32 PM

the error exists also for different versions of the name, with a few of our internal domains. But it also shows an error for the domain that should work. (rancher.ourdomain.net) But the resolution works fine for any pod in the same cluster.

Copy code

kubectl exec -i -t dnsutils -- nslookup <http://rancher.ourdomain.net|rancher.ourdomain.net>
Server:		10.152.183.10
Address:	10.152.183.10#53

Name:	<http://rancher.ourdomain.net|rancher.ourdomain.net>
Address: 172.20.16.21

quiet-area-89381

03/20/2024, 6:38 PM

coredns doesn't have any fancy configuration AFAIK. Mainly forwarding to our internal DNS server running outside the k8s clusters.

forward .  172.20.16.1

quiet-area-89381

03/20/2024, 6:40 PM

So multiple questions: • why/what is it happening here? • how does it know about all those other zones like the GCP names? it's not in any config of the cluster. • why is it even complaining about the domain that should work?

quiet-area-89381

03/20/2024, 6:45 PM

We also don't use IPv6, so the AAAA is useless.

creamy-pencil-82913

03/20/2024, 6:50 PM

what do you mean “how does it know about them”? It doesn’t know anything. it just answers requests that clients make.

quiet-area-89381

03/20/2024, 6:50 PM

I mean it's related to rancher, so I mean how is whatever rancher related knows about those

creamy-pencil-82913

03/20/2024, 6:51 PM

also, you can’t stop clients from doing AAAA lookups, they are going to do whatever they want

quiet-area-89381

03/20/2024, 6:51 PM

those additional domains are in the /etc/resolv.conf of the hosts on GCP

quiet-area-89381

03/20/2024, 6:51 PM

Yeah I get that about the requests coming in. I guess my main question is what is doing those request about rancher. It's only about rancher

creamy-pencil-82913

03/20/2024, 6:52 PM

so if you have

ndots:5

it will check search domains for anything with less than 5 dots.

<http://rancher.ourdomain.net|rancher.ourdomain.net>

has only 2 dots so it will check all your search domains before trying just

<http://rancher.ourdomain.net|rancher.ourdomain.net>

quiet-area-89381

03/20/2024, 6:52 PM

right

creamy-pencil-82913

03/20/2024, 6:53 PM

172.20.16.1:53

is the google-provided upstream DNS server for your VPC. It is complaining that it is timing out when resolving that address via the google upstream.

quiet-area-89381

03/20/2024, 6:53 PM

that server is local on our on-premise network

quiet-area-89381

03/20/2024, 6:54 PM

it's accessible from all nodes on-prem or on GCP

creamy-pencil-82913

03/20/2024, 6:55 PM

OK, thats a weird hybrid setup but ok. The requests from coredns to that upstream, whatever it is, are timing out. it’s UDP, so maybe you’re seeing UDP traffic get dropped?

quiet-area-89381

03/20/2024, 6:55 PM

yeah I agree it's unusual 🙂

quiet-area-89381

03/20/2024, 6:56 PM

I'll check if that's some issues with UDP.

quiet-area-89381

03/20/2024, 6:56 PM

though it happens only with the rancher name in the logs...

creamy-pencil-82913

03/20/2024, 6:56 PM

why are you digging into this? is there an actual problem you’re trying to solve?

quiet-area-89381

03/20/2024, 6:56 PM

and I can make DNS queries from the node sin GCP no problem

quiet-area-89381

03/20/2024, 6:56 PM

I have other possible transient DNS issues so I looked into the logs

creamy-pencil-82913

03/20/2024, 6:57 PM

welp

creamy-pencil-82913

03/20/2024, 6:57 PM

Are you only seeing errors if the coredns pods run on-prem, or ?

quiet-area-89381

03/20/2024, 7:00 PM

haven't tried to move coredns pod on-prem

quiet-area-89381

03/20/2024, 7:00 PM

I'll add that to the list of test. good idea

quiet-area-89381

03/20/2024, 7:01 PM

coredns runs in the GCP nodes, all kube-system stuff is unless they are daemonsets.

quiet-area-89381

03/20/2024, 7:04 PM

but I would expect to see a lot of errors, or even no DNS resolution working at all if that were a networking issue.

creamy-pencil-82913

03/20/2024, 7:05 PM

If you’re sending UDP DNS traffic from on-prem to cloud, or vice versa, it could be getting dropped when the link is busy

quiet-area-89381

03/20/2024, 7:07 PM

I don't think that packet drop of the UDP packet, specifically of rancher related request. but who knows 🙂 Thanks for helping bouncing ideas. I'll dig more and report.

quiet-area-89381

03/20/2024, 7:18 PM

Now that I see the timestamps, it correlates with other network downtime events .... moving on 🙂

4 Views

Open in Slack

Previous Next