This message was deleted.
# general
a
This message was deleted.
q
the rancher ui is at rancher.ourdomain.net
there is like a LOT of those logs
the error exists also for different versions of the name, with a few of our internal domains. But it also shows an error for the domain that should work. (rancher.ourdomain.net) But the resolution works fine for any pod in the same cluster.
Copy code
kubectl exec -i -t dnsutils -- nslookup <http://rancher.ourdomain.net|rancher.ourdomain.net>
Server:		10.152.183.10
Address:	10.152.183.10#53

Name:	<http://rancher.ourdomain.net|rancher.ourdomain.net>
Address: 172.20.16.21
coredns doesn't have any fancy configuration AFAIK. Mainly forwarding to our internal DNS server running outside the k8s clusters.
forward .  172.20.16.1
So multiple questions: • why/what is it happening here? • how does it know about all those other zones like the GCP names? it's not in any config of the cluster. • why is it even complaining about the domain that should work?
We also don't use IPv6, so the AAAA is useless.
c
what do you mean “how does it know about them”? It doesn’t know anything. it just answers requests that clients make.
q
I mean it's related to rancher, so I mean how is whatever rancher related knows about those
c
also, you can’t stop clients from doing AAAA lookups, they are going to do whatever they want
q
those additional domains are in the /etc/resolv.conf of the hosts on GCP
Yeah I get that about the requests coming in. I guess my main question is what is doing those request about rancher. It's only about rancher
c
so if you have
ndots:5
it will check search domains for anything with less than 5 dots.
<http://rancher.ourdomain.net|rancher.ourdomain.net>
has only 2 dots so it will check all your search domains before trying just
<http://rancher.ourdomain.net|rancher.ourdomain.net>
q
right
c
172.20.16.1:53
is the google-provided upstream DNS server for your VPC. It is complaining that it is timing out when resolving that address via the google upstream.
q
that server is local on our on-premise network
it's accessible from all nodes on-prem or on GCP
c
OK, thats a weird hybrid setup but ok. The requests from coredns to that upstream, whatever it is, are timing out. it’s UDP, so maybe you’re seeing UDP traffic get dropped?
q
yeah I agree it's unusual 🙂
I'll check if that's some issues with UDP.
though it happens only with the rancher name in the logs...
c
why are you digging into this? is there an actual problem you’re trying to solve?
q
and I can make DNS queries from the node sin GCP no problem
I have other possible transient DNS issues so I looked into the logs
c
welp
Are you only seeing errors if the coredns pods run on-prem, or ?
q
haven't tried to move coredns pod on-prem
I'll add that to the list of test. good idea
coredns runs in the GCP nodes, all kube-system stuff is unless they are daemonsets.
but I would expect to see a lot of errors, or even no DNS resolution working at all if that were a networking issue.
c
If you’re sending UDP DNS traffic from on-prem to cloud, or vice versa, it could be getting dropped when the link is busy
q
I don't think that packet drop of the UDP packet, specifically of rancher related request. but who knows 🙂 Thanks for helping bouncing ideas. I'll dig more and report.
Now that I see the timestamps, it correlates with other network downtime events .... moving on 🙂