https://rancher.com/ logo
Title
m

miniature-advantage-78722

10/24/2022, 9:50 PM
Every time I provision a RKE1 cluster via Rancher 2.6 on Harvester the DNS is all screwy. I hate to ask here but I can't find anything online to help. When I run anything in kubectl or helm everything resolves to
52.128.23.153
and I get
dial tcp 52.128.23.153:443: connect: connection refused
as an error. I have configure the coredns system to use the upstream dns servers 8.8.8.8 and 1.1.1.1 but it still won't resolve. The underlying nodes hosting the cluster resolve fine and executing nslookup in busy box via the cluster shows proper DNS lookups. I'm at a loss
s

square-orange-60123

10/24/2022, 9:57 PM
is harvester running on baremetal?
m

miniature-advantage-78722

10/24/2022, 10:01 PM
Yup
Thanks for the time/help!
c

creamy-pencil-82913

10/24/2022, 11:26 PM
What is
52.128.23.153
in your environment? Do you have a wildcard DNS entry in one of your upstream DNS servers that is causing everything to resolve to that, when your domain is in the search list and you’re using your default DNS servers?
m

miniature-advantage-78722

10/24/2022, 11:31 PM
I have absolutely no idea where that IP came from. I have never seen it before today. As for the wildcard, I am pretty confident that I do not but I'm not opposed to checking a second time! Thanks again!
I modified the CoreDNS config to log the requests and got some info. Every log for the DNS adds to the address
[INFO] 127.0.0.1:36983 - 26588 "HINFO IN 9043960556748274845.2860016264471597811. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.05304183s
[INFO] 10.42.1.2:40354 - 61034 "A IN git.rancher.io.cattle-system.svc.cluster.local. udp 75 false 1232" NXDOMAIN qr,aa,rd 157 0.000456681s
[INFO] 10.42.1.2:59781 - 3982 "AAAA IN git.rancher.io.cattle-system.svc.cluster.local. udp 75 false 1232" NXDOMAIN qr,aa,rd 157 0.000542648s
[INFO] 10.42.1.2:34838 - 24788 "AAAA IN git.rancher.io.svc.cluster.local. udp 61 false 1232" NXDOMAIN qr,aa,rd 143 0.000203592s
[INFO] 10.42.1.2:37789 - 41381 "A IN git.rancher.io.svc.cluster.local. udp 61 false 1232" NXDOMAIN qr,aa,rd 143 0.000313163s
[INFO] 10.42.1.2:40261 - 27939 "A IN <http://git.rancher.io.sam.local.net|git.rancher.io.sam.local.net>. udp 58 false 1232" NOERROR qr,rd,ra 92 0.179959346s
[INFO] 10.42.1.2:50455 - 36080 "A IN <http://git.rancher.io.sam.local.net|git.rancher.io.sam.local.net>. udp 47 false 512" NOERROR qr,aa,rd,ra 92 0.000212161s
[INFO] 10.42.1.2:50455 - 15346 "AAAA IN <http://git.rancher.io.sam.local.net|git.rancher.io.sam.local.net>. udp 47 false 512" NOERROR qr,rd,ra 163 0.021115637s
[INFO] 10.42.1.2:57743 - 20876 "A IN git.rancher.io.svc.cluster.local. udp 61 false 1232" NXDOMAIN qr,aa,rd 143 0.000253306s
[INFO] 10.42.1.2:41940 - 24480 "AAAA IN git.rancher.io.cattle-system.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,aa,rd 157 0.000228955s
[INFO] 10.42.1.2:41940 - 29786 "A IN git.rancher.io.cattle-system.svc.cluster.local. udp 64 false 512" NXDOMAIN qr,aa,rd 157 0.000342861s
[INFO] 10.42.5.3:54668 - 42473 "A IN releases.rancher.com.cattle-system.svc.cluster.local. udp 70 false 512" NXDOMAIN qr,aa,rd 163 0.000413685s
[INFO] 10.42.5.3:54383 - 61391 "A IN <http://releases.rancher.com.sam.local.net|releases.rancher.com.sam.local.net>. udp 53 false 512" NOERROR qr,rd,ra 104 0.122573234s
[INFO] 10.42.5.3:36324 - 42486 "AAAA IN <http://releases.rancher.com.sam.local.net|releases.rancher.com.sam.local.net>. udp 53 false 512" NOERROR qr,rd,ra 175 0.198445833s
<http://sam.local.net|sam.local.net>
is the upstream firewalls local domain. So maybe an upstream DNS issue but weird nonetheless
That appending of the local domain is only when the execution is from the kubectl shell. If the DNS request comes from a pod the entry is correct:
[INFO] 10.42.7.5:49280 - 4 "A IN <http://releases.rancher.com|releases.rancher.com>. udp 38 false 512" NOERROR qr,rd,ra 276 0.079005532s
[INFO] 10.42.7.5:37887 - 6 "PTR IN 107.224.156.108.in-addr.arpa. udp 46 false 512" NOERROR qr,rd,ra 133 0.015905996s
Command to check pod dns:
kubectl run -it --rm --restart=Never busybox --image=busybox:1.28 -- nslookup <http://releases.rancher.com|releases.rancher.com>
This is the result of a bad domain name on the upstream OPNSense firewall. The domain name
<http://sam.local.net|sam.local.net>
caused
<http://local.net|local.net>
to register as the base name for DNS queries. Just a dumb mistake that was old enough and obscure enough to cause issues.
c

creamy-pencil-82913

10/25/2022, 5:27 PM
there ya go