https://rancher.com/ logo
Title
b

bright-jordan-61721

10/03/2022, 3:28 PM
Recently built a k3s cluster with version
v1.24.6+k3s1
and I have some pods configured with
dnsPolicy: ClusterFirst
(which is the default) and noticing weird DNS resolution problems. When I shell into a pod with this dns policy and
cat /etc/resolv.conf
this is what I see:
bash-5.1# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local [home search domains redacted]
nameserver 10.43.0.10
options ndots:5
I believe ndots:5 is causing the problem, as
ping <http://github.com|github.com>
fails due to dns resolution, but
ping <http://github.com|github.com>.
works instead. Why is k3s setting the ndots:5 option by default? I’m not setting this with the pod’s dnsConfig at all. If this option were removed or reduced to ndots:1 it would likely solve my issue.
b

bland-account-99790

10/03/2022, 4:27 PM
ndots:5
is a Kubernetes default parameter, i.e. it is not set by k3s. You can change it though https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-dns-config
However, it should work with
ndots:5
Can you check that github.com+your_search_domains does not exist? Check: • github.com.default.svc.cluster.local • github.com.svc.cluster.local • etc When you try
<http://github.com|github.com>.
you are telling your OS that the string is already a FQDN and thus it does not need to add any search domain
I suspect
<http://github.com|github.com>.$OneOfYourSearchDomains[[home search domains redacted]
returns an IP from a server which does not reply to pings
Something similar happened to this user https://github.com/k3s-io/k3s/issues/5045
k

kind-nightfall-56861

10/03/2022, 7:44 PM
I cannot explain that behaviour, but I have had a somewhat similar example, where pings would most of the time simply fail, and some times it would succeed, albeit with a massive latency on the line. Turns out that my dns pod wasn't running properly, and along side many other system pods that were in a malfunctioning state, so I deleted them all and let k3s recreate them to resolve the issue.
b

bright-jordan-61721

10/04/2022, 3:15 AM
ok so I get this:
bash-5.1# nslookup <http://github.com|github.com>
Server:         10.43.0.10
Address:        10.43.0.10#53

*** Can't find <http://github.com.private.home.jtcressy.net|github.com.private.home.jtcressy.net>: No answer
but this makes no sense, except that there are exactly 5 dots in that FQDN it stopped with. and this is the first search domain after cluster.local
Hmm, I think the problem is that some DNS somewhere up my chain is returning with a NOERROR for
<http://github.com.private.home.jtcressy.net|github.com.private.home.jtcressy.net>
instead of NXDOMAIN since when I dig
<http://github.com|github.com>.svc.cluster.local
it will give me NXDOMAIN, and this must be the response needed for the resolver to query the next search domain
LOL so the problem is that I had CAA records with wildcards e.g.
*.<http://home.jtcressy.net|home.jtcressy.net> in CAA
As soon as I deleted them I get NXDOMAIN instead of NOERROR I hate DNS!
🤣 1
my base domain, jtcressy.net is hosted in clouddns
and instead of CNAME records shadowing things, it was CAA records
k

kind-nightfall-56861

10/04/2022, 7:25 AM
Nice, good to hear you managed to resolve it 😄
b

bland-account-99790

10/04/2022, 12:27 PM
Yes! Good work @bright-jordan-61721, DNS can be complex