This message was deleted.
# k3s
a
This message was deleted.
c
that looks to me like you’ve got bad content in your resolv.conf, but I can’t tell what it is
n
This is in the resolv.conf of the k3s node. Or would it be in the pod?
Copy code
nameserver 127.0.0.53
options edns0 trust-ad
search gmn.local
This is the one inside the pod:
Copy code
search gmn.local
nameserver 10.10.1.10
nameserver 1.1.1.1
nameserver 9.9.9.9
When I exec into the pod (added a sleep command before it crashed) and try to curl the rancher instance, I get an SSL error. Is it possible the self signed cert is causing the issue?
Copy code
cattle-cluster-agent-859976c8f-qxzrp:/var/lib/rancher # curl <https://rancher.gmn.local/ping>
curl: (60) SSL certificate problem: self signed certificate
More details here: <https://curl.haxx.se/docs/sslcerts.html>

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
It's odd how it's worked fine, but now isn't.
r
I've noticed that with newer Ubuntu and we fixed by editing the coredns config map to pass in valid DNS to get outside (the 127.0.0.53 that the host OS uses is invalid in the pod as all localhost goes to pod within pod). The weird thing is on the same distro with the same DNS config, sometimes it works without tweaks, other times it needs to coredns change and occasionally loses it and needs it set again (though the losing it we found was from something we were doing through argocd in prometheus config being a bit overly broad).
n
I have added the entry into coredns config map under the
customdomains.db
key and then mounted that volume which helped the resolution be 100% of the time, but for some reason after it resolves it, it shows that
parse error
.
r
I don't have a setup handy to check, but in ours we just added another DNS server IP to the right of another piece of the config.
n
I got this fixed. For some reason after a reboot of truenas on my LAN, it assumed the IP of the rancher server that the cluster agent was trying to connect to, so there was an issue when it tried to connect and got routed to the truenas server sometimes vs. the rancher server. I got here by manually mounting the SSL CA cert from rancher to the cattle-cluster-agent and removing the
CATTLE_CA_CHECKSUM
from the deployment config. Then watching the logs, it found that the SSL cert from the "rancher" server was not matching the one from the mounted path, which I knew was right. The CA cert it was getting from "rancher" was issued by XiSystems which I recognized as coming from the TrueNAS server, so after looking at the network interface config of truenas, it was also broadcasting on that IP, causing the overlap, even though I never configured it to have that IP. Once I removed it, the whole thing started working again.
r
Congrats on the find. IP collisions are things we assume don't happen and usually don't check for, but certainly can understandably hose network operations.
💯 1