This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

12/04/2023, 11:34 PM

This message was deleted.

creamy-pencil-82913

12/04/2023, 11:43 PM

that looks to me like you’ve got bad content in your resolv.conf, but I can’t tell what it is

nice-teacher-34779

12/05/2023, 12:40 PM

This is in the resolv.conf of the k3s node. Or would it be in the pod?

Copy code

nameserver 127.0.0.53
options edns0 trust-ad
search gmn.local

nice-teacher-34779

12/05/2023, 1:25 PM

This is the one inside the pod:

Copy code

search gmn.local
nameserver 10.10.1.10
nameserver 1.1.1.1
nameserver 9.9.9.9

nice-teacher-34779

12/05/2023, 1:27 PM

When I exec into the pod (added a sleep command before it crashed) and try to curl the rancher instance, I get an SSL error. Is it possible the self signed cert is causing the issue?

Copy code

cattle-cluster-agent-859976c8f-qxzrp:/var/lib/rancher # curl <https://rancher.gmn.local/ping>
curl: (60) SSL certificate problem: self signed certificate
More details here: <https://curl.haxx.se/docs/sslcerts.html>

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

nice-teacher-34779

12/05/2023, 1:27 PM

It's odd how it's worked fine, but now isn't.

rough-farmer-49135

12/05/2023, 2:36 PM

I've noticed that with newer Ubuntu and we fixed by editing the coredns config map to pass in valid DNS to get outside (the 127.0.0.53 that the host OS uses is invalid in the pod as all localhost goes to pod within pod). The weird thing is on the same distro with the same DNS config, sometimes it works without tweaks, other times it needs to coredns change and occasionally loses it and needs it set again (though the losing it we found was from something we were doing through argocd in prometheus config being a bit overly broad).

nice-teacher-34779

12/05/2023, 9:57 PM

I have added the entry into coredns config map under the

customdomains.db

key and then mounted that volume which helped the resolution be 100% of the time, but for some reason after it resolves it, it shows that

parse error

rough-farmer-49135

12/05/2023, 10:48 PM

I don't have a setup handy to check, but in ours we just added another DNS server IP to the right of another piece of the config.

nice-teacher-34779

12/08/2023, 1:02 AM

I got this fixed. For some reason after a reboot of truenas on my LAN, it assumed the IP of the rancher server that the cluster agent was trying to connect to, so there was an issue when it tried to connect and got routed to the truenas server sometimes vs. the rancher server. I got here by manually mounting the SSL CA cert from rancher to the cattle-cluster-agent and removing the

CATTLE_CA_CHECKSUM

from the deployment config. Then watching the logs, it found that the SSL cert from the "rancher" server was not matching the one from the mounted path, which I knew was right. The CA cert it was getting from "rancher" was issued by XiSystems which I recognized as coming from the TrueNAS server, so after looking at the network interface config of truenas, it was also broadcasting on that IP, causing the overlap, even though I never configured it to have that IP. Once I removed it, the whole thing started working again.

rough-farmer-49135

12/08/2023, 1:23 AM

Congrats on the find. IP collisions are things we assume don't happen and usually don't check for, but certainly can understandably hose network operations.

💯 1

3 Views

Open in Slack

Previous Next