This message was deleted.
# rke2
a
This message was deleted.
c
This sounds like an issue with Rancher, not with RKE2?
That said…
Copy code
ERROR: <https://rancher75182.senode.dev/ping> is not accessible (Could not resolve host: rancher75182.senode.dev)
this hostname suggest that you are using a private DNS zone for your Rancher server. Can you confirm that the resolv.conf file on your nodes is properly configured to point at that? Check the RKE2 logs for a message about using 8.8.8.8 instead of your private DNS server.
Host resolv.conf includes loopback or multicast nameservers - kubelet will use autogenerated resolv.conf with nameserver 8.8.8.8
s
yes. I can ping the rancher server from all nodes on the downstream cluster
c
^^ itll look like that
make sure that your resolv.conf doesn’t point at multicast or loopback resolvers
If you can’t resolve private hostnames from within pods, RKE2 falling back to 8.8.8.8 is the most likely reason why
s
mine only has the nameserver pointing to an ipaddr
everything else is commented out
Even rebooting the cluster VMs on a working cluster causes the same issue and the cluster never comes back up again. Same pods go into a CrashLoop
c
Is the name server not reachable from within pods? What do the coredns pod logs say?
s
coredns pods have the right nameserver and can access from there. Unable to exec omnt the pods which are in a crashloop
seems bazar that a working cluster is unable to establish connection with a rancher after the cluster VMs were rebooted
c
So just to confirm, the coredns pods can resolve the rancher server address but the rancher agent pods cannot? How did you test that it can be resolved by coredns?
s
Recreated a fresh cluster and collected more data. It seems on cluster create the cluster-agent pods successfully come up even though they are unable to ping rancher. Cluster also connects to rancher. One has an error on the certificate and the other cannot reach it. After rebooting the cluster, the cluster-agent pods go in a crashloop
also from my local machine or VMs, I don't see any issues with rancher ping:
Copy code
curl <https://rancher75182.senode.dev/ping>
pong
Also to answer your question regarding coredns, Yes. exec into coredns and curl google.com or my rancher instance and it works, but not from within the cluster-agents
coredns IPs are 10.42.247.129 and 10.42.213.131, but the cluster-agent resolv.conf has a nameserver 10.43.0.10. Is that correct?
c
yes, that is the dns service address. Can you show me how specifically you are testing resolution from within the coredns pods?
s
Just doing a curl as shown above.
c
You said you were doing that from the local machine or VM, not inside the coredns pod
s
Both. I also created a new cluster on the same server as Rancher and it seems to work fine. I am able to restart the cluster and it comes back up. I suspect it’s some networking issue on the Harvester server on which the cluster fails to come up
After a restart
c
hmm
222 Views