https://rancher.com/ logo
Title
f

future-fountain-82544

05/24/2023, 1:57 PM
I'm having problems getting a K3s cluster (1.23.6+k3s1) connected to a Rancher (2.7.1) instance. It was previously attached to an older Rancher instance, was Rancher instance using the Rancher UI, then added to the new cluster (via the UI). When adding it to the new cluster, everything seems to go smoothly except the ClusterAgent never fully connects. When I look at the logs for the cluster-agent pod, this is what I see during the initial few starts:
INFO: <https://REDACTED/ping> is accessible
INFO: REDACTED resolves to REDACTED
time="2023-05-24T13:45:38Z" level=info msg="Listening on /tmp/log.sock"
time="2023-05-24T13:45:38Z" level=info msg="Rancher agent version v2.7.1 is starting"
time="2023-05-24T13:45:38Z" level=fatal msg="looking up cattle-system/cattle ca/token: failed to find service account cattle-system/ca
ttle: serviceaccounts \"cattle\" is forbidden: User \"system:serviceaccount:cattle-system:cattle\" cannot get resource \"serviceaccoun
ts\" in API group \"\" in the namespace \"cattle-system\""
After a few pod crash/restarts, it calms down and I see this:
time="2023-05-24T13:46:50Z" level=info msg="Listening on /tmp/log.sock"
time="2023-05-24T13:46:50Z" level=info msg="Rancher agent version v2.7.1 is starting"
time="2023-05-24T13:46:50Z" level=info msg="Connecting to <wss://REDACTED/v3/connect/register> with token starting with
[REDACTED]"
time="2023-05-24T13:46:50Z" level=info msg="Connecting to proxy" url="<wss://REDACTED/v3/connect/register>"
When I enable debug logging in the pod, I see an occasional "Wrote ping" message, but not much else. Any ideas on where to start looking?
I was able to successfully shell into the cattle-agent pod and
curl -ki <https://REDACTED/ping>
to the rancher instance.
I noted that I had to add the -k to accept the certificate (even though it is a valid certificate) and I think it's because the container image may not have CA certs installed. I have the same problem using curl against https://google.com, so I think this is a red herring, but I figured I'd mention it anyway. We have no TLS decryption proxying thing going on. This is direct internet access
e

eager-nightfall-87875

05/24/2023, 2:57 PM
Since this was moved from one Rancher cluster to another, could it be possible that the k3s cluster somehow did not get an updated cert from the certmanager of the new Rancher cluster? Some legacy configs may still be there preventing the k3s cluster from joining appropriately. I will look deeper into it if I have time, but this might be a good jumping off point for you to look into.
So looking further into this, unfortunately, this is not supported (yet). I use yet loosely here because there has been an open feature request on this for quite some time. https://github.com/rancher/rancher/issues/16471 It is also hinted at in the Rancher documentation this would be a no go: https://ranchermanager.docs.rancher.com/faq/rancher-is-no-longer-needed
f

future-fountain-82544

05/24/2023, 3:16 PM
I actually somehow made it work, but I'm not sure
e

eager-nightfall-87875

05/24/2023, 3:16 PM
lol, good for you!
I am sure there are a bunch of people that would like to know how you made it work given it's still an open issue on Rancher github
Glad to have been no help at all 😀
😆 1
f

future-fountain-82544

05/24/2023, 4:11 PM
I tried a few things in various combinations and I'm not sure which combination worked: • Delete the cluster in the new rancher • Add the cluster in the new rancher • Manually deleted the cattle-system namespace • Ran the cluster cleanup, which didn't seem to work in any case; Several times I ran it with an invalid version (IE:
2.7.1
instead of
v2.7.1
), but at any rate it just spit out a lot of errors. I'd clean up with
curl $yaml_url | kubectl delete -f -