Hi all, I'm trying to start up a Custom RKE2 clust...
# general
c
Hi all, I'm trying to start up a Custom RKE2 cluster from Rancher 2.11.3. After running the registration script,
rancher-system-agent.service
started and running without any error. However, no RKE2 get installed. On Rancher UI, it shows the node as
Waiting for Node Ref
. Debug log on Rancher Server show something related to the node:
Copy code
[DEBUG] Searching for providerID for selector rke.cattle.io/machine=acef2fd1-52a8-4f86-a56e-168d6ab5896e in cluster fleet-default/proxmox, machine custom-390116fc9c53: {"Code":{"Code":"Forbidden","Status":403},"Message":"clusters.management.cattle.io \"c-m-bbm7s6lm\" is forbidden: User \"u-nxbq6gtuep\" cannot get resource \"clusters\" in API group \"management.cattle.io\" at the cluster scope","Cause":null,"FieldName":""} (get nodes)
I tried to delete the node, clean it, and rerun the registration script again. I repeat that sequence multiple times, then it works once (RKE2 installed and nodes change to state and working). It appears to be random. Could anyone point me how I can debug further? Tried to turn on DEBUG mode on
rancher-system-agent
as well but nothing seems to be related. Thanks
s
Getting a "forbidden" on a Kubernetes API call means your user has no permission to execute the operation. As from the log, User "u-nxbq6gtuep" (probably a technical aut-generated user) has no rights to execute the equivalent of
kubectl get clusters
. My first bet is that maybe between the cleans something leftover remained in Rancher (old credentials, certificates belong to the cleaned cluster etc) and it tries to use that on the new instance. I would try to wipe the node (drop the vm disks, save vm, re-open settings and add new disk, start install). Dropping and re-adding the disk ensures they get a new UUID and a clean MBR / EFI, so zero chance of matching anything before. I usually create a snapshot of such vms before the first boot to avoid this procedure and have a guaranteed clean rollback state. After the node wipe and reinstall, i would go through Rancher with a microscope scanning for any leftovers, and after the cleanup i'd re-try the RKE2 cluster creation.
c
Thank you for the pointer. I had the same thought and tried reinstalling on a clean VM (from a clean snapshot). It also fails with the same error. What puzzle me is when I try to apply the same steps on the same node: • Delete node from Rancher • Run clean up according to: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes?k8s-distro=RKE2#cleaning-up-nodes • Onboard again It works after a few tries. I then repeat 1 more times, fail...Do you know anywhere I can look for the user
u-nxbq6gtuep
in the Rancher Local K8s(I guess) ?
Also I tried to delete the cluster and retry. The user shown in the log changes