09/08/2022, 7:44 AM
Hello. someone here delete a namespace cattle-system with cluster agents and now the rancher show
in his status. I found site that show some tips how to recreate it but, the point is that from what I read it says about executing something on control-plan nodes, In my case I use AKS clusters so have just the workload nodes. Any tips what I can do to restore the rancher agent on such scenario (I don't have etcd backup)?
Ok small update and progress: By using
curl -s -H "Authorization: Bearer xxxx" "<https://zzz/v3/clusterregistrationtokens?clusterId=c-xxx>" | jq -r '.data[] | .command'
I was able to see the yaml definition that should be deployed to the cluster what I did. When I deploy it to the broken cluster I notice that
pds are created, but still on rancher side I see errors like:
[ERROR] failed to start cluster controllers c-xxxxx: context canceled
So I recheck the deployment from this import command with another cluster and notice about missing env like:
  value: embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false
So I add it also to my broken cluster and now the odd things. My cluster is no longer grayout on the list so I can click on it but still the label is Unavailable. on rancher logs I saw also
[ERROR] error syncing 'c-xxxxx': handler cluster-deploy: Unauthorized, requeuing
So it's odd for me as I can see the pods and resource metrics on rancher ui but somehow something missing Any clue what else or where can be misconfigured from rancher perspective?
huh I see that alse namespace called
so more stuff to restore ­čśĽ