This message was deleted.
# rke2
a
This message was deleted.
b
hello clark, you can run the following commands to find out whats going on: systemctl status rancher-system-agent.service
journalctl -f -u rancher-system-agent.service
once you see the logs and the status you would know better whats going on with your installation
f
I already reviewed this log, is the problem at the line I marked italic?
error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti
Copy code
Sep 14 16:43:23 zhjw-master-01 rancher-system-agent[1330]: time="2023-09-14T16:43:23+08:00" level=info msg="Starting remote watch of plans"
Sep 14 16:43:23 zhjw-master-01 rancher-system-agent[1330]: time="2023-09-14T16:43:23+08:00" level=fatal msg="error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti>Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=debug msg="Instantiated new image utility with imagesDir: /var/lib/rancher/agent/images, imageCredentialProvider>Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=info msg="Starting remote watch of plans"
Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=fatal msg="error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti>Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Starting remote watch of plans"
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: E0914 16:43:32.913442    1483 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>:
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Starting /v1, Kind=Secret controller"
If this is the problem, what can I do to fix it?
b
Hello Clark, Can you try installing kubectl on the node and check if the cluster components are coming up. kubectl get all --all-namespaces If the pods are coming up, then you can run: kubectl logs pod-name -n namespace to get the logs from the pods to debug futher. I was at a similar point earlier, i.e. the on running the rancher server registration code the, rke2 nodes would not connect to my rancher server, and had needed to make the following changes to cluster settings on creation to connect the nodes to the rancher server successfully: 1. I changed the container network from calico to canal 2. For the cloud provider needed to choose Default - RKE2 Embedded 3. CIS Profile None Once I created the cluster with the above 3 conditions, my node connected to the cluster Hope this helps Thanks and Best Regards, Santosh
f
I have re-created the cluster, and uses the option as you wrote, the new cluster still not starts up. Below is the log from rancher server cluster:
Copy code
2023/09/15 01:47:16 [DEBUG] [etcd-backup] checking backups for cluster [local]
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [local] is not an rke cluster, skipping..
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [local] is not an rke cluster, skipping..
2023/09/15 01:47:16 [DEBUG] [etcd-backup] checking backups for cluster [c-m-wc7wcln6]
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [c-m-wc7wcln6] is not an rke cluster, skipping..
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [c-m-wc7wcln6] is not an rke cluster, skipping..
2023/09/15 01:47:19 [DEBUG] Wrote ping
2023/09/15 01:47:19 [DEBUG] Wrote ping
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/custom-64d51b4a4a8a-machine-bootstrap for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/custom-64d51b4a4a8a-machine-bootstrap for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] Extras returned map[principalid:[<system://provisioning/fleet-default/zhjw> <local://u-vrtps3d7wq>] username:[]]
2023/09/15 01:47:23 [DEBUG] Triggering auth refresh on u-vrtps3d7wq
2023/09/15 01:47:23 [DEBUG] Searching for providerID for selector <http://rke.cattle.io/machine=15ad0257-4d1a-4f52-a25f-17c9855fd4cf|rke.cattle.io/machine=15ad0257-4d1a-4f52-a25f-17c9855fd4cf> in cluster fleet-default/zhjw, machine custom-64d51b4a4a8a: {"Code":{"Code":"Forbidden","Status":403},"Message":"<http://clusters.management.cattle.io|clusters.management.cattle.io> \"c-m-wc7wcln6\" is forbidden: User \"u-vrtps3d7wq\" cannot get resource \"clusters\" in API group \"<http://management.cattle.io|management.cattle.io>\" at the cluster scope","Cause":null,"FieldName":""} (get nodes)
2023/09/15 01:47:23 [DEBUG] Skipping refresh for system-user u-vrtps3d7wq 
2023/09/15 01:47:24 [DEBUG] Wrote ping
2023/09/15 01:47:24 [DEBUG] Wrote ping
2023/09/15 01:47:25 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2023/09/15 01:47:25 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2023/09/15 01:47:25 [DEBUG] [CAPI] Infrastructure provider is not ready, requeuing
2023/09/15 01:47:25 [DEBUG] [CAPI] Cannot reconcile Machine's Node, no valid ProviderID yet
b
Hello Clark,
Apart from the server settings, this could be an error related to connectivity from the node to the rancher server port 6443 Can you check out the logs on the node that we are trying to connect to the server, as to why it is not able to connect to the cluster For the same you can: Try installing kubectl on the node and check if the cluster components are coming up on the node. The following command will get all the running components in the cluster with their statuses if the agent and rke came up on the node. kubectl get all --all-namespaces If the pods are coming up, then you can run kubectl logs pod-name -n namespace to get the logs from the pods to debug futher. You can check if the networking pods came up on the node Also if you check out the basic connectivity from the node to the server ping, telnet, firewall, that may also provide a few pointers
Also on the node are you still getting this error
Copy code
Sep 14 16:43:23 zhjw-master-01 rancher-system-agent[1330]: time="2023-09-14T16:43:23+08:00" level=fatal msg="error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti
f
Thank you, Santosh! I have solved this problem, it's my problem, I misunderstood official tutorial.😓 I only registered one node with role
Control Plane, Etcd
, I think the node will be active, then I goto register the new node with
Worker
. The correct steps should be: 1. register a node with
Contorl Plane, Etcd
2. register a node with
Worker
3. wait these two nodes become
active
b
Great, Glad to hear the same 🙂