This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

09/14/2023, 11:11 AM

This message was deleted.

bland-twilight-27418

09/14/2023, 11:27 AM

hello clark, you can run the following commands to find out whats going on: systemctl status rancher-system-agent.service

bland-twilight-27418

09/14/2023, 11:27 AM

journalctl -f -u rancher-system-agent.service

bland-twilight-27418

09/14/2023, 11:28 AM

once you see the logs and the status you would know better whats going on with your installation

fast-motorcycle-89632

09/14/2023, 11:32 AM

I already reviewed this log, is the problem at the line I marked italic?

error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti

Copy code

Sep 14 16:43:23 zhjw-master-01 rancher-system-agent[1330]: time="2023-09-14T16:43:23+08:00" level=info msg="Starting remote watch of plans"
Sep 14 16:43:23 zhjw-master-01 rancher-system-agent[1330]: time="2023-09-14T16:43:23+08:00" level=fatal msg="error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti>Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=debug msg="Instantiated new image utility with imagesDir: /var/lib/rancher/agent/images, imageCredentialProvider>Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=info msg="Starting remote watch of plans"
Sep 14 16:43:28 zhjw-master-01 rancher-system-agent[1337]: time="2023-09-14T16:43:28+08:00" level=fatal msg="error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti>Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Starting remote watch of plans"
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: E0914 16:43:32.913442    1483 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>:
Sep 14 16:43:32 zhjw-master-01 rancher-system-agent[1483]: time="2023-09-14T16:43:32+08:00" level=info msg="Starting /v1, Kind=Secret controller"

fast-motorcycle-89632

09/14/2023, 11:35 AM

If this is the problem, what can I do to fix it?

bland-twilight-27418

09/14/2023, 12:57 PM

Hello Clark, Can you try installing kubectl on the node and check if the cluster components are coming up. kubectl get all --all-namespaces If the pods are coming up, then you can run: kubectl logs pod-name -n namespace to get the logs from the pods to debug futher. I was at a similar point earlier, i.e. the on running the rancher server registration code the, rke2 nodes would not connect to my rancher server, and had needed to make the following changes to cluster settings on creation to connect the nodes to the rancher server successfully: 1. I changed the container network from calico to canal 2. For the cloud provider needed to choose Default - RKE2 Embedded 3. CIS Profile None Once I created the cluster with the above 3 conditions, my node connected to the cluster Hope this helps Thanks and Best Regards, Santosh

fast-motorcycle-89632

09/15/2023, 1:54 AM

I have re-created the cluster, and uses the option as you wrote, the new cluster still not starts up. Below is the log from rancher server cluster:

Copy code

2023/09/15 01:47:16 [DEBUG] [etcd-backup] checking backups for cluster [local]
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [local] is not an rke cluster, skipping..
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [local] is not an rke cluster, skipping..
2023/09/15 01:47:16 [DEBUG] [etcd-backup] checking backups for cluster [c-m-wc7wcln6]
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [c-m-wc7wcln6] is not an rke cluster, skipping..
2023/09/15 01:47:16 [DEBUG] [etcd-backup] [c-m-wc7wcln6] is not an rke cluster, skipping..
2023/09/15 01:47:19 [DEBUG] Wrote ping
2023/09/15 01:47:19 [DEBUG] Wrote ping
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/custom-64d51b4a4a8a-machine-bootstrap for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/custom-64d51b4a4a8a-machine-bootstrap for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding fleet-default/custom-64d51b4a4a8a-machine-plan for rke-bootstrap fleet-default/custom-64d51b4a4a8a
2023/09/15 01:47:23 [DEBUG] Extras returned map[principalid:[<system://provisioning/fleet-default/zhjw> <local://u-vrtps3d7wq>] username:[]]
2023/09/15 01:47:23 [DEBUG] Triggering auth refresh on u-vrtps3d7wq
2023/09/15 01:47:23 [DEBUG] Searching for providerID for selector <http://rke.cattle.io/machine=15ad0257-4d1a-4f52-a25f-17c9855fd4cf|rke.cattle.io/machine=15ad0257-4d1a-4f52-a25f-17c9855fd4cf> in cluster fleet-default/zhjw, machine custom-64d51b4a4a8a: {"Code":{"Code":"Forbidden","Status":403},"Message":"<http://clusters.management.cattle.io|clusters.management.cattle.io> \"c-m-wc7wcln6\" is forbidden: User \"u-vrtps3d7wq\" cannot get resource \"clusters\" in API group \"<http://management.cattle.io|management.cattle.io>\" at the cluster scope","Cause":null,"FieldName":""} (get nodes)
2023/09/15 01:47:23 [DEBUG] Skipping refresh for system-user u-vrtps3d7wq 
2023/09/15 01:47:24 [DEBUG] Wrote ping
2023/09/15 01:47:24 [DEBUG] Wrote ping
2023/09/15 01:47:25 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2023/09/15 01:47:25 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2023/09/15 01:47:25 [DEBUG] [CAPI] Infrastructure provider is not ready, requeuing
2023/09/15 01:47:25 [DEBUG] [CAPI] Cannot reconcile Machine's Node, no valid ProviderID yet

bland-twilight-27418

09/15/2023, 5:17 AM

Hello Clark,

bland-twilight-27418

09/15/2023, 5:22 AM

Apart from the server settings, this could be an error related to connectivity from the node to the rancher server port 6443 Can you check out the logs on the node that we are trying to connect to the server, as to why it is not able to connect to the cluster For the same you can: Try installing kubectl on the node and check if the cluster components are coming up on the node. The following command will get all the running components in the cluster with their statuses if the agent and rke came up on the node. kubectl get all --all-namespaces If the pods are coming up, then you can run kubectl logs pod-name -n namespace to get the logs from the pods to debug futher. You can check if the networking pods came up on the node Also if you check out the basic connectivity from the node to the server ping, telnet, firewall, that may also provide a few pointers

bland-twilight-27418

09/15/2023, 5:37 AM

Also on the node are you still getting this error

Copy code

Sep 14 16:43:23 zhjw-master-01 rancher-system-agent[1330]: time="2023-09-14T16:43:23+08:00" level=fatal msg="error while connecting to Kubernetes cluster: the server has asked for the client to provide credenti

fast-motorcycle-89632

09/15/2023, 8:07 AM

Thank you, Santosh! I have solved this problem, it's my problem, I misunderstood official tutorial.😓 I only registered one node with role

Control Plane, Etcd

, I think the node will be active, then I goto register the new node with

Worker

. The correct steps should be: 1. register a node with

Contorl Plane, Etcd

2. register a node with

Worker

3. wait these two nodes become

active

bland-twilight-27418

09/15/2023, 8:22 AM

Great, Glad to hear the same 🙂

6 Views

Open in Slack

Previous Next