This message was deleted Rancher Users #vsphere

Join Slack

This message was deleted.

# vsphere

adamant-kite-43734

08/03/2023, 9:50 AM

This message was deleted.

agreeable-oil-87482

08/03/2023, 9:56 AM

First i'd check the

rancher-system-agent

logs from the provisioned node

bitter-shoe-85930

08/03/2023, 10:09 AM

Hi @agreeable-oil-87482, thanks for the prompt reply. The logs should be located under

/var/log/rancher/

or somewhere else? Under the

/etc/rancher/agent/

directory I do not have any relevant files. Sorry for the silly question and thanks for your help.

agreeable-oil-87482

08/03/2023, 10:21 AM

Depends on the OS but if it's using systemd -

journalctl -u rancher-system-agent

bitter-shoe-85930

08/03/2023, 10:52 AM

Okay, the rancher-system-agent does not seem to exist. Thus no logs. Is there something else I could check? This is the current state of my cluster. Basically it stuck there with no further changes. I am definitely missing something here.

agreeable-oil-87482

08/03/2023, 11:09 AM

Has the hostname changed for the node?

bitter-shoe-85930

08/03/2023, 11:12 AM

Instead, the hostname was updated and I have an IP address assigned from the DHCP server. I am also able to execute cloud_config code blocks (defined in the Rancher UI under cloud_config).

bitter-shoe-85930

08/03/2023, 11:18 AM

I also tested the websocket connection to the Rancher instance. It works fine.

agreeable-oil-87482

08/03/2023, 11:22 AM

If you look in the user-data file from cloud init it will reference writing and running a script. Can you try running that manually?

bitter-shoe-85930

08/03/2023, 11:55 AM

Okay, so I believe the execution of the script does not work due to a certificate issue

2023-08-03 11:50:19,894 - subp.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=False, capture=False)

[INFO] --no-roles flag passed, unsetting all other requested roles

[INFO] Using default agent configuration directory /etc/rancher/agent

[INFO] Using default agent var directory /var/lib/rancher/agent

[INFO] Determined CA is necessary to connect to Rancher

[INFO] Successfully downloaded CA certificate

[INFO] Value from <https://x.x.x.x/cacerts> is an x509 certificate

[ERROR] Configured cacerts checksum (xxx) does not match given --ca-checksum (xxx)

[ERROR] Please check if the correct certificate is configured at<https://x.x.x.x/cacerts>

2023-08-03 11:50:20,010 - subp.py[DEBUG]: Unexpected error while running command.

bitter-shoe-85930

08/03/2023, 11:55 AM

Okay, at least I have a pointer for further tshoot steps. The Rancher instance uses a self-sign cert atm.

agreeable-oil-87482

08/03/2023, 2:02 PM

Run something like

openssl s_client -showcerts -connect <http://YourRancherURL.com:443|YourRancherURL.com:443>

from a node other than what Rancher is running on and see which cert is being presented

bitter-shoe-85930

08/03/2023, 4:02 PM

Hey @agreeable-oil-87482, thanks a lot for the hint. For anyone who is having similar issues, for Ubuntu 22.04 the openssl command is

openssl s_client -showcerts -connect IP ADDR:PORT

. I was able to re-run the user-data file from the cloud-init manually and the

rancher-system-agent

is there. Below is the output from the journalctl command

Copy code

Aug 03 15:32:16 rke2-self-sign-pool1-3aca6d04-4vx4m systemd[1]: Started Rancher System Agent.
░░ Subject: A start job for unit rancher-system-agent.service has finished successfully
░░ Defined-By: systemd
░░ Support: <http://www.ubuntu.com/support>
░░ 
░░ A start job for unit rancher-system-agent.service has finished successfully.
░░ 
░░ The job identifier is 1465.
Aug 03 15:32:16 rke2-self-sign-pool1-3aca6d04-4vx4m rancher-system-agent[2611]: time="2023-08-03T15:32:16Z" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Aug 03 15:32:16 rke2-self-sign-pool1-3aca6d04-4vx4m rancher-system-agent[2611]: time="2023-08-03T15:32:16Z" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Aug 03 15:32:16 rke2-self-sign-pool1-3aca6d04-4vx4m rancher-system-agent[2611]: time="2023-08-03T15:32:16Z" level=info msg="Starting remote watch of plans"
Aug 03 15:32:16 rke2-self-sign-pool1-3aca6d04-4vx4m rancher-system-agent[2611]: E0803 15:32:16.783848    2611 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>:
Aug 03 15:32:16 rke2-self-sign-pool1-3aca6d04-4vx4m rancher-system-agent[2611]: time="2023-08-03T15:32:16Z" level=info msg="Starting /v1, Kind=Secret controller"

bitter-shoe-85930

08/04/2023, 9:02 AM

Okay, so the rancher-agent is running and I do see the below log messages.

Copy code

# systemctl status rancher-system-agent
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-08-04 08:46:00 UTC; 14min ago
       Docs: <https://www.rancher.com>
   Main PID: 2632 (rancher-system-)
      Tasks: 11 (limit: 4556)
     Memory: 101.7M
        CPU: 3.425s
     CGroup: /system.slice/rancher-system-agent.service
             └─2632 /usr/local/bin/rancher-system-agent sentinel

Aug 04 09:00:11 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:11Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manag>
Aug 04 09:00:11 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:11Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 04 09:00:16 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:16Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manag>
Aug 04 09:00:16 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:16Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 04 09:00:16 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:16Z" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/ranch>
Aug 04 09:00:16 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:16Z" level=error msg="error while appending ca cert to pool for probe kube-scheduler"
Aug 04 09:00:21 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:21Z" level=error msg="error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/ranch>
Aug 04 09:00:21 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:21Z" level=error msg="error while appending ca cert to pool for probe kube-scheduler"
Aug 04 09:00:21 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:21Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manag>
Aug 04 09:00:21 rke2-canal-self-signed-pool1-8cc19983-dqmbb rancher-system-agent[2632]: time="2023-08-04T09:00:21Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"

bitter-shoe-85930

08/04/2023, 9:03 AM

The kube-scheduler and kube-controller-manager directories do not exist indeed.

agreeable-oil-87482

08/04/2023, 9:03 AM

has the

rke2-server

service started?

bitter-shoe-85930

08/04/2023, 9:03 AM

https://github.com/rancher/rancher/issues/41650

bitter-shoe-85930

08/04/2023, 9:04 AM

Copy code

systemctl status rke2-server
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: enabled)
     Active: activating (start) since Fri 2023-08-04 09:01:10 UTC; 2min 44s ago
       Docs: <https://github.com/rancher/rke2#readme>
    Process: 7520 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
    Process: 7522 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 7523 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
   Main PID: 7524 (rke2)
      Tasks: 59
     Memory: 1.7G
        CPU: 2min 46.683s
     CGroup: /system.slice/rke2-server.service
             ├─3450 /var/lib/rancher/rke2/data/v1.26.7-rke2r1-9873cf6e613f/bin/containerd-shim-runc-v2 -namespace <http://k8s.io|k8s.io> -id e05d0df1d14bddb17fec8c3eef22187a6302fbfb752f4316ec6f9b4c0d302856 -address /run/k3s/containerd/containerd.sock
             ├─3539 /var/lib/rancher/rke2/data/v1.26.7-rke2r1-9873cf6e613f/bin/containerd-shim-runc-v2 -namespace <http://k8s.io|k8s.io> -id 6851153a309b7049795e2300a17d9a88ecfb08c10df85c5dcb3033bd82fd53d1 -address /run/k3s/containerd/containerd.sock
             ├─7524 "/usr/local/bin/rke2 server"
             ├─7536 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
             └─8246 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --cloud-provider=external --cloud-config= --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Web>

Aug 04 09:03:19 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:19Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:24 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:24Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:29 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:29Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:34 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:34Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:39 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:39Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:44 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:44Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:44 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:44Z" level=error msg="Kubelet exited: exit status 255"
Aug 04 09:03:46 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:46Z" level=info msg="Waiting for API server to become available"
Aug 04 09:03:49 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:49Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
Aug 04 09:03:54 rke2-canal-self-signed-pool1-8cc19983-dqmbb rke2[7524]: time="2023-08-04T09:03:54Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"

agreeable-oil-87482

08/04/2023, 9:07 AM

and you post the output of

journalctl -u rke2-server

please

bitter-shoe-85930

08/04/2023, 9:32 AM

I cannot paste the output of the file as code becuase is quite log. I uploaded the file instead.

rke2-server.log

agreeable-oil-87482

08/04/2023, 9:36 AM

Are you using a private registry?

bitter-shoe-85930

08/04/2023, 9:43 AM

You mean a container private registry?

agreeable-oil-87482

08/04/2023, 9:43 AM

Yes

bitter-shoe-85930

08/04/2023, 9:48 AM

No, we do not use a private registry.

agreeable-oil-87482

08/04/2023, 9:48 AM

Does

/etc/rancher/rke2/registries.yaml"

exist on the node?

bitter-shoe-85930

08/04/2023, 9:57 AM

yes, and the output is the following

{"configs":{},"mirrors":null}

bitter-shoe-85930

08/04/2023, 9:57 AM

I am checking the directory

/var/lib/rancher/rke2/agent/images

and looking at the string we try to pull from docker hub.

bitter-shoe-85930

08/04/2023, 10:12 AM

Btw, do I need to setup a private registry on the node? Additionally, I thought I did not need Docker on the nodes for RKE2 to work. Is this valid?

agreeable-oil-87482

08/04/2023, 10:13 AM

No, its optional

bitter-shoe-85930

08/04/2023, 10:15 AM

Any other ideas how to continue further with the tshoot?

bitter-shoe-85930

08/07/2023, 9:35 AM

@agreeable-oil-87482 Thanks a lot for your help and hints for further tshoot. My issues are resolved. The Loopback DNS resolution was weird on my VM. Have a great day! :)

472 Views

Open in Slack

Previous Next