This message was deleted.
# rke2
a
This message was deleted.
c
No. Why doesn’t it rejoin? Have you looked at the logs or anything else to see what the problem is?
s
systemctl status rke2-server or rke2-agent is it enabled?
a
If I install RKE2 using the defaults on my servers from Rancher server, the service gets installed under /usr/local. Systemd kicks the enabled services before /usr/local gets mounted thus rke2-server or rke2-agent never starts. I tried to find what installation parameter to use to instruct the installer to put the service definition under /etc/systemd/system instead but failed to identify. The solution for me was that after rke2 was installed I simply moved the service definition file from under /usr/local/lib/systemd/system to /etc/systemd/system. This does not occur when I do the installation directly from get.rke2.io.
s
I will try and collect logs and also check rke2-server and rke2-agent.
c
It should go to /opt/rke2 if /usr/local is a btrfs subvolume, for that very reason. what is the result of
mountpoint -q /usr/local
on your node?
a
It does not return anything.
And yes, /usr/local is a btrfs subvolume.
c
sorry, drop the -q (for quiet)
s
I finally got back to looking into RKE2 cluster not working after ctrl and etcd nodes get rebooted. I checked the systemctl status rke2-server or rke2-agent and saw that they were both dead:
Copy code
systemctl status rke2-server
○ rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
     Active: inactive (dead)
       Docs: <https://github.com/rancher/rke2#readme>

systemctl status rke2-agent
○ rke2-agent.service - Rancher Kubernetes Engine v2 (agent)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-agent.service; disabled; vendor preset: disabled)
     Active: inactive (dead)
       Docs: <https://github.com/rancher/rke2#readme>

Tried to start the server:

systemctl status rke2-server
○ rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
     Active: inactive (dead) (Result: exit-code) since Wed 2022-10-26 16:02:22 PDT; 3min 13s ago
       Docs: <https://github.com/rancher/rke2#readme>
    Process: 4003 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
    Process: 4005 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 4006 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 4007 ExecStart=/usr/local/bin/rke2 server (code=exited, status=1/FAILURE)
    Process: 4017 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kil>
   Main PID: 4007 (code=exited, status=1/FAILURE)
Copy code
Cluster was created using:
curl -fL <https://my-rancher-url/system-agent-install.sh> | sudo  sh -s - --server <https://my-rancher-url> --label '<http://cattle.io/os=linux|cattle.io/os=linux>' --token rvqf4kngw4pk9gpnbr2d5dfcvgdb7lnn8j84jrk4h88rz555k8gl75 --etcd --controlplane --worker
The issue seems to be that rke2-server and rke2-agent after a reboot does not automatically start. however if I manually start the rke2-server on the ctrl-etcd node and the rke2-agent on the worker nodes then cluster is functional again.
I finally did what was suggested by @able-engineer-22050. Just moved the server and agent files from /usr/local/lib/systemd/system to /etc/systemd/system. Thanks.
a
I believe, there should be an environment variable that controls where the service files get installed, but I haven't yet had time to figure out which one it would be. If someone has this info ready then please, share.
c
If you set
$INSTALL_RKE2_TAR_PREFIX
to something other than the default of
/usr/local
, the installer will move the systemd units to
/etc/systemd/system/
. This should automatically be done on any system where /usr/local is read-only or a mount point (such as on systems where it is a btrfs subvolume that is mounted too late for systemd to find things).
a
Ok. In my case it does not (or did not) work as planned. /usr/local is a mountpoint, to L120-128 should have RKE2 install prefix set to /opt/rke2. In fact, rke2 gets installed under /var/lib/rancher/rke2. Also L369-L375 should have service files copied from /usr/local/lib/systemd to /etc/systemd but it does not happen. It might be possible that the install.sh located inside Rancher management does not yet have these options (I have Rancher 2.6.5 installed). The mgmt cluster (also RKE2) was installed from rke2.io directly and that didn't suffer from the systemd location problem.
c
Rke2 will always put its files under /var/lib/rancher, the question is just where the rke2 binary and the systemd units go.
393 Views