adamant-kite-43734
09/24/2024, 8:49 AMsticky-summer-13450
09/24/2024, 9:20 AMwitty-jelly-95845
09/24/2024, 9:56 AMwitty-jelly-95845
09/24/2024, 9:57 AMsticky-summer-13450
09/24/2024, 10:07 AMsystemctl list-units
the first node (the one that's been "updated") does not have an rke2-server.service
...
lvm2-monitor.service loaded active exited Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling
rancher-system-agent.service loaded active running Rancher System Agent
smartd.service loaded active running Self Monitoring and Reporting Technology (SMART) Daemon
sshd.service loaded active running OpenSSH Daemon
witty-jelly-95845
09/24/2024, 11:41 AMsticky-summer-13450
09/24/2024, 11:46 AMrancher@harvester003:~> ls -la /etc/systemd/system/rke2-server.service
-rw-r--r-- 1 root root 0 Sep 23 19:12 /etc/systemd/system/rke2-server.service
witty-jelly-95845
09/24/2024, 11:49 AM[Unit]
Description=Rancher Kubernetes Engine v2 (server)
Documentation=<https://github.com/rancher/rke2#readme>
Wants=network-online.target
After=network-online.target
Conflicts=rke2-agent.service
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/opt/rke2/lib/systemd/system/%N.env
KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/opt/rke2/bin/rke2 server
ExecStopPost=-/bin/sh -c "systemd-cgls /system.slice/%n | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill"
EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env
witty-jelly-95845
09/24/2024, 11:49 AMsticky-summer-13450
09/24/2024, 11:52 AMwitty-jelly-95845
09/24/2024, 11:53 AMsticky-summer-13450
09/24/2024, 11:54 AM-rw-r--r-- 1 root root 554 Jul 6 13:27 /etc/systemd/system/rancher-system-agent.service
-rw-r--r-- 1 root root 868 Sep 23 12:33 /etc/systemd/system/rke2-agent.service
-rw-r--r-- 1 root root 943 Sep 23 12:33 /etc/systemd/system/rke2-server.service
-rw-r--r--. 1 root root 317 Jun 14 02:00 /etc/systemd/system/rke2-shutdown.service
On the updated node I have:
-rw-r--r-- 1 root root 554 Jul 6 13:27 /etc/systemd/system/rancher-system-agent.service
-rw-r--r-- 1 root root 0 Sep 23 19:12 /etc/systemd/system/rke2-agent.service
-rw-r--r-- 1 root root 0 Sep 23 19:12 /etc/systemd/system/rke2-server.service
-rw-r--r--. 1 root root 317 Sep 6 07:01 /etc/systemd/system/rke2-shutdown.service
sticky-summer-13450
09/24/2024, 11:56 AMrke2-[agent|server]
service files have been updated (yesterday's date)witty-jelly-95845
09/24/2024, 11:57 AM-rw-r--r-- 1 root root 554 Jun 14 10:40 rancher-system-agent.service
-rw-r--r-- 1 root root 554 Nov 22 2022 rancher-system-agent.service.ORIG
drwxr-xr-x. 2 root root 4096 Sep 6 07:01 rancher-system-agent.service.d
drwxr-xr-x. 2 root root 4096 Sep 6 07:01 rancherd.service.d
drwxr-xr-x 2 root root 4096 Sep 12 2023 reboot.target.requires
drwxr-xr-x. 2 root root 4096 Sep 3 17:53 remote-fs.target.wants
-rw-r--r-- 1 root root 868 Sep 23 17:15 rke2-agent.service
drwxr-xr-x. 2 root root 4096 Sep 6 07:01 rke2-agent.service.d
-rw-r--r-- 1 root root 943 Sep 23 17:15 rke2-server.service
drwxr-xr-x. 2 root root 4096 Sep 6 07:01 rke2-server.service.d
-rw-r--r--. 1 root root 317 Sep 6 07:01 rke2-shutdown.service
sticky-summer-13450
09/24/2024, 11:57 AM/etc
seems to be on /
and it has plenty of space
/dev/loop0 ext2 3.0G 1.3G 1.6G 46% /
witty-jelly-95845
09/24/2024, 11:58 AMsticky-summer-13450
09/24/2024, 12:47 PM/etc/systemd
is a bind mount from the /usr/local
filesystem:
rancher@harvester003:~> cat /etc/fstab | grep '/etc/systemd'
/usr/local/.state/etc-systemd.bind /etc/systemd none defaults,bind 0 0
Seems to be plenty of space there:
rancher@harvester003:~> df -h /usr/local
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p5 98G 75G 19G 81% /usr/local
sticky-summer-13450
09/24/2024, 4:45 PMrke2-server.service
and when I start that service I find that /opt/rke2/bin/rke2
is also zero length. So, it looks like a lot of useful files are zero length and this node is screwed.
rancher@harvester003:~> ls -la /opt/rke2/bin/rke2
-rwxr-xr-x 1 root root 0 Aug 1 22:36 /opt/rke2/bin/rke2
sticky-summer-13450
09/24/2024, 4:45 PMbland-article-62755
09/24/2024, 8:12 PMbland-article-62755
09/24/2024, 8:13 PMbland-article-62755
09/24/2024, 8:15 PMsticky-summer-13450
09/24/2024, 8:18 PMbland-article-62755
09/24/2024, 8:25 PMprehistoric-balloon-31801
09/25/2024, 7:29 AMharvester003
? can you run blkid
on it and share with me? Thanksprehistoric-balloon-31801
09/25/2024, 7:31 AMsticky-summer-13450
09/25/2024, 10:35 AMrancher@harvester003:~> sudo blkid
/dev/nvme0n1p5: LABEL="COS_PERSISTENT" UUID="81aae9fc-59e2-4c73-ad92-8a024aeb3357" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="persistent" PARTUUID="0b109d3c-3948-4b1c-9aca-5e40f3979cab"
/dev/nvme0n1p3: LABEL="COS_STATE" UUID="a0a8656e-9c51-484f-91b0-e85ca3971428" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="state" PARTUUID="83bc5609-e7ed-4188-b0d1-63756fadc928"
/dev/nvme0n1p1: LABEL_FATBOOT="COS_GRUB" LABEL="COS_GRUB" UUID="0916-11CA" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="primary" PARTUUID="3f34ec9a-1356-4119-92a7-41e675e8a4ec"
/dev/nvme0n1p6: LABEL="HARV_LH_DEFAULT" UUID="77c46199-ed8c-40df-b96f-3b5966b47b9d" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="longhorn" PARTUUID="3ab004c5-ae89-4750-a873-c46e037228f8"
/dev/nvme0n1p4: LABEL="COS_RECOVERY" UUID="8a96ed32-696f-42e5-ad09-eb8ccee8603b" BLOCK_SIZE="4096" TYPE="ext4" PARTLABEL="recovery" PARTUUID="16e7de20-73c4-496c-bd3b-7cd8e503a6e8"
/dev/nvme0n1p2: LABEL="COS_OEM" UUID="77340ad5-2d73-4830-98b1-952cc0b73fcc" BLOCK_SIZE="1024" TYPE="ext4" PARTLABEL="oem" PARTUUID="21aca11f-20f2-4cf9-b6fe-6c4c4b8dcb2c"
/dev/loop0: LABEL="COS_ACTIVE" UUID="c0627e2c-e94a-4cd5-b20b-b5acbed52f0b" BLOCK_SIZE="4096" TYPE="ext2"