I have a node with a drive that periodically seems to have a Rancher Users #harvester

I have a node with a drive that periodically seems...

gray-room-77418

08/25/2025, 4:08 PM

I have a node with a drive that periodically seems to have an issue. BIOS sees the disk just fine, however Harvester doesn't see it, so I'm trying to figure which disk in the node is problematic. Both data disks are 2TB NVMe , so I guess what I'm asking is... how do I determine the specific disk, so I can pull and replace?

happy-cat-90847

08/25/2025, 4:09 PM

Harvester doesn’t see it. As the UI or via the CLI?

gray-room-77418

08/25/2025, 4:11 PM

shows as an error in UI. I removed the disk from the node in the UI, rebooted, but there's no disk available to add in. Not sure exactly how I'd check this in the CLI.

Copy code

rancher@harvester1:~> lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0     3G  1 loop /
sr0          11:0    1  1024M  0 rom
nvme0n1     259:0    0   1.8T  0 disk
nvme2n1     259:1    0   1.8T  0 disk /var/lib/harvester/defaultdisk
nvme1n1     259:2    0 465.8G  0 disk
├─nvme1n1p1 259:3    0    64M  0 part
├─nvme1n1p2 259:4    0    50M  0 part /oem
├─nvme1n1p3 259:5    0     8G  0 part
├─nvme1n1p4 259:6    0    15G  0 part /run/initramfs/cos-state
└─nvme1n1p5 259:7    0 442.6G  0 part /var/lib/longhorn
                                      /var/crash
                                      /var/lib/third-party
                                      /var/lib/cni
                                      /var/lib/wicked
                                      /var/lib/kubelet
                                      /var/lib/rancher
                                      /var/log
                                      /usr/libexec
                                      /root
                                      /opt
                                      /home
                                      /etc/pki/trust/anchors
                                      /etc/cni
                                      /etc/nvme
                                      /etc/iscsi
                                      /etc/ssh
                                      /etc/rancher
                                      /etc/systemd
                                      /usr/local

the device with issues is nvme0n1

gray-room-77418

08/25/2025, 4:13 PM

when I try to add the disk back in the UI, this is what I see

happy-cat-90847

08/25/2025, 5:07 PM

Yeah, wipe it. Assuming that’s it and it’s empty.

happy-cat-90847

08/25/2025, 5:07 PM

wipefs -a /dev/nvme0n1

happy-cat-90847

08/25/2025, 5:07 PM

Then wait. Can’t remember how often that is checked

gray-room-77418

08/25/2025, 7:45 PM

thanks - I actually reinstalled this node a couple of months back after seeing this thing, so I suspect that there's an actual hardware issue

gray-room-77418

08/26/2025, 2:16 PM

Hmm, so I did

Copy code

sudo wipefs -a /dev/nvme0n1

yesterday, still don't see the option to add the disk back into the UI

gray-room-77418

08/26/2025, 2:18 PM

any way to get the serial number for the device from the Harvester CLI?

gray-room-77418

08/26/2025, 2:21 PM

of course, I could just do

Copy code

sudo smartctl -a /dev/nvme0n1

gray-room-77418

08/26/2025, 2:28 PM

interestingly, not seeing any errors in SMART for the device

gray-room-77418

08/26/2025, 2:32 PM

already on the latest firmware, so maybe there's something else amiss

2 Views

Open in Slack

Previous Next