This message was deleted.
# harvester
a
This message was deleted.
f
Since it mounts the label I am not sure that this is an issue here. But Longhorn sure doesn't like the disk.
e
The device nodes
/dev/sdX
exist mostly for backwards compatibility. They are convenient if you only have one disk in your system but they have problems otherwise. You can use
/dev/disk/by-id/$ID
or
/dev/disk/by-path/$PATH
naming schemes during installation to configure the disks. The ID and the path are fixed attributes that don't change on reboot - they only change if you change the disk hardware https://docs.harvesterhci.io/v1.4/install/harvester-configuration#installdata_disk The device nodes
/dev/sda, /dev/sdb, /dev/sdc
can change which disk they refer to because they are populated during boot in the order that the devices show up in. When booting the Linux kernel queries the device buses of your motherboard, to find what hardware is plugged in. This happens in parallel to speed things up, but it is not guaranteed to yield repeatable results, and therefore the order in which device nodes are populated can switch around. To get around this you have to use a device naming based on physical properties. For disks device nodes with stable names are the aforementioned
/dev/disk/by-*
. Same reason why network devices aren't named
eth0, eth1, eth2...
anymore, but
enp3s0f7
or
wlp3s0
. These names are generated based on physical properties and it's guaranteed that the same interface will get the same name regardless of when it shows up.
f
I agree, the disk is correct and is mounted via the label. But it for some reason says: Message: Disk default-disk-e863a01910a9739(/var/lib/harvester/defaultdisk) on node harvester-cnf2j is not ready: record diskUUID doesn't match the one on the disk But the disk is mounted and the UUID matches the longhorn-disk.cfg file
Is there any other place I need to verify that the UUID matches?
I am looking into the longhorn logs:
Copy code
kubectl logs -n longhorn-system -l app=longhorn-manager  --max-log-requests 6 --follow
time="2025-03-05T20:29:37Z" level=error msg="Dropping Longhorn replica out of the queue" func=controller.handleReconcileErrorLogging file="utils.go:79" Replica=longhorn-system/pvc-623160fe-c5d5-412f-9683-f07f221d4c24-r-87452d70 controller=longhorn-replica error="failed to sync replica for longhorn-system/pvc-623160fe-c5d5-412f-9683-f07f221d4c24-r-87452d70: Failed to get instance process pvc-623160fe-c5d5-412f-9683-f07f221d4c24-r-87452d70: invalid Instance Manager instance-manager-e90d2bfef72ec289d8bcea1eb1e3262c, state: error, IP: 10.52.3.97" node=harvester-cnf2j
time="2025-03-05T20:29:37Z" level=error msg="Dropping Longhorn replica out of the queue" func=controller.handleReconcileErrorLogging file="utils.go:79" Replica=longhorn-system/pvc-b0bc2fae-00b8-4f80-a666-cc9c84520969-r-17eb71b2 controller=longhorn-replica error="failed to sync replica for longhorn-system/pvc-b0bc2fae-00b8-4f80-a666-cc9c84520969-r-17eb71b2: Failed to get instance process pvc-b0bc2fae-00b8-4f80-a666-cc9c84520969-r-17eb71b2: invalid Instance Manager instance-manager-e90d2bfef72ec289d8bcea1eb1e3262c, state: error, IP: 10.52.3.97" node=harvester-cnf2j
time="2025-03-05T20:29:37Z" level=error msg="Dropping Longhorn replica out of the queue" func=controller.handleReconcileErrorLogging file="utils.go:79" Replica=longhorn-system/pvc-14a02af3-dbc1-4df1-b37c-2431b474fe87-r-129c1a86 controller=longhorn-replica error="failed to sync replica for longhorn-system/pvc-14a02af3-dbc1-4df1-b37c-2431b474fe87-r-129c1a86: Failed to get instance process pvc-14a02af3-dbc1-4df1-b37c-2431b474fe87-r-129c1a86: invalid Instance Manager instance-manager-e90d2bfef72ec289d8bcea1eb1e3262c, state: error, IP: 10.52.3.97" node=harvester-cnf2j
I am not really sure where to go here.
e
Hi John, is this the same issue as Christian's or a different one. It would probably be better to ask in the Longhorn channel #CC2UQM49Y, since there are more knowledgeable people there wrt. Longhorn. In any case, first I'd make sure that you won't have the problem of device nodes switching around by using one of the stable references. Longhorn also keeps information about what disks it expects on what node in the corresponding
<http://node.longhorn.io/v1beta2|node.longhorn.io/v1beta2>
objects. Check that these match what's actually there. That being said, without a more detailed description of your environment it's hard to tell what goes on.
w
@flat-librarian-14243 and I are referring to the same cluster - so yes, it is the same issue 🙂 Thanks for pointing us to the correct channel. I’ll repost this thread there.
r
Same cluster, more information. We are using more stable references, for everything but the extra disk for Storage. When adding the disk, we are getting the disk listed with "legacy" names only. And then when the server boots things move around, but I will search in the storage channel.