Hello everyone, I have a question I hope someone c...
# harvester
g
Hello everyone, I have a question I hope someone can help me with. Before updating Harvester, all nodes had the same IQN
InitiatorName
, but after the update, the IQN has changed. Is it possible that the Harvester update caused this change, or is it something managed by Kubernetes? This change is causing issues with adding disks from our QSAN storage server, as the updated IQN no longer matches the expected configuration. Thank you in advance for your time and support! Best regards,Damyan
f
Did you use a iSCSI CSI? Or did you just keeping going with manually setting up the target?
g
Hi, I'm setting it up manually via
iscsiadm
, not using a CSI driver.
f
This will not work because nothing you do in the operating system manually is saved between reboots. My understanding is that you must use the CSI.
☝️ 1
b
I think technically you might be able to hack the
/oem/90_custom.yaml
and write a script to get it iscsiadm to perform the actions after every reboot, but best practice would be to use the csi.
a
It was an error that all harvester nodes had the same initiatorname - these should be unique per host. This was fixed in v1.5.0, see: https://github.com/harvester/harvester/issues/6911
After that fix, if you wish to change the initiatorname, it should be enough to simply edit
/etc/iscsi/initiatorname.iscsi
because those changes should now persist. For other versions as @bland-article-62755 said you can add yaml to
/oem
to override it (see suggested workaround in the description of that issue).
That said, you're likely better off using CSI if possible 🙂
g
First of all, thank you all very much for your replies — I really appreciate the help. Secondly, I assume you’re suggesting that I use this: https://github.com/kubernetes-csi/csi-driver-iscsi to automatically attach the disk to the iSCSI node, and then re-add the disk through the Harvester UI. Thanks again for taking the time to support me!
🐿️ 1
a
I haven't tried that particular iSCSI CSI driver myself, but the idea with CSI is that you'd install a CSI driver, then create a storage class that uses it, then create volumes via that storage class to attach to your VMs, i.e. the VMs are talking (more or less directly) to iSCSI volumes. That you said "re-add the disk through the Harvester UI" makes me think you might actually be using your external iSCSI storage as backing storage for Longhorn, which is a different approach, and CSI won't help you with that.
g
Hi, Yes, that's correct — I'm using iSCSI for external storage. I have two QSAN units that serve as external storage backends for Longhorn.
a
OK, in that case, if you're using Harvester v1.5 you should be able to set the initiatorname by editing
/etc/iscsi/initiatorname.iscsi
on each host, so if you really need to you could set them all back to the same name, BUT each host really is meant to have a different name, so it would be best to have unique names for each host, and update the config on the storage server to allow connections from all hosts (not just the one name). I haven't tried backing Longhorn with iSCSI myself, so I'm interested to hear how it works for you
f
You need to understand that Longhorn doesn't support iSCSI backing in the way that vmware/vmfs works. You can't have Harvester place Virtual Disks "On the SAN" in the same way it works in vmware. If this is your goal then you should stop here and reassess, because I can tell you with absolute certainty that this won't work. Harvester is not designed to work this way. If you mount the same iSCSI target on each host and point Longhorn at it then your data will be corrupted. If you mount seperate LUNs on each host then you'll end up with multiple copies of the same disk on the SAN. It might be worth asking what you are expecting here?
a
Yeah, for it to work you'd need a separate iSCSI target per host, so that from Longhorn's perspective it just looked like a separate local disk per host, then longhorn would replicate volumes across those just as it would with local disks (in theory). To the best of my knowledge Longhorn engineering folks don't/haven't done any testing of this scenario.
f
I think the question is really why would you want to do this? Performance would likely be poor and storage efficiency would be terrible. Harvester really isn't built for shared block storage SANs, unless those SANs ship a CSI driver. Which those QSAN units allegedly have: https://www.qsan.com/en/os/xevo/cloudnative
a
Yeah, if you can switch to using CSI, then it's just Harvester VM -> CSI Volume (on ISCSI) -> SAN vs: Harvester VM -> Longhorn Volume -> iSCSI -> SAN
and if your SAN is already doing replicated storage or RAID or something for volumes, then Longhorn replication on top of that is presumably redundant ... unless you set up a LH storage class to use a single replica, but even then you're still adding another layer in the storage stack for data to flow through (and potentially slow down)
g
@full-night-53442 Thanks for the advice! Yes, I read that it's not possible to use the same target for multiple nodes, and that each node requires a separate target — that's exactly how I'm doing it now. I'm currently creating a separate target per node on the SAN servers, attaching the appropriate target to the corresponding node, and then adding it through the Harvester UI into Longhorn. I also followed the recommendation to change both the InitiatorName and the hostnqn. Thanks @ambitious-daybreak-95996 for the help as well! Here’s the YAML file I use to connect and mount the target automatically in case of a reboot: name: "ISCSI configuration" stages: network: - name: "Add local IP address" commands: - ip addr add 192.168.70.20/24 dev enp216s0f1 - name: "Set ISCSI automatic connection" commands: - iscsiadm -m discovery -t st -p 192.168.70.10 - iscsiadm -m node -T TARGET_IQN -p ISCSI_IP --login - mkdir -p /var/lib/harvester/extra-disks/unique_mount_id - mount "$(readlink -f /dev/disk/by-path/ip-192.168.70.103260 iscsi iqn.2024 01.com.exampletarget1-lun-0)" /var/lib/harvester/extra-disks/unique_mount_id I also explored using volumes, but for a larger number of VMs it felt a bit inconvenient to create and attach a volume every time you set up a new VM. Personally, I think using a StorageClass and pre-created volumes makes sense when managing a small number of VMs. But since these two QSANs will be used solely as Longhorn storage, I find it more convenient to let Longhorn manage the disks directly. The SAN servers have four pools in total — two from each server: one 20TB RAID 5 pool and one 20TB RAID 6 pool. Thanks again to everyone for your time and support — really appreciate the help!
The only issue I encountered was that I couldn't get it to work like this:
Copy code
DISK_PATH=$(readlink -f /dev/disk/by-path/ip-192.168.70.10:3260-iscsi-iqn.2004-08.com.qsan:xs3216-000d47038:dev0.ctr1-lun-0)
mount "${DISK_PATH}" /var/lib/harvester/extra-disks/79de259774dce4983b72c9ddb950793c
It just doesn’t seem to work when I use a variable for the path, and I’m not sure why. Do you have any idea what might be causing this? Or am I possibly doing something wrong in the way I'm using variables?
a
So something like this?
Copy code
commands:
  - DISK_PATH=$(readlink ...)
  - mount "${DISK_PATH}" ...
I suspect each of those commands is run separately, i.e. in separate instances of a shell. If that's the case, there's no way for a variable to propagate from one to the other. Try instead something like this:
Copy code
commands:
  - |
      DISK_PATH=$(readlink ...)
      mount "${DISK_PATH}" ...
...or maybe:
Copy code
commands:
    - DISK_PATH=$(readlink ...) ;
      mount "${DISK_PATH}" ...
...to combine the variable declaration and use into one single command. Note: I haven't tested the above so it's possible there's errors in the yaml formatting, but you get the idea 🙂
Our of curiosity, how's the performance?
g
Hi @ambitious-daybreak-95996, I haven't run any tests yet. Over the next two weeks, I plan to run performance tests and observe disk behavior during virtual machine startup. Do you have any specific metrics you’d like me to collect that might be useful for you? Once I’m done, I’ll share the results, but I'm not sure exactly when that will be, as I have a lot of work at the moment. Also, regarding the variable issue — I tried various approaches, including using
EXPORT
, but I couldn’t get it to work. Either I’m missing something, or it’s not possible to use variables in the Elemental syntax. Still, I suspect I’m doing something wrong. Lastly, do you know if there’s a way to force a VM to boot from a specific disk? It would really help with testing iSCSI disks more efficiently.
a
Hi @gentle-actor-15883, please accept my apologies for taking so long to reply to this. I'm sure any performance information you're able to share would be really interesting. For reference, https://github.com/harvester/harvester/wiki/Harvester-Performance-Result is what the Harvester team has run before, AIUI just using
fio
inside running VMs. At a lower level, the Longhorn folks have https://github.com/longhorn/longhorn/wiki/Performance-Benchmark
As for forcing a VM to boot from specific disks, if you edit the VM in the harvester GUI, then you should be able to use the Boot Order buttons on the Volumes tab to set this.
g
Hi @ambitious-daybreak-95996, First of all, thank you very much for your response – I really appreciate it. Secondly, no worries at all, we all have a lot on our plates. I also want to apologize for not running any tests yet. The reason is that my colleagues and I are currently working on setting up a multipath disk configuration. I've enabled multipathing so that if one of the QSAN controllers goes down, the system can immediately failover to the other controller without losing disk connectivity. However, I’ve encountered an issue: Longhorn doesn't seem to recognize any devices other than the standard
/dev/sdX
paths. Since multipath uses
/dev/mapper/mpathX
, Longhorn isn’t detecting those multipath devices. We need to resolve this issue first. If my managers haven’t decided to move away from Harvester, I’ll be more than happy to provide you with full test results on the iSCSI disks. I’ll review the information you’ve shared and will try to get back to you with some test results as soon as I can. Thanks again!
a
Ah! So, when you're adding disks using the Harvester GUI, one of the components of harvester (node-disk-manager, or NDM) is responsible for providing the list of disks that are available. NDM does not understand how to deal with multipath disks
There's an open feature for this https://github.com/harvester/harvester/issues/6975 but I can't say when support is likely to land
What you might be able to do though, is set it up manually --
mkfs /dev/mapper/whatever
then mount that somewhere, then use the Longhorn GUI (not the Harvester GUI) to add that disk to Longhorn (see https://longhorn.io/docs/1.8.2/nodes-and-volumes/nodes/multidisk/). The trick is going to be getting those re-mounted automatically after boot. For that you'd need to add some custom yaml under
/oem
to mount the multipath devices on boot.
(LMK if you want to try this and I can write up a bit more detail - I haven't tried it myself, but I think it should be possible)
(oh, and thanks for offering to do perf testing, really appreciate it, but of course if it doesn't end up going ahead I understand completely)
g
I'm really grateful for your advice and support – I truly appreciate it. Yes, I created a YAML file that automatically attaches the disks via iSCSI. The only thing left is to add the mounting logic, but that should be fairly straightforward. What I'm mostly concerned about is whether using the Longhorn UI to add the disk (as you suggested) would actually create a Volume. Most of the solutions I was offered involved installing the CSI driver and then provisioning volumes manually. However, I mentioned that I have over 60 virtual machines, and doing this manually for each one would be a nightmare. That's why I’m looking for a solution where Longhorn itself manages the disks and handles the volume lifecycle automatically. I’d be happy to try what you suggested – it sounds promising. One more thing: I haven’t figured out how to access the Longhorn UI yet. Do you know where or how I can access it, or where I can read more about how it’s configured?
a
The mounting logic would be something like this (but I might have the indentation wrong, so be careful 🙂 ):
Copy code
stages:
  fs:
    - commands:
      - mkdir -p /path-you-want-to-mount
      - mount -o [..options here..] /dev/mapper/your-multipath-disk /path-you-want-to-mount
      - mkdir -p /another-path-you-want-to-mount
      - mount [...you get the idea...]
adding disks to nodes via the longhorn UI is essentially the same thing that happens when you add a disk using the harvester UI, i.e. the disks is then owned by longhorn, and when you create VMs or upload images or whatever in harvester, those just use the longhorn storage by default
To access the longhorn UI, go to the harvester GUI, then click the user icon in the top right hand corner, and on that menu select "preferences". On the preferences screen, check the "Enable extension developer features" box. Then go back to the main harvester gui page, and down on the bottom left, hit the "Support" link. The support page will now include a link to "Access Embedded Longhorn UI"
As for the CSI option, since Harvester v1.5.0, if you install a CSI driver, then harvester should be able to take full advantage of that when creating VMs, volumes, images etc. approximately just like you would when using longhorn storage. Provided the CSI driver supports the functionality harvester needs (see https://docs.harvesterhci.io/v1.5/advanced/csidriver) the only thing you should need to do differently than when using longhorn storage, is to choose a different storage class when e.g. uploading images or creating VMs, and the various bits of volume manipulation that need to happen should be more or less automatic.
gotta go now, but hope that helps, and I look forward to seeing how things work out for you 🙂
g
I’m thinking of trying the approach via the Longhorn UI. Okay, I’ve just accessed the Longhorn UI and now I understand your idea better — basically, I need to mount the multipath device myself and then pass the mounted path to Longhorn, instead of letting it handle the mounting automatically. Makes sense, I’ll give it a try now. However, when I tried to add a new disk through the UI, I got the following error:
Copy code
Last Transition Time: 8 days ago  
Message: multipathd is running with a known issue that affects Longhorn. See description and solution at 
<https://longhorn.io/kb/troubleshooting-volume-with-multipath>  
Reason: MultipathdIsRunning  
Status: False
From what I understand, this might be resolved by adding the following to the multipath config:
Copy code
blacklist {
    devnode "^sd[a-z0-9]+"
}
But I’m concerned this could break the iSCSI disk connections. Here's the issue: if I apply this config, multipath might stop working properly, because upon reboot, each iSCSI target reconnects as a new
sdX
device — and those names change every time. I’m thinking of working around this by dynamically resolving the correct device with something like:
Copy code
readlink -f /dev/disk/by-path/ip-<iscsi_session>-lun-0
That way I can identify the current
sdX
device and make sure it's not blacklisted, but that’s still a bit messy. I’m open to better ideas if you have any. Also, one thing I’m not sure about: does this warning from Longhorn actually impact production usage? I’m not entirely sure what role multipath plays in Longhorn internally, and whether this warning is just informational or could cause actual issues. For reference, here's what my current multipath setup looks like:
Copy code
mpathd (32017001378101100) dm-14 Qsan,XS3216  
size=500G features='0' hwhandler='1 alua' wp=rw  
|-+- policy='service-time 0' prio=50 status=active  
| `- 16:0:0:0 sde 8:64 active ready running  
`-+- policy='service-time 0' prio=50 status=enabled  
  `- 17:0:0:0 sdf 8:80 active ready running
I’ll start testing now to see if everything works as expected.
Thanks a lot for your help! Once I make some progress, I'll share what I've done.
a
Which version of harvester are you using? If it's v1.5, then (unless I'm forgetting something) each host should already have a config file
/etc/multipath/conf.d/99-longhorn.conf
which explicitly blacklists longhorn devices by vendor (IET) and product (VIRTUAL-DISK)
I don't believe LH uses multipath internally, so I hope that error/warning is just an alert that some extra blacklisting configuration may be required to ensure that multipathd doesn't grab longhorn block devices, but I can't say with 100% certainty right now (sorry!)
g
Hi, I'm currently using Harvester version 1.4.3. I found the reason for the issue: Longhorn mounts different disks across the nodes, and the
multipath
service detects all
sdX
disks, which can lead to complications. At one point, I even encountered a critical error where the OEM layer and filesystem failed to mount, but I believe I will be able to adjust the configuration so it will eventually work. In general,
multipath
can easily break things and it is tricky to configure it in a way that doesn't cause node issues, but I am hopeful I will manage it. Other than that, adding disks through the Longhorn UI works without problems, and I just need to ensure they are mounted automatically on startup, which I have some ideas on how to handle. One thing I couldn't fully understand is why disks added directly through the Longhorn UI do not have the
Provisioned
tag under the node's storage in the Harvester UI, while disks added through the Harvester UI do have this tag. Does this tag play an important role? The disk itself doesn't show errors, and I was able to create and run a VM on it without issues. Regarding multipath, as far as I have seen, Longhorn itself does not use multipath, since it is disabled by default in the GRUB configuration. However, Longhorn creates additional paths to the disks, which might confuse it if it relies on unique identifiers to find disks. Since
multipath
can use the same names and WWIDs, this could potentially be the root cause of these issues, but I am not 100% certain yet as I am still investigating.
a
OK, so the blacklist configuration that harvester v1.5.0 includes looks like this:
Copy code
blacklist { 
  device { 
    vendor "IET" 
    product "VIRTUAL-DISK"
  }
}
If you apply that to v1.4.3 it should blacklist the LH devices, without messing up any other /dev/sd[x] devices
Disks in harvester's UI are managed by a harvester component called node-disk-manager, and if you use the harvester UI to provision disks, that's what sets the provisioned flag to "true"
if you manually add disks to longhorn, that won't be set to true, because harvester didn't provision the disk
it doesn't (or shouldn't) hurt, it's just something you need to be aware/careful of -- if you manually configure disks for longhorn, you wouldn't want to later try provisioning them via harvester or you might trash what you set up manually
g
Hi @ambitious-daybreak-95996, Here is an update on the progress I’ve made, step-by-step: 1️⃣ Enabled multipath via GRUB env I enabled
multipath
by adding the necessary GRUB variables and rebooted the node. 2️⃣ Manually attached the iSCSI disk Initially, I attached the iSCSI disk manually to retrieve its
wwid
. 3️⃣ Created a temporary
multipath.conf
Copy code
multipaths {
    multipath {
        wwid 32017001378101100
        alias qsan
        path_grouping_policy multibus
        path_selector "round-robin 0"
        failback manual
        rr_weight priorities
        no_path_retry 5
    }
}
4️⃣ Added the disk via Longhorn UI Mounted the disk at:
Copy code
/var/lib/harvester/extra-disks/qsan500g
5️⃣ Created a YAML for automated mounting The YAML: • creates a persistent
multipath.conf
, • ensures the multipath device is properly mapped, • mounts it automatically after confirming device readiness. ⚠️ Important note: I disabled
multipathd
autostart because it causes race conditions when mounting
COS_GRUB
,
COS_OEM
,
COS_RECOVERY
,
COS_STATE
,
COS_PERSISTENT
, and
HARV_LH_DEFAULT
. Instead, I manually start
multipathd
after everything else is ready. Current problem needing input I cannot blacklist the OS disk in
multipath.conf
. It keeps being handled by multipath, not
sdX
, and since it is actively in use, I cannot easily blacklist it. Do you know a clean way to prevent multipath from claiming the OS disk while keeping the system stable? Here is the clean YAML I am currently using:
Copy code
name: "Configure multipath for QSAN"
stages:
  fs:
    - name: "Create multipath.conf"
      files:
        - path: /etc/multipath.conf
          content: |
            defaults {
                user_friendly_names yes
                find_multipaths yes
                path_checker tur
                fast_io_fail_tmo 2
                dev_loss_tmo 5
            }

            blacklist {
                wwid 3600605b011e738702b752a444f9c5b1a
                wwid SPCC_Solid_State_Disk_A20231228S301KG01384
                wwid ST9500325AS_6VE7J1WW
            }

            multipaths {
                multipath {
                    wwid 32017001378101100
                    alias qsan
                    path_grouping_policy group_by_prio
                    path_selector "round-robin 0"
                    failback immediate
                    rr_weight priorities
                    no_path_retry 5
                }
            }
          permissions: 0644
          owner: 0
          group: 0

    - name: "Create QSAN multipath mount script"
      files:
        - path: /usr/local/bin/qsan_mount.sh
          content: |
            #!/bin/bash

            echo "Waiting for iSCSI LUN with WWID 32017001378101100..."
            for i in {1..12}; do
                if ls /dev/disk/by-id/ | grep -q "32017001378101100"; then
                    echo "iSCSI LUN detected."
                    break
                else
                    echo "Retrying in 5s..."
                    sleep 5
                fi
            done

            echo "Starting multipathd..."
            systemctl unmask multipathd
            systemctl start multipathd

            echo "Reloading multipath maps..."
            multipath -r

            echo "Waiting for /dev/mapper/qsan..."
            for i in {1..12}; do
                if [ -e /dev/mapper/qsan ]; then
                    echo "QSAN multipath device ready."
                    fsck -y /dev/mapper/qsan || true
                    if ! mountpoint -q /var/lib/harvester/extra-disks/qsan500g; then
                        mkdir -p /var/lib/harvester/extra-disks/qsan500g
                        mount /dev/mapper/qsan /var/lib/harvester/extra-disks/qsan500g
                    else
                        echo "QSAN disk already mounted."
                    fi
                    break
                else
                    echo "QSAN device not ready, retrying in 5s..."
                    sleep 5
                fi
            done

            echo "Multipath setup complete."
          permissions: 0755
          owner: 0
          group: 0

  network:
    - name: "Run QSAN multipath mount script"
      commands:
        - bash /usr/local/bin/qsan_mount.sh
Next steps: I am now starting tests to observe the system’s behavior when a controller path failure occurs on the QSAN side to see how gracefully multipath handles reconnections and failover. Let me know if you have any suggestions on the OS disk blacklisting issue or improvements for stability during path failures. Thanks!
👀 1
Hi , Following up on my previous tests, I wanted to share additional insights regarding the initial failover testing on our QSAN setup. During my first test, I simply disabled one of the server ports connected to a QSAN controller to trigger a failover scenario. Initially, the test was unsuccessful. However, after repeated testing, I identified that the default values for the following iSCSI parameters were too high, causing delays in detecting link failures: •
node.conn[0].timeo.noop_out_timeout
(default 15) •
node.conn[0].timeo.noop_out_interval
(default 10) •
node.session.timeo.replacement_timeout
(default 120) To speed up failure detection and failover, I modified them as follows: •
node.conn[0].timeo.noop_out_timeout
from 15 to 2
node.conn[0].timeo.noop_out_interval
from 10 to 1
node.session.timeo.replacement_timeout
from 120 to 2 I applied these changes using this YAML snippet:
Copy code
name: "ISCSI configuration"
stages:
  boot:
    - name: "Set replacement_timeout in iscsid.conf"
      commands:
        - iscsiadm -m node --op update -n node.conn[0].timeo.noop_out_timeout -v 2
        - iscsiadm -m node --op update -n node.conn[0].timeo.noop_out_interval -v 1
        - iscsiadm -m node --op update -n node.session.timeo.replacement_timeout -v 2
Additionally, I adjusted the multipath configuration for faster path failover: • f`ast_io_fail_tmo` • set to 2 seconds
dev_loss_tmo
set to 5 seconds This allows the system to more quickly mark a path as faulty and switch to the operational path without interrupting workloads. After applying these changes, I re-ran the tests, and the VM continued operating without interruption. The tests did not cause the VM to pause, disconnect, or crash. For testing, I used Rocky Linux 9 with the fio package: First test (lighter workload):
Copy code
[global]
ioengine=libaio
direct=1
runtime=600
time_based
group_reporting
numjobs=1
iodepth=8
size=500M

[randrw]
rw=randrw
rwmixread=70
bs=4k
S*econd test (heavier workload):*
Copy code
[global]
ioengine=libaio
direct=1
runtime=600
time_based
group_reporting
numjobs=1
iodepth=32
size=1G

[randread]
rw=randread
bs=4k

[randwrite]
rw=randwrite
bs=4k
During the tests, I manually toggled the server ports connected to one of the QSAN controllers multiple times, and failover was smooth after reducing the timeouts. Observations: • Kubernetes generates a very high number of I/O operations, and if the disk connection is not restored within 5–7 seconds, the node may mark the volume as lost. • Since the volumes are iSCSI devices, they disconnect quickly, requiring faster reaction times than the default Kubernetes behavior. • I have not yet tested this in a real cluster environment with multiple virtual machines performing different workloads on the disks simultaneously. If issues occur under such conditions, one possible approach could be reducing the timeouts further to 1 second to ensure faster failover. However, this carries the risk of treating transient micro-outages as failures and will increase server load due to more frequent connection checks.