I m having some trouble installing NueVector due to RWX volu Rancher Users #harvester

I'm having some trouble installing NueVector due t...

worried-state-78253

06/19/2025, 11:57 AM

I'm having some trouble installing NueVector due to RWX volume requirements, however - I see RWX should be available now - accoring to - https://github.com/harvester/harvester/issues/1992 I'm running Harvester 1.4.1, Rancher 2.11.2, cluster has harvester-csi-driver:0.1.2300. Loosely following the issue above, in harvester I have a storage class called "ssd-rwx" as follows -

Copy code

yaml
kind: StorageClass
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
metadata:
  name: ssd-rwx
provisioner: <http://driver.longhorn.io|driver.longhorn.io>
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  diskSelector: ssd
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5"

SSD just selectes the right disks, so I have a disk with the label "ssd" and it is available in the UI. In the target cluster I added the storage clas to match using the UI, YAML as follows -

Copy code

yaml
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
kind: StorageClass
metadata:
  name: ssd-rwx
  fields:
    - ssd-rwx
    - <http://driver.harvesterhci.io|driver.harvesterhci.io>
    - Delete
    - Immediate
    - false
    - 54m
allowVolumeExpansion: false
parameters:
  hostStorageClass: ssd-rwx
provisioner: <http://driver.harvesterhci.io|driver.harvesterhci.io>
reclaimPolicy: Delete
volumeBindingMode: Immediate
__clone: true

I can create PVC's using this storage class as RWX - however when I try to mount them in a pod, I get the following error:

Copy code

plaintext
MountVolume.MountDevice failed for volume "pvc-33157c69-b802-4bb5-9d13-ae00b0b4be76" : rpc error: code = DeadlineExceeded desc = context deadline exceeded

The cluster nodes appear to have nfs-client installed - from one of the nodes -

Copy code

bash
cluster-3-worker-bwjtp-7qxmw:/ # nfsstat -c
Client rpc stats:
calls      retrans    authrefrsh
1978       0          1978    

Client nfs v4:
null             read             write            commit           open             
9         0%     6         0%     0         0%     0         0%     4         0%     
open_conf        open_noat        open_dgrd        close            setattr          
0         0%     5         0%     0         0%     9         0%     4         0%     
fsinfo           renew            setclntid        confirm          lock             
28        1%     0         0%     0         0%     0         0%     0         0%     
lockt            locku            access           getattr          lookup           
0         0%     0         0%     28        1%     116       5%     30        1%     
lookup_root      remove           rename           link             symlink          
7         0%     2         0%     0         0%     0         0%     0         0%     
create           pathconf         statfs           readlink         readdir          
0         0%     21        1%     1561     78%     0         0%     20        1%     
server_caps      delegreturn      getacl           setacl           fs_locations     
49        2%     5         0%     0         0%     0         0%     0         0%     
rel_lkowner      secinfo          fsid_present     exchange_id      create_session   
0         0%     0         0%     0         0%     2         0%     1         0%     
destroy_session  sequence         get_lease_time   reclaim_comp     layoutget        
0         0%     63        3%     0         0%     1         0%     0         0%     
getdevinfo       layoutcommit     layoutreturn     secinfo_no       test_stateid     
0         0%     0         0%     0         0%     7         0%     0         0%     
free_stateid     getdevicelist    bind_conn_to_ses destroy_clientid seek             
0         0%     0         0%     0         0%     0         0%     0         0%     
allocate         deallocate       layoutstats      clone            
0         0%     0         0%     0         0%     0         0%

I see that on github you mention updating the CSI driver to the version on teh repo - "Upgrade csi-driver from 0.1.18 to 0.1.19 (with 0.2.0 image)" however the version provisoined by harvester seems to be newer than this so should version 0.1.2300 work? The issue is closed so just posting here, hoping to get some clarity on this as cant install NueVector without RWX support. Issue said milestone was 1.4.0 so have I missed something. @salmon-city-57654's verfy steps are the ones I was following - but I feel like I'm hitting a dead end... Are the versions I'm using supported, or have I missed something?

brainy-kilobyte-33711

06/19/2025, 12:41 PM

Have you seen the latest docs https://docs.harvesterhci.io/v1.4/rancher/csi-driver/#rwx-volumes-support You will need a storage network and to attach a NIC in the storage network to the VMs

worried-state-78253

06/19/2025, 1:00 PM

I did see that, however the host cluster is running harvester 1.4.1, will take a quick look again tho. Also upgrading to 1.5 is our plan, but we'll go through the minot versions over the weekend

worried-state-78253

06/19/2025, 1:01 PM

Also w have a dedicated storage network already, but will double check that too

brainy-kilobyte-33711

06/19/2025, 1:03 PM

We have it "working" with 1.4.1 but there is a big issue you should be aware of https://github.com/harvester/harvester/issues/7796

brainy-kilobyte-33711

06/19/2025, 1:04 PM

oh sorry, I misread. This isn't guest RWX, ignore me

brainy-kilobyte-33711

06/19/2025, 1:05 PM

Or are you trying to do this in a guest cluster pod?

worried-state-78253

06/19/2025, 1:06 PM

We've a cluster installed on the harvester, in which we were trying to install nuevector, this requires RWX volumes - I hit this problem before and gave up in earlier harvester version and ended up running a dedicated NFS VM to service shared volumes, but now I need to sort it

worried-state-78253

06/19/2025, 1:07 PM

So for clarity - we have a separate rancher pool (bare metal low power machines) which manages a harvester pool (high power machines) on which we have used rancher to deploy a RKE2 cluster with open-leap-micro as the base image (seems to have NFS client pre installed and it provisions nicely).

brainy-kilobyte-33711

06/19/2025, 1:10 PM

Thanks, so you might hit the issue I linked. We also had issues with the NFS share that backs the RWX volume "locking up" from time to time and all writes failed until we restarted the guest cluster pods. Sadly overall not a reliable experience and we moved away from it.

worried-state-78253

06/19/2025, 1:10 PM

Bit of a blocker for using nuevector then

worried-state-78253

06/19/2025, 1:14 PM

Storage Network for RWX Volume Enabled: wasnt toggled in longhorn

worried-state-78253

06/19/2025, 1:15 PM

damn it... says all volumes must be detached before we can apply that!!!!

worried-state-78253

06/19/2025, 1:15 PM

so thats pull the handbrake!

brainy-kilobyte-33711

06/19/2025, 1:15 PM

ah yes, that was a pain

brainy-kilobyte-33711

06/19/2025, 1:15 PM

full shutdown

worried-state-78253

06/19/2025, 1:16 PM

one for the weekend... will see if i can use existing nfs for nuevector install in the meantime

brainy-kilobyte-33711

06/19/2025, 1:26 PM

if you are shutting down make sure you have the storage network exclude IPs in the storage network settings in harvester, as that will also require a shutdown of all VMs

worried-state-78253

06/19/2025, 1:26 PM

do i need to do that tho - as the vms use a completely different subnet

brainy-kilobyte-33711

06/19/2025, 1:27 PM

You need to attach a NIC from the storage network to the VMs hosting rke2 and then give that NIC an IP from the excluded range (so it doesn't conflict with anything harvester is doing)

brainy-kilobyte-33711

06/19/2025, 1:29 PM

e.g. our one of our rke2 VMs looks like

brainy-kilobyte-33711

06/19/2025, 1:29 PM

with NICs

Copy code

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0e:45:ec:4d:d6:bd brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 10.150.2.27/23 brd 10.150.3.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0e:24:21:55:26:36 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 10.150.11.1/22 brd 10.150.11.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever

worried-state-78253

06/19/2025, 1:30 PM

ahh ic

brainy-kilobyte-33711

06/19/2025, 1:30 PM

where the eth1 IP we have set from the excluded range in the storage network

worried-state-78253

06/19/2025, 1:32 PM

i get you, makes sense - on the other end we dedicated separate 10g ports for storage for each harvester node, so this needs to be routable to the pods then... think thats what your saying... 🙂 almost fudged this with our dedicated NFS by manually creating the PV...

worried-state-78253

06/19/2025, 1:33 PM

Thats also painful though given the nodes in the RKE2 cluster are many...

worried-state-78253

06/19/2025, 1:34 PM

also another consideration for when you scale the cluster up .... wont that be better in the cloud-init for the worker templates?

worried-state-78253

06/19/2025, 1:35 PM

Just seen on the nodes you can add networks there - so maybe a case of defining the storage network in rancehr so you can allocate it to the extra nic from there

brainy-kilobyte-33711

06/19/2025, 1:36 PM

We setup the guest rke2 cluster ourselves with ansible and our workload size is fairly static, all the IPs are set via ansible. Which isn't very dynamic but works for our use case. I don't think there is DHCP in the storage network to hook into but could be wrong.

brainy-kilobyte-33711

06/19/2025, 1:37 PM

and the VMs with terraform where we add a separate NIC config.

worried-state-78253

06/19/2025, 1:38 PM

ahh - i've been driving this from rancher itself, will give this a play when I get a chance to shut down fully

worried-state-78253

06/19/2025, 1:43 PM

fixed it for now - manually editing the pvc / pv that neuvector uses make it possible for me to use our dedicated NFS server, its not perfect and only there for development - ultimately not a million miles away from what longhorn would provision, only technically not as private or as tidy! It'll do for now while we find our way with this product. I'll take note of your points @brainy-kilobyte-33711 thats super helpful and will let you know how we get on, soonest I can likely try this will be the weekend - but I've quite a lot on and could do without shutting everything down just yet...

brainy-kilobyte-33711

06/19/2025, 1:48 PM

good luck!

worried-state-78253

06/19/2025, 1:49 PM

Now i need to write all this up... thanks again neuvector is now running - all be it on poor mans nfs for now, I'll let you know how i get on with the RWX "proper" option later... though from the sounds of it I might be better off sticking with what we have!

brainy-kilobyte-33711

06/19/2025, 1:49 PM

Yeah, it might be more reliable with your one until that issue I linked is fixed

worried-state-78253

06/19/2025, 1:52 PM

Might lay the ground work in the meantime - got to do 1.4.1->1.5.0 yet

worried-state-78253

06/19/2025, 1:53 PM

Just got the rancher/k8s side up-to-date and ready so next steps should be good

🤞 1

bumpy-portugal-40754

06/27/2025, 5:04 PM

You know that you can install Neuvector also with a RWO volume? It's not ideal, but it's better than emptydir...

worried-state-78253

06/27/2025, 7:06 PM

Well - read that it was a requirement for RWX and as stated, I got it working using NFS prepared pv in advance and updated the values for Helm, so we have persistence working just fine.

2 Views

Open in Slack

Previous Next