I'm having some trouble installing NueVector due t...
# harvester
w
I'm having some trouble installing NueVector due to RWX volume requirements, however - I see RWX should be available now - accoring to - https://github.com/harvester/harvester/issues/1992 I'm running Harvester 1.4.1, Rancher 2.11.2, cluster has harvester-csi-driver:0.1.2300. Loosely following the issue above, in harvester I have a storage class called "ssd-rwx" as follows -
Copy code
yaml
kind: StorageClass
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
metadata:
  name: ssd-rwx
provisioner: <http://driver.longhorn.io|driver.longhorn.io>
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  diskSelector: ssd
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5"
SSD just selectes the right disks, so I have a disk with the label "ssd" and it is available in the UI. In the target cluster I added the storage clas to match using the UI, YAML as follows -
Copy code
yaml
apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
kind: StorageClass
metadata:
  name: ssd-rwx
  fields:
    - ssd-rwx
    - <http://driver.harvesterhci.io|driver.harvesterhci.io>
    - Delete
    - Immediate
    - false
    - 54m
allowVolumeExpansion: false
parameters:
  hostStorageClass: ssd-rwx
provisioner: <http://driver.harvesterhci.io|driver.harvesterhci.io>
reclaimPolicy: Delete
volumeBindingMode: Immediate
__clone: true
I can create PVC's using this storage class as RWX - however when I try to mount them in a pod, I get the following error:
Copy code
plaintext
MountVolume.MountDevice failed for volume "pvc-33157c69-b802-4bb5-9d13-ae00b0b4be76" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
The cluster nodes appear to have nfs-client installed - from one of the nodes -
Copy code
bash
cluster-3-worker-bwjtp-7qxmw:/ # nfsstat -c
Client rpc stats:
calls      retrans    authrefrsh
1978       0          1978    

Client nfs v4:
null             read             write            commit           open             
9         0%     6         0%     0         0%     0         0%     4         0%     
open_conf        open_noat        open_dgrd        close            setattr          
0         0%     5         0%     0         0%     9         0%     4         0%     
fsinfo           renew            setclntid        confirm          lock             
28        1%     0         0%     0         0%     0         0%     0         0%     
lockt            locku            access           getattr          lookup           
0         0%     0         0%     28        1%     116       5%     30        1%     
lookup_root      remove           rename           link             symlink          
7         0%     2         0%     0         0%     0         0%     0         0%     
create           pathconf         statfs           readlink         readdir          
0         0%     21        1%     1561     78%     0         0%     20        1%     
server_caps      delegreturn      getacl           setacl           fs_locations     
49        2%     5         0%     0         0%     0         0%     0         0%     
rel_lkowner      secinfo          fsid_present     exchange_id      create_session   
0         0%     0         0%     0         0%     2         0%     1         0%     
destroy_session  sequence         get_lease_time   reclaim_comp     layoutget        
0         0%     63        3%     0         0%     1         0%     0         0%     
getdevinfo       layoutcommit     layoutreturn     secinfo_no       test_stateid     
0         0%     0         0%     0         0%     7         0%     0         0%     
free_stateid     getdevicelist    bind_conn_to_ses destroy_clientid seek             
0         0%     0         0%     0         0%     0         0%     0         0%     
allocate         deallocate       layoutstats      clone            
0         0%     0         0%     0         0%     0         0%
I see that on github you mention updating the CSI driver to the version on teh repo - "Upgrade csi-driver from 0.1.18 to 0.1.19 (with 0.2.0 image)" however the version provisoined by harvester seems to be newer than this so should version 0.1.2300 work? The issue is closed so just posting here, hoping to get some clarity on this as cant install NueVector without RWX support. Issue said milestone was 1.4.0 so have I missed something. @salmon-city-57654's verfy steps are the ones I was following - but I feel like I'm hitting a dead end... Are the versions I'm using supported, or have I missed something?
b
Have you seen the latest docs https://docs.harvesterhci.io/v1.4/rancher/csi-driver/#rwx-volumes-support You will need a storage network and to attach a NIC in the storage network to the VMs
w
I did see that, however the host cluster is running harvester 1.4.1, will take a quick look again tho. Also upgrading to 1.5 is our plan, but we'll go through the minot versions over the weekend
Also w have a dedicated storage network already, but will double check that too
b
We have it "working" with 1.4.1 but there is a big issue you should be aware of https://github.com/harvester/harvester/issues/7796
oh sorry, I misread. This isn't guest RWX, ignore me
Or are you trying to do this in a guest cluster pod?
w
We've a cluster installed on the harvester, in which we were trying to install nuevector, this requires RWX volumes - I hit this problem before and gave up in earlier harvester version and ended up running a dedicated NFS VM to service shared volumes, but now I need to sort it
So for clarity - we have a separate rancher pool (bare metal low power machines) which manages a harvester pool (high power machines) on which we have used rancher to deploy a RKE2 cluster with open-leap-micro as the base image (seems to have NFS client pre installed and it provisions nicely).
b
Thanks, so you might hit the issue I linked. We also had issues with the NFS share that backs the RWX volume "locking up" from time to time and all writes failed until we restarted the guest cluster pods. Sadly overall not a reliable experience and we moved away from it.
w
Bit of a blocker for using nuevector then
Storage Network for RWX Volume Enabled: wasnt toggled in longhorn
damn it... says all volumes must be detached before we can apply that!!!!
so thats pull the handbrake!
b
ah yes, that was a pain
full shutdown
w
one for the weekend... will see if i can use existing nfs for nuevector install in the meantime
b
if you are shutting down make sure you have the storage network exclude IPs in the storage network settings in harvester, as that will also require a shutdown of all VMs
w
do i need to do that tho - as the vms use a completely different subnet
b
You need to attach a NIC from the storage network to the VMs hosting rke2 and then give that NIC an IP from the excluded range (so it doesn't conflict with anything harvester is doing)
e.g. our one of our rke2 VMs looks like
with NICs
Copy code
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0e:45:ec:4d:d6:bd brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 10.150.2.27/23 brd 10.150.3.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 0e:24:21:55:26:36 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    inet 10.150.11.1/22 brd 10.150.11.255 scope global noprefixroute eth1
       valid_lft forever preferred_lft forever
w
ahh ic
b
where the eth1 IP we have set from the excluded range in the storage network
w
i get you, makes sense - on the other end we dedicated separate 10g ports for storage for each harvester node, so this needs to be routable to the pods then... think thats what your saying... 🙂 almost fudged this with our dedicated NFS by manually creating the PV...
Thats also painful though given the nodes in the RKE2 cluster are many...
also another consideration for when you scale the cluster up .... wont that be better in the cloud-init for the worker templates?
Just seen on the nodes you can add networks there - so maybe a case of defining the storage network in rancehr so you can allocate it to the extra nic from there
b
We setup the guest rke2 cluster ourselves with ansible and our workload size is fairly static, all the IPs are set via ansible. Which isn't very dynamic but works for our use case. I don't think there is DHCP in the storage network to hook into but could be wrong.
and the VMs with terraform where we add a separate NIC config.
w
ahh - i've been driving this from rancher itself, will give this a play when I get a chance to shut down fully
fixed it for now - manually editing the pvc / pv that neuvector uses make it possible for me to use our dedicated NFS server, its not perfect and only there for development - ultimately not a million miles away from what longhorn would provision, only technically not as private or as tidy! It'll do for now while we find our way with this product. I'll take note of your points @brainy-kilobyte-33711 thats super helpful and will let you know how we get on, soonest I can likely try this will be the weekend - but I've quite a lot on and could do without shutting everything down just yet...
b
good luck!
w
Now i need to write all this up... thanks again neuvector is now running - all be it on poor mans nfs for now, I'll let you know how i get on with the RWX "proper" option later... though from the sounds of it I might be better off sticking with what we have!
b
Yeah, it might be more reliable with your one until that issue I linked is fixed
w
Might lay the ground work in the meantime - got to do 1.4.1->1.5.0 yet
Just got the rancher/k8s side up-to-date and ready so next steps should be good
🤞 1