This message was deleted.
# harvester
a
This message was deleted.
s
yeah, I only found the document https://docs.harvesterhci.io/v1.3/advanced/vgpusupport for VM instead of the guest cluster management. cc @great-bear-19718, do you have any suggestions for the vGPU with cluster management? (for guest cluster)
g
rancher 2.8.4 allows you define vgpu for downstream cluster when creating machine pool in the advanced settings
👍 1
🙏 1
💯 1
i think it is rancher 2.8.4
n
K i got my rancher updated and I now see that option thanks. Now I am having some issues with the SR-IOV GPU Devices. I have my A5000 ada's installed and the PCI Devices can see the GPU's but the SR-IOV GPU Devices are not listed. I have a AsRock Rack X399D8A-2T Mobo and I have IOMMU and SR-IOV both enabled. What was your experiance like getting the GPU's to show? Is there any prep to the Harvester Nodes that need to be done still? I installed Harvester using PXE do I need to install drivers on those nodes?
👀 1
g
you need to have the nvidia driver toolkit addon enabled.. and image present
the image is not packaged in the os so needs to be pulled from docker hub or your private repo
and when enabling addon you need to provide details of http endpoint where the driver is located
once this is done.. the vgpus will show up when sriovgpu device is enabled
t
Glad you like it. Keep in mind that vgpu and sr-iov are different technologies. It is really up to which one is supported by your gpu.
n
is the proper licence form Nvidia the NVIDIA RTX Virtual Workstation (vWS)?
t
not sure. I am using an a2000 that doesn’t need a license.
n
Thx @thousands-advantage-10804 I am having issues getting with the harvester-nvidia-driver-toolkit. I see the helm chart was deployed to the cluster but how do I ensure that the vgpu/NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run file gets installed to the 2 nodes with the GPU's out of the 8 I have for harvester?
t
do all 8 nodes have the same gpus?
n
No out of the 8 nodes only 2 have a GPU and those 2 both have A5000 ADA's
t
OH. cool.
you should be good.
n
Those 2 nodes though. I can't seem to get the nvidia-driver-toolkit to trigger the install of the driver on them. I tried even uninstalling Harvester on those nodes and reinstalling and when looking at the nodes after there is still no driver installed so sr-iov is not being turned on.
I do have the driverLocation end point pointing at the file correctly
t
do the cards show up in the pci devices page?
n
Yes
t
you should be good then.
n
that will only be good for VM's correct? I need to pull these into my RKE2 deployments into pods which needs vGPU Devices Populated which only will be populated once SR-IOV is working and the baremetal harvester nodes have the drivers installed? Am I correct in thinking this?
t
Yes.. you need to pass the GPU to the VM. and then pass it to the pod.
n
So I found this label that needs to be added to the nodes to get the nvidia tool kit to try the install. sriovgpu.harvesterhci.io/driver-needed:true I added the label to the nodes and got a log from the DaemondSet but I do not know if it was successful in Building?
Copy code
Installing nvidia driver from <http://192.168.210.5:8080/vgpu/NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run>
2024-07-22T16:09:21.191550768Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2024-07-22T16:09:21.191638804Z                                  Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 50.7M  100 50.7M    0     0   264M      0 --:--:-- --:--:-- --:--:--  264M
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.90.05..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................

2024-07-22T16:09:23.246257441Z Welcome to the NVIDIA Software Installer for Unix/Linux

2024-07-22T16:09:23.246268261Z Detected 64 CPUs online; setting concurrency level to 32.
Scanning the initramfs with lsinitrd...
2024-07-22T16:09:23.246280174Z /usr/bin/lsinitrd requires a file path argument, but none was given.
/usr/bin/lsinitrd requires a file path argument, but none was given.
2024-07-22T16:09:23.246290533Z /usr/bin/lsinitrd requires a file path argument, but none was given.
/usr/bin/lsinitrd requires a file path argument, but none was given.
2024-07-22T16:09:23.246301334Z Initramfs scan failed.
This system requires use of the NVIDIA open kernel modules; these will be selected by default.
2024-07-22T16:09:23.246311673Z Installing NVIDIA driver version 550.90.05.
Performing CC sanity check with CC="/usr/bin/cc".
2024-07-22T16:09:23.246322063Z Performing CC check.
Kernel source path: '/usr/src/linux'
2024-07-22T16:09:23.246331591Z 
Kernel output path: '/usr/src/linux'
2024-07-22T16:09:23.246341119Z 
2024-07-22T16:09:23.246346830Z Performing Compiler check.
Performing Dom0 check.
2024-07-22T16:09:23.246356828Z Performing Xen check.
2024-07-22T16:09:23.246361998Z Performing PREEMPT_RT check.
2024-07-22T16:09:23.246366857Z Performing vgpu_kvm check.
Cleaning kernel module build directory.
2024-07-22T16:09:23.246377227Z Building kernel modules: 

  [                              ]   0%
  [#                             ]   0%
t
interesting. I don’t have a card that supports sr-iov.