https://rancher.com/ logo
Title
f

full-crayon-745

01/17/2023, 11:06 AM
Hi guys, We have 1 host with 4 Nvidia 3090 GPUs with passthrough enabled and 1 VM with these GPUs attached that was working well. Recently, I switched off the VM and restarted the host. Once, the host returned from maintenance, I started the VM. But the VM did not start and Harvester returned an error message, complaining about “Insufficient nvidia.com/GA102_GEFORCE_RTX_3090”. I created a new VM and attached the GPUs to it, but the VM will not start and I’m getting this error. combined from similar events): server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2023-01-17T10:18:36.433107Z qemu-system-x86_64: -device vfio-pci,host=0000:81:00.0,id=ua-hostdevice-GA102_GEFORCE_RTX_30901,bus=pci.8,addr=0x0: vfio 0000:81:00.0: group 30 is not viable\nPlease ensure all devices within the iommu_group are bound to their vfio bus driver.')" Not sure what I’m doing wrong here, since the GPUs worked once already. Anyone here worked with Nvidia GPUs and successfully managed to attach the GPUs to a VM? Thanks
l

lively-zebra-61132

01/17/2023, 12:53 PM
sorry, I'm new to slack
Hi, sorry about that I am new to slack... What you are describing sounds like a problem with Kubernetes scheduling. The last time I worked with NVIDIA GPUs and Kubernetes there was a "Nvidia Device plugin" which takes care of letting Kubernetes know how many GPUs are available in your cluster. It could be that this plugin is not running correctly so Kubernetes thinks there are no GPUs available and won't schedule your VM because of it.
f

full-crayon-745

01/17/2023, 1:41 PM
I've run
kubectl describe node inog02
and this was returned:
Capacity:
  ...  
<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>:                  1
  <http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>:  4
  ...
Allocatable:
  ...
  <http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>:                  1
  <http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>:  4
  ...
It seems the node is missing the other 3 GPUs. We restarted the host with GPUs and now running the same command returns:
Capacity:
  ...
<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>:                  0
  <http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>:  0
...
Allocatable:
  ...<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>:                  0
  <http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>:  0
  ...
It looks like it is completely missing the GPUs
Thanks @lively-zebra-61132 for the suggestion about the device plugin. As this plugin is not mentioned in the documentation regarding PCI passthrough. I am wondering whether Harvester has its own device plugin and whether I might mess up something by running the plugin from Nvidia. For the past few hours, I tried disabling passthrough and enabling it again for the GPUs. But I still see 0 devices under Capacity and under Allocatable. I assume that Harvester is not completely aware of these devices after the reboot. Is there any way how to force the discovery of these devices? @limited-breakfast-50094 Any ideas what I might be doing wrong? Thanks
We managed to solve the problem by disabling the PCI passthrough on the GPUs. Then deleting the pcidevices-controller pod on the host with the GPUs.
kubectl delete pod -n harvester-system harvester-pcidevices-controller-f29sr --grace-period=0 --force
New pod has been created by Harvester and now it found all 4 of the GPUs. Thanks all for help
l

limited-breakfast-50094

01/18/2023, 5:28 PM
Hi @full-crayon-745 my PR will fix this, it introduces our custom DevicePlugin that solves this issue. The rebooting issue, the out-of-date allocatable counts, those happen because 1.1.0+ (pre 1.1.2) don't use DevicePlugins. We directly modify the KubeVirt config, which is awkward and unreliable in practice. The NVIDIA DevicePlugin can work, but it's not supported. This DevicePlugin PR will be supported, and also customized for Harvester's use cases. It allows more than just NVIDIA devices, for example.
f

full-crayon-745

01/19/2023, 8:05 AM
Great to hear that. Thanks Tobi