https://rancher.com/ logo
Title
l

limited-breakfast-50094

09/12/2022, 5:54 PM
Thread for Kubevirt hack (to avoid filling up the channel)
Posting here
f

future-address-23425

09/12/2022, 5:54 PM
Sure
l

limited-breakfast-50094

09/12/2022, 6:27 PM
Still getting my VM set up so it might be a little while
f

future-address-23425

09/12/2022, 6:32 PM
No problem! Let me know if this works for you, too. I need to check if the issue perists on kubevirt 0.56.0 (current latest) and figure out why it doesn't affect minikube deployments. I will probably put together a PR on kubevirt.
👍 1
l

limited-breakfast-50094

09/14/2022, 5:46 PM
Okay I got my KubeVirt set up, VM is getting assigned to node with GPU, but I've still got some config issues, documenting everything and I'll let you know as soon as I try your workaround
👌🏼 1
I booted up a VM, then shut it down. Then I added:
hostDevices:
          - deviceName: <http://nvidia.com/GeForce1060|nvidia.com/GeForce1060>
            name: gpu1
to the
spec.domain.devices
part of the VM's YAML config
When I booted up I got this error: failed to setup container for group 14: No available IOMMU models
ohh, I wonder if it's the GPU/Audio thing
I passed through my VGA device and Audio card, I need to tell Kubevirt to allow the audio to get passed through as well
f

future-address-23425

09/14/2022, 7:10 PM
you use the nvidia kubevirt device plugin?
l

limited-breakfast-50094

09/14/2022, 7:11 PM
no
I have externalResourceProvider set to false
f

future-address-23425

09/14/2022, 7:14 PM
Ok, it makes sense if your devices have a single bus address. It's not always the case, some FPGA boards for example expose their mgmt and user pfs separately.
l

limited-breakfast-50094

09/14/2022, 7:18 PM
Yeah, I'll have to make some tutorials for certain hardware and add troubleshooting docs
Still getting the no available IOMMU models. I don't have the
vfio_iommu_type1
module loaded
f

future-address-23425

09/14/2022, 7:27 PM
So, the hack worked right?
I'm not sure about that new issue, but I guess you can try just passing through sth simple, e.g. the USB controller.
It should be loaded together with
vfio-pci
.
l

limited-breakfast-50094

09/14/2022, 7:30 PM
Yeah, although GPU passthrough is the thing all our customers want, so I want to work through a real GPU example before I go forward. after loading the vfio_iommu_type1 module, I have a different error (progress?)
failed to setup container for group 14: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x55ead073e260, 0x100000000, 0x79c00000, 0x7f64e2200000) = -12 (Cannot allocate memory)')"
is that a capability problem?
another manifestation of the no capability for SYS_RESOURCE?
f

future-address-23425

09/14/2022, 7:34 PM
Check please the virt launcher pod, how many containers are in there? Is the "compute" container the one that ligs this? Does it have the SYS_RESOURCE capability?
l

limited-breakfast-50094

09/14/2022, 7:36 PM
securityContext:
    capabilities:
      add:
        - NET_BIND_SERVICE
        - SYS_PTRACE
        - SYS_NICE
nope, can I just edit it live?
also it's not a priviledged container
I tried editing the launch container and adding it but that didn't validate
I'll try your kubevirt workaroudn
f

future-address-23425

09/14/2022, 7:40 PM
Did you deploy my yaml?
It will automatically add this capability.
l

limited-breakfast-50094

09/14/2022, 7:40 PM
Doing that now
f

future-address-23425

09/14/2022, 7:41 PM
To every virt-launcher pod.
Sure, after that simply restart the vm.
l

limited-breakfast-50094

09/14/2022, 7:44 PM
Hey that worked!
capabilities:
      add:
        - NET_BIND_SERVICE
        - SYS_PTRACE
        - SYS_NICE
        - SYS_RESOURCE
Pod is running
VM is running, I mean
f

future-address-23425

09/14/2022, 7:45 PM
The "correct" way to do it is to add in the kubevirt renderer a condition that checks if the vmi contains host devices and append that CAP to the pod.
l

limited-breakfast-50094

09/14/2022, 7:45 PM
I can do that, I'll go in and see if I can see the PCI device
f

future-address-23425

09/14/2022, 7:46 PM
It's gonna be there 😎 !
l

limited-breakfast-50094

09/14/2022, 7:46 PM
Good news:
tobi@pcitest4:~$ diff before.txt after.txt 
9a10,11
> 00:02.7 PCI bridge [0604]: Red Hat, Inc. QEMU PCIe Root port [1b36:000c]
> 00:03.0 PCI bridge [0604]: Red Hat, Inc. QEMU PCIe Root port [1b36:000c]
18c20,22
< 06:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon [1af4:1045] (rev 01)
---
> 06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
> 07:00.0 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
> 08:00.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon [1af4:1045] (rev 01)
I can see my NVIDIA device in the VM!
f

future-address-23425

09/14/2022, 7:46 PM
🎉
l

limited-breakfast-50094

09/14/2022, 7:46 PM
Okay, driver time
You rock elias!
:inaccel: 1
Hey we could also add the capability using the UI too
f

future-address-23425

09/14/2022, 9:21 PM
Unfortunately there is no way to add it through the VMI, which is the smaller KubeVirt component that we can manipulate. That's why I had to build a mutator to do this, in the rendered pod, under the hood.
I think the best way to do this is to resolve it upstream on KubeVirt.
l

limited-breakfast-50094

09/15/2022, 2:28 AM
that screenshot was there from earlier, my slack crashed
I was thinking if we could edit the VMI then we might be able to mutate it from the UI
its' just changing the status though right?
f

future-address-23425

09/15/2022, 7:40 AM
We can't do it on the VMI. I think we should first try bumping KubeVirt (v0.57.0). It may use a newer libvirt version that doesn't need that CAP. It has happened again in the past.
Otherwise, if you could help me raise an issue at KubeVirt first, I have almost prepared the PR that fixes it!
l

limited-breakfast-50094

09/15/2022, 8:57 PM
Oh nice! How can I help?
KubeVirt 0.57 changelog doesn't mention libvirt
f

future-address-23425

09/15/2022, 9:48 PM
I think both v0.54.x and v0.57.x use libvirt v8.0.0, so they should probably behave the same, but let me check this first. I'll keep you posted.
l

limited-breakfast-50094

09/15/2022, 9:50 PM
I just looked through a bunch of 0.57 PRs and found no hint of a version bump in libvirt
f

future-address-23425

09/15/2022, 9:53 PM
What I don't get is why when I deploy kubevirt on minikube, pci passthrough works without the hack.
l

limited-breakfast-50094

09/15/2022, 9:58 PM
what do the minikube pods' securityContext look like?
f

future-address-23425

09/15/2022, 10:00 PM
It's identical (without the CAP_SYS_RESOURCE)!
⁉️ 1
l

limited-breakfast-50094

09/15/2022, 10:31 PM
does minikube use Docker for it's container engine?
f

future-address-23425

09/15/2022, 10:32 PM
yes, that's a core difference
l

limited-breakfast-50094

09/15/2022, 10:32 PM
Maybe there's some convenience setting that is giving the process that capability?
f

future-address-23425

09/15/2022, 10:33 PM
if it does, it does it under the hood, it escalates somehow
FYI: I tried KubeVirt v0.49.0 both on k3s and rke2, and PCI passthrough works normally (without the SYS_RESOURCE capability hack). I'm now looking for OS differences (apparmor, etc).
l

limited-breakfast-50094

09/16/2022, 4:21 PM
Could it be something in SLES?
its' the base os image in harvester