Hello! I’ve run into an issue with PCIPassthrough....
# harvester
s
Hello! I’ve run into an issue with PCIPassthrough. I’m running Harvester 1.4.1, and I’m trying to enable PCIPassthrough on my Nvidia 3070. I’ve enabled VT-d, IOMMU and SR-IOV in my BIOS. I enabled the PCIPassthrough add-on, and I can see the video and audio devices for my 3070 in the PCI devices ui; however, when I enable both devices they hang on “In Progress”. I’ve read a number of posts online and in this channel suggesting I disable the PCIPassthrough add-on and restart the host. I’ve done that to no avail. Any other troubleshooting steps I should take? I’ve also tried deleting the CRDs for the PCIPassthrough to get a fresh start, but still nothing.
I’m getting the following log in the harvester-pcidevices-controller:
Copy code
level=error msg="error syncing 'node-name-000001000': handler PCIDeviceClaimReconcile: Cannot find PCIDevice that owns node-name-000001000, requeuing
Verified IOMMU:
Copy code
IOMMU Group 13:
	01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070] [10de:2484] (rev a1)
	01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
I’m also seeing the group in the pcidevice:
Copy code
apiVersion: <http://devices.harvesterhci.io/v1beta1|devices.harvesterhci.io/v1beta1>
kind: PCIDevice
metadata:
  annotations:
    <http://harvesterhci.io/pcideviceDriver|harvesterhci.io/pcideviceDriver>: ""
  creationTimestamp: "2025-03-03T22:48:26Z"
  generation: 1
  labels:
    nodename: node-name
  name: node-name-000001000
  resourceVersion: "105645074"
  uid: 566e234a-3a31-41e7-b159-56ea3327d194
spec: {}
status:
  address: "0000:01:00.0"
  classId: "0300"
  description: 'VGA compatible controller: NVIDIA Corporation GA104 [GeForce RTX 3070]'
  deviceId: "2484"
  iommuGroup: "13"
  nodeName: node-name
  resourceName: <http://nvidia.com/GA104_GEFORCE_RTX_3070|nvidia.com/GA104_GEFORCE_RTX_3070>
  vendorId: 10de
Copy code
dmesg | grep -i IOMMU
[    0.000000] Command line: BOOT_IMAGE=(loop0)/boot/vmlinuz console=tty1 root=LABEL=COS_STATE cos-img/filename=/cOS/active.img panic=0 net.ifnames=1 rd.cos.oemlabel=COS_OEM rd.cos.mount=LABEL=COS_OEM:/oem rd.cos.mount=LABEL=COS_PERSISTENT:/usr/local rd.cos.oemtimeout=120 audit=1 audit_backlog_limit=8192 intel_iommu=on amd_iommu=on iommu=pt multipath=off rd.emergency=reboot rd.shell=0 panic=5 systemd.crash_reboot systemd.crash_shell=0
[    0.018081] Kernel command line: BOOT_IMAGE=(loop0)/boot/vmlinuz console=tty1 root=LABEL=COS_STATE cos-img/filename=/cOS/active.img panic=0 net.ifnames=1 rd.cos.oemlabel=COS_OEM rd.cos.mount=LABEL=COS_OEM:/oem rd.cos.mount=LABEL=COS_PERSISTENT:/usr/local rd.cos.oemtimeout=120 audit=1 audit_backlog_limit=8192 intel_iommu=on amd_iommu=on iommu=pt multipath=off rd.emergency=reboot rd.shell=0 panic=5 systemd.crash_reboot systemd.crash_shell=0
[    0.018209] DMAR: IOMMU enabled
[    0.075568] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.394344] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[    0.412698] iommu: Default domain type: Passthrough (set via kernel command line)
[    0.454269] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.454269] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.454271] DMAR: IOMMU feature nwfs inconsistent
[    0.454272] DMAR: IOMMU feature dit inconsistent
[    0.454273] DMAR: IOMMU feature sc_support inconsistent
[    0.454274] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.454526] pci 0000:00:00.0: Adding to iommu group 0
[    0.454534] pci 0000:00:01.0: Adding to iommu group 1
[    0.454539] pci 0000:00:02.0: Adding to iommu group 2
[    0.454546] pci 0000:00:06.0: Adding to iommu group 3
[    0.454554] pci 0000:00:14.0: Adding to iommu group 4
[    0.454559] pci 0000:00:14.2: Adding to iommu group 4
[    0.454564] pci 0000:00:14.3: Adding to iommu group 5
[    0.454571] pci 0000:00:15.0: Adding to iommu group 6
[    0.454577] pci 0000:00:16.0: Adding to iommu group 7
[    0.454582] pci 0000:00:17.0: Adding to iommu group 8
[    0.454597] pci 0000:00:1c.0: Adding to iommu group 9
[    0.454604] pci 0000:00:1c.7: Adding to iommu group 10
[    0.454611] pci 0000:00:1d.0: Adding to iommu group 11
[    0.454622] pci 0000:00:1f.0: Adding to iommu group 12
[    0.454628] pci 0000:00:1f.3: Adding to iommu group 12
[    0.454633] pci 0000:00:1f.4: Adding to iommu group 12
[    0.454638] pci 0000:00:1f.5: Adding to iommu group 12
[    0.454647] pci 0000:01:00.0: Adding to iommu group 13
[    0.454653] pci 0000:01:00.1: Adding to iommu group 13
[    0.454660] pci 0000:02:00.0: Adding to iommu group 14
[    0.454673] pci 0000:04:00.0: Adding to iommu group 15
[    0.454683] pci 0000:05:00.0: Adding to iommu group 16
[    0.454693] pci 0000:05:00.1: Adding to iommu group 17
I saw
[    0.394344] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
, but found out it applies to the Intel integrated graphics (11th gen) - not the Nvidia 3070.
Hello! I’ve opened a bug for the PCI Passthrough issue I’m having. https://github.com/harvester/harvester/issues/7761
👀 1
a
Hi @shy-restaurant-94821, I got PCIpassthrough working with other GPU model, and also experienced some issues. for sure SR-IOV need to be enabled in the BIOS, PCI. Passthrough plugin must be turn on in Harvester, and you need to "Enable" the passthrough for the GPU device in Harvester. nvidia driver must not be installed in Harvester. I'm not even sure you could so most likely not the issue.
s
Sorry for the delayed update. I figured out what was going on, and it’s definite a bug in Rancher. I found another bug report that outlined the same behavior I was seeing when trying to setup PCI passthrough. One critical detail is that I was trying to enable passthrough via Rancher connected to Harvester. I tried again directly from Harvester, and it worked (albeit, I had to restart the node). Now I’m running Ollama in Kubernetes with full GPU support!
👍 1
I’m going to update my bug report and highlight this.