This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

02/08/2025, 9:22 AM

This message was deleted.

wooden-area-49191

02/08/2025, 9:25 AM

Or do I install these in the VM?

thousands-advantage-10804

02/08/2025, 11:45 AM

what are you trying to do? harvester itself doesn’t really need graphics

wooden-area-49191

02/08/2025, 1:33 PM

I’m trying to run LLM workloads in Harvester (vllm etc)

thousands-advantage-10804

02/08/2025, 4:08 PM

I would use PCI passthrough and a VM :

https://youtu.be/RgW_uB6dOJ0▾

wooden-area-49191

02/08/2025, 5:47 PM

The problem is that the vm doesn’t start at all when I enable pci passthrough for the AMD pci devices.

thousands-advantage-10804

02/08/2025, 5:59 PM

reboot the node. I have seen this before where I needed to disable PCI passthrough, reboot, and re-enable.

wooden-area-49191

02/08/2025, 6:06 PM

Thanks, I will try that.

thousands-advantage-10804

02/08/2025, 6:07 PM

you can also do a kubectl describe of the VM (pod) to see why it won’t schedule

happy-cat-90847

02/08/2025, 7:31 PM

@wooden-area-49191 make sure all devices in the group are passed. See the PCI device ID. ie some graphic cards also have a sound device, and that also needs to be passed along the GPU.

wooden-area-49191

02/09/2025, 9:13 AM

I have a AMD MI300X and the GPU:s are visible as processing accelerators. When I create a vm and enable passthrough to these I get an error saying the condition isn’t met. After some investigation I managed to add ‘modprobe amdgpu’ to the 90_custom.yaml and restarted the node- after that the vm started but stopped before reaching the shell with an error about

virtqemud

wooden-area-49191

02/09/2025, 9:14 AM

i’m running harvester 1.4.1

happy-cat-90847

02/09/2025, 1:25 PM

Perhaps @great-bear-19718 can comment. I don’t think you need the driver locally at all anyway since you are doing a pass through.

wooden-area-49191

02/10/2025, 9:58 AM

Hm - this is the error I receive now in the log when I’m trying again to start it up

Copy code

{"component":"virt-launcher","level":"warning","msg":"MDEV_PCI_RESOURCE_AMD_COM_AQUA_VANJARAM_INSTINCT_MI300X not set for resource <http://amd.com/AQUA_VANJARAM_INSTINCT_MI300X%22,%22pos%22:%22addresspool.go:51%22,%22timestamp%22:%222025-02-10T09:56:18.435280Z%22}|amd.com/AQUA_VANJARAM_INSTINCT_MI300X","pos":"addresspool.go:51","timestamp":"2025-02-10T09:56:18.435280Z"}>
{"component":"virt-launcher","level":"warning","msg":"USB_RESOURCE_AMD_COM_AQUA_VANJARAM_INSTINCT_MI300X not set for resource <http://amd.com/AQUA_VANJARAM_INSTINCT_MI300X%22,%22pos%22:%22addresspool.go:51%22,%22timestamp%22:%222025-02-10T09:56:18.435369Z%22}|amd.com/AQUA_VANJARAM_INSTINCT_MI300X","pos":"addresspool.go:51","timestamp":"2025-02-10T09:56:18.435369Z"}>
{"component":"virt-launcher","level":"info","msg":"host-devices created: [0000:65:00.0, 0000:85:00.0]","pos":"hostdev.go:98","timestamp":"2025-02-10T09:56:18.435468Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"vllm","namespace":"berget","pos":"server.go:208","timestamp":"2025-02-10T09:56:18.437905Z","uid":"bb30a93c-5b72-4ca4-9abd-de74f2e693f2"}
{"component":"virt-launcher","level":"info","msg":"Process berget_vllm and pid 77 is a zombie, sending SIGCHLD to pid 1 to reap process","pos":"monitor.go:186","timestamp":"2025-02-10T09:56:19.387458Z"}
{"component":"virt-launcher","level":"info","msg":"Waiting on final notifications to be sent to virt-handler.","pos":"virt-launcher.go:281","timestamp":"2025-02-10T09:56:19.387537Z"}

wooden-area-49191

02/18/2025, 9:57 AM

According to the AMD ROCm documentation- a kernel driver is needed on the physical host machine in order for it to work properly in vm or pods. Is there a recommended way to change the filesystem and install drivers even though it is read-only?

thousands-advantage-10804

02/18/2025, 12:14 PM

is there any documentation for k8s? In the video I posted about nvidia I didn’t have to install anything on the harvester node.

wooden-area-49191

02/28/2025, 2:19 PM

Here is the AMD operator that I’m trying to install. it needs a kernel driver installed in the operating system unfortunately: https://github.com/ROCm/gpu-operator

great-bear-19718

03/03/2025, 9:51 PM

i do not have an amd gpu.. but eventually you just need the amd drivers installed. the current addon is written to handle the nvidia driver only

great-bear-19718

03/03/2025, 9:52 PM

you will need to build an image from here https://github.com/harvester/os2/blob/sle-micro/nvidia-driver-toolkit/entrypoint.sh and change entrypoint

great-bear-19718

03/03/2025, 9:52 PM

when you run nvidia-driver addon with your custom image it should load the driver

62 Views

Open in Slack

Previous Next