This message was deleted.
# harvester
a
This message was deleted.
w
Or do I install these in the VM?
t
what are you trying to do? harvester itself doesn’t really need graphics
w
I’m trying to run LLM workloads in Harvester (vllm etc)
t
I would use PCI passthrough and a VM :

https://youtu.be/RgW_uB6dOJ0

w
The problem is that the vm doesn’t start at all when I enable pci passthrough for the AMD pci devices.
t
reboot the node. I have seen this before where I needed to disable PCI passthrough, reboot, and re-enable.
w
Thanks, I will try that.
t
you can also do a kubectl describe of the VM (pod) to see why it won’t schedule
h
@wooden-area-49191 make sure all devices in the group are passed. See the PCI device ID. ie some graphic cards also have a sound device, and that also needs to be passed along the GPU.
w
I have a AMD MI300X and the GPU:s are visible as processing accelerators. When I create a vm and enable passthrough to these I get an error saying the condition isn’t met. After some investigation I managed to add ‘modprobe amdgpu’ to the 90_custom.yaml and restarted the node- after that the vm started but stopped before reaching the shell with an error about
virtqemud
i’m running harvester 1.4.1
h
Perhaps @great-bear-19718 can comment. I don’t think you need the driver locally at all anyway since you are doing a pass through.
w
Hm - this is the error I receive now in the log when I’m trying again to start it up
Copy code
{"component":"virt-launcher","level":"warning","msg":"MDEV_PCI_RESOURCE_AMD_COM_AQUA_VANJARAM_INSTINCT_MI300X not set for resource <http://amd.com/AQUA_VANJARAM_INSTINCT_MI300X%22,%22pos%22:%22addresspool.go:51%22,%22timestamp%22:%222025-02-10T09:56:18.435280Z%22}|amd.com/AQUA_VANJARAM_INSTINCT_MI300X","pos":"addresspool.go:51","timestamp":"2025-02-10T09:56:18.435280Z"}>
{"component":"virt-launcher","level":"warning","msg":"USB_RESOURCE_AMD_COM_AQUA_VANJARAM_INSTINCT_MI300X not set for resource <http://amd.com/AQUA_VANJARAM_INSTINCT_MI300X%22,%22pos%22:%22addresspool.go:51%22,%22timestamp%22:%222025-02-10T09:56:18.435369Z%22}|amd.com/AQUA_VANJARAM_INSTINCT_MI300X","pos":"addresspool.go:51","timestamp":"2025-02-10T09:56:18.435369Z"}>
{"component":"virt-launcher","level":"info","msg":"host-devices created: [0000:65:00.0, 0000:85:00.0]","pos":"hostdev.go:98","timestamp":"2025-02-10T09:56:18.435468Z"}
{"component":"virt-launcher","kind":"","level":"info","msg":"Synced vmi","name":"vllm","namespace":"berget","pos":"server.go:208","timestamp":"2025-02-10T09:56:18.437905Z","uid":"bb30a93c-5b72-4ca4-9abd-de74f2e693f2"}
{"component":"virt-launcher","level":"info","msg":"Process berget_vllm and pid 77 is a zombie, sending SIGCHLD to pid 1 to reap process","pos":"monitor.go:186","timestamp":"2025-02-10T09:56:19.387458Z"}
{"component":"virt-launcher","level":"info","msg":"Waiting on final notifications to be sent to virt-handler.","pos":"virt-launcher.go:281","timestamp":"2025-02-10T09:56:19.387537Z"}
According to the AMD ROCm documentation- a kernel driver is needed on the physical host machine in order for it to work properly in vm or pods. Is there a recommended way to change the filesystem and install drivers even though it is read-only?
t
is there any documentation for k8s? In the video I posted about nvidia I didn’t have to install anything on the harvester node.
w
Here is the AMD operator that I’m trying to install. it needs a kernel driver installed in the operating system unfortunately: https://github.com/ROCm/gpu-operator
g
i do not have an amd gpu.. but eventually you just need the amd drivers installed. the current addon is written to handle the nvidia driver only
you will need to build an image from here https://github.com/harvester/os2/blob/sle-micro/nvidia-driver-toolkit/entrypoint.sh and change entrypoint
when you run nvidia-driver addon with your custom image it should load the driver