adamant-kite-43734
11/10/2024, 10:40 PMbrave-rose-20023
11/13/2024, 10:30 PM0/1 nodes are available: 1 Insufficient <http://nvidia.com/GP104_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GP104_HIGH_DEFINITION_AUDIO_CONTROLLER>. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Solution:
The audio controllers for the P4000 GPUs were still enabled and couldn’t be disabled via the Harvester UI. Here are the steps and commands that worked for me, starting from scratch:
1. List Custom Resource Definitions (CRDs) for PCI/GPU devices
To identify relevant CRDs for PCI and GPU devices in Harvester:
bash
kubectl get crd | grep -i -e pci -e gpu -e nvidia
2. Locate Specific Audio Controllers
List and filter the pcidevices
to find the exact resource names for the audio controllers (replace 0000:5e:00.1
and 0000:d9:00.1
with your PCI addresses):
bash
kubectl get pcidevices.devices.harvesterhci.io | grep -E "0000:5e:00.1|0000:d9:00.1"
3. Inspect Each Audio Controller Resource
Describe each audio controller to confirm its details:
bash
kubectl describe pcidevices.devices.harvesterhci.io harvester01-00005e001
kubectl describe pcidevices.devices.harvesterhci.io harvester01-0000d9001
4. Unbind the Audio Controllers from the Driver
To release the devices, unbind them directly from the host’s driver:
bash
echo "0000:5e:00.1" | sudo tee /sys/bus/pci/devices/0000:5e:00.1/driver/unbind
echo "0000:d9:00.1" | sudo tee /sys/bus/pci/devices/0000:d9:00.1/driver/unbind
5. Force-Delete the Audio Controller Resources
After unbinding, forcefully delete the audio controller resources to ensure they don’t interfere:
bash
kubectl delete pcidevices.devices.harvesterhci.io harvester01-00005e001 --force --grace-period=0
kubectl delete pcidevices.devices.harvesterhci.io harvester01-0000d9001 --force --grace-period=0