Is there a Harvester approved way to add the drive...
# harvester
v
Is there a Harvester approved way to add the drivers and configuration needed to run containers that require GPUs? I see the GPU in the Kubernetes configuration
Copy code
kubectl describe node fed-mp-aj-hv-c1-n1 | egrep -iE "<http://nvidia.com|nvidia.com>|Resource|Allocatable|Capacity"
Capacity:
  <http://nvidia.com/AD104GL_L4|nvidia.com/AD104GL_L4>:                    0
Allocatable:
  <http://nvidia.com/AD104GL_L4|nvidia.com/AD104GL_L4>:                    0
Allocated resources:
  Resource                                 Requests     Limits
  <http://nvidia.com/AD104GL_L4|nvidia.com/AD104GL_L4>                    0            0
But the GPU is not available. We figured out how to enable the vGPU-KVM driver, but that doesn't seem to work for kubernetes containers. I have been told to follow these directions, but want to make sure there isn't a preferred way in Harvester to accomplish this. https://github.com/NVIDIA/k8s-device-plugin
t
Are you tying to use GPUs with pods on Harvester, Or pass through to a VM? Take a look at

https://youtu.be/RgW_uB6dOJ0

.
v
No GPU pass through. I figured Rancher was already there with a full Kubernetes stack, so I wasn't looking to load a VM with Kubernetes on top of a the existing Harvester elements. Under the Rancher / Apps area I found a Helm Chart for gpu-operator that looks like it should install the drivers and components needed, but that doesn't work because that appears to read the os-version file and then goes looking for this driver, which I don't think exists?
Copy code
Events:
  Type     Reason   Age                     From     Message
  ----     ------   ----                    ----     -------
  Normal   Pulling  55m (x183 over 16h)     kubelet  Pulling image "<http://nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5|nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5>"
  Warning  Failed   49m (x184 over 16h)     kubelet  Failed to pull image "<http://nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5|nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5>": rpc error: code = NotFound desc = failed to pull and unpack image "<http://nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5|nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5>": failed to resolve reference "<http://nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5|nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5>": <http://nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5|nvcr.io/nvidia/driver:570.124.06-sle-micro-rancher5.5>: not found
Haven't watch the video yet, but will give it a look shortly to see if gets me past this.
t
Harvester is tricky loading things outside of kubetnetes. It has to do with how it boots a new OS every time. Not to mention not having a full featured Rancher installed on it. The operator will fail since it needs to write to the os itself.
r
Hi @victorious-printer-90746 I assume you tried the nvidia-driver-toolkit add-on, but that didn’t suit your needs. Is that correct? https://docs.harvesterhci.io/v1.5/advanced/addons/nvidiadrivertoolkit
☝️ 1