https://rancher.com/ logo
a

abundant-gpu-72225

11/08/2022, 11:29 PM
Does anyone have any examples of creating resource requests of gpu memory? I am able to schedule on gpu nodes and run computation on the gpu, but currently I can get cuda oom with my current resource limit definitions as pods can be schedule on nodes no matter how much gpu memory I have available:
Copy code
resources:
  limits:
    <http://nvidia.com/gpu|nvidia.com/gpu>: 1
I would like to request/limit to a certain amount of gb, instead of a whole device.
c

creamy-pencil-82913

11/08/2022, 11:39 PM
this is more of an nvidia question than a kubernetes question… but no it doesnt’ work that way
MIG stands for Multi-Instance-GPU. It is a mode of operation for future Nvidia GPUs that allows one to partition a GPU into a set of MIG devices, each of which appears to the software consuming them as a mini-GPU with a fixed partition of memory and a fixed partition of compute resources.
Assuming you’re using MIG-capable devices
if it’s a single-instance device, then you can’t partition it at all, and whatever’s using it gets to use it.
👍 1
a

abundant-gpu-72225

11/08/2022, 11:42 PM
I see, thank you very much for the response
c

creamy-pencil-82913

11/08/2022, 11:44 PM
also, are you sure that you’re OOMing on GPU resources, and not OOMing on host memory? Are you setting traditional memory requests/limits in addition to requesting GPU?
a

abundant-gpu-72225

11/08/2022, 11:46 PM
Yes definitely running our of video memory. I set too large of a batch size for inference for some model and got a cuda error. I was just wondering if it was possible to request a certain amount of vram to avoid this.
2 Views