This message was deleted.
# kubernetes
a
This message was deleted.
c
this is more of an nvidia question than a kubernetes question… but no it doesnt’ work that way
MIG stands for Multi-Instance-GPU. It is a mode of operation for future Nvidia GPUs that allows one to partition a GPU into a set of MIG devices, each of which appears to the software consuming them as a mini-GPU with a fixed partition of memory and a fixed partition of compute resources.
Assuming you’re using MIG-capable devices
if it’s a single-instance device, then you can’t partition it at all, and whatever’s using it gets to use it.
👍 1
a
I see, thank you very much for the response
c
also, are you sure that you’re OOMing on GPU resources, and not OOMing on host memory? Are you setting traditional memory requests/limits in addition to requesting GPU?
a
Yes definitely running our of video memory. I set too large of a batch size for inference for some model and got a cuda error. I was just wondering if it was possible to request a certain amount of vram to avoid this.