https://rancher.com/ logo
f

full-crayon-745

02/27/2023, 10:49 AM
Hi guys, We have some customers that run mostly GPU heavy tasks and would like to monitor the usage of each NVidia GPU. Any software/tools recommendations for monitoring VMs (Ubuntu Server) that are using GPUs through PCI passthrough? I would like to avoid installing active agents (e.g. Zabbix active agent) into the VMs, if possible. Thanks
👀 1
a

agreeable-oil-87482

02/27/2023, 11:40 AM
If you're using the Nvidia GPU operator in the downstream clusters you can enable the prometheus exporter to extract stats
f

full-crayon-745

02/27/2023, 11:41 AM
Thanks, We are using the built-in PCI passthrough feature of Harvester. As far as I understand that is not using the nvidia GPU operator.
a

agreeable-oil-87482

02/27/2023, 11:42 AM
These are just bog standard VM's and not k8s nodes?
f

full-crayon-745

02/27/2023, 5:03 PM
Hi David. These are VMs created in Harvester using the Create button on the Virtual Machines page. These VMs have one or more GPUs attached using the built-in PCI passthrough function of Harvester. Usually these VMs are running Ubuntu Server (20.04, 22.04). Hope this clarifies it.
g

great-bear-19718

02/28/2023, 9:57 PM
i assume you are not running k8s on these VM's
f

full-crayon-745

03/01/2023, 7:14 AM
No k8s running on those VMs. Usually, it's just Ubuntu Server with some machine learning (ML) libraries (e.g. Pytorch, Tensorflow) running ML tasks on the GPUs.
11 Views