This message was deleted.
# harvester
a
This message was deleted.
r
Is the target pcidevice enabled and added to the VM? Do you have any screenshots or support bundle files we can investigate further?
r
yes, it is not pcidevice enable or not, both machines are enabled and we can run nvidia-smi on vm to see graph card info. it is just the cuda program are unable to run on a machine without native graph support. I have written a detail description on the bundle.
r
I saw the issue description. Does the “native graph support from CPU” mean the CPU has an integrated graphics processing unit? Sorry I’m not familiar with the technical term.
r
yes
👌 1
I forgot I can use integrated graphics terms
r
Could you show us the
nvidia-smi
output? And may I re-post your issue description here (it describes more details)?
p
Hi @rhythmic-painter-76998 So nvidia-smi in the VM returns the card, but CUDA program can’t use it? I have little experience to CUDA programming, do you see any error when calling function to get or detect cards?
r
Copy code
Fri Jul 28 08:34:50 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090         On | 00000000:08:00.0 Off |                  N/A |
|  0%   50C    P8               19W / 420W|      1MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
I am not seeing any errors from the program as I was running gpu-burn process https://github.com/wilicc/gpu-burn this code can only run on machine B (no integrated graphics + external graphics) when I installed ubuntu on top of it. but can NOT work when I create a vm from harvester (w/ pci devices enabled)
p
So running the command returns the card
Copy code
gpu_burn -l
r
Copy code
docker run --rm --gpus all gpu_burn -l

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
this comamnd is fine, but when running burning test, no proc info are shown in
nvidia-smi
p
It seems
gpu_burn -l
doesn’t return any card either
r
let me do a cross check. I have one can work and one is not. 😞
interesting, both shows the same result on -l option
and running the burn test, the process hangs
Copy code
docker run --rm --gpus all gpu_burn

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-c3865714-8aa0-f435-8ef3-9cde617bcb7b)



^C^C^C
👀 1
I wonder the integrated graph plays any role here
r
Was the problematic VM running when you generated the support bundle file? I only saw one VM running with PCIDevice attached on harvester01 🤔
r
oh, let me upload another bundle. harvester01 and harvester02-5900x are two nodes attached together. and harvester02-5900x can generate provlematic VM.
g
i dont have this GPU to emulate this issue.. but i am wondering if it could be related to this? https://www.reddit.com/r/VFIO/comments/pbgsg4/solved_rtx_3090_gpu_passthrough_just_displays_a/
r
i also found this article, the device is not enabled.
g
any chance we can get
nvidia-smi -q
output?
and when you run it in ubuntu.. as you mentioned do you pass through to a vm
or natively on the host?
r
no, for this particular machien, i did two test: 1 running an ubuntu on natively and cuda is working 2. running as a harvester node and create a vm and pass gpu into it, cuda did not work.
g
any chance of running it via kvm in ubuntu?
r
both scenarios are running an ubuntu, same version
g
one is running native ubuntu one is running in a vm
r
ya
g
it may not be the same test then
r
what do you mean? for scenario 1 is for testing whether the gpu is working properly.
g
i am trying to check if its qemu doing something..
so we need to check on host ubuntu.. where it works fine
then on this host.. run kvm and passthrough gpu to ubuntu vm
and compare difference
also what version of ubuntu did you use?
r
2004
my hunches is that is two graphics are required if you want to pass one into vm
g
i cant say..
i am trying to arrange a 30-- series gpu in our lab to see for myself
any chance you please create an issue for us to track
r
sure
g
i ran the same
gpu_burn
on a 3070 GPU, and can see the CPU usage pinned on a core..
what happens when you run it natively on an ubuntu host.. what would be the load expectation?
r
the load would be on gpu
g
sure
r
is it possible that the machine B is plugged with a monitor, so the graphic are unable to “release”?
g
i tried checking it too.. i was noticing cuda was not picking up the gpu in the vm
but this is a great find