full-crayon-745
01/17/2023, 11:06 AMlively-zebra-61132
01/17/2023, 12:53 PMfull-crayon-745
01/17/2023, 1:41 PMkubectl describe node inog02
Capacity:
...
<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>: 1
<http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>: 4
...
Allocatable:
...
<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>: 1
<http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>: 4
...
It seems the node is missing the other 3 GPUs.
We restarted the host with GPUs and now running the same command returns:
Capacity:
...
<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>: 0
<http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>: 0
...
Allocatable:
...<http://nvidia.com/GA102_GEFORCE_RTX_3090|nvidia.com/GA102_GEFORCE_RTX_3090>: 0
<http://nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER|nvidia.com/GA102_HIGH_DEFINITION_AUDIO_CONTROLLER>: 0
...
It looks like it is completely missing the GPUskubectl delete pod -n harvester-system harvester-pcidevices-controller-f29sr --grace-period=0 --force
New pod has been created by Harvester and now it found all 4 of the GPUs.
Thanks all for helplimited-breakfast-50094
01/18/2023, 5:28 PMfull-crayon-745
01/19/2023, 8:05 AM