adamant-kite-43734
05/31/2023, 4:45 PMcreamy-pencil-82913
05/31/2023, 5:06 PMcreamy-pencil-82913
05/31/2023, 5:06 PMcreamy-pencil-82913
05/31/2023, 5:07 PMimportant-tomato-46085
05/31/2023, 5:08 PMimportant-tomato-46085
05/31/2023, 10:51 PMimportant-tomato-46085
05/31/2023, 10:51 PMimportant-tomato-46085
05/31/2023, 10:52 PMimportant-tomato-46085
05/31/2023, 10:52 PM/var/lib/kubelet/device-plugins
important-tomato-46085
05/31/2023, 10:52 PMimportant-tomato-46085
05/31/2023, 10:54 PMimportant-tomato-46085
05/31/2023, 10:55 PMcreamy-pencil-82913
05/31/2023, 11:37 PMcreamy-pencil-82913
05/31/2023, 11:38 PMimportant-tomato-46085
06/02/2023, 4:38 PMimportant-tomato-46085
06/02/2023, 4:38 PM❯ k -n kube-system logs nvidia-device-plugin-sw9j5
I0602 16:37:22.243956 1 main.go:154] Starting FS watcher.
I0602 16:37:22.244038 1 main.go:161] Starting OS watcher.
I0602 16:37:22.244732 1 main.go:176] Starting Plugins.
I0602 16:37:22.244747 1 main.go:234] Loading configuration.
I0602 16:37:22.244857 1 main.go:242] Updating config with default resource matching patterns.
I0602 16:37:22.245032 1 main.go:253]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": true,
"nvidiaDriverRoot": "/",
"gdsEnabled": false,
"mofedEnabled": false,
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "<http://cdi.k8s.io/|cdi.k8s.io/>",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "<http://nvidia.com/gpu|nvidia.com/gpu>"
}
]
},
"sharing": {
"timeSlicing": {}
}
}
I0602 16:37:22.245041 1 main.go:256] Retreiving plugins.
W0602 16:37:22.245351 1 factory.go:31] No valid resources detected, creating a null CDI handler
I0602 16:37:22.245407 1 factory.go:107] Detected non-NVML platform: could not load NVML library: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
I0602 16:37:22.245436 1 factory.go:107] Detected non-Tegra platform: /sys/devices/soc0/family file not found
E0602 16:37:22.245444 1 factory.go:115] Incompatible platform detected
E0602 16:37:22.245449 1 factory.go:116] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E0602 16:37:22.245454 1 factory.go:117] You can check the prerequisites at: <https://github.com/NVIDIA/k8s-device-plugin#prerequisites>
E0602 16:37:22.245460 1 factory.go:118] You can learn how to set the runtime at: <https://github.com/NVIDIA/k8s-device-plugin#quick-start>
E0602 16:37:22.245466 1 factory.go:119] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
E0602 16:37:22.256648 1 main.go:123] error starting plugins: error creating plugin manager: unable to create plugin manager: platform detection failed
important-tomato-46085
06/02/2023, 4:38 PMlibnvidia-ml.so.1
is interesting, but I think that's referring to the container. The host has this library.important-tomato-46085
06/02/2023, 4:39 PMimportant-tomato-46085
06/02/2023, 4:43 PMimportant-tomato-46085
06/02/2023, 4:56 PM