This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

10/17/2023, 10:33 AM

This message was deleted.

boundless-spoon-6503

10/17/2023, 11:58 AM

yes, it's possible...

boundless-spoon-6503

10/17/2023, 11:59 AM

the node tags are included with https://github.com/NVIDIA/k8s-device-plugin

boundless-spoon-6503

10/17/2023, 11:59 AM

make sure drivers of gpu are installed in OS an GPU is working.

boundless-spoon-6503

10/17/2023, 12:00 PM

check with nvidia-smi

boundless-spoon-6503

10/17/2023, 12:02 PM

make sure to install nvidia Runtime class https://docs.k3s.io/advanced

boundless-spoon-6503

10/17/2023, 12:02 PM

https://docs.k3s.io/advanced

boundless-spoon-6503

10/17/2023, 12:03 PM

NVIDIA Container Runtime Support K3s will automatically detect and configure the NVIDIA container runtime if it is present when K3s starts. 1. Install the nvidia-container package repository on the node by following the instructions at: https://nvidia.github.io/libnvidia-container/ 2. Install the nvidia container runtime packages. For example:

apt install -y nvidia-container-runtime cuda-drivers-fabricmanager-515 nvidia-headless-515-server

3. Install K3s, or restart it if already installed:

curl -ksL <http://get.k3s.io|get.k3s.io> | sh -

4. Confirm that the nvidia container runtime has been found by k3s:

grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml

boundless-spoon-6503

10/17/2023, 12:03 PM

check with sudo grep nvidia /var/lib/rancher/k3s/agent/etc/containerd/config.toml

boundless-spoon-6503

10/17/2023, 12:03 PM

something like:

boundless-spoon-6503

10/17/2023, 12:03 PM

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options] BinaryName = "/usr/bin/nvidia-container-runtime"

boundless-spoon-6503

10/17/2023, 12:05 PM

with this you should already able to run pods specifying the runtimeclass:

boundless-spoon-6503

10/17/2023, 12:05 PM

apiVersion: node.k8s.io/v1 kind: RuntimeClass metadata: name: nvidia handler: nvidia --- apiVersion: v1 kind: Pod metadata: name: nbody-gpu-benchmark namespace: default spec: restartPolicy: OnFailure runtimeClassName: nvidia containers: - name: cuda-container image: nvidiak8s/cuda-sample:nbody args: ["nbody", "-gpu", "-benchmark"] # resources: # limits: # nvidia.com/gpu: 1 env: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: all

boundless-spoon-6503

10/17/2023, 12:07 PM

if you want the tagged nodes, install device-plugin or gpu operator

boundless-spoon-6503

10/17/2023, 12:07 PM

for gpu operator edit values for: toolkit: env: - name: CONTAINERD_CONFIG value: /var/lib/rancher/k3s/agent/etc/containerd/config.toml - name: CONTAINERD_SOCKET value: /run/k3s/containerd/containerd.sock

boundless-spoon-6503

10/17/2023, 12:08 PM

I hope this helps you

ancient-tomato-94095

10/17/2023, 12:21 PM

Yes the nvidia container tuntime is working. With : docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi I got my GPU detected.

ancient-tomato-94095

10/17/2023, 12:23 PM

I did follow the documentation on k3s "advanced options" .

ancient-tomato-94095

10/17/2023, 12:30 PM

But gpu-feature-discovery don´t set the labels.

ancient-tomato-94095

10/17/2023, 12:31 PM

this is the only nvidia label

ancient-tomato-94095

10/17/2023, 12:32 PM

and this is the log of the gpu-feature-discovery POD. look like it did not find the GPU.

boundless-spoon-6503

10/17/2023, 1:31 PM

run this in k8s: https://github.com/NVIDIA/k8s-device-plugin

boundless-spoon-6503

10/17/2023, 1:31 PM

this is the container that tags the node, not the feature discovery...

boundless-spoon-6503

10/17/2023, 1:32 PM

Let me know the logs

ancient-tomato-94095

10/17/2023, 6:36 PM

Some details about my procedure. I am following the procedure on "NVIDIA Container Runtime Support" on k3s I did steos 1 e 2 (install container runtime packages). nvidia-smi is detecting my gpu.After this I did step 3. a install of k3s.

ancient-tomato-94095

10/17/2023, 6:38 PM

This is the config.toml of containerd :

ancient-tomato-94095

10/17/2023, 6:47 PM

next i installed k8s device plugin : kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml

ancient-tomato-94095

10/17/2023, 6:55 PM

This is the log of the nvidia-device-plugin-daemonset POD.

ancient-tomato-94095

10/17/2023, 6:55 PM

Looks like it found the GPU, on a previous image it did not found it. On the message it ask about the nvidia-container-toolkit. But it is installed. If it was not installed the containerd would not be updated right ? And I said the command :

Copy code

docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

runs fine.

boundless-spoon-6503

10/18/2023, 8:56 AM

Copy code

apiVersion: v1
kind: Pod
metadata:
  name: nbody-gpu-benchmark
  namespace: default
spec:
  restartPolicy: OnFailure
  runtimeClassName: nvidia
  containers:
  - name: cuda-container
    image: nvidiak8s/cuda-sample:nbody
    args: ["nbody", "-gpu", "-benchmark"]
    env:
    - name: NVIDIA_VISIBLE_DEVICES
      value: all
    - name: NVIDIA_DRIVER_CAPABILITIES
      value: all

boundless-spoon-6503

10/18/2023, 8:56 AM

Could you apply this manifest and check the logs of the pod? did it worked?

boundless-spoon-6503

10/18/2023, 9:00 AM

if it works, GPU on K3s is working in your cluster, we have to check the reason nvidia-device-plugin is not tagging your nodes (meanwhile you can work specifying runtimeclassname as in the example of the pod, assigning the complete GPU to the pods requiring it...)... I think the reason is your path is not correct (for your logs attached).

ancient-tomato-94095

10/18/2023, 9:19 AM

"meanwhile you can work specifying runtimeclassname as in the example of the pod, assigning the complete GPU to the pods requiring it." on this case I can have multiple pods using the GPU ?

boundless-spoon-6503

10/18/2023, 9:35 AM

yes, you could, but if one consumes the full GPU, none of them will work

boundless-spoon-6503

10/18/2023, 9:35 AM

it worked, I mean the pod you just have run detected the GPU and used it on K3s...

boundless-spoon-6503

10/18/2023, 9:37 AM

please, make a full recursive copy with 777 permissions (if it's a non prod enviroment) from dir /usr/bin/nvidia-container-runtime to /usr/bin/nvidia-ctk (the good way would be changing in nvidia-device-plugin instead) ...

boundless-spoon-6503

10/18/2023, 9:38 AM

relaunch nvidia-device-plugin ... then if you check the tags of the pod it should be ok and everything already working

boundless-spoon-6503

10/18/2023, 9:40 AM

Did it help you @ancient-tomato-94095? Btw if it's not too much curiosity ... Could you explain which is your use case with k3s and GPU?

boundless-spoon-6503

10/18/2023, 9:41 AM

Direct message me if prefered

ancient-tomato-94095

10/18/2023, 9:42 AM

"I think the reason is your path is not correct (for your logs attached)." i will further investigate the issue.

Open in Slack

Previous Next