Hi everyone I have a problem I have created a cluster from r Rancher Users #rke2

Hi everyone, I have a problem. I have created a cl...

rhythmic-intern-33969

06/09/2023, 12:32 PM

Hi everyone, I have a problem. I have created a cluster from rancher on rke2 version v1.25.9+rke2r1. On the machine using the command

/var/lib/rancher/rke2/data/v1.25.9-rke2r1-177f016694ea/bin/ctr --address=/run/k3s/containerd/containerd. sock run --rm -t --runc-binary=/usr/local/nvidia/toolkit/nvidia-container-runtime --env NVIDIA_VISIBLE_DEVICES=all <http://docker.io/nvidia/samples:vectoradd-cuda10.2|docker.io/nvidia/samples:vectoradd-cuda10.2> nvidia-smi

everything works fine. On kubernetes I installed gpu-operator from nvidia which validated correctly on the machine. But when I put the same image from kubernetes I get the log

Copy code

Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
2023-06-09T12:29:00.444604100Z [Vector addition of 50000 elements]

. I have set 2 variables in the operator, specifically

CONTAINERD_CONFIG: /var/lib/rancher/rke2/agent/etc/containerd/config.toml

and

CONTAINERD_SOCKET: /run/k3s/containerd/containerd.sock

. Is anyone able to tell what is wrong that it is not working?

167 Views

Open in Slack

Previous Next