Hi everyone, I have a problem. I have created a cl...
# rke2
r
Hi everyone, I have a problem. I have created a cluster from rancher on rke2 version v1.25.9+rke2r1. On the machine using the command
/var/lib/rancher/rke2/data/v1.25.9-rke2r1-177f016694ea/bin/ctr --address=/run/k3s/containerd/containerd. sock run --rm -t --runc-binary=/usr/local/nvidia/toolkit/nvidia-container-runtime --env NVIDIA_VISIBLE_DEVICES=all <http://docker.io/nvidia/samples:vectoradd-cuda10.2|docker.io/nvidia/samples:vectoradd-cuda10.2> nvidia-smi
everything works fine. On kubernetes I installed gpu-operator from nvidia which validated correctly on the machine. But when I put the same image from kubernetes I get the log
Copy code
Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
2023-06-09T12:29:00.444604100Z [Vector addition of 50000 elements]
. I have set 2 variables in the operator, specifically
CONTAINERD_CONFIG: /var/lib/rancher/rke2/agent/etc/containerd/config.toml
and
CONTAINERD_SOCKET: /run/k3s/containerd/containerd.sock
. Is anyone able to tell what is wrong that it is not working?
159 Views