Currently running a v1 24 10 RKE2 provisioned via Rancher UI Rancher Users #rke2

Currently running a v1.24.10 RKE2 provisioned via ...

jolly-eye-77963

08/16/2024, 2:40 PM

Currently running a v1.24.10 RKE2 provisioned via Rancher UI (upgrade is currently not possible - planned for the next stages for development) - the underlying VM nodes are being upgraded to ubuntu 24 and Nvidia drivers 560 with the nvidia container toolkit. When registering the node - if no config.toml.tmpl is defined, the node will provision but the nvidia devices will not be assigned, putting in a config.toml.tmpl to identify the nvidia runtime works until the node is rebooted, then the kube-proxy and calico pods go into crashbackloops - until the tmpl is removed and agent restarted and the cycle begins again. Does anyone have any idea why setting the default runtime to include the nvidia runtime is causing the crashbacks?

Open in Slack

Previous Next