little-shampoo-18495
05/23/2023, 10:51 AMUbuntu 22.04.2 LTS
by going through the QuickStart and I have an issue where my CNI pod doesn't come up because it cannot talk to the api server,
(i have tried different cnis and I get the same issue).
Right now I have an instance with rke2 setup with these commands
ufw disable
curl -sfL <https://get.rke2.io> | sh -
systemctl enable rke2-server.service
systemctl start rke2-server.service
I don't see any errors in the journalctl logs
but the pod rke2-canal-___
is stuck in an init crashloopbackoff.
and from the logs of the install-cni
container, I see that it cannot connect to the kubernetes
service:
2023-05-23 10:36:11.795 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "<https://10.43.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/canal/token>": dial tcp 10.43.0.1:443: i/o timeout
this is my service and endpoints:
root@k8s-master-1:~# kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 14m <none>
root@k8s-master-1:~# kubectl get endpoints -owide
NAME ENDPOINTS AGE
kubernetes 45.76.137.187:6443 15m
I can reach the endpoint
$ kubectl exec -it etcd-k8s-master-1 -nkube-system -- curl -vk <https://45.76.137.187:6443>
....
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
* Connection #0 to host 45.76.137.187 left intact
but I cannot reach the kubernetes
service
$ kubectl exec -it etcd-k8s-master-1 -nkube-system -- curl -vk <https://10.43.0.1>
* Uses proxy env variable NO_PROXY == '.svc,.cluster.local,10.42.0.0/16,10.43.0.0/16'
* Trying 10.43.0.1:443...
* TCP_NODELAY set
limited-motherboard-41807
05/23/2023, 11:56 AM/opt/cni/bin/
on agent nodes? I've had a case where an agent node doesn't have it but not sure whyhundreds-evening-84071
05/23/2023, 1:38 PMhallowed-window-565
05/24/2023, 7:16 AMpolite-translator-35958
05/25/2023, 3:09 PMsystemctl start rke2-server
eventually times out and fails. It seems like it’s waiting for etcd startup which never happens. I’m guessing something changed in the rhel release, but I’m failing to grok what the issue is. Anyone seen this?broad-farmer-70498
05/25/2023, 9:44 PMhundreds-airport-66196
05/25/2023, 11:52 PMrapid-church-43569
05/26/2023, 7:00 AMpolite-translator-35958
05/26/2023, 3:25 PMechoing-tomato-53055
05/29/2023, 6:16 AMechoing-tomato-53055
05/29/2023, 6:16 AMprehistoric-advantage-39331
05/29/2023, 10:57 AMpolite-translator-35958
05/30/2023, 7:45 PMglamorous-lighter-5580
05/31/2023, 9:17 AMlevel=warning msg="Running modprobe ip_vs failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
ambitious-plastic-3551
05/31/2023, 1:42 PMgreen-rain-9522
06/01/2023, 11:40 AMTaints: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute>
<http://node.kubernetes.io/not-ready:NoSchedule|node.kubernetes.io/not-ready:NoSchedule>
Therefore showing up as NotReady
NAME STATUS ROLES AGE VERSION
k8s-2-master-01.herren5.local NotReady control-plane,etcd,master 6d23h v1.25.9+rke2r1
k8s-2-worker-01.herren5.local Ready <none> 6d23h v1.25.9+rke2r1
k8s-2-worker-02.herren5.local Ready <none> 6d23h v1.25.9+rke2r1
k8s-2-worker-03.herren5.local Ready <none> 6d23h v1.25.9+rke2r1
Shoudn’t the taints be more like ?
<http://node-role.kubernetes.io/etcd=true:NoExecute|node-role.kubernetes.io/etcd=true:NoExecute>
<http://node-role.kubernetes.io/controlplane=true:NoSchedule|node-role.kubernetes.io/controlplane=true:NoSchedule>
green-art-51038
06/01/2023, 8:36 PMuser nginx;
worker_processes 4;
worker_rlimit_nofile 40000;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 8192;
}
stream {
upstream backend {
least_conn;
server <IP_NODE_1>:9345 max_fails=3 fail_timeout=5s;
server <IP_NODE_2>:9345 max_fails=3 fail_timeout=5s;
}
server {
listen 9345;
proxy_pass backend;
}
upstream rancher_api {
least_conn;
server <IP_NODE_1>:6443 max_fails=3 fail_timeout=5s;
server <IP_NODE_2>:6443 max_fails=3 fail_timeout=5s;
}
server {
listen 6443;
proxy_pass rancher_api;
}
upstream rancher_http {
least_conn;
server <IP_SERVER1>:80 max_fails=3 fail_timeout=5s;
server <IP_SERVER2>:80 max_fails=3 fail_timeout=5s;
}
server {
listen 80;
proxy_pass rancher_http;
}
upstream rancher_https {
least_conn;
server <IP_SERVER1>:443 max_fails=3 fail_timeout=5s;
server <IP_SERVER2>:443 max_fails=3 fail_timeout=5s;
}
server {
listen 443;
proxy_pass rancher_https;
}
}
Please can someone help ?ambitious-plastic-3551
06/02/2023, 2:12 PMlimited-motherboard-41807
06/04/2023, 10:12 AMjolly-lock-68045
06/05/2023, 2:19 PMpolite-ocean-96458
06/07/2023, 2:36 AMstale-painting-80203
06/07/2023, 3:13 AMbest-jordan-89798
06/07/2023, 4:34 PMc:/etc/rancher/rke2/config.yaml
rhythmic-intern-33969
06/09/2023, 12:32 PM/var/lib/rancher/rke2/data/v1.25.9-rke2r1-177f016694ea/bin/ctr --address=/run/k3s/containerd/containerd. sock run --rm -t --runc-binary=/usr/local/nvidia/toolkit/nvidia-container-runtime --env NVIDIA_VISIBLE_DEVICES=all <http://docker.io/nvidia/samples:vectoradd-cuda10.2|docker.io/nvidia/samples:vectoradd-cuda10.2> nvidia-smi
everything works fine. On kubernetes I installed gpu-operator from nvidia which validated correctly on the machine. But when I put the same image from kubernetes I get the log Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
2023-06-09T12:29:00.444604100Z [Vector addition of 50000 elements]
. I have set 2 variables in the operator, specifically CONTAINERD_CONFIG: /var/lib/rancher/rke2/agent/etc/containerd/config.toml
and CONTAINERD_SOCKET: /run/k3s/containerd/containerd.sock
. Is anyone able to tell what is wrong that it is not working?numerous-sunset-21016
06/09/2023, 11:04 PMstocky-baker-96123
06/10/2023, 2:38 AMechoing-city-69785
06/11/2023, 8:19 PMquick-waitress-21632
06/13/2023, 3:22 PMenv
inside a pod?ambitious-plastic-3551
06/13/2023, 4:57 PMambitious-plastic-3551
06/13/2023, 5:04 PM