https://rancher.com/ logo
Title
c

crooked-cat-21365

02/01/2023, 7:52 AM
Trying to deploy a new RKE2 cluster using Rancher 2.7.1 I get a lot of messages "Internal server error" and "retrying of unary invoker failed", but the cluster doesn't show up. Not a single container has been started. If I try to run a sample container for verification I get
# crictl runp /tmp/pod-config.json 
E0201 08:48:23.160651    4075 remote_runtime.go:201] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"00bea6d4a719789624b9eb4b5467cab36de70c241687f267377035958ca406cb\": cni plugin not initialized"
FATA[0000] run pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "00bea6d4a719789624b9eb4b5467cab36de70c241687f267377035958ca406cb": cni plugin not initialized
How comes? Every insightful comment is highly appreciated.
c

creamy-pencil-82913

02/01/2023, 9:30 AM
You haven't provided any actual errors from the logs of rke2 itself so it's hard to make any suggestions other than... have you looked at the logs?
c

crooked-cat-21365

02/01/2023, 9:51 AM
These logs are so huge they don't fit here. Here are the last few lines:
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-metrics-server.yaml to set cluster configuration values"
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-cloud-provider.yaml to set cluster configuration values"
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Updated manifest /var/lib/rancher/rke2/server/manifests/harvester-csi-driver.yaml to set cluster configuration values"
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Updated manifest /var/lib/rancher/rke2/server/manifests/rancher-vsphere-csi.yaml to set cluster configuration values"
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Updated manifest /var/lib/rancher/rke2/server/manifests/rke2-canal.yaml to set cluster configuration values"
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Logging containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log"
rke2[2629]: time="2023-02-01T10:28:24+01:00" level=info msg="Running containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Containerd is now running"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Imported images from /var/lib/rancher/rke2/agent/images/cloud-controller-manager-image.txt in 1.488373ms"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/etcd-image.txt"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Imported images from /var/lib/rancher/rke2/agent/images/etcd-image.txt in 958.382µs"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt"
rke2[2629]: time="2023-02-01T10:28:25+01:00" level=info msg="Pulling image <http://index.docker.io/rancher/hardened-kubernetes:v1.24.10-rke2r1-build20230119|index.docker.io/rancher/hardened-kubernetes:v1.24.10-rke2r1-build20230119>..."
rke2[2629]: time="2023-02-01T10:28:28+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:33+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Imported images from /var/lib/rancher/rke2/agent/images/kube-apiserver-image.txt in 11.592291954s"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Imported images from /var/lib/rancher/rke2/agent/images/kube-controller-manager-image.txt in 778.012µs"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Pulling images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Imported images from /var/lib/rancher/rke2/agent/images/kube-scheduler-image.txt in 530.893µs"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Connecting to proxy" url="<wss://127.0.0.1:9345/v1-rke2/connect>"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=<http://srvl034a.ac.aixigo.de|srvl034a.ac.aixigo.de> --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-labels= --pod-infra-container-image=<http://index.docker.io/rancher/pause:3.6|index.docker.io/rancher/pause:3.6> --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Handling backend connection request [<http://srvl034a.ac.aixigo.de|srvl034a.ac.aixigo.de>]"
rke2[2629]: time="2023-02-01T10:28:36+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
rke2[2738]: Flag --volume-plugin-dir has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/> for more information.
rke2[2738]: Flag --file-check-frequency has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/> for more information.
          :
rke2[2738]: Flag --tls-cert-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/> for more information.
rke2[2738]: Flag --tls-private-key-file has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See <https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/> for more information.
rke2[2629]: time="2023-02-01T10:28:38+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:38+01:00" level=info msg="Container for etcd not found (no matching container found), retrying"
rke2[2629]: time="2023-02-01T10:28:41+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
rke2[2629]: time="2023-02-01T10:28:43+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:46+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
rke2[2629]: {"level":"warn","ts":"2023-02-01T10:28:48.543+0100","logger":"etcd-client","caller":"v3@v3.5.4-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc001611500/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
rke2[2629]: time="2023-02-01T10:28:48+01:00" level=info msg="Failed to test data store connection: context deadline exceeded"
rke2[2629]: time="2023-02-01T10:28:48+01:00" level=info msg="Waiting for etcd server to become available"
rke2[2629]: time="2023-02-01T10:28:48+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:48+01:00" level=info msg="Waiting for API server to become available"
rke2[2629]: time="2023-02-01T10:28:51+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
rke2[2629]: time="2023-02-01T10:28:53+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:56+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
rke2[2629]: time="2023-02-01T10:28:58+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
rke2[2629]: time="2023-02-01T10:28:58+01:00" level=info msg="Container for etcd not found (no matching container found), retrying"
rke2[2629]: time="2023-02-01T10:29:01+01:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
rke2[2629]: time="2023-02-01T10:29:03+01:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
This logs were created by running a native installation of rke2, without the help of Rancher.
c

creamy-pencil-82913

02/01/2023, 5:08 PM
Attaching a log file or using pastebin is preferred to just dumping them here. Do they continue on like that, waiting for etcd to start?
You might also check the kubelet and containerd logs, I don’t see anything in there that indicates why etcd isn’t starting