This message was deleted.
# general
a
This message was deleted.
c
These files are created by rke2 when it starts. Check the rke2-server log to see why it's not starting
b
Copy code
Nov 14 14:46:09 vm53245 rke2[248590]: time="2023-11-14T14:46:09-03:00" level=info msg="Waiting for cri connection: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
Nov 14 14:46:17 vm53245 rke2[248590]: time="2023-11-14T14:46:17-03:00" level=info msg="Waiting for containerd startup: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/k3s/containerd/containerd.sock: connect: connection refused\""
The best I found that could be related. suggestions?
c
I’ve never seen that before. Can you check the logs at /var/lib/rancher/rke2/agent/containerd/containerd.log ?
b
Copy code
time="2023-11-14T16:29:44.014214414-03:00" level=info msg="CreateContainer within sandbox \"0d374130828f1dea445ce4e25d2a3d44998d11d78eb7840fd634749c91b05631\" for &ContainerMetadata{Name:cluster-register,Attempt:3,} returns container id \"acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca\""
time="2023-11-14T16:29:44.014741928-03:00" level=info msg="StartContainer for \"acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca\""
time="2023-11-14T16:29:44.164595936-03:00" level=info msg="StartContainer for \"acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca\" returns successfully"
time="2023-11-14T16:29:44.596939155-03:00" level=info msg="shim disconnected" id=acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:29:44.597007488-03:00" level=warning msg="cleaning up after shim disconnected" id=acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:29:44.597021048-03:00" level=info msg="cleaning up dead shim" namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:29:44.767526307-03:00" level=info msg="RemoveContainer for \"26058e60fdf29aad194d6d0a576fb22fb42fda854aa3b436ba8139060b6fc43e\""
time="2023-11-14T16:29:44.775798151-03:00" level=info msg="RemoveContainer for \"26058e60fdf29aad194d6d0a576fb22fb42fda854aa3b436ba8139060b6fc43e\" returns successfully"
time="2023-11-14T16:30:26.990747612-03:00" level=info msg="CreateContainer within sandbox \"0d374130828f1dea445ce4e25d2a3d44998d11d78eb7840fd634749c91b05631\" for container &ContainerMetadata{Name:cluster-register,Attempt:4,}"
time="2023-11-14T16:30:27.011744713-03:00" level=info msg="CreateContainer within sandbox \"0d374130828f1dea445ce4e25d2a3d44998d11d78eb7840fd634749c91b05631\" for &ContainerMetadata{Name:cluster-register,Attempt:4,} returns container id \"7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d\""
time="2023-11-14T16:30:27.012243941-03:00" level=info msg="StartContainer for \"7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d\""
time="2023-11-14T16:30:27.149825986-03:00" level=info msg="StartContainer for \"7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d\" returns successfully"
time="2023-11-14T16:30:27.596337436-03:00" level=info msg="shim disconnected" id=7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:30:27.596401694-03:00" level=warning msg="cleaning up after shim disconnected" id=7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:30:27.596415016-03:00" level=info msg="cleaning up dead shim" namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:30:27.865157117-03:00" level=info msg="RemoveContainer for \"acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca\""
time="2023-11-14T16:30:27.869728322-03:00" level=info msg="RemoveContainer for \"acde4ee74e80cea4659b99ec5dc2fd1515b0c7b11cfde0695202dee266cef8ca\" returns successfully"
time="2023-11-14T16:31:57.991588900-03:00" level=info msg="CreateContainer within sandbox \"0d374130828f1dea445ce4e25d2a3d44998d11d78eb7840fd634749c91b05631\" for container &ContainerMetadata{Name:cluster-register,Attempt:5,}"
time="2023-11-14T16:31:58.026234863-03:00" level=info msg="CreateContainer within sandbox \"0d374130828f1dea445ce4e25d2a3d44998d11d78eb7840fd634749c91b05631\" for &ContainerMetadata{Name:cluster-register,Attempt:5,} returns container id \"ce7e3dd2c129d3931c13f5eca2ac98f85924808e85cb89c92bddc5989f08f365\""
time="2023-11-14T16:31:58.026791642-03:00" level=info msg="StartContainer for \"ce7e3dd2c129d3931c13f5eca2ac98f85924808e85cb89c92bddc5989f08f365\""
time="2023-11-14T16:31:58.161432269-03:00" level=info msg="StartContainer for \"ce7e3dd2c129d3931c13f5eca2ac98f85924808e85cb89c92bddc5989f08f365\" returns successfully"
time="2023-11-14T16:31:58.606578970-03:00" level=info msg="shim disconnected" id=ce7e3dd2c129d3931c13f5eca2ac98f85924808e85cb89c92bddc5989f08f365 namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:31:58.606698148-03:00" level=warning msg="cleaning up after shim disconnected" id=ce7e3dd2c129d3931c13f5eca2ac98f85924808e85cb89c92bddc5989f08f365 namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:31:58.606712835-03:00" level=info msg="cleaning up dead shim" namespace=<http://k8s.io|k8s.io>
time="2023-11-14T16:31:59.068068950-03:00" level=info msg="RemoveContainer for \"7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d\""
time="2023-11-14T16:31:59.074883023-03:00" level=info msg="RemoveContainer for \"7c1e1b71be8156c1880fb23458f23322037e5c18326f608d77392e4a9ec5bb7d\" returns successfully"
It keeps creating and removing the container... I'll try to get some errors previous to this.
c
Well it's running so clearly it got past that error about not being able to connect
Can you list pods using kubectl?
b
Copy code
[root@vm53245 default]# /var/lib/rancher/rke2/data/v1.26.8-rke2r1-fbb8da8a4df9/bin/kubectl get pods --all-namespaces --kubeconfig /etc/rancher/rke2/rke2.yaml
NAMESPACE         NAME                                                    READY   STATUS             RESTARTS      AGE
calico-system     calico-kube-controllers-86485fc7d6-sxzv2                1/1     Running            0             8m25s
calico-system     calico-node-hpkv5                                       1/1     Running            0             8m25s
calico-system     calico-typha-db99f986c-qq7ll                            1/1     Running            0             8m25s
cattle-system     cattle-cluster-agent-59bc4bd585-rhhq2                   0/1     CrashLoopBackOff   6 (45s ago)   9m
kube-system       cloud-controller-manager-vm53245                        1/1     Running            0             9m2s
kube-system       etcd-vm53245                                            1/1     Running            0             8m53s
kube-system       helm-install-rke2-calico-5k7pt                          0/1     Completed          2             9m1s
kube-system       helm-install-rke2-calico-crd-qzwzm                      0/1     Completed          0             9m1s
kube-system       helm-install-rke2-coredns-9zxz5                         0/1     Completed          0             9m1s
kube-system       helm-install-rke2-ingress-nginx-6pmqw                   0/1     Completed          0             9m1s
kube-system       helm-install-rke2-metrics-server-rhf76                  0/1     Completed          0             9m1s
kube-system       helm-install-rke2-snapshot-controller-crd-r9zwx         0/1     Completed          0             9m1s
kube-system       helm-install-rke2-snapshot-controller-t9cqs             0/1     Completed          2             9m1s
kube-system       helm-install-rke2-snapshot-validation-webhook-lbwfn     0/1     Completed          0             9m1s
kube-system       kube-apiserver-vm53245                                  1/1     Running            0             8m58s
kube-system       kube-controller-manager-vm53245                         1/1     Running            0             8m24s
kube-system       kube-proxy-vm53245                                      1/1     Running            0             8m58s
kube-system       kube-scheduler-vm53245                                  1/1     Running            0             8m21s
kube-system       rke2-coredns-rke2-coredns-7c98b7488c-kmbhq              1/1     Running            0             8m49s
kube-system       rke2-coredns-rke2-coredns-autoscaler-65b5bfc754-vq6gl   1/1     Running            0             8m49s
kube-system       rke2-ingress-nginx-controller-6mqpz                     1/1     Running            0             7m10s
kube-system       rke2-metrics-server-5bf59cdccb-sbvks                    1/1     Running            0             7m25s
kube-system       rke2-snapshot-controller-6f7bbb497d-2vrb9               1/1     Running            0             7m18s
kube-system       rke2-snapshot-validation-webhook-65b5675d5c-kg9d5       1/1     Running            0             7m39s
tigera-operator   tigera-operator-6869bc46c4-s88b6                        1/1     Running            0             8m33s
It seems your hunch is leading me to something.
c
yeah cluster seems fine. see why the cattle-cluster-agent is crashing
b
Copy code
INFO: Environment: CATTLE_ADDRESS=10.42.57.203 CATTLE_CA_CHECKSUM=212e14d5b8aab876cae488599f3e32e510ccb92895f7c1b5c7dc4f67ddc68a0a CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=<tcp://10.43.73.6:80> CATTLE_CLUSTER_AGENT_PORT_443_TCP=<tcp://10.43.73.6:443> CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.43.73.6 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=<tcp://10.43.73.6:80> CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.43.73.6 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.43.73.6 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES=embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false CATTLE_INGRESS_IP_DOMAIN=<http://sslip.io|sslip.io> CATTLE_INSTALL_UUID=147d31a6-ebb6-4686-876b-43d7b42d2a6b CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-59bc4bd585-rhhq2 CATTLE_RANCHER_WEBHOOK_MIN_VERSION= CATTLE_RANCHER_WEBHOOK_VERSION=2.0.6+up0.3.6 CATTLE_SERVER=<https://mysite> CATTLE_SERVER_VERSION=v2.7.9
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local <http://prodemge.gov.br|prodemge.gov.br> <http://d02.prodemge.gov.br|d02.prodemge.gov.br> nameserver 10.43.0.10 options ndots:5
INFO: <https://mysite/ping> is accessible
INFO: <http://labrancher.prodemge.gov.br|labrancher.prodemge.gov.br> resolves to 1.2.3.4
INFO: Value from <https://mysite/v3/settings/cacerts> is an x509 certificate
time="2023-11-14T21:02:27Z" level=info msg="Listening on /tmp/log.sock"
time="2023-11-14T21:02:27Z" level=info msg="Rancher agent version v2.7.9 is starting"
time="2023-11-14T21:02:27Z" level=fatal msg="looking up cattle-system/cattle ca/token: failed to find service account cattle-system/cattle: Get \"<https://10.43.0.1:443/api/v1/namespaces/cattle-system/serviceaccounts/cattle>\": Service Unavailable"
seems like a certificate problem. I'm using a self-signed certificate... probably the issue is there... right?
c
Copy code
Get \"<https://10.43.0.1:443/api/v1/namespaces/cattle-system/serviceaccounts/cattle>\": Service Unavailable"
That’s unusual
can you
kubectl get serviceaccount -n cattle-system
?
b
Copy code
NAME      SECRETS   AGE
cattle    0         40h
default   0         40h
there we go...
This time I have no idea on what I'm looking for.
I'm really out of depth this time. Do you have any general advice that should I follow with the issue am I facing?