ancient-energy-15842
05/16/2023, 8:38 PMAgentDeployed False
and it seems that my control plane is broken, the thing is, that I cant create more control plane nodes because they get stuck with Waiting for registering with kubernetes
, Rancher version is 2.7.3 and Kubernetes version is 1.24.13
SSHing into those machines I just see 2 containers with docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ed435976827f rancher/rancher-agent:v2.7.3 "run.sh --no-registe…" 10 minutes ago Exited (0) 10 minutes ago share-mnt
44860d74081d rancher/rancher-agent:v2.7.3 "run.sh --server htt…" 10 minutes ago Up 10 minutes admiring_yalow
Taking a look into the logs of the rancher-agent
that it's still running I get
INFO: Arguments: --server <https://REDACTED> --token REDACTED -r -n m-644rr
INFO: Environment: CATTLE_ADDRESS=172.31.37.252 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-644rr CATTLE_SERVER=<https://REDACTED> CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: nameserver 127.0.0.53 options edns0 trust-ad search us-east-2.compute.internal
WARN: Loopback address found in /etc/resolv.conf, please refer to the documentation how to configure your cluster to resolve DNS properly
INFO: <https://REDACTED/ping> is accessible
INFO: REDACTED resolves to REDACTED
time="2023-05-16T20:33:41Z" level=info msg="Listening on /tmp/log.sock"
time="2023-05-16T20:33:41Z" level=info msg="Rancher agent version v2.7.3 is starting"
time="2023-05-16T20:33:41Z" level=info msg="Option worker=false"
time="2023-05-16T20:33:41Z" level=info msg="Option requestedHostname=m-644rr"
time="2023-05-16T20:33:41Z" level=info msg="Option dockerInfo={PFIV:LBRE:AMMF:JNWU:4XQZ:GPJN:4FSD:4O6A:U336:VFBT:7WVD:AKOS 2 1 0 1 1 overlay2 [[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true false false true true true true true true true true false 32 false 39 2023-05-16T20:33:41.064129161Z json-file systemd 2 0 5.19.0-1025-aws Ubuntu 22.04.2 LTS 22.04 linux x86_64 <https://index.docker.io/v1/> 0xc0011f0a10 4 16629444608 [] /var/lib/docker control-plane-5 [provider=amazonec2] false 20.10.23 map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc { inactive false [] 0 0 <nil> []} false docker-init {3dce8eb055cbb6872793272b4f20ed16117344f8 3dce8eb055cbb6872793272b4f20ed16117344f8} {v1.1.7-0-g860f061 v1.1.7-0-g860f061} {de40ad0 de40ad0} [name=apparmor name=seccomp,profile=default name=cgroupns] [] []}"
time="2023-05-16T20:33:41Z" level=info msg="Option customConfig=map[address:172.31.37.252 internalAddress: label:map[] roles:[] taints:[]]"
time="2023-05-16T20:33:41Z" level=info msg="Option etcd=false"
time="2023-05-16T20:33:41Z" level=info msg="Option controlPlane=false"
time="2023-05-16T20:33:41Z" level=info msg="Connecting to <wss://REDACTED/v3/connect> with token starting with c5brthtz9nwjwnmrqr5spckpw45"
time="2023-05-16T20:33:41Z" level=info msg="Connecting to proxy" url="<wss://REDACTED/v3/connect>"
time="2023-05-16T20:33:41Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-05-16T20:33:41Z" level=info msg="Starting plan monitor, checking every 120 seconds"
time="2023-05-16T20:35:41Z" level=info msg="Requesting kubelet certificate regeneration"
Any ideas why they get stuck at Requesting kubelet certificate regeneration
?2023/05/16 21:32:07 [ERROR] Failed to handle tunnel request from remote address :40636: response 400: <http://nodes.management.cattle.io|nodes.management.cattle.io> "c-x8lmc/m-308799fb7fde" not found
2023/05/16 21:32:07 [ERROR] Failed to handle tunnel request from remote address :39390: response 400: <http://nodes.management.cattle.io|nodes.management.cattle.io> "c-x8lmc/m-8e45b2978781" not found
powerful-branch-76072
05/17/2023, 1:49 AMancient-energy-15842
05/17/2023, 1:50 AMpowerful-branch-76072
05/17/2023, 1:59 AMancient-energy-15842
05/17/2023, 2:00 AMpowerful-branch-76072
05/17/2023, 2:18 AMancient-energy-15842
05/17/2023, 2:23 AMpowerful-branch-76072
05/17/2023, 2:24 AMancient-energy-15842
05/17/2023, 2:25 AMpowerful-branch-76072
05/17/2023, 2:27 AMRequesting kubelet certificate regeneration"
msg="Connecting to proxy" url="<wss://REDACTED/v3/connect>"
you may have a look if REDCATED is dns resolved and reachable? if not, it mean websocket to it will fail to connectancient-energy-15842
05/17/2023, 2:37 AM2023/05/16 21:32:07 [ERROR] Failed to handle tunnel request from remote address :40636: response 400: <http://nodes.management.cattle.io|nodes.management.cattle.io> "c-x8lmc/m-308799fb7fde" not found
that IP is from one of those control plane nodes