wonderful-terabyte-28236
08/15/2023, 7:35 PMkube-controller-manager
is set to that node and is currently deleted. Is there anyway to change the kube-controller-manager or get etcd issue to fix so we can redeploy this node? Basically our cluster is stuck waiting to provision a node that doesn't exist, and when we try to create it, it fails.
$ kubectl describe endpoints kube-controller-manager -n kube-system
Name: kube-controller-manager
Namespace: kube-system
Labels: <none>
Annotations: <http://control-plane.alpha.kubernetes.io/leader|control-plane.alpha.kubernetes.io/leader>:
{"holderIdentity":"control-1","leaseDurationSeconds":15,"acquireTime":"2022-01-11T16:00:56Z...
Subsets:
Events: <none>
wonderful-terabyte-28236
08/16/2023, 8:33 PMhallowed-carpet-38238
08/17/2023, 1:25 PMhallowed-carpet-38238
08/17/2023, 1:25 PMgorgeous-pizza-36569
08/17/2023, 4:02 PMwonderful-terabyte-28236
08/17/2023, 4:04 PMINFO: Arguments: --server <https://rancher.nameofplatform.com> --token REDACTED -r -n m-ccnpb
INFO: Environment: CATTLE_ADDRESS=10.217.43.18 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-ccnpb CATTLE_SERVER=<https://rancher.nameofplatform.com> CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: nameserver 127.0.0.53 options edns0 trust-ad search <http://3xjoa1i2lshejpt0ninj35opia.bx.internal.cloudapp.net|3xjoa1i2lshejpt0ninj35opia.bx.internal.cloudapp.net>
WARN: Loopback address found in /etc/resolv.conf, please refer to the documentation how to configure your cluster to resolve DNS properly
INFO: <https://rancher.nameofplatform.com/ping> is accessible
INFO: <http://rancher.nameofplatform.com|rancher.nameofplatform.com> resolves to 10.217.42.5
time="2023-08-17T15:57:24Z" level=info msg="Listening on /tmp/log.sock"
time="2023-08-17T15:57:24Z" level=info msg="Rancher agent version v2.6.13 is starting"
time="2023-08-17T15:57:24Z" level=info msg="Option etcd=false"
time="2023-08-17T15:57:24Z" level=info msg="Option controlPlane=false"
time="2023-08-17T15:57:24Z" level=info msg="Option worker=false"
time="2023-08-17T15:57:24Z" level=info msg="Option requestedHostname=m-ccnpb"
time="2023-08-17T15:57:24Z" level=info msg="Option dockerInfo={FUN4:2SZJ:AXCV:IW6F:JUDO:MHCD:WIJT:QGAB:PAE4:2VWN:GZCI:SGDX 1 1 0 0 1 overlay2 [[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff false] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true false false true true true true true true true true false 32 false 40 2023-08-17T15:57:24.266091701Z json-file systemd 2 0 5.15.0-1041-azure Ubuntu 22.04.3 LTS 22.04 linux x86_64 <https://index.docker.io/v1/> 0xc001296150 2 8324927488 [] /var/lib/docker rke-infra-control-4 [provider=azure] false 20.10.24 map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc { inactive false [] 0 0 <nil> []} false docker-init {8165feabfdfe38c65b599c4993d227328c231fca 8165feabfdfe38c65b599c4993d227328c231fca} {v1.1.8-0-g82f18fe v1.1.8-0-g82f18fe} {de40ad0 de40ad0} [name=apparmor name=seccomp,profile=default name=cgroupns] [] []}"
time="2023-08-17T15:57:24Z" level=info msg="Option customConfig=map[address:10.217.43.18 internalAddress: label:map[] roles:[] taints:[]]"
time="2023-08-17T15:57:24Z" level=info msg="Connecting to <wss://rancher.nameofplatform.com/v3/connect> with token starting with 945w6gmwldmfpsadasda42knk6r"
time="2023-08-17T15:57:24Z" level=info msg="Connecting to proxy" url="<wss://rancher.nameofplatform.com/v3/connect>"
time="2023-08-17T15:57:24Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-08-17T15:57:24Z" level=info msg="Starting plan monitor, checking every 15 seconds"
time="2023-08-17T15:57:39Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-08-17T15:57:39Z" level=info msg="Plan monitor checking 120 seconds"
billowy-egg-51769
08/21/2023, 5:40 PMssh_key_path: /home/...
cluster_name: kubernetes
#kubernetes_version: 'v1.21.14-rancher1-1'
kubernetes_version: 'v1.24.16-rancher1-1'
enable_cri_dockerd: true
authorization:
mode: rbac
#network:
# plugin: calico
ingress:
provider: none
services:
kube-api:
service_cluster_ip_range: 172.16.128.0/17
kube_controller:
cluster_cidr: 172.16.0.0/17
service_cluster_ip_range: 172.16.128.0/17
kubelet:
cluster_dns_server: 172.16.128.10
nodes:
- address: 10.75.144.169
user: rke
role:
- etcd
- controlplane
wide-easter-7639
08/22/2023, 9:44 AMwide-easter-7639
08/22/2023, 9:46 AMrich-shoe-36510
08/24/2023, 9:49 AMabundant-flag-93304
08/24/2023, 1:11 PMwonderful-terabyte-28236
08/25/2023, 3:16 PMwonderful-terabyte-28236
08/28/2023, 7:57 PMgorgeous-pizza-36569
08/29/2023, 6:32 PMmany-zebra-31494
08/31/2023, 6:36 PM/etc/rancher/rke2
but the registries.yaml
file always ends up back to the default (oops wrong channel!)bumpy-queen-38421
09/01/2023, 5:58 AMechoing-tomato-53055
09/01/2023, 11:21 AMlate-beard-23404
09/01/2023, 12:50 PMacoustic-addition-45641
09/01/2023, 1:00 PMsquare-park-59193
09/04/2023, 9:23 AMsquare-park-59193
09/04/2023, 9:24 AMbumpy-queen-38421
09/04/2023, 5:09 PMbored-nest-98612
09/07/2023, 12:32 PMgifted-army-13924
09/11/2023, 11:42 AMadventurous-iron-36662
09/15/2023, 7:38 AMquiet-fountain-61995
09/15/2023, 3:19 PMRequesting kubelet certificate regeneration
i have tried rebuilding the rke cluster and re-installling the agent i still keep getting the same error. any ideas on what i could be doing wrong? note im creating rke cluster using the rke cli and not through rancher.acoustic-country-78083
09/21/2023, 11:37 AMprehistoric-gpu-28362
09/27/2023, 9:40 AMmany-zebra-31494
09/27/2023, 6:47 PM--root-dir=/opt/rke/var/lib/kubelet
whereas all the previous nodes are using --root-dir=/var/lib/kublet
this is causing the AWS aws-ebs-csi-driver to not be able to run on the new nodes - anyone have any thoughts on why/how this changed?quaint-microphone-1347
09/27/2023, 11:07 PM