This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

10/30/2024, 8:52 AM

This message was deleted.

orange-hospital-8952

10/30/2024, 8:54 AM

the original existing etcd + control plane + worker node started spewing errors

orange-hospital-8952

10/30/2024, 8:54 AM

Copy code

o msg="Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 3.694719ms"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Connecting to proxy" url="<wss://127.0.0.1:9345/v1-rke2/connect>"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Creating rke2-cert-monitor event broadcaster"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --auth>
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Handling backend connection request [node02-shadow]"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Remotedialer connected to proxy" url="<wss://127.0.0.1:9345/v1-rke2/connect>"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=error msg="Sending HTTP 503 response to 127.0.0.1:39188: runtime core not ready"
Oct 30 04:46:44 node02-shadow rke2[15755]: time="2024-10-30T04:46:44-04:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack>
Oct 30 04:46:56 node02-shadow rke2[15755]: time="2024-10-30T04:46:56-04:00" level=info msg="Pod for etcd is synced"
Oct 30 04:47:06 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:06.160593-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:06 node02-shadow rke2[15755]: time="2024-10-30T04:47:06-04:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Oct 30 04:47:06 node02-shadow rke2[15755]: time="2024-10-30T04:47:06-04:00" level=info msg="Waiting for etcd server to become available"
Oct 30 04:47:06 node02-shadow rke2[15755]: time="2024-10-30T04:47:06-04:00" level=info msg="Waiting for API server to become available"
Oct 30 04:47:13 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:13.490669-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:13 node02-shadow rke2[15755]: time="2024-10-30T04:47:13-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:47:28 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:28.49116-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","ta>
Oct 30 04:47:28 node02-shadow rke2[15755]: time="2024-10-30T04:47:28-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:47:36 node02-shadow rke2[15755]: time="2024-10-30T04:47:36-04:00" level=info msg="Waiting for etcd server to become available"
Oct 30 04:47:36 node02-shadow rke2[15755]: time="2024-10-30T04:47:36-04:00" level=info msg="Waiting for API server to become available"
Oct 30 04:47:41 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:41.161304-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:41 node02-shadow rke2[15755]: time="2024-10-30T04:47:41-04:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Oct 30 04:47:43 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:43.491726-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:43 node02-shadow rke2[15755]: time="2024-10-30T04:47:43-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:47:58 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:58.492275-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:58 node02-shadow rke2[15755]: time="2024-10-30T04:47:58-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:48:06 node02-shadow rke2[15755]: time="2024-10-30T04:48:06-04:00" level=info m

creamy-pencil-82913

10/30/2024, 9:11 AM

check the etcd pod logs. Are you sure the two etcd nodes are able to talk to each other?

creamy-pencil-82913

10/30/2024, 9:12 AM

also note that 2 etcd nodes is worse than 1. You should always have an odd number.

orange-hospital-8952

10/30/2024, 9:23 AM

ah yeah, they are, but the new node logs just keep printing this

Copy code

"msg":"rejected connection","remote-addr":"192.168.1.69:51838","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.034721Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51850","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.035878Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51848","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.13299Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51862","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.135306Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51870","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.23633Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51880","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.238307Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51884","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.33889Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51900","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.339094Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51896","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.43692Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51912","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.439189Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51920","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.533255Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51934","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.534601Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51936","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.633556Z","caller":"embed/config_logging.go:169","msg":"re

creamy-pencil-82913

10/30/2024, 10:34 AM

What process exactly did you follow to add the new node? Looks like you messed up the certs somehow.

orange-hospital-8952

10/30/2024, 4:59 PM

i installed rke2 on a new node and then ran the registration command

orange-hospital-8952

10/30/2024, 4:59 PM

afaict, that was all i did

orange-hospital-8952

10/30/2024, 5:04 PM

don't believe i touched anything else

orange-hospital-8952

10/30/2024, 5:40 PM

ideally, don't want to rebuild the cluster to get everything back up

creamy-pencil-82913

10/30/2024, 5:45 PM

Is this a Rancher-provisioned cluster?

creamy-pencil-82913

10/30/2024, 5:46 PM

You’re not supposed to install and start RKE2 first, THEN run the registration command. Registering a node with Rancher will install and start RKE2 for you, using the correct configuration.

creamy-pencil-82913

10/30/2024, 5:47 PM

The errors are because you probably just brought that node up as a standalone cluster with a totally different configuration, and then when the registration command ran it tried to join it to the other cluster - which won’t work.

creamy-pencil-82913

10/30/2024, 5:47 PM

Wipe that node, and start over from scratch. Don’t install and start RKE2 before running the registration command from Rancher.

orange-hospital-8952

10/30/2024, 5:58 PM

which node? the existing one or the new one

orange-hospital-8952

10/30/2024, 5:58 PM

the existing node was the one trying to talk to the new one (the cert errors above)

orange-hospital-8952

10/30/2024, 5:59 PM

yep rancher provisioned

creamy-pencil-82913

10/30/2024, 6:03 PM

yes. don’t do that. let Rancher do everything for you.

orange-hospital-8952

10/30/2024, 8:09 PM

do i run it on the existing broken node?

orange-hospital-8952

10/30/2024, 8:09 PM

since the cluster is now broken

orange-hospital-8952

10/30/2024, 8:12 PM

i wiped the new node and only ran the registration command and that didn't seem to do anything

creamy-pencil-82913

10/30/2024, 8:20 PM

You may have got it stuck now. See if you can delete the node from Rancher. You might need to restore to an etcd snapshot taken before you added that node, so that the etcd cluster doesn’t think it’s lost quorum due to only having 1 of 2 nodes.

orange-hospital-8952

10/30/2024, 9:47 PM

woot, that worked! thanks so much!

33 Views

Open in Slack

Previous Next