This message was deleted.
# general
a
This message was deleted.
o
the original existing etcd + control plane + worker node started spewing errors
Copy code
o msg="Imported images from /var/lib/rancher/rke2/agent/images/runtime-image.txt in 3.694719ms"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Connecting to proxy" url="<wss://127.0.0.1:9345/v1-rke2/connect>"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Creating rke2-cert-monitor event broadcaster"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Running kubelet --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --auth>
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Handling backend connection request [node02-shadow]"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=info msg="Remotedialer connected to proxy" url="<wss://127.0.0.1:9345/v1-rke2/connect>"
Oct 30 04:46:43 node02-shadow rke2[15755]: time="2024-10-30T04:46:43-04:00" level=error msg="Sending HTTP 503 response to 127.0.0.1:39188: runtime core not ready"
Oct 30 04:46:44 node02-shadow rke2[15755]: time="2024-10-30T04:46:44-04:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack>
Oct 30 04:46:56 node02-shadow rke2[15755]: time="2024-10-30T04:46:56-04:00" level=info msg="Pod for etcd is synced"
Oct 30 04:47:06 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:06.160593-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:06 node02-shadow rke2[15755]: time="2024-10-30T04:47:06-04:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Oct 30 04:47:06 node02-shadow rke2[15755]: time="2024-10-30T04:47:06-04:00" level=info msg="Waiting for etcd server to become available"
Oct 30 04:47:06 node02-shadow rke2[15755]: time="2024-10-30T04:47:06-04:00" level=info msg="Waiting for API server to become available"
Oct 30 04:47:13 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:13.490669-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:13 node02-shadow rke2[15755]: time="2024-10-30T04:47:13-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:47:28 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:28.49116-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","ta>
Oct 30 04:47:28 node02-shadow rke2[15755]: time="2024-10-30T04:47:28-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:47:36 node02-shadow rke2[15755]: time="2024-10-30T04:47:36-04:00" level=info msg="Waiting for etcd server to become available"
Oct 30 04:47:36 node02-shadow rke2[15755]: time="2024-10-30T04:47:36-04:00" level=info msg="Waiting for API server to become available"
Oct 30 04:47:41 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:41.161304-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:41 node02-shadow rke2[15755]: time="2024-10-30T04:47:41-04:00" level=info msg="Failed to test data store connection: context deadline exceeded"
Oct 30 04:47:43 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:43.491726-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:43 node02-shadow rke2[15755]: time="2024-10-30T04:47:43-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:47:58 node02-shadow rke2[15755]: {"level":"warn","ts":"2024-10-30T04:47:58.492275-0400","logger":"etcd-client","caller":"v3@v3.5.13-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","t>
Oct 30 04:47:58 node02-shadow rke2[15755]: time="2024-10-30T04:47:58-04:00" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Oct 30 04:48:06 node02-shadow rke2[15755]: time="2024-10-30T04:48:06-04:00" level=info m
c
check the etcd pod logs. Are you sure the two etcd nodes are able to talk to each other?
also note that 2 etcd nodes is worse than 1. You should always have an odd number.
o
ah yeah, they are, but the new node logs just keep printing this
Copy code
"msg":"rejected connection","remote-addr":"192.168.1.69:51838","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.034721Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51850","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.035878Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51848","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.13299Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51862","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.135306Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51870","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.23633Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51880","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.238307Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51884","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.33889Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51900","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.339094Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51896","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.43692Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51912","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.439189Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51920","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.533255Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51934","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.534601Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.1.69:51936","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-10-30T09:12:24.633556Z","caller":"embed/config_logging.go:169","msg":"re
c
What process exactly did you follow to add the new node? Looks like you messed up the certs somehow.
o
i installed rke2 on a new node and then ran the registration command
afaict, that was all i did
don't believe i touched anything else
ideally, don't want to rebuild the cluster to get everything back up
c
Is this a Rancher-provisioned cluster?
You’re not supposed to install and start RKE2 first, THEN run the registration command. Registering a node with Rancher will install and start RKE2 for you, using the correct configuration.
The errors are because you probably just brought that node up as a standalone cluster with a totally different configuration, and then when the registration command ran it tried to join it to the other cluster - which won’t work.
Wipe that node, and start over from scratch. Don’t install and start RKE2 before running the registration command from Rancher.
o
which node? the existing one or the new one
the existing node was the one trying to talk to the new one (the cert errors above)
yep rancher provisioned
c
yes. don’t do that. let Rancher do everything for you.
o
do i run it on the existing broken node?
since the cluster is now broken
i wiped the new node and only ran the registration command and that didn't seem to do anything
c
You may have got it stuck now. See if you can delete the node from Rancher. You might need to restore to an etcd snapshot taken before you added that node, so that the etcd cluster doesn’t think it’s lost quorum due to only having 1 of 2 nodes.
o
woot, that worked! thanks so much!