07/10/2022, 10:06 PM
Hey guys, I have a cluster with 3 master nodes. One of my offsite noes which is a master node was offline for 2 weeks due to hw failure. I brought it up but K3s didn't start up for an hour or so, so I attempted deletion of etcd db files and manually removed it from the etcd cluster. Tried to start it again and I saw the database size grow nicely as it was syncing the content. Around 600MB out of the total 780 MB, etcd complains that the member was removed from the cluster and dies. I'm not sure what removes the member - would it be K3s timing out with the start up and aborting the process?
I've since increased the heartbeat and election timeouts just in case, and ran etcd manually from standalone binary using the k3s config and whilst it does seem to complete the snapshot download, I still get
"etcdserver: request time
d out"}
non stop when it attempts to
"failed to publish local member to c
luster through raft"
. I can't use etcdctl on this node either - not sure how to proceed from here to be honest.
Scrapped the idea, moved the node to the same site as others.