https://rancher.com/ logo
Title
r

rich-crowd-36987

10/17/2022, 4:42 PM
Follow on question to my attempts to move nodes to different AWS AZs... Tried doing this with another stack and am having less luck! The original cluster had 3 master nodes. Currently 1 of those 3 is functional. On the first master node, if I remove the
/var/lib/rancher/rke2
directory and relaunch
rke2 server
, it appears to create an entirely new cluster (as the process starts successfully, but
kubectl get nodes
only returns itself.) On the second master node, after removing the dir and relaunching the service, it is just showing this loop in the logs:
Oct 17 16:37:09 <http://k8mst02.espc-nostromo.nos-amc.io|k8mst02.espc-nostromo.nos-amc.io> rke2[32216]: time="2022-10-17T16:37:09Z" level=info msg="Failed to test data store connection: this server has not yet been promoted from learner to voting member"
Oct 17 16:37:10 <http://k8mst02.espc-nostromo.nos-amc.io|k8mst02.espc-nostromo.nos-amc.io> rke2[32216]: time="2022-10-17T16:37:10Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
On Master 1, if I keep the original
/var/lib/rancher/rke2
directory, I get the following panic:
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/pipeline.go:86","msg":"stopped HTTP pipelining with remote peer","local-member-id":"fd35c5c2c37ef2ee","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"fd35c5c2c37ef2ee","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"fd35c5c2c37ef2ee","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/peer.go:340","msg":"stopped remote peer","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/transport.go:369","msg":"removed remote peer","local-member-id":"fd35c5c2c37ef2ee","removed-remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: panic: removed all voters
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: goroutine 206 [running]:
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: <http://go.etcd.io/etcd/raft.(*raft).applyConfChange(0xc0001c0500|go.etcd.io/etcd/raft.(*raft).applyConfChange(0xc0001c0500>, 0x0, 0xc00091dbf0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: /go/pkg/mod/github.com/k3s-io/etcd@v0.5.0-alpha.5.0.20220113195313-6c2233a709e8/raft/raft.go:1514 +0x225
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: <http://go.etcd.io/etcd/raft.(*node).run(0xc0014440c0)|go.etcd.io/etcd/raft.(*node).run(0xc0014440c0)>
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: /go/pkg/mod/github.com/k3s-io/etcd@v0.5.0-alpha.5.0.20220113195313-6c2233a709e8/raft/node.go:356 +0x845
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: created by <http://go.etcd.io/etcd/raft.RestartNode|go.etcd.io/etcd/raft.RestartNode>
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: /go/pkg/mod/github.com/k3s-io/etcd@v0.5.0-alpha.5.0.20220113195313-6c2233a709e8/raft/node.go:240 +0x330
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> systemd[1]: rke2-server.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Ultimately got this working. Not sure what the extra difficultly was this time, but I had to use
etcdctl
to manually promote the learner node, and on my first master node also had to include the
server:
cfg option (and eventually promote it too when it tried to join the existing cluster). But here I am now with three master nodes again, so hooray.