rich-crowd-36987
10/17/2022, 4:42 PM/var/lib/rancher/rke2
directory and relaunch rke2 server
, it appears to create an entirely new cluster (as the process starts successfully, but kubectl get nodes
only returns itself.)
On the second master node, after removing the dir and relaunching the service, it is just showing this loop in the logs:
Oct 17 16:37:09 <http://k8mst02.espc-nostromo.nos-amc.io|k8mst02.espc-nostromo.nos-amc.io> rke2[32216]: time="2022-10-17T16:37:09Z" level=info msg="Failed to test data store connection: this server has not yet been promoted from learner to voting member"
Oct 17 16:37:10 <http://k8mst02.espc-nostromo.nos-amc.io|k8mst02.espc-nostromo.nos-amc.io> rke2[32216]: time="2022-10-17T16:37:10Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
/var/lib/rancher/rke2
directory, I get the following panic:
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/pipeline.go:86","msg":"stopped HTTP pipelining with remote peer","local-member-id":"fd35c5c2c37ef2ee","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"fd35c5c2c37ef2ee","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/stream.go:459","msg":"stopped stream reader with remote peer","stream-reader-type":"stream Message","local-member-id":"fd35c5c2c37ef2ee","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/peer.go:340","msg":"stopped remote peer","remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: {"level":"info","ts":"2022-10-17T18:05:01.589Z","caller":"rafthttp/transport.go:369","msg":"removed remote peer","local-member-id":"fd35c5c2c37ef2ee","removed-remote-peer-id":"42626d1a170fc6f5"}
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: panic: removed all voters
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: goroutine 206 [running]:
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: <http://go.etcd.io/etcd/raft.(*raft).applyConfChange(0xc0001c0500|go.etcd.io/etcd/raft.(*raft).applyConfChange(0xc0001c0500>, 0x0, 0xc00091dbf0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: /go/pkg/mod/github.com/k3s-io/etcd@v0.5.0-alpha.5.0.20220113195313-6c2233a709e8/raft/raft.go:1514 +0x225
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: <http://go.etcd.io/etcd/raft.(*node).run(0xc0014440c0)|go.etcd.io/etcd/raft.(*node).run(0xc0014440c0)>
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: /go/pkg/mod/github.com/k3s-io/etcd@v0.5.0-alpha.5.0.20220113195313-6c2233a709e8/raft/node.go:356 +0x845
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: created by <http://go.etcd.io/etcd/raft.RestartNode|go.etcd.io/etcd/raft.RestartNode>
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> rke2[27972]: /go/pkg/mod/github.com/k3s-io/etcd@v0.5.0-alpha.5.0.20220113195313-6c2233a709e8/raft/node.go:240 +0x330
Oct 17 18:05:01 <http://k8mst01.espc-nostromo.nos-amc.io|k8mst01.espc-nostromo.nos-amc.io> systemd[1]: rke2-server.service: main process exited, code=exited, status=2/INVALIDARGUMENT
etcdctl
to manually promote the learner node, and on my first master node also had to include the server:
cfg option (and eventually promote it too when it tried to join the existing cluster). But here I am now with three master nodes again, so hooray.