This message was deleted.
# general
a
This message was deleted.
c
The only reason it would delete the other nodes from the etcd cluster is if you deleted their Kubernetes node objects. Is there anything else you're doing as part of your process that you didn't mention? Can you attach logs from the servers at the time the etcd cluster members are deleted?
n
thanks for the answer, i've been trying to debug the issue and one possibly important detail i omitted, is that while the two nodes have the "server" property set and pointing to the leader in the config file. the leader node didn't have said property as it was the first one we installed and didn't need it to join an existing cluster. It seems that due to this, after the restart it initiates a new etcd cluster evictting the other two, does that seem right to you?
c
No, the first one shouldn't have the server option set, since it's not joining an existing server.
The only reason the other members would be removed is if you do a cluster-reset or delete the nodes
n
We don't delete the nodes and they manage to connect successfully to the leader after updating their versions and resetting, the removal is triggered after updating the leader node and resetting it, maybe something in our leader config is causing it. this is the config on the leader node (maybe it's the cluster-init ?):
Copy code
token: <token>
node-label:
- <http://node.kubernetes.io/master=true|node.kubernetes.io/master=true>
- <custom label>
flannel-backend: wireguard
default-local-storage-path: <path>
system-default-registry: <our registry>
write-kubeconfig: /etc/rancher/k3s/k3s.yaml
write-kubeconfig-mode: "0600"
disable: traefik,local-storage,servicelb
disable-cloud-controller: true
service-cidr: <ip>
cluster-cidr: <ip>
cluster-init: true
kubelet-arg: config=/etc/rancher/k3s/kubelet.config
the kubelet config file
Copy code
apiVersion: <http://kubelet.config.k8s.io/v1beta1|kubelet.config.k8s.io/v1beta1>
kind: KubeletConfiguration
shutdownGracePeriod: 600s
shutdownGracePeriodCriticalPods: 300s
some logs:
Copy code
May 24 15:35:58 node3 k3s[35081]: I0524 15:35:58.152234   35081 reconciler.go:157] "Reconciler: start to sync state"
May 24 15:35:59 node3 k3s[35081]: I0524 15:35:59.144366   35081 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="dc3c0faa70c2770b2d49684e044fc6e5e2de398ef5c37ad3044aa6
May 24 15:35:59 node3 k3s[35081]: I0524 15:35:59.240854   35081 request.go:677] Waited for 1.191574292s due to client-side throttling, not priority and fairness, request: GET:<https://127.0.0.1:6443/api/>
May 24 15:36:02 node3 k3s[35081]: I0524 15:36:02.155272   35081 kubelet_node_status.go:71] "Attempting to register node" node="node3"
May 24 15:36:03 node3 k3s[35081]: I0524 15:36:03.945764   35081 reconciler.go:194] "operationExecutor.UnmountVolume started for volume \"kube-api-access-zw7hl\" (UniqueName: \"<http://kubernetes.io/projected/5a|kubernetes.io/projected/5a>
May 24 15:36:03 node3 k3s[35081]: I0524 15:36:03.954704   35081 operation_generator.go:868] UnmountVolume.TearDown succeeded for volume "<http://kubernetes.io/projected/5aa02472-2d37-4bb8-bcfc-4a5133c0dc97-kube|kubernetes.io/projected/5aa02472-2d37-4bb8-bcfc-4a5133c0dc97-kube>
May 24 15:36:04 node3 k3s[35081]: I0524 15:36:04.046755   35081 reconciler.go:312] "Volume detached for volume \"kube-api-access-zw7hl\" (UniqueName: \"<http://kubernetes.io/projected/5aa02472-2d37-4bb8-bcfc-4a|kubernetes.io/projected/5aa02472-2d37-4bb8-bcfc-4a>
May 24 15:36:06 node3 k3s[35081]: I0524 15:36:06.448498   35081 kubelet_node_status.go:109] "Node was previously registered" node="node3"
May 24 15:36:06 node3 k3s[35081]: I0524 15:36:06.448636   35081 kubelet_node_status.go:74] "Successfully registered node" node="node3"
May 24 15:36:07 node3 k3s[35081]: I0524 15:36:07.159589   35081 prober_manager.go:274] "Failed to trigger a manual run" probe="Readiness"
May 24 15:36:07 node3 k3s[35081]: I0524 15:36:07.159614   35081 prober_manager.go:274] "Failed to trigger a manual run" probe="Readiness"
May 24 15:36:11 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:11.101+0300","caller":"fileutil/purge.go:77","msg":"purged","path":"/var/lib/rancher/k3s/server/db/etcd/member/snap/0000000000000
May 24 15:36:12 node3 k3s[35081]: I0524 15:36:12.243410   35081 prober_manager.go:274] "Failed to trigger a manual run" probe="Readiness"
May 24 15:36:12 node3 k3s[35081]: I0524 15:36:12.243442   35081 prober_manager.go:274] "Failed to trigger a manual run" probe="Readiness"
May 24 15:36:13 node3 k3s[35081]: I0524 15:36:13.170285   35081 prober_manager.go:274] "Failed to trigger a manual run" probe="Readiness"
May 24 15:36:14 node3 k3s[35081]: I0524 15:36:14.840382   35081 prober_manager.go:274] "Failed to trigger a manual run" probe="Readiness"
May 24 15:36:20 node3 k3s[35081]: time="2023-05-24T15:36:20+03:00" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
May 24 15:36:20 node3 k3s[35081]: time="2023-05-24T15:36:20+03:00" level=error msg="Remotedialer proxy error" error="websocket: close 1006 (abnormal closure): unexpected EOF"
May 24 15:36:20 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:20.378+0300","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"strea
May 24 15:36:20 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:20.378+0300","caller":"rafthttp/peer_status.go:66","msg":"peer became inactive (message send to peer failed)","peer-id":"6a6a0fd5
May 24 15:36:20 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:20.379+0300","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"strea
May 24 15:36:25 node3 k3s[35081]: time="2023-05-24T15:36:25+03:00" level=info msg="Connecting to proxy" url="<wss://192.168.32.154:6443/v1-k3s/connect>"
May 24 15:36:25 node3 k3s[35081]: time="2023-05-24T15:36:25+03:00" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 192.168.32.154:6443: connect: connection refused"
May 24 15:36:25 node3 k3s[35081]: time="2023-05-24T15:36:25+03:00" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.32.154:6443: connect: connection refused"
May 24 15:36:26 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:26.102+0300","caller":"rafthttp/stream.go:194","msg":"lost TCP streaming connection with remote peer","stream-writer-type":"strea
May 24 15:36:26 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:26.102+0300","caller":"rafthttp/stream.go:194","msg":"lost TCP streaming connection with remote peer","stream-writer-type":"strea
May 24 15:36:27 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:27.909+0300","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.32.154:45772","server-name
May 24 15:36:27 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:27.909+0300","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.32.154:45770","server-name
May 24 15:36:27 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:27.910+0300","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.32.154:45780","server-name
May 24 15:36:27 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:27.912+0300","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"192.168.32.154:45782","server-name
May 24 15:36:29 node3 k3s[35081]: time="2023-05-24T15:36:29+03:00" level=info msg="Stopped tunnel to 192.168.32.154:6443"
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.341+0300","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"feba6c925b18c7ea","to":"6a6a0fd5dfec0dc8","str
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.341+0300","caller":"rafthttp/peer_status.go:53","msg":"peer became active","peer-id":"6a6a0fd5dfec0dc8"}
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.341+0300","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type"
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.341+0300","caller":"rafthttp/stream.go:249","msg":"set message encoder","from":"feba6c925b18c7ea","to":"6a6a0fd5dfec0dc8","str
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.341+0300","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type"
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.363+0300","caller":"rafthttp/stream.go:412","msg":"established TCP streaming connection with remote peer","stream-reader-type"
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.363+0300","caller":"rafthttp/stream.go:412","msg":"established TCP streaming connection with remote peer","stream-reader-type"
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.365+0300","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"5558a28e57dc07eb","local-member-i
May 24 15:36:34 node3 k3s[35081]: {"level":"info","ts":"2023-05-24T15:36:34.365+0300","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
May 24 15:36:34 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:34.368+0300","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"strea
May 24 15:36:34 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:34.368+0300","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"strea
May 24 15:36:34 node3 k3s[35081]: {"level":"warn","ts":"2023-05-24T15:36:34.373+0300","caller":"rafthttp/peer_status.go:66","msg":"peer became inactive (message send to peer failed)","peer-id":"6a6a0fd5
May 24 15:36:34 node3 k3s[35081]: time="2023-05-24T15:36:34+03:00" level=info msg="this node has been removed from the cluster please restart k3s to rejoin the cluster"
logs from leader node:
Copy code
}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 became pre-candidate at term 6"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 received MsgPreVoteResp from 6a6a0fd5d
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 became candidate at term 7"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 received MsgVoteResp from 6a6a0fd5dfec
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 became leader at term 7"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: 6a6a0fd5dfec0dc8 elected leader 6a6a0fd5dfec
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","caller":"etcdserver/server.go:2044","msg":"published local member to cluster through raft","local-member-id":"6a6a
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","caller":"embed/serve.go:98","msg":"ready to serve client requests"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.725+0300","caller":"etcdserver/server.go:2514","msg":"updating cluster version using v2 API","from":"3.4","to":"3.5"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.726+0300","caller":"embed/serve.go:140","msg":"serving client traffic insecurely; this is strongly discouraged!","address":"1
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.726+0300","caller":"membership/cluster.go:576","msg":"updated cluster version","cluster-id":"5558a28e57dc07eb","local-member-
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.727+0300","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.5"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.727+0300","caller":"etcdserver/server.go:2533","msg":"cluster version is updated","cluster-version":"3.5"}
May 24 15:36:33 node1 k3s[130922]: time="2023-05-24T15:36:33+03:00" level=info msg="Defragmenting etcd database"
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.729+0300","caller":"v3rpc/maintenance.go:89","msg":"starting defragment"}
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.733+0300","caller":"backend/backend.go:497","msg":"defragmenting","path":"/var/lib/rancher/k3s/server/db/etcd-tmp/member/snap
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.951+0300","caller":"backend/backend.go:549","msg":"finished defragmenting directory","path":"/var/lib/rancher/k3s/server/db/e
May 24 15:36:33 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:33.952+0300","caller":"v3rpc/maintenance.go:95","msg":"finished defragment"}
May 24 15:36:33 node1 k3s[130922]: time="2023-05-24T15:36:33+03:00" level=info msg="etcd temporary data store connection OK"
May 24 15:36:33 node1 k3s[130922]: time="2023-05-24T15:36:33+03:00" level=info msg="Reconciling bootstrap data between datastore and disk"
May 24 15:36:33 node1 k3s[130922]: time="2023-05-24T15:36:33+03:00" level=info msg="Migrating bootstrap data to new format"
May 24 15:36:33 node1 k3s[130922]: time="2023-05-24T15:36:33+03:00" level=info msg="stopping etcd"
Copy code
May 24 15:36:20 node1 systemd[1]: k3s.service: main process exited, code=killed, status=9/KILL
May 24 15:36:20 node1 systemd[1]: Unit k3s.service entered failed state.
May 24 15:36:20 node1 systemd[1]: k3s.service failed.
May 24 15:36:25 node1 systemd[1]: k3s.service holdoff time over, scheduling restart.
May 24 15:36:25 node1 systemd[1]: Stopped Aurora Kubernetes.
May 24 15:36:25 node1 systemd[1]: Starting Aurora Kubernetes...
May 24 15:36:25 node1 k3s[130922]: time="2023-05-24T15:36:25+03:00" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
May 24 15:36:25 node1 k3s[130922]: time="2023-05-24T15:36:25+03:00" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/65fcb4e2f7a809518e9306f45d1fbf976ec8b2bf667e86b35c1290f73b78677c"
May 24 15:36:27 node1 k3s[130922]: time="2023-05-24T15:36:27+03:00" level=info msg="Starting k3s v1.22.17+k3s1 (3ed243df)"
May 24 15:36:27 node1 k3s[130922]: time="2023-05-24T15:36:27+03:00" level=info msg="Managed etcd cluster bootstrap already complete and initialized"
May 24 15:36:27 node1 k3s[130922]: time="2023-05-24T15:36:27+03:00" level=info msg="Starting temporary etcd to reconcile with datastore"
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.106+0300","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["<http://127.0.0.1:2400>"]}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.107+0300","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":["<http://127.0.0.1:2399>"]}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.107+0300","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.4","git-sha":"Not provided (use ./
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.124+0300","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/rancher/k3s/server/db/etcd-tmp/membe
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.401+0300","caller":"etcdserver/server.go:529","msg":"No snapshot found. Recovering WAL from scratch!"}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.717+0300","caller":"etcdserver/raft.go:526","msg":"discarding uncommitted WAL entries","entry-index":62699,"commit-index-from
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.722+0300","caller":"etcdserver/raft.go:556","msg":"forcing restart member","cluster-id":"5558a28e57dc07eb","local-member-id":
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.723+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 switched to configuration voters=()"}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.723+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 became follower at term 6"}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.723+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 6a6a0fd5dfec0dc8 [peers: [], term: 6, commit: 6
May 24 15:36:27 node1 k3s[130922]: {"level":"warn","ts":"2023-05-24T15:36:27.725+0300","caller":"auth/store.go:1220","msg":"simple token is not cryptographically signed"}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.726+0300","caller":"mvcc/kvstore.go:345","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-k
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.740+0300","caller":"mvcc/kvstore.go:415","msg":"kvstore restored","current-rev":55983}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.742+0300","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.747+0300","caller":"etcdserver/corrupt.go:46","msg":"starting initial corruption check","local-member-id":"6a6a0fd5dfec0dc8",
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.754+0300","caller":"etcdserver/corrupt.go:116","msg":"initial corruption checking passed; no corruption","local-member-id":"6
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.754+0300","caller":"etcdserver/server.go:851","msg":"starting etcd server","local-member-id":"6a6a0fd5dfec0dc8","local-server
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.755+0300","caller":"etcdserver/server.go:752","msg":"starting initial election tick advance","election-ticks":10}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.755+0300","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6a6a0fd5dfec0dc8 switched to configuration voters=(7667
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.755+0300","caller":"membership/cluster.go:421","msg":"added member","cluster-id":"5558a28e57dc07eb","local-member-id":"6a6a0f
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.755+0300","caller":"membership/cluster.go:584","msg":"set initial cluster version","cluster-id":"5558a28e57dc07eb","local-mem
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.755+0300","caller":"api/capability.go:75","msg":"enabled capabilities for version","cluster-version":"3.4"}
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.757+0300","caller":"embed/etcd.go:277","msg":"now serving peer/client/metrics","local-member-id":"6a6a0fd5dfec0dc8","initial-
May 24 15:36:27 node1 k3s[130922]: {"level":"info","ts":"2023-05-24T15:36:27.757+0300","caller":"embed/etcd.go:581","msg":"serving peer traffic","address":"127.0.0.1:2400"}
c
What is Aurora Kubernetes? Are you using a modified distribution of k3s?
n
It's just a description we added in the systemd service file, the distribution itself isn't modified
Copy code
[Unit]
Description=Aurora Kubernetes
c
Hmm. Would you mind opening a GH issue? Fill out the issue template as best you can, and attach the journald logs from all three servers.
n
i created a GH issue hopefully i added sufficient info: https://github.com/k3s-io/k3s/issues/7613
103 Views