This message was deleted.
# k3s
a
This message was deleted.
c
can you post the full logs from both the joining node, and the node that is is attempting to join against?
e
I definitely can
c
either attached here, or pastebin or something
e
Just dumping journalctl to gist is too much for it. Rough idea of how many lines back you want?
that's the last 10k lines from both
c
Yeah, so there’s something odd with etcd on the node it’s joining. It appears to be trying to connect to the failed node to get the member list, FOR the joining node…
Copy code
Mar 29 17:35:48 lisa k3s[964]: {"level":"warn","ts":"2024-03-29T17:35:48.351836Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000159a40/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.45.0.1:2379: connect: connection refused\""}
Mar 29 17:35:48 lisa k3s[964]: time="2024-03-29T17:35:48Z" level=warning msg="Failed to get etcd MemberList for 95.217.198.219:46144: context deadline exceeded"
Mar 29 17:35:48 lisa k3s[964]: {"level":"info","ts":"2024-03-29T17:35:48.359475Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d1ee2b97af9a4f8d switched to configuration voters=(15127076128870256525 18184093900799801018) learners=(7101457614970752874)"}
Mar 29 17:35:48 lisa k3s[964]: {"level":"info","ts":"2024-03-29T17:35:48.359521Z","caller":"etcdserver/server.go:1942","msg":"applied a configuration change through raft","local-member-id":"d1ee2b97af9a4f8d","raft-conf-change":"ConfChangeAddLearnerNode","raft-conf-change-node-id":"2be7a92208601983"}
Mar 29 17:35:48 lisa k3s[964]: {"level":"warn","ts":"2024-03-29T17:35:48.773418Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"628d72135fd30f6a","rtt":"0s","error":"dial tcp 10.45.0.1:2380: connect: connection refused"}
Mar 29 17:35:48 lisa k3s[964]: {"level":"warn","ts":"2024-03-29T17:35:48.773413Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"628d72135fd30f6a","rtt":"0s","error":"dial tcp 10.45.0.1:2380: connect: connection refused"}
You might try just restarting k3s on both of the existing nodes, and then try joining it again.
There are also a ton of errors in here about the cert on your stackgres-operator webhook, you should probably fix that to cut down on the log spew
fwiw this should be addressed by next month’s release: https://github.com/k3s-io/k3s/pull/9722
e
Yeah, that stackgres thing is it trying to contact the dead node for some reason. Haven't gotten an answer to why yet. I'll try restarting the nodes and report back.
Hahahaha, "I don't know why this was supposed to be a good idea." 😄
Hmm, rebooted both nodes and now neither of them are coming up. Last thing in the logs before systemd restarts the k3s service is this:
Copy code
Mar 30 13:54:59 marge k3s[18182]: time="2024-03-30T13:54:59Z" level=info msg="etcd temporary data store connection OK"
Mar 30 13:54:59 marge k3s[18182]: time="2024-03-30T13:54:59Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Mar 30 13:54:59 marge k3s[18182]: time="2024-03-30T13:54:59Z" level=fatal msg="/var/lib/rancher/k3s/server/cred/ipsec.psk newer than datas>
Ok, deleting that file on both the nodes and restarting k3s fixed that.
but same as before with the errors when joining.