This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/29/2024, 3:04 PM

This message was deleted.

creamy-pencil-82913

03/29/2024, 6:17 PM

can you post the full logs from both the joining node, and the node that is is attempting to join against?

enough-carpet-20915

03/29/2024, 6:18 PM

I definitely can

creamy-pencil-82913

03/29/2024, 6:20 PM

either attached here, or pastebin or something

enough-carpet-20915

03/29/2024, 6:23 PM

Just dumping journalctl to gist is too much for it. Rough idea of how many lines back you want?

enough-carpet-20915

03/29/2024, 6:23 PM

existing server: https://gist.github.com/bhechinger/451cdad0620e01334addce04ba55fc38

enough-carpet-20915

03/29/2024, 6:24 PM

joining server: https://gist.github.com/bhechinger/dee0fb659e4fcaf40dbe1c1d3bb633e6

enough-carpet-20915

03/29/2024, 6:25 PM

that's the last 10k lines from both

creamy-pencil-82913

03/29/2024, 8:13 PM

Yeah, so there’s something odd with etcd on the node it’s joining. It appears to be trying to connect to the failed node to get the member list, FOR the joining node…

Copy code

Mar 29 17:35:48 lisa k3s[964]: {"level":"warn","ts":"2024-03-29T17:35:48.351836Z","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000159a40/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.45.0.1:2379: connect: connection refused\""}
Mar 29 17:35:48 lisa k3s[964]: time="2024-03-29T17:35:48Z" level=warning msg="Failed to get etcd MemberList for 95.217.198.219:46144: context deadline exceeded"
Mar 29 17:35:48 lisa k3s[964]: {"level":"info","ts":"2024-03-29T17:35:48.359475Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"d1ee2b97af9a4f8d switched to configuration voters=(15127076128870256525 18184093900799801018) learners=(7101457614970752874)"}
Mar 29 17:35:48 lisa k3s[964]: {"level":"info","ts":"2024-03-29T17:35:48.359521Z","caller":"etcdserver/server.go:1942","msg":"applied a configuration change through raft","local-member-id":"d1ee2b97af9a4f8d","raft-conf-change":"ConfChangeAddLearnerNode","raft-conf-change-node-id":"2be7a92208601983"}
Mar 29 17:35:48 lisa k3s[964]: {"level":"warn","ts":"2024-03-29T17:35:48.773418Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"628d72135fd30f6a","rtt":"0s","error":"dial tcp 10.45.0.1:2380: connect: connection refused"}
Mar 29 17:35:48 lisa k3s[964]: {"level":"warn","ts":"2024-03-29T17:35:48.773413Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"628d72135fd30f6a","rtt":"0s","error":"dial tcp 10.45.0.1:2380: connect: connection refused"}

creamy-pencil-82913

03/29/2024, 8:14 PM

You might try just restarting k3s on both of the existing nodes, and then try joining it again.

creamy-pencil-82913

03/29/2024, 8:15 PM

There are also a ton of errors in here about the cert on your stackgres-operator webhook, you should probably fix that to cut down on the log spew

creamy-pencil-82913

03/29/2024, 8:17 PM

fwiw this should be addressed by next month’s release: https://github.com/k3s-io/k3s/pull/9722

enough-carpet-20915

03/30/2024, 9:57 AM

Yeah, that stackgres thing is it trying to contact the dead node for some reason. Haven't gotten an answer to why yet. I'll try restarting the nodes and report back.

enough-carpet-20915

03/30/2024, 9:58 AM

Hahahaha, "I don't know why this was supposed to be a good idea." 😄

enough-carpet-20915

03/30/2024, 1:57 PM

Hmm, rebooted both nodes and now neither of them are coming up. Last thing in the logs before systemd restarts the k3s service is this:

Copy code

Mar 30 13:54:59 marge k3s[18182]: time="2024-03-30T13:54:59Z" level=info msg="etcd temporary data store connection OK"
Mar 30 13:54:59 marge k3s[18182]: time="2024-03-30T13:54:59Z" level=info msg="Reconciling bootstrap data between datastore and disk"
Mar 30 13:54:59 marge k3s[18182]: time="2024-03-30T13:54:59Z" level=fatal msg="/var/lib/rancher/k3s/server/cred/ipsec.psk newer than datas>

enough-carpet-20915

03/30/2024, 9:19 PM

Ok, deleting that file on both the nodes and restarting k3s fixed that.

enough-carpet-20915

03/30/2024, 9:20 PM

but same as before with the errors when joining.

13 Views

Open in Slack

Previous Next