This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/28/2025, 12:04 PM

This message was deleted.

late-needle-80860

03/28/2025, 12:41 PM

On version 1.32.1+k3s1 and below … there’s no p2p.k3s.cattle.io label on the control-plane nodes.

late-needle-80860

03/28/2025, 12:48 PM

I found this commit: https://github.com/k3s-io/k3s/commit/95700aa6b327f77e8a8a992377aeabc89bc20ee5 Are those changes relevant to what I’m seeing?

creamy-pencil-82913

03/28/2025, 5:24 PM

this will only be enabled if you are running with --embedded-registry=true to enable spegel. That particular query will only be made when libp2p (spegel) is trying to bootstrap the p2p mesh because it has no peers. It sounds like your environment is somehow broken or misconfigured?

creamy-pencil-82913

03/28/2025, 5:28 PM

for the record, the query you’re seeing is here: https://github.com/k3s-io/k3s/blob/master/pkg/spegel/bootstrap.go#L179-L181 and has been around since spegel support was added in 1.29.1 https://github.com/k3s-io/k3s/pull/8977

late-needle-80860

03/28/2025, 6:39 PM

Hmm. I’ll look into whether or not it could be us fooling around. But, downgrading the control-plane to 1.32.1 and we’re good. So hmmm. I’m earlier versions, <1.32.1 - the p2p label is not there on control-planes

late-needle-80860

03/28/2025, 6:40 PM

We downgraded to 1.32.1

late-needle-80860

03/28/2025, 6:42 PM

And yes we set —embedded: true both on <132.1 and above it

late-needle-80860

03/28/2025, 6:45 PM

Of course there’s also the possibility that the newer spegel version is the culprit somehow

creamy-pencil-82913

03/28/2025, 7:13 PM

do you see any errors from spegel in the logs?

creamy-pencil-82913

03/28/2025, 7:14 PM

you can run with

--debug

debug: true

to get more output from it

late-needle-80860

03/28/2025, 9:06 PM

--debug on K3s itself I assume?

late-needle-80860

03/28/2025, 9:06 PM

I’ll give it a try when I can upgrade again

creamy-pencil-82913

04/01/2025, 12:05 AM

were you able to get any more info on this?

late-needle-80860

04/01/2025, 12:38 PM

Thank you very much for reaching out. Still researching. And bumped our internal test cluster to 1.32.3 just today. To debug when the issue is there. Right now at kubecon. So the time I can dedicate is limited.

late-needle-80860

04/01/2025, 3:02 PM

I enabled —debug on all 3 k3s server Nodes in the cluster. We see a huge bunch of bad Tls certificate. See: https://gist.github.com/larssb/23f7549427b3d31ae51cf5e7cea621c9

late-needle-80860

04/01/2025, 3:05 PM

No parameters was changed between the upgrade. But, from k3s v1.32.1 to 1.32.3 there’s an upgrade to containerd 2, runc is bumped … Looked at issues applicable to tls bad cert. however, nothing really applicable to our situation Any idea @creamy-pencil-82913 ?

creamy-pencil-82913

04/01/2025, 5:42 PM

no, there’s not really much to work with here. What are all those IPs reporting the bad certificates? Are those nodes on your network, or pod IPs, or what? The etcd errors that you shared in a comment look like they’re from a normal startup of a server node?

creamy-pencil-82913

04/01/2025, 5:43 PM

I don’t see anything here at all from spegel or libp2p, which is the component that would be trying to find nodes with that label

late-needle-80860

04/01/2025, 8:19 PM

Ip’s are pods yes. Nodes in the cluster and a clouds network backplane

late-needle-80860

04/01/2025, 8:20 PM

Yeah k3s is trying to start up. But, never success.

creamy-pencil-82913

04/01/2025, 8:27 PM

that sounds like a different problem entirely

creamy-pencil-82913

04/01/2025, 8:28 PM

the little bit of etcd logs you shared show that it can’t connect to two of the peers. Are you trying to start up only one server of a 3-node cluster?

creamy-pencil-82913

04/01/2025, 8:30 PM

If you have 3 servers, you need at least 2 of them online. If this node can’t connect to at least one of the other servers it won’t ever start up. These logs say that there are at least 2 nodes that it can’t connect to:

Copy code

Apr 01 16:59:41 test-test-ctlplane-0 k3s[36770]: {"level":"warn","ts":"2025-04-01T16:59:41.795283Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"5b55765432c297","rtt":"0s","error":"dial tcp 192.168.114.86:2380: connect: connection refused"}
Apr 01 16:59:41 test-test-ctlplane-0 k3s[36770]: {"level":"warn","ts":"2025-04-01T16:59:41.795310Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"5b55765432c297","rtt":"0s","error":"dial tcp 192.168.114.86:2380: connect: connection refused"}
Apr 01 16:59:41 test-test-ctlplane-0 k3s[36770]: {"level":"warn","ts":"2025-04-01T16:59:41.797500Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"7026efc5b923151d","rtt":"0s","error":"dial tcp 192.168.114.85:2380: connect: connection refused"}
Apr 01 16:59:41 test-test-ctlplane-0 k3s[36770]: {"level":"warn","ts":"2025-04-01T16:59:41.797513Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"7026efc5b923151d","rtt":"0s","error":"dial tcp 192.168.114.85:2380: connect: connection refused"}

late-needle-80860

04/01/2025, 11:22 PM

Yup. After enabling —debug —v=9 … no nodes is starting. Not saying that it’s caused by this. Just how it happened to go. So i need to get at least one more node up.

10 Views

Open in Slack

Previous Next