This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/23/2024, 2:53 AM

This message was deleted.

creamy-pencil-82913

03/23/2024, 2:58 AM

I highly doubt it, what makes you suspect it is at all related?

creamy-pencil-82913

03/23/2024, 2:59 AM

Do you have journal logs to share from the nodes that are creating/deleting the snapshots?

creamy-pencil-82913

03/23/2024, 3:00 AM

I'm also confused when you say the apiserver container is crashing, k3s doesn't run the apiserver in a container...

bland-painting-61617

03/23/2024, 3:04 AM

I have all the logs in Loki so can get you whatever you need. I have k3s control plane agent-less container running as a pod in AKS - we sort of briefly talked about it long time ago when I had issues with the reverse connection to the nodes.

bland-painting-61617

03/23/2024, 3:06 AM

The etcd snapshots would be on the PVC where I store the database

bland-painting-61617

03/23/2024, 3:11 AM

Found this in the log

time="2024-03-23T03:05:40Z" level=error msg="Failed to record snapshots for cluster: nodes \"k3st-control-plane-0\" not found"

bland-painting-61617

03/23/2024, 3:12 AM

this is correct as this is an agentless instance - no node is registered for it

bland-painting-61617

03/23/2024, 3:13 AM

Other than that, no other errors

creamy-pencil-82913

03/23/2024, 3:31 AM

oh yeah. we don’t technically support embedded etcd with --disable-agent. It will get really confused when there is no node for where etcd is running.

creamy-pencil-82913

03/23/2024, 3:33 AM

if you want to open an issue on GH I can take a look, but I am not at all surprised that it freaks out

bland-painting-61617

03/23/2024, 3:33 AM

I can imagine, that's why I only run a single control plane pod. Either way, something must have changed as it worked fine before and I've been doing this for 2+ years. I can roll back to 1.27.7 to double check. And sure, can open an issue.

creamy-pencil-82913

03/23/2024, 3:34 AM

Yes, we completely redid how snapshots are recorded in the cluster. There is now a CRD type that records what node took the snapshot, alongside other metadata

creamy-pencil-82913

03/23/2024, 3:35 AM

When started with the
--disable-agent
flag, servers do not run the kubelet, container runtime, or CNI. They do not register a Node resource in the cluster, and will not appear in
kubectl get nodes
output. Because they do not host a kubelet, they cannot run pods or be managed by operators that rely on enumerating cluster nodes, including the embedded etcd controller and the system upgrade controller.

creamy-pencil-82913

03/23/2024, 3:35 AM

why are you running a single node with embedded etcd instead of kine?

bland-painting-61617

03/23/2024, 3:38 AM

Well, at first I had 3 control plane nodes at home so I started off with etcd and just didn't want to start again after I moved this cluster's control plane to AKS. it was too unreliable replicating etcd through a VPN between two locations and having a single control plane instance in AKS is far more reliable.

bland-painting-61617

03/23/2024, 3:39 AM

So at the time I just moved the etcd database and performed a restore there. Worked fine since.

bland-painting-61617

03/23/2024, 3:39 AM

With a little init container workaround that fixes the etcd node IP as those change everytime the pod starts.

bland-painting-61617

03/23/2024, 3:40 AM

Would disabling snapshots resolve it?

creamy-pencil-82913

03/23/2024, 6:13 AM

most likely. You could also try creating a dummy node object that matches the container hostname, and set the various labels and annotations that the controller expects.

creamy-pencil-82913

03/23/2024, 6:14 AM

Please do open an issue though, I think we could probably improve the handling of this for the April releases, so that you can keep using snapshots but without the extra bits of functionality that expect a node resource to exist for the host that’s running etcd

creamy-pencil-82913

03/23/2024, 6:25 AM

this is actually super easy to repo, I made one for you: https://github.com/k3s-io/k3s/issues/9774

bland-painting-61617

03/23/2024, 12:30 PM

Thank you Brandon. I went to bed as it was 4 am and woke up to you having done all the repro work. Luckily it was an easy repro - in the meantime I think I'll step back to a release that doesn't have this issue. Could you let me know which release of 1.27 introduced the new behavior? Looks like 1.27.8+k3s2?

creamy-pencil-82913

03/27/2024, 10:08 PM

https://github.com/k3s-io/k3s/pull/9809

❤️ 1

creamy-pencil-82913

03/27/2024, 10:09 PM

first 1.27 release with the new snapshot management in it was v1.27.7+k3s1

bland-painting-61617

03/27/2024, 11:29 PM

Thanks for doing the fix so quick and yeah, that's strange - I rolled back to 1.27.7+k3s2 and I don't have this issue any more. Snapshots work fine. Let me know if there's a dev build you want me to try.

creamy-pencil-82913

03/27/2024, 11:33 PM

What you’re running into might have been specific to some of the enhancements we added in one of the subsequent patch releases. There was some tinkering with it after the fact.

👍 1

Open in Slack

Previous Next