https://rancher.com/ logo
#general
Title
# general
h

happy-branch-33441

02/28/2023, 6:33 PM
What might cause a node to be NotReady after coming up again? e.g. if we take a snapshot of a k3s node and restore it; we see the node go into a healthy running state, but then immediately go into NotReady state for quite a while and then only after many minutes come back. Is there any way to debug why that might be aside from inspecting
describe
output?
c

creamy-pencil-82913

02/28/2023, 6:35 PM
I am assuming you’re talking about a single-node cluster, and not snapshotting random workers in a larger cluster?
h

happy-branch-33441

02/28/2023, 6:37 PM
yeah that’s correct
c

creamy-pencil-82913

02/28/2023, 6:43 PM
well the node is Ready when you snapshot it, right?
When you bring it back up, the node is still in the datastore as Ready
h

happy-branch-33441

02/28/2023, 6:44 PM
right
c

creamy-pencil-82913

02/28/2023, 6:44 PM
It won’t get moved to NotReady until the controller manager comes up and notices that the kubelet isn’t reporting in
so it’ll go to NotReady as the kubelet initializes, and then back to Ready
h

happy-branch-33441

02/28/2023, 6:44 PM
I’m not surprised that it goes into a NotReady state intermittently, but once it does it takes like 5-15 minutes to go back into a Ready state
c

creamy-pencil-82913

02/28/2023, 6:46 PM
You could probably check the kubelet logs or just describe it to see what’s blocking readiness
I would probably recommend against snapshotting the whole node though, in favor of using etcd and taking etcd snapshots
unless for some reason you have other stuff on there that you’re trying to capture in the snapshot, and you’re OK with a potentially inconsistent datastore state
h

happy-branch-33441

02/28/2023, 6:58 PM
makes sense; and yeah I totally understand etcd snapshots are preferable just not 100% feasible in our slightly-odd use case (effectively using k3s as a replacement for a single-node docker compose type thing)
3 Views