This message was deleted.
# rke2
a
This message was deleted.
c
restore from an etcd snapshot?
q
@creamy-pencil-82913 i've never done it, so i dont know the pit-falls of doing so 😞
will i lose the longhorn vol that's on that node, because i'm restoring from another node?
c
etcd snapshots are just cluster datastore state. Its not like restoring from a VM snapshot.
q
so in theory, i can fix my problem the hard way (restore vms from backups i have) and leave the one node dead. then in off-hours i can take a full backup of all vms, then try the snapshot restore for etcd to bring rke on the bad harvester node back online? does that sound right?
c
I would think that restoring the whole VM from backup is probably a lot more work than just restoring one of the servers from an etcd snapshot, and then rejoining the others to the cluster. Assuming you took the etcd snapshot when the LH volume was present in the cluster, and the LH data itself is not also corrupted.
q
i would think that too. it's just more that i've never did it, and if things go sideways because i did something wrong, loosing the whole cluster when i'm already about 60% done restoring the one vm is a huge issue. where as as of now, i'm almost through it, so get through it, get the vm up and running, then do the snapshot method when it can be a bit more controlled is my thought.
this is more like figuring out how to deal with it if it happens again, w/o risking losing the whole cluster in the middle of the day. if that makes sense.
@creamy-pencil-82913 so this is odd. the node eventually (after several hours) came back online. is that an expected behavior?
c
with only a single log message to work off of, I can’t say whether anything is expected or not.
q
so for next time, is there a better spot to pull that log? considering i cant really pull a support bundle because k8s isnt starting
c
just grab all the rke2 logs from journald, and the kube-system logs from /var/log/pods. Sometimes the kubelet and containerd logs are helpful too, depending on the problem.