This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

07/25/2024, 4:14 PM

This message was deleted.

creamy-pencil-82913

07/25/2024, 5:46 PM

restore from an etcd snapshot?

quaint-alarm-7893

07/25/2024, 5:52 PM

@creamy-pencil-82913 i've never done it, so i dont know the pit-falls of doing so 😞

quaint-alarm-7893

07/25/2024, 5:52 PM

will i lose the longhorn vol that's on that node, because i'm restoring from another node?

creamy-pencil-82913

07/25/2024, 5:59 PM

etcd snapshots are just cluster datastore state. Its not like restoring from a VM snapshot.

quaint-alarm-7893

07/25/2024, 6:01 PM

so in theory, i can fix my problem the hard way (restore vms from backups i have) and leave the one node dead. then in off-hours i can take a full backup of all vms, then try the snapshot restore for etcd to bring rke on the bad harvester node back online? does that sound right?

creamy-pencil-82913

07/25/2024, 6:10 PM

I would think that restoring the whole VM from backup is probably a lot more work than just restoring one of the servers from an etcd snapshot, and then rejoining the others to the cluster. Assuming you took the etcd snapshot when the LH volume was present in the cluster, and the LH data itself is not also corrupted.

quaint-alarm-7893

07/25/2024, 6:18 PM

i would think that too. it's just more that i've never did it, and if things go sideways because i did something wrong, loosing the whole cluster when i'm already about 60% done restoring the one vm is a huge issue. where as as of now, i'm almost through it, so get through it, get the vm up and running, then do the snapshot method when it can be a bit more controlled is my thought.

quaint-alarm-7893

07/25/2024, 6:19 PM

this is more like figuring out how to deal with it if it happens again, w/o risking losing the whole cluster in the middle of the day. if that makes sense.

quaint-alarm-7893

07/25/2024, 9:51 PM

@creamy-pencil-82913 so this is odd. the node eventually (after several hours) came back online. is that an expected behavior?

creamy-pencil-82913

07/25/2024, 9:55 PM

with only a single log message to work off of, I can’t say whether anything is expected or not.

quaint-alarm-7893

07/25/2024, 10:04 PM

so for next time, is there a better spot to pull that log? considering i cant really pull a support bundle because k8s isnt starting

creamy-pencil-82913

07/25/2024, 10:16 PM

just grab all the rke2 logs from journald, and the kube-system logs from /var/log/pods. Sometimes the kubelet and containerd logs are helpful too, depending on the problem.

14 Views

Open in Slack

Previous Next