https://rancher.com/ logo
Title
n

numerous-barista-78936

06/14/2022, 6:34 AM
Heres my desperate last resort call for help haha. We had a power outage with no UPS yet, cluster stopped starting up. Kubectl is inaccessible since rke2 is stopping repeatedly. To nobody's surprise, it looks like the usual culprit, etcd database is corrupted. This is a 2 node cluster, what are my options for recovery? Really hoping I can pull the VMs off the machines still.
g

great-bear-19718

06/14/2022, 11:43 PM
this may be worth a shot.. rke2 should be creating snapshots.. https://docs.rke2.io/backup_restore/
you could use one of the snaps to trigger a restore and add the other node back
and you can also backup the volume data before this if needed: https://longhorn.io/docs/1.2.4/advanced-resources/data-recovery/export-from-replica/
i hope this helps
n

numerous-barista-78936

06/15/2022, 12:05 AM
I'll try this when I'm back on campus tomorrow, appreciate the advice 👍