This message was deleted.
# harvester
a
This message was deleted.
👀 1
w
I think this cluster is toast. I cant get RKE2 to come up properly either
Restore also failed. `no-device: failed to run wicked ifreload all: exit status 157"
s
Hi @witty-honey-18052, just find your message here. Is this the root reason you would like to recreate the cluster through recovery mode? If you want to change the VIP, maybe you could refer to this issue? (https://github.com/harvester/harvester/issues/4584)
w
I didn't want to change the VIP. The VIP was lost and it never recovered. If anything I was hoping to restore it.
We experienced some sort of catastrophic failure when the cluster nodes were errantly moved to a different DHCP server. We moved the nodes back to the proper server, then updated the IP reservations, but the VIP never came back. Then when we tried to debug the nodes, we weren't even able to use kubectl, the RKE2 cluster was inaccessible locally. The troubleshooting steps were not helpful, as I couldn't use kubectl and the
ip addr
did not show the previous VIP on any of the nodes.
So what I was hoping was that I could run recovery to fix the rke2 issue. Maybe the better option would have been the "binaries only"? but at that point I wasn't certain if I was using the recovery as intended anyways. I figured it couldn't hurt to try it if I was going to have to rebuild everything anyways
I think my issue with the lost VIP is different and needs more exploration. But I at least figured I'd report the issue with the recovery installer in the meantime (even if it would/wouldn't have fixed my VIP issue)
s
Thanks @witty-honey-18052, I update the above situation on the github issue. Feel free to correct me if I misunderstand something. We can track it here. Thanks!
w
Hi @salmon-city-57654, just a heads up, I'm at SUSEcon in person with my lab setup in tow, if anyone on the team would like me to demonstrate/debug the issue
s
Hi @witty-honey-18052, sorry for the late reply. AFAIK we have a teammate who will join the SUSECon, but I am unsure about his schedule. cc @prehistoric-balloon-31801, would we be able to help with this issue with a live demo or debug? BTW, I saw you mentioned you could reproduce (on the GH issue). Could you also provide the steps to reproduce on the GH issue? I concluded the steps to reproduce. Could you help to verify it again? Thanks for it!
w
@prehistoric-balloon-31801 I'll have some time today and tomorrow. will you be at the solution showcase at any particular time?
p
Hi Colin, I won't attend the conference.
w
oh shoot
p
what's the current state, I can bring network guy to this thread.
w
it still occured in the 1.3.1 RC, but haven't tested yet with 1.3.1
My action item is to open a github issue specific to it. Was hoping to be able to show someone or talk through it first to validate it
p
From the thread it looks like RKE2 is not up correctly, can you gather
journalctl -u rke2-server
yes please open an issue
w
that was my thought as well
could this also be a symptom: none of the nodes displayed the VIP in
ip addr show mgmt-br
not urgent, if y'all aren't at susecon I can address it later when I get back to the US
I definitely feel like it's worthwhile to investigate