This message was deleted.
# longhorn-storage
a
This message was deleted.
m
This doc might be of help. https://docs.rke2.io/upgrades/automated_upgrade#configure-plans I'd set a label for the upgrade nodes, once complete remove the labels and label the next nodes.
I'm assuming your doing the upgrade from the UI which uses the system-upgrade-controller.
n
Just for context, the way we've been doing our upgrades is by selecting the cluster -> edit cluster config -> change the version in the dropdown -> save
is the automated upgrade something completely different?
(I should mention this is RKE2 managed by Rancher)
@mysterious-animal-29850 I would appreciate if you could clarify 🙂 I can provide some more information if needed.
m
It is the same mechanism, but with the plan you define more options. The UI will not give you match expression options that you can do in the plan crd.
n
@mysterious-animal-29850 To be clear, does this mean that there are absolutely no guardrails in place by default to prevent Rancher upgrades from reprovisioning all longhorn nodes and causing a data loss? When using the method I described (no plan configurations)
m
That does not mean there aren't any guardrails. If you look into your cluster config under edit, under Upgrade stratefy, rancher defaults to only 1 control plane and 1 worker at a time. You can modify that to your needs. I was sharing more control if that is what you need
n
What I fear is this (might be wrong, so please correct me): Rancher reprovisions all the old longhorn nodes before the new nodes have had time to rebuild/sync fully. So, in effect most (if not all) of the data would be lost.
m
Keep in mind, I'm not a rancher dev, I'm just sharing info from what I learned over the years using Rancher products in production.
n
all good, I appreciate your input nonetheless, but I do take that into account.
What we've been observing is that Rancher often reprovisions (i.e. cordons, then spins a new node, then deletes the old one) instead of upgrading in place
this is making me nervous with regards to data protection
m
I also understand that concern, I've run longhorn in prod. If you want to upgrade your longhorn storage nodes where you want to wait for the replicas rebuild. The plan crd is the best option. You can label the node and set the match expressions to do a node at a time. Then just change labels on nodes as they complete.
wait what? please clarify the cordons and spins a new node. Like create a new node? Are you using vpshere or some other provider?
n
@mysterious-animal-29850 yes, we've been experiencing reprovisioning (as opposed to in-place updating) when doing kubernetes upgrades. It also happens during other config modifications, like adding cluster owners, etc.
This is the root problem, in fact
We use Rancher to manage RKE2 clusters running on bare EC2
m
Running on EC2 means your using aws cloud provider. Which I know nothing about, sorry.
Also, you don’t need to @ me for every response. I get notified when I’m in a thread if new threads come in. Thanks
n
Sorry, I'm not familiar with Slack. Understood.
m
No worries. Hope someone else can help when they have a moment.