Hey all to share a workaround we have for problematic upgrad Rancher Users #harvester

Hey all - to share a workaround we have for proble...

brainy-kilobyte-33711

07/24/2025, 2:27 PM

Hey all - to share a workaround we have for problematic upgrades. Previously we had issues with harvester randomly shutting off VMs during an upgrade which forces us to schedule outages for our hosted applications. VMs seem to be susceptible to this if they fail migration during the node drain phase of the upgrade process. We now have an approach that pauses the upgrade just before the node drain until a human operator marks the node as ready for upgrade. This gives us full control to move VMs and verify everything is healthy before we proceed. It is slightly involved / a rube goldberg machine but the steps are 1. Create ConfigMap in the

harvester-system

namespace with a bash script that waits for

/tmp/node_ready

to exist and then calls

do_upgrade_node.sh pre-drain

2. Create a MutatingWebhookConfiguration to intercept just the pre-drain jobs which have a specific set of labels:

<http://harvesterhci.io/upgradeComponent|harvesterhci.io/upgradeComponent>: node

<http://harvesterhci.io/upgradeJobType|harvesterhci.io/upgradeJobType>: pre-drain

3. Create a webhook controller to process this and mutate the job request so it mounts your config map with the custom script and replaces the container command with one to run the custom script. Now the upgrade pre-drain job which causes the drain and shutdowns of non migratable VMs won't be invoked until someone creates the

/tmp/node_ready

file on the upgrade pre drain pod. Hopefully this might help someone else.

brainy-kilobyte-33711

07/24/2025, 2:29 PM

We used python for the controller so we could use the pre-existing

<http://docker.io/longhornio/longhorn-engine|docker.io/longhornio/longhorn-engine>

images which come with python and not have to worry about loading custom images on to the nodes

brainy-kilobyte-33711

07/24/2025, 2:35 PM

It would be very preferable if the official upgrade process allowed us to do something similar

bland-article-62755

07/24/2025, 2:45 PM

This sounds cool, specially for hosts that have non-migratable VMs because of a video card or something.

bland-article-62755

07/24/2025, 2:48 PM

I don't work for Suse, but I'd recommend opening two tickets against the github project: • Feature Request that describes the overall process. • A Pull request with the ConfigMap and MutatingWebhookConfiguration etc yaml files you already have.

bland-article-62755

07/24/2025, 2:51 PM

Then you can comment/link both together. I'm sure they'll have some extra steps/requirements, but it's easier to get those in (or pass them off) with some existing patches/code.

brainy-kilobyte-33711

07/25/2025, 7:19 AM

Unfortunately I don't own the code I have written so won't be able to contribute it fully but will continue the discussion on https://github.com/harvester/harvester/issues/6145

brainy-kilobyte-33711

07/25/2025, 6:07 PM

not immediately in favour of the manual approach https://github.com/harvester/harvester/issues/6145#issuecomment-3119628546

brainy-kilobyte-33711

09/04/2025, 7:31 AM

they have been convinced 🎉 https://github.com/harvester/harvester/issues/8980

2 Views

Open in Slack

Previous Next