Hey all - to share a workaround we have for proble...
# harvester
b
Hey all - to share a workaround we have for problematic upgrades. Previously we had issues with harvester randomly shutting off VMs during an upgrade which forces us to schedule outages for our hosted applications. VMs seem to be susceptible to this if they fail migration during the node drain phase of the upgrade process. We now have an approach that pauses the upgrade just before the node drain until a human operator marks the node as ready for upgrade. This gives us full control to move VMs and verify everything is healthy before we proceed. It is slightly involved / a rube goldberg machine but the steps are 1. Create ConfigMap in the
harvester-system
namespace with a bash script that waits for
/tmp/node_ready
to exist and then calls
do_upgrade_node.sh pre-drain
2. Create a MutatingWebhookConfiguration to intercept just the pre-drain jobs which have a specific set of labels:
<http://harvesterhci.io/upgradeComponent|harvesterhci.io/upgradeComponent>: node
&
<http://harvesterhci.io/upgradeJobType|harvesterhci.io/upgradeJobType>: pre-drain
3. Create a webhook controller to process this and mutate the job request so it mounts your config map with the custom script and replaces the container command with one to run the custom script. Now the upgrade pre-drain job which causes the drain and shutdowns of non migratable VMs won't be invoked until someone creates the
/tmp/node_ready
file on the upgrade pre drain pod. Hopefully this might help someone else.
We used python for the controller so we could use the pre-existing
<http://docker.io/longhornio/longhorn-engine|docker.io/longhornio/longhorn-engine>
images which come with python and not have to worry about loading custom images on to the nodes
It would be very preferable if the official upgrade process allowed us to do something similar
b
This sounds cool, specially for hosts that have non-migratable VMs because of a video card or something.
I don't work for Suse, but I'd recommend opening two tickets against the github project: • Feature Request that describes the overall process. • A Pull request with the ConfigMap and MutatingWebhookConfiguration etc yaml files you already have.
Then you can comment/link both together. I'm sure they'll have some extra steps/requirements, but it's easier to get those in (or pass them off) with some existing patches/code.
b
Unfortunately I don't own the code I have written so won't be able to contribute it fully but will continue the discussion on https://github.com/harvester/harvester/issues/6145