https://rancher.com/ logo
Title
c

creamy-autumn-87994

05/16/2023, 1:19 AM
Hi, I have a test cluster on vsphere with a separate server/agent worker pool. I was looking to update the worker nodes to a newer VM template (larger longhorn disk), so I selected the template, saved, and Rancher proceeded to “update” the nodes 1 by 1. Well after all said and done, the volumes were never migrated and ended up with no replicas available. Is it not possible to upgrade worker nodes this way?
Looks like the best way is to create a new worker pool and migrate volumes, cordone/drain old pool, and then delete old pool.
a

aloof-branch-69545

05/16/2023, 2:58 AM
For node update, you can refer to this doc: https://longhorn.io/docs/1.4.2/volumes-and-nodes/maintenance/ thanks!
c

creamy-autumn-87994

05/16/2023, 5:35 PM
I think this is a little different. The article assumes you have control over the upgrade process. When you update the cluster config for an RKE2 cluster, rancher automatically replaces nodes 1 by 1 without user intervention. The workaround would be to keep the same name of the template when replacing it, but once you adjust cpu or memory, it’ll start auto replacing nodes. This gives you no time to do manual cordoning and draining.
a

aloof-branch-69545

05/17/2023, 1:39 AM
Got it! So I would like to know more about vsphere provisioner’s behavior when updating the node with the new template Will the provisioner upgrade the node 1 by 1 and wait until the workload on that node to be migrated? Could you provide some doc for RKE/vsphere? We can also take a look to see if we can add a qa test for this. The issue here might because the node upgrade is too fast and it didn’t wait until the workload to be migrated (replica should be copy to other healthy node) In the end there was no healthy replica left in the cluster. A better way is to migrate the volume/workload to a tmp node pool And update the current node pool Then migrate the volume back to the current node pool by draining/corden the tmp node pool