Hi All, we have 2 clusters with 3 node Harvesterh...
# harvester
k
Hi All, we have 2 clusters with 3 node Harvesterhci. The first one successfully upgraded from v1.4.3 to v1.5.1 almost 3 weeks ago, and 2nd one last week started, but now stuck. Here is the screenshot --- the first node is already upgraded, I mean rke2 version is new, also we already tried to reboot the server(1st node) and it's still the same.
Copy code
# kubectl get pods -n harvester-system | grep Pending
harvester-cb7778455-mr8vn                                   0/1     Pending     0                17h
harvester-webhook-76ddd86c6f-mvnm9                          0/1     Pending     0                4d12h
Copy code
# kubectl get pods -n harvester-system | grep upgrade
hvst-upgrade-8fkgc-apply-manifests-k2td5                    0/1     Completed   0                4d15h
hvst-upgrade-8fkgc-upgradelog-agent-fluentbit-h445p         1/1     Running     0                4d14h
hvst-upgrade-8fkgc-upgradelog-agent-fluentbit-lwqxf         1/1     Running     0                4d14h
hvst-upgrade-8fkgc-upgradelog-agent-fluentbit-sls67         1/1     Running     1 (4d11h ago)    4d14h
hvst-upgrade-8fkgc-upgradelog-downloader-5887c4545d-t8lbf   1/1     Running     0                4d15h
hvst-upgrade-8fkgc-upgradelog-infra-fluentd-0               2/2     Running     0                4d14h
hvst-upgrade-8fkgc-upgradelog-packager-rbhr8-tcqc8          0/1     Completed   0                17h
Copy code
# kubectl get nodes
NAME           STATUS                     ROLES                       AGE    VERSION
cnfra-node10   Ready                      control-plane,etcd,master   432d   v1.31.7+rke2r1
cnfra-node8    Ready,SchedulingDisabled   control-plane,etcd,master   432d   v1.32.4+rke2r1
cnfra-node9    Ready                      control-plane,etcd,master   432d   v1.31.7+rke2r1
--- As per upgrade flow, we are in phase 4, and we didn't want to restart upgrade. Also, I'm attaching support bundle, maybe someone can help, how to proceed?
e
Hi Maksat, I believe the upgrade isn't progressing because the Harvester deployment (
kubectl -n harvester-system get deployment harvester
) specifies that during upgrades at most one replica may be unavailable. However as long as
cnfra-node8
is cordoned, only two out of the three replicas of that deployment can be scheduled. Therefore it's not possible to upgrade another node without going below the minimum availability of that deployment. I don't know why that node is cordoned, but you need to solve that before the upgrade can continue, because the upgrade waits until it is able to schedule a replica of the Harvester deployment on that node.
k
Hi Moritz, the
cnfra-node8
node cordoned by upgrade process, I didn't touch anything, what I did, is only reboot the server. One thing, might be is that, during migration one of the VM, which one is big(16vcpu,RAM 128GB and 2tb disk) I connected to the VM and shutdown it(to accelerate upgrade process). If I will uncordon the
cnfra-node8
node, how will be the longhorn part? As per my knowledge, the longhorn part will start sync, as the replica count is 3 and the node is back online all the volume will start replicating.