This message was deleted.
# harvester
a
This message was deleted.
s
I'm thinking, kill the pod and see what happens...
I did that, and the logs seem to be showing the same looping behaviour.
Copy code
...
Upgrading Harvester
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched (no change)
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> unchanged
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> patched
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched
Waiting for ManagedChart fleet-local/harvester from generation 11
Target version: 1.0.3, Target state: modified
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
...
Any ideas what I do now?
From what I can tell from the bash script that is running, the script is looping around
kubectl get <http://managedcharts.management.cattle.io|managedcharts.management.cattle.io> harvester -n fleet-local -o yaml
and extracting some values. The summary at the end of the yaml says:
Copy code
summary:
    desiredReady: 1
    errApplied: 1
    nonReadyResources:
    - bundleState: ErrApplied
      message: another operation (install/upgrade/rollback) is in progress
      name: fleet-local/local
    ready: 0
  unavailable: 1
  unavailablePartitions: 0
Is there anything else I can kill or kick in order to get past this issue?
p
Hi Mark, could you use helm command to check what’s the previous revision and if the workaround works: https://github.com/harvester/harvester/issues/2280#issuecomment-1132446373
s
Thanks for the hint. Inspired by the information in that ticket I discovered that the
helm
command I needed was
helm rollback harvester
. After running that and waiting a few minutes, the pod logs of the upgrade job I was watching started changing, as the deployment happened, the
Upgrading System Service
has successfully completed and one of the Harvester nods is in the
pre-drained
state. Hopefully the node won't get stuck in this state... It's only been 30 minutes...
Humm... It's been sitting at
Pre-drained
for over 4 hours, the upgrade job has completed, but there is no reboot happening.
Copy code
$ kubectl get jobs -n harvester-system -l <http://harvesterhci.io/upgradeComponent=node|harvesterhci.io/upgradeComponent=node> --context=harvester003
NAME                                        COMPLETIONS   DURATION   AGE
hvst-upgrade-bz6gs-pre-drain-harvester003   1/1           5m         4h7m
Maybe I should reboot the node manually and see what happens...
Yep - a reboot of the cordoned node caused the process to continue, the node updated itself and the process has moved on to cordon the next node.
It ended up that I needed to manually 2 out of the three nodes to make the upgrade complete.