09/03/2022, 9:46 AM
09/03/2022, 10:50 AM
I'm thinking, kill the pod and see what happens...
I did that, and the logs seem to be showing the same looping behaviour.
Upgrading Harvester
<|> patched (no change)
<|> unchanged
<|> patched
<|> patched
Waiting for ManagedChart fleet-local/harvester from generation 11
Target version: 1.0.3, Target state: modified
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
Any ideas what I do now?
From what I can tell from the bash script that is running, the script is looping around
kubectl get <|> harvester -n fleet-local -o yaml
and extracting some values. The summary at the end of the yaml says:
    desiredReady: 1
    errApplied: 1
    - bundleState: ErrApplied
      message: another operation (install/upgrade/rollback) is in progress
      name: fleet-local/local
    ready: 0
  unavailable: 1
  unavailablePartitions: 0
Is there anything else I can kill or kick in order to get past this issue?


09/05/2022, 1:26 AM
Hi Mark, could you use helm command to check what’s the previous revision and if the workaround works:


09/05/2022, 10:21 AM
Thanks for the hint. Inspired by the information in that ticket I discovered that the
command I needed was
helm rollback harvester
. After running that and waiting a few minutes, the pod logs of the upgrade job I was watching started changing, as the deployment happened, the
Upgrading System Service
has successfully completed and one of the Harvester nods is in the
state. Hopefully the node won't get stuck in this state... It's only been 30 minutes...
Humm... It's been sitting at
for over 4 hours, the upgrade job has completed, but there is no reboot happening.
$ kubectl get jobs -n harvester-system -l <|> --context=harvester003
NAME                                        COMPLETIONS   DURATION   AGE
hvst-upgrade-bz6gs-pre-drain-harvester003   1/1           5m         4h7m
Maybe I should reboot the node manually and see what happens...
Yep - a reboot of the cordoned node caused the process to continue, the node updated itself and the process has moved on to cordon the next node.
It ended up that I needed to manually 2 out of the three nodes to make the upgrade complete.