This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

09/03/2022, 9:46 AM

This message was deleted.

sticky-summer-13450

09/03/2022, 10:50 AM

I'm thinking, kill the pod and see what happens...

I did that, and the logs seem to be showing the same looping behaviour.

Copy code

...
Upgrading Harvester
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched (no change)
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> unchanged
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> patched
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched
Waiting for ManagedChart fleet-local/harvester from generation 11
Target version: 1.0.3, Target state: modified
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
...

Any ideas what I do now?

sticky-summer-13450

09/04/2022, 8:49 AM

From what I can tell from the bash script that is running, the script is looping around

kubectl get <http://managedcharts.management.cattle.io|managedcharts.management.cattle.io> harvester -n fleet-local -o yaml

and extracting some values. The summary at the end of the yaml says:

Copy code

summary:
    desiredReady: 1
    errApplied: 1
    nonReadyResources:
    - bundleState: ErrApplied
      message: another operation (install/upgrade/rollback) is in progress
      name: fleet-local/local
    ready: 0
  unavailable: 1
  unavailablePartitions: 0

Is there anything else I can kill or kick in order to get past this issue?

prehistoric-balloon-31801

09/05/2022, 1:26 AM

Hi Mark, could you use helm command to check what’s the previous revision and if the workaround works: https://github.com/harvester/harvester/issues/2280#issuecomment-1132446373

sticky-summer-13450

09/05/2022, 10:21 AM

Thanks for the hint. Inspired by the information in that ticket I discovered that the

helm

command I needed was

helm rollback harvester

. After running that and waiting a few minutes, the pod logs of the upgrade job I was watching started changing, as the deployment happened, the

Upgrading System Service

has successfully completed and one of the Harvester nods is in the

pre-drained

state. Hopefully the node won't get stuck in this state... It's only been 30 minutes...

sticky-summer-13450

09/05/2022, 1:55 PM

Humm... It's been sitting at

Pre-drained

for over 4 hours, the upgrade job has completed, but there is no reboot happening.

Copy code

$ kubectl get jobs -n harvester-system -l <http://harvesterhci.io/upgradeComponent=node|harvesterhci.io/upgradeComponent=node> --context=harvester003
NAME                                        COMPLETIONS   DURATION   AGE
hvst-upgrade-bz6gs-pre-drain-harvester003   1/1           5m         4h7m

Maybe I should reboot the node manually and see what happens...

sticky-summer-13450

09/06/2022, 11:14 AM

Yep - a reboot of the cordoned node caused the process to continue, the node updated itself and the process has moved on to cordon the next node.

sticky-summer-13450

09/06/2022, 11:14 AM

It ended up that I needed to manually 2 out of the three nodes to make the upgrade complete.

16 Views

Open in Slack

Previous Next