https://rancher.com/ logo
Title
s

sticky-summer-13450

09/03/2022, 9:46 AM
Harvester v1.0.2 to v1.0.3 upgrade. It's been hung at the Phase 3: Upgrade system services phase for 14 hours, logging every 5 seconds that it's still waiting in the
ErrApplied
state, as seen in the logs of the pod running the upgrade job - as found using the command in that trouble-shooting page. Unfortunately that troubleshooting section does not say what to do to remediate any issues at this stage. I'm thinking, kill the pod and see what happens...?
harvester: v1.0.3
harvesterChart: 1.0.3
os: Harvester v1.0.3
kubernetes: v1.22.12+rke2r1
rancher: v2.6.4-harvester3
monitoringChart: 100.1.0+up19.0.3
kubevirt: 0.49.0-2
rancherDependencies:
  fleet:
    chart: 100.0.3+up0.3.9
    app: 0.3.9
  fleet-crd:
    chart: 100.0.3+up0.3.9
    app: 0.3.9
  rancher-webhook:
    chart: 1.0.4+up0.2.5
    app: 0.2.5
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> patched
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched
<http://managedchart.management.cattle.io/rancher-monitoring|managedchart.management.cattle.io/rancher-monitoring> patched
<http://managedchart.management.cattle.io/rancher-monitoring-crd|managedchart.management.cattle.io/rancher-monitoring-crd> patched
Upgrading Rancher
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M100   129  100   129    0     0  43000      0 --:--:-- --:--:-- --:--:-- 43000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
^M  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0^M100 55.6M  100 55.6M    0     0  1326M      0 --:--:-- --:--:-- --:--:-- 1326M
time="2022-09-02T18:34:48Z" level=info msg="Extract mapping / => /tmp/upgrade/rancher"
time="2022-09-02T18:34:48Z" level=info msg="Checking local image archives in /tmp/upgrade/images for <http://index.docker.io/rancher/system-agent-installer-rancher:v2.6.4-harvester3|index.docker.io/rancher/system-agent-installer-rancher:v2.6.4-harvester3>"
time="2022-09-02T18:34:49Z" level=info msg="Extracting file run.sh to /tmp/upgrade/rancher/run.sh"
time="2022-09-02T18:34:49Z" level=info msg="Extracting file rancher-2.6.4-harvester3.tgz to /tmp/upgrade/rancher/rancher-2.6.4-harvester3.tgz"
time="2022-09-02T18:34:49Z" level=info msg="Extracting file helm to /tmp/upgrade/rancher/helm"
Rancher values:
bootstrapPassword: admin
features: multi-cluster-management=false,multi-cluster-management-agent=false
hostPort: 8443
ingress:
  enabled: false
noDefaultAdmin: false
rancherImage: rancher/rancher
rancherImagePullPolicy: IfNotPresent
rancherImageTag: v2.6.4-harvester3
replicas: -2
systemDefaultRegistry: ""
tls: external
useBundledSystemChart: true
Skip update Rancher. The version is already v2.6.4-harvester3
Upgrading Harvester Cluster Repo
deployment.apps/harvester-cluster-repo patched
Waiting for deployment "harvester-cluster-repo" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "harvester-cluster-repo" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "harvester-cluster-repo" rollout to finish: 1 old replicas are pending termination...
deployment "harvester-cluster-repo" successfully rolled out
<http://clusterrepo.catalog.cattle.io/harvester-charts|clusterrepo.catalog.cattle.io/harvester-charts> patched
Upgrading Harvester
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> configured
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> patched
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched
Waiting for ManagedChart fleet-local/harvester from generation 8
Target version: 1.0.3, Target state: modified
Current version: 1.0.3, Current state: null, Current generation: 8
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 10
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 10
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 10
Sleep for 5 seconds to retry
I'm thinking, kill the pod and see what happens...
I did that, and the logs seem to be showing the same looping behaviour.
...
Upgrading Harvester
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched (no change)
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> unchanged
<http://managedchart.management.cattle.io/harvester|managedchart.management.cattle.io/harvester> patched
<http://managedchart.management.cattle.io/harvester-crd|managedchart.management.cattle.io/harvester-crd> patched
Waiting for ManagedChart fleet-local/harvester from generation 11
Target version: 1.0.3, Target state: modified
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
Current version: 1.0.3, Current state: ErrApplied, Current generation: 12
Sleep for 5 seconds to retry
...
Any ideas what I do now?
From what I can tell from the bash script that is running, the script is looping around
kubectl get <http://managedcharts.management.cattle.io|managedcharts.management.cattle.io> harvester -n fleet-local -o yaml
and extracting some values. The summary at the end of the yaml says:
summary:
    desiredReady: 1
    errApplied: 1
    nonReadyResources:
    - bundleState: ErrApplied
      message: another operation (install/upgrade/rollback) is in progress
      name: fleet-local/local
    ready: 0
  unavailable: 1
  unavailablePartitions: 0
Is there anything else I can kill or kick in order to get past this issue?
p

prehistoric-balloon-31801

09/05/2022, 1:26 AM
Hi Mark, could you use helm command to check what’s the previous revision and if the workaround works: https://github.com/harvester/harvester/issues/2280#issuecomment-1132446373
s

sticky-summer-13450

09/05/2022, 1:55 PM
Humm... It's been sitting at
Pre-drained
for over 4 hours, the upgrade job has completed, but there is no reboot happening.
$ kubectl get jobs -n harvester-system -l <http://harvesterhci.io/upgradeComponent=node|harvesterhci.io/upgradeComponent=node> --context=harvester003
NAME                                        COMPLETIONS   DURATION   AGE
hvst-upgrade-bz6gs-pre-drain-harvester003   1/1           5m         4h7m
Maybe I should reboot the node manually and see what happens...
Yep - a reboot of the cordoned node caused the process to continue, the node updated itself and the process has moved on to cordon the next node.
It ended up that I needed to manually 2 out of the three nodes to make the upgrade complete.