This message was deleted.
# harvester
a
This message was deleted.
s
Support bundle after trying that upgrade process couple of times:
e
checking kubectl get bundles -A i think will tell you what the bundle is complaining about. Whether an easy fix or not, no idea 🙂 I do think it means the bundle is already unhappy before your upgrade began, so something leftover or something that has occured since your last upgrade. i think
👍 1
s
Copy code
$ kubectl get bundles -A --context=harvester
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 1/1                       
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging                           0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1
e
https://docs.harvesterhci.io/v1.3/upgrade/v1-1-2-to-v1-2-0#known-issues Maybe can use it to debug. I don't see an exact match to yours, but I do see out-of-sync being a symptom of others. If you didn't want to wait for the experts, i'd start using those debug tools to try and find more clues.
👍 1
s
I wonder if it'll turn out to be something to do with ssl-certificates... That caused the v1.1.3 to v1.2.0 upgrade a lot of problems.
e
Yeah I saw that section had the out-of-sync. I also have had issues with certs before and dont' really remembe the magical configuration I ended with. It gets a bit confusing especially when running addon-rancher too and wanting to use lets-encrypt certs for anything external facing..
⬆️ 1
s
Hi @ancient-pizza-13099, could you help to check this issue?
a
@sticky-summer-13450 There is an object with type
ManagedChart
name
rancher-logging
, which is
paused
, please use
kubectl edit managedchart -n fleet-local rancher-logging
to set
paused: false
, then wait a while to check it via
kubectl get bunlde -A
.
Copy code
- apiVersion: <http://management.cattle.io/v3|management.cattle.io/v3>
  kind: ManagedChart
    namespace: fleet-local

    chart: rancher-logging
    defaultNamespace: cattle-logging-system
    paused: true
    releaseName: rancher-logging
    repoName: harvester-charts
👍 1
s
Hi @ancient-pizza-13099 . Having changed the paused value to
false
...
Copy code
$ kubectl get managedchart -n fleet-local rancher-logging -o yaml --context=harvester|grep paused
  paused: false
How long should I wait for the bundle to not be in sync?
Copy code
$ kubectl get bundles -A --context=harvester --watch
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 1/1                       
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging                           0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1
It's been 10 minutes so far...
a
please kill the
fleet-agent-*
pod, and wait the new pod to be replaced, when it's still
OutOfSync
, please check the log of this new
fleet-agent-*
pod, see if something related to
rancher-logging
is there
👀 1
s
Nothing about
rancher-logging
in the first 5 minutes of watching the logs on the new
fleet-agent-*
pod:
Copy code
I0523 13:18:28.090204       1 leaderelection.go:248] attempting to acquire leader lease cattle-fleet-local-system/fleet-agent-lock...
I0523 13:18:28.154819       1 leaderelection.go:258] successfully acquired lease cattle-fleet-local-system/fleet-agent-lock
E0523 13:18:30.380053       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=Node controller"
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
E0523 13:18:30.404382       1 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>: 
E0523 13:18:30.412858       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T13:18:30Z" level=info msg="Starting <http://fleet.cattle.io/v1alpha1|fleet.cattle.io/v1alpha1>, Kind=BundleDeployment controller"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release harvester-crd"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release local-managed-system-agent"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release mcc-local-managed-system-upgrade-controller"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release harvester"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release fleet-agent-local"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release rancher-logging-crd"
time="2024-05-23T13:18:31Z" level=info msg="getting history for release rancher-monitoring-crd"
I0523 13:18:34.690814       1 request.go:601] Waited for 1.00462687s due to client-side throttling, not priority and fairness, request: GET:<https://10.53.0.1:443/api/v1/namespaces/kube-system/serviceaccounts?labelSelector=objectset.rio.cattle.io%2Fhash%3De852fa897f5eae59a44b4bfe186aad80b10b94b3>
time="2024-05-23T13:18:35Z" level=info msg="Deleting orphan bundle ID rke2, release kube-system/rke2-canal"
E0523 13:19:04.968560       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:19:43.286588       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:20:22.799857       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:20:43.405029       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:20:59.167852       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:21:39.514631       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:22:25.925259       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:22:39.169064       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
a
Copy code
status:
    conditions:
    - lastUpdateTime: "2023-09-11T20:44:09Z"
      message: OutOfSync(1) [Cluster fleet-local/local]
      status: "False"
      type: Ready
    - lastUpdateTime: "2023-09-11T20:44:09Z"
      status: "True"
      type: Processed
    - lastUpdateTime: "2023-09-11T20:46:31Z"
      message: no chart version found for rancher-logging-100.1.3+up3.17.7
      reason: Error
      status: "False"
      type: Defined
    display:
      readyClusters: 0/1
      state: OutOfSync
please
cat /oem/harvester.config
this managedchart is running with an non-existing chart version
100.1.3+up3.17.7
Copy code
runtimeversion: v1.24.7+rke2r1
rancherversion: v2.6.9
harvesterchartversion: 1.1.0
monitoringchartversion: 100.1.0+up19.0.3
systemsettings: {}
clusternetworks: {}
loggingchartversion: 100.1.3+up3.17.7
Is your cluster running with Harvester v1.1.0 ? or was it upgraded from v1.1.0 ?
If your harvester is running with
v1.2.0
the the
rancher-logging
version should be:
Copy code
loggingchartversion: 102.0.0+up3.17.10
please try to
kubectl edit managedchart -n fleet-local rancher-logging
, and set the version to
102.0.0+up3.17.10
, and then watch the
status
of the
managedchart rancher-logging
, see if above error messages are gone.
The
rancher-logging-crd
managedchart is using the correct version; and
rancher-logging
should also use this version
Copy code
spec:
    chart: rancher-logging-crd
    defaultNamespace: cattle-logging-system
    paused: false
    releaseName: rancher-logging-crd
    repoName: harvester-charts
    targets:
    - clusterName: local
      clusterSelector:
        matchExpressions:
        - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
          operator: DoesNotExist
    version: 102.0.0+up3.17.10
s
This cluster, that's been in existence for over 2 years, has been through loads of versions - yes, it was upgraded from v1.1.x, with all of the various fun those old upgrades had.
Copy code
> sudo cat /oem/harvester.config
...
runtimeversion: v1.21.7+rke2r1
harvesterchartversion: 1.0.0
monitoringchartversion: 100.0.0+up16.6.0
systemsettings: {}
I stuck with v1.2.0 because the last upgrade from v1.1.x was a bit painful, and v1.2.0 has worked so nicely 🙂
give me a moment and I'll patch that CRD.
a
don't patch the CRD
just patch
rancher-logging managedchart
s
Sorry - that's what I meant!
👍 1
Well, that's made
fleet-agent-*
pod log be more exciting:
Copy code
E0523 14:19:31.989373       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 14:20:22.698582       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T14:20:22Z" level=info msg="preparing upgrade for rancher-logging"
time="2024-05-23T14:20:22Z" level=info msg="performing update for rancher-logging"
time="2024-05-23T14:20:22Z" level=info msg="getting history for release rancher-logging"
time="2024-05-23T14:20:23Z" level=error msg="error syncing 'cluster-fleet-local-local-1a3d67d0a899/mcc-rancher-logging': handler bundle-deploy: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: \"rancher-logging-rke2-journald-aggregator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first, resource mapping not found for name: \"psp.logging-operator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first], handler bundle-trigger: the server could not find the requested resource, requeuing"
E0523 14:20:23.346324       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T14:20:23Z" level=info msg="preparing upgrade for rancher-logging"
time="2024-05-23T14:20:23Z" level=info msg="performing update for rancher-logging"
time="2024-05-23T14:20:23Z" level=error msg="error syncing 'cluster-fleet-local-local-1a3d67d0a899/mcc-rancher-logging': handler bundle-deploy: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: \"rancher-logging-rke2-journald-aggregator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first, resource mapping not found for name: \"psp.logging-operator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first], handler bundle-trigger: the server could not find the requested resource, requeuing"
E0523 14:20:23.741982       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T14:20:23Z" level=info msg="preparing upgrade for rancher-logging"
Copy code
$ kubectl get bundles -A --context=harvester
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 1/1                       
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging                           0/1                       ErrApplied(1) [Cluster fleet-local/local: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: "rancher-logging-rke2-journald-aggregator" namespace: "cattle-logging-system" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"...
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1
a
please double check the
managedchart rancher-logging-crd
in your cluster
Copy code
spec:
    chart: rancher-logging-crd
    defaultNamespace: cattle-logging-system
    paused: false
    releaseName: rancher-logging-crd
    repoName: harvester-charts
    targets:
    - clusterName: local
      clusterSelector:
        matchExpressions:
        - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
          operator: DoesNotExist
    version: 102.0.0+up3.17.10
the
spec.version
👀 1
kubectl get <http://addons.harvesterhci.io|addons.harvesterhci.io> -A
s
rancher-logging-crd
and
rancher-logging
`version`s are identical - after I edited
rancher-logging
as you suggested above:
Copy code
apiVersion: <http://management.cattle.io/v3|management.cattle.io/v3>
kind: ManagedChart
spec:
  chart: rancher-logging-crd
  defaultNamespace: cattle-logging-system
  paused: false
  releaseName: rancher-logging-crd
  repoName: harvester-charts
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
        operator: DoesNotExist
  version: 102.0.0+up3.17.10
Copy code
apiVersion: <http://management.cattle.io/v3|management.cattle.io/v3>
kind: ManagedChart
spec:
  chart: rancher-logging
  defaultNamespace: cattle-logging-system
  paused: false
  releaseName: rancher-logging
  repoName: harvester-charts
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
        operator: DoesNotExist
  version: 102.0.0+up3.17.10
Copy code
$ kubectl get <http://addons.harvesterhci.io|addons.harvesterhci.io> -A --context=harvester003
NAMESPACE          NAME                    HELMREPO                                                 CHARTNAME                         ENABLED
harvester-system   harvester-seeder        <http://harvester-cluster-repo.cattle-system.svc/charts>   harvester-seeder                  false
harvester-system   pcidevices-controller   <http://harvester-cluster-repo.cattle-system.svc/charts>   harvester-pcidevices-controller   false
harvester-system   vm-import-controller    <http://harvester-cluster-repo.cattle-system.svc/charts>   harvester-vm-import-controller    false
a
It is quite tricky, your cluster also does not have
rancher-monitoring managedchart
I guess in the Harvester dashboard gui, you have not grafana which shows the cluster & vm statistics pictures.
👀 1
s
I guess not - I don't think I've ever seen any Grafana style monitoring. There's nothing obvious missing in the Harvester dashboard...
I don't think there's anything I've actively done to not have
rancher-monitoring managedchart
. It's probably a victim of upgrades not going 100% to plan.
a
Form the support-bundle, there are such upgrade history:
Copy code
kind: Upgrade
  metadata:
    creationTimestamp: "2022-05-29T10:39:08Z"

    previousVersion: v1.0.0
    repoInfo: |
      release:
          harvester: v1.0.1
          harvesterChart: 1.0.1
          os: Harvester v1.0.1
          kubernetes: v1.21.11+rke2r1
          rancher: v2.6.4-harvester3
          monitoringChart: 100.1.0+up19.0.3
          


    creationTimestamp: "2022-05-29T18:00:57Z"

    previousVersion: v1.0.1
    repoInfo: |
      release:
          harvester: v1.0.2
          harvesterChart: 1.0.2
          os: Harvester v1.0.2                  



    creationTimestamp: "2022-09-02T18:04:51Z"
    previousVersion: v1.0.2
    repoInfo: |
      release:
          harvester: v1.0.3
          harvesterChart: 1.0.3
          os: Harvester v1.0.3
          kubernetes: v1.22.12+rke2r1


    creationTimestamp: "2022-12-04T10:49:21Z"
    previousVersion: v1.1.0
    repoInfo: |
      release:
          harvester: v1.1.1
          harvesterChart: 1.1.1
          os: Harvester v1.1.1



    creationTimestamp: "2023-04-27T17:56:01Z"
    previousVersion: v1.1.1
    repoInfo: |
      release:
          harvester: v1.1.2
          harvesterChart: 1.1.2
          os: Harvester v1.1.2
          kubernetes: v1.24.11+rke2r1


    creationTimestamp: "2023-10-02T07:31:21Z"
    previousVersion: v1.2.0
    repoInfo: |
      release:
          harvester: v1.2.0
          harvesterChart: 1.2.0
          os: Harvester v1.2.0
What is missing is the
v103->v110
let me figure how to recover them
s
Thank you. I'll be offline for the next 15 hours now.
I'd still really like to upgrade this Harvester cluster, so if you do have any clue how to have the right components running to allow that I would be very grateful 🙂
a
@sticky-summer-13450 Sorry for the delay, we had some holidays. I will continue this.
@sticky-summer-13450 As the
rancher-logging
is not working as expected, we can delete this managedchart first; after upgrade, then we create it back. (in v1.3.0, this is an
addon
https://docs.harvesterhci.io/v1.3/advanced/addons )
kubeclt delete managedchart -n fleet-local rancher-logging
👀 1
s
Oh - great. Thanks, I'll give that a go.
👍 1
a
Ping me with your new update; if everything goes well and your cluster upgrades to v1.3.0, then let's try to create the new addon.
s
Thanks @ancient-pizza-13099 - sorry for my delay in responding. I ran that command:
Copy code
$ kubectl delete managedchart -n fleet-local rancher-logging --context harvester003
<http://managedchart.management.cattle.io|managedchart.management.cattle.io> "rancher-logging" deleted
And something happened in the
fleet-agent-*
log:
Copy code
E0607 16:24:49.225869       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-06-07T16:24:49Z" level=info msg="uninstall: Deleting rancher-logging"
time="2024-06-07T16:24:49Z" level=info msg="uninstall: Failed to delete release: [unable to build kubernetes objects for delete: [resource mapping not found for name: \"rancher-logging-rke2-journald-aggregator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first, resource mapping not found for name: \"psp.logging-operator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first]]"
time="2024-06-07T16:24:49Z" level=error msg="error syncing 'cluster-fleet-local-local-1a3d67d0a899/mcc-rancher-logging': handler bundle-cleanup: failed to delete release: rancher-logging, requeuing"
time="2024-06-07T16:25:26Z" level=info msg="Deleting orphan bundle ID rke2, release kube-system/rke2-canal"
E0607 16:25:33.707626       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
But the upgrade dialogue complained about not enough disk space in
/usr/local
Copy code
admission webhook "<http://validator.harvesterhci.io|validator.harvesterhci.io>" denied the request: Node "harvester001" has insufficient free system partition space 23.51GiB (df -h '/usr/local/'). The upgrade requires at least 30GiB of free system partition space on each node.
It seems that 70GB is in use in the folder
/usr/local/.state
.
Copy code
harvester001:/usr/local # df -h /usr/local
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p5   98G   70G   24G  75% /usr/local
harvester001:/usr/local # find . -maxdepth 1 -type d -exec du -sh \{} \;
70G	.
164M	./bin
16K	./upgrade_tmp
8.0K	./etc
16K	./lost+found
4.0K	./cloud-config
28K	./share
28K	./lib
70G	./.state
This is the same on all three nodes. Each node has one 2TB NVME and it's completely partitioned/managed by Harvester.
a
@sticky-summer-13450 You can refer https://docs.harvesterhci.io/v1.3/upgrade/index/#free-system-partition-space-requirement to add an annotation
<http://harvesterhci.io/minFreeDiskSpaceGB|harvesterhci.io/minFreeDiskSpaceGB>
to lower the required disk for bypassing the check. After long time running, ther are some old container images / temp files which may occupy system partition spaces; the newer Harvester version will try to clean old container images automatically.
s
Thanks - I'd missed that. I'll look now
🤝 1
I did that, but nothing has happened in 24 hours. The "Download Logs" option doesn't exist. I could try starting over but that's caused problems in the past.
a
kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> -A
then take the yaml output of the latest one.
s
Copy code
$ kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> hvst-upgrade-9smf2 -o yaml -n harvester-system 
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  creationTimestamp: "2024-06-13T08:19:36Z"
  finalizers:
  - <http://wrangler.cattle.io/harvester-upgrade-controller|wrangler.cattle.io/harvester-upgrade-controller>
  generateName: hvst-upgrade-
  generation: 2
  labels:
    <http://harvesterhci.io/latestUpgrade|harvesterhci.io/latestUpgrade>: "true"
    <http://harvesterhci.io/upgradeState|harvesterhci.io/upgradeState>: PreparingLoggingInfra
  name: hvst-upgrade-9smf2
  namespace: harvester-system
  resourceVersion: "1315869202"
  uid: 54bea1a3-5344-48d0-a8bb-328279c98aa9
spec:
  image: ""
  logEnabled: true
  version: v1.2.1
status:
  conditions:
  - status: Unknown
    type: Completed
  - status: Unknown
    type: LogReady
  previousVersion: v1.2.0
  upgradeLog: hvst-upgrade-9smf2-upgradelog
a
The
upgrade log
depends on the
rancher-logging crd
; you may delete this object, and upgrade again,
untick
the
enableLog
option
s
Ah - that makes sense 🙂
a
if the
enableLog
option is not there, then manually create the `Upgrade`object like
Copy code
$ kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> hvst-upgrade-9smf2 -o yaml -n harvester-system 
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  name: hvst-upgrade-9smf2
  namespace: harvester-system
spec:
  image: ""
  logEnabled: false
  version: v1.2.1
👍 1
s
I get this using the GUI, so I'll try the non-GUI method
no - unfortunately that doesn't work:
Copy code
$ echo "apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  name: hvst-upgrade-9smf2
  namespace: harvester-system
spec:
  image: ""
  logEnabled: false
  version: v1.2.1" | kubectl apply -f - -n harvester-system

Error from server (BadRequest): error when creating "STDIN": admission webhook "<http://validator.harvesterhci.io|validator.harvesterhci.io>" denied the request: managed chart hvst-upgrade-9smf2-upgradelog-operator is not ready, please wait for it to be ready
a
manually delete this managedchart
managed chart hvst-upgrade-9smf2-upgradelog-operator
previous upgrade has some left resources
s
Understood
you are Awesome Jian - the upgrade image is downloading.
🤞
🤝 1
It looks like the Harvester services have been updated but the nodes don't seem to be draining and being upgraded.
Is the upgrade of the nodes also managed by the
fleet-agent-*
pod? After all of the activity of updating the service components the Fleet agent is only doing it's usual logging about
Copy code
E0614 15:40:22.691390       1 memcache.go:206] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1
a
cc @red-king-19196 any insights?
kubectl get upgrade
and output the yaml of latest object, let's check it's status now
👍 1
<http://upgrade.harvesterhci.io|upgrade.harvesterhci.io>
s
Copy code
$ kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> hvst-upgrade-9smf2 -o yaml -n harvester-system --context harvester003
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  annotations:
    <http://harvesterhci.io/replica-replenishment-wait-interval|harvesterhci.io/replica-replenishment-wait-interval>: "600"
    <http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
      {"apiVersion":"<http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>","kind":"Upgrade","metadata":{"annotations":{},"name":"hvst-upgrade-9smf2","namespace":"harvester-system"},"spec":{"image":"","logEnabled":false,"version":"v1.2.1"}}
  creationTimestamp: "2024-06-14T14:27:09Z"
  finalizers:
  - <http://wrangler.cattle.io/harvester-upgrade-controller|wrangler.cattle.io/harvester-upgrade-controller>
  generation: 17
  labels:
    <http://harvesterhci.io/latestUpgrade|harvesterhci.io/latestUpgrade>: "true"
    <http://harvesterhci.io/upgradeState|harvesterhci.io/upgradeState>: UpgradingNodes
  name: hvst-upgrade-9smf2
  namespace: harvester-system
  resourceVersion: "1317770693"
  uid: c2d96a42-49c2-4601-b5a1-3fb05173d8a8
spec:
  image: ""
  logEnabled: false
  version: v1.2.1
status:
  conditions:
  - status: Unknown
    type: Completed
  - lastUpdateTime: "2024-06-14T14:27:09Z"
    message: Upgrade observability is administratively disabled
    reason: Disabled
    status: "False"
    type: LogReady
  - lastUpdateTime: "2024-06-14T14:48:35Z"
    status: "True"
    type: ImageReady
  - lastUpdateTime: "2024-06-14T14:52:08Z"
    status: "True"
    type: RepoReady
  - lastUpdateTime: "2024-06-14T14:59:27Z"
    status: "True"
    type: NodesPrepared
  - lastUpdateTime: "2024-06-14T15:00:56Z"
    status: "True"
    type: SystemServicesUpgraded
  - status: Unknown
    type: NodesUpgraded
  imageID: harvester-system/hvst-upgrade-9smf2
  nodeStatuses:
    harvester001:
      state: Images preloaded
    harvester002:
      state: Images preloaded
    harvester003:
      state: Images preloaded
  previousVersion: v1.2.0
  repoInfo: |
    release:
        harvester: v1.2.1
        harvesterChart: 1.2.1
        os: Harvester v1.2.1
        kubernetes: v1.25.9+rke2r1
        rancher: v2.7.5
        monitoringChart: 102.0.0+up40.1.2
        minUpgradableVersion: v1.1.2
r
Need to check the cluster state:
Copy code
kubectl -n fleet-local get clusters.provisioning local -o yaml
Also the log from rancher pod (find out the leader pod)
👀 1
s
Copy code
$ kubectl -n fleet-local get clusters.provisioning local -o yaml --context harvester003
apiVersion: <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>
kind: Cluster
metadata:
  annotations:
    <http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
      {"apiVersion":"<http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>","kind":"Cluster","metadata":{"annotations":{},"labels":{"<http://rke.cattle.io/init-node-machine-id|rke.cattle.io/init-node-machine-id>":"fr5bvk6lt5mrfbnls4swkkmgqgnfnfj2wqwcwf4lzxflhr8ldv9dml"},"name":"local","namespace":"fleet-local"},"spec":{"kubernetesVersion":"v1.21.7+rke2r1","rkeConfig":{"controlPlaneConfig":null}}}
    <http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/4yQzU7DMBCEXwXt2Slt079Y4oAQ4sCVF9jYS2Ow15G9CYfK746SVqJC4udo78xovjlBIEGLgqBPgMxRUFzkPD1j+0ZGMskiubgwKOJp4eKts6ChT3F02UV2fKyMH7JQqkwiFAL1ozV+MKXqOL6DhoCMRwrEciUYa3Xz7NjePZwj/8xiDAQafDTo/yXOPZrJAUXB3NdFfnGBsmDoQfPgvQKPLflfR+gwd6Bhu9ztt3XdUGNwc7Crdr9u6jW1y/pg91vb2LXdbHarA6jzYpbSVwho6DCNNIMWBd9Yrtu+eiKpzpeiIPdkpnbzx2Wq+0G6R7Z9dCygT2WSCcpwwciURrJPxJRmZtDLUj4DAAD//5CVWGcAAgAA
    <http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: provisioning-cluster-create
    <http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=Cluster
    <http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: local
    <http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: ""
  creationTimestamp: "2022-01-22T15:25:52Z"
  finalizers:
  - <http://wrangler.cattle.io/provisioning-cluster-remove|wrangler.cattle.io/provisioning-cluster-remove>
  - <http://wrangler.cattle.io/rke-cluster-remove|wrangler.cattle.io/rke-cluster-remove>
  generation: 17
  labels:
    <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 50675339e9ca48d1b72932eb038d75d9d2d44618
    <http://provider.cattle.io|provider.cattle.io>: harvester
  name: local
  namespace: fleet-local
  resourceVersion: "1317770694"
  uid: 9a5208ed-bb43-4b24-8b5a-c63df8b761ce
spec:
  kubernetesVersion: v1.25.9+rke2r1
  localClusterAuthEndpoint: {}
  rkeConfig:
    chartValues: null
    machineGlobalConfig: null
    provisionGeneration: 4
    upgradeStrategy:
      controlPlaneConcurrency: "1"
      controlPlaneDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: true
        force: true
        gracePeriod: 0
        ignoreDaemonSets: true
        postDrainHooks:
        - annotation: <http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>
        preDrainHooks:
        - annotation: <http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 0
      workerConcurrency: "1"
      workerDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: true
        force: true
        gracePeriod: 0
        ignoreDaemonSets: true
        postDrainHooks:
        - annotation: <http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>
        preDrainHooks:
        - annotation: <http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 0
status:
  clientSecretName: local-kubeconfig
  clusterName: local
  conditions:
  - lastUpdateTime: "2023-09-11T20:46:55Z"
    message: custom-13b01cc43f01,custom-2666fac666a5
    reason: Waiting
    status: Unknown
    type: Ready
  - lastUpdateTime: "2022-01-22T15:25:52Z"
    status: "False"
    type: Reconciling
  - lastUpdateTime: "2022-01-22T15:25:52Z"
    status: "False"
    type: Stalled
  - lastUpdateTime: "2024-06-14T15:00:56Z"
    status: "True"
    type: Created
  - lastUpdateTime: "2024-05-08T06:47:03Z"
    status: "True"
    type: RKECluster
  - status: Unknown
    type: DefaultProjectCreated
  - status: Unknown
    type: SystemProjectCreated
  - lastUpdateTime: "2024-05-08T06:47:03Z"
    message: 'configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes:
      kube-controller-manager, kube-scheduler'
    reason: Waiting
    status: Unknown
    type: Provisioned
  - lastUpdateTime: "2024-05-08T06:47:03Z"
    message: 'configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes:
      kube-controller-manager, kube-scheduler'
    reason: Waiting
    status: Unknown
    type: Updated
  - lastUpdateTime: "2022-10-28T18:24:35Z"
    status: "True"
    type: Connected
  observedGeneration: 17
  ready: true
How much of the
rancher-*
pod logs do you want - this looks like the active pod...
Copy code
2024/06/14 15:41:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
W0614 15:45:08.409704      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.981953      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.982009      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.983346      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.984509      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:10.413736      33 transport.go:313] Unable to cancel request for *client.addQuery
2024/06/14 15:46:02 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:46:02 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:46:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 15:47:23 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:47:23 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:48:49 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:48:49 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:51:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 15:55:17 [ERROR] Error during subscribe websocket: close sent
2024/06/14 15:56:03 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:56:03 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:56:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 15:57:25 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:57:25 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:58:53 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:58:53 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 16:01:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 16:06:05 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 16:06:05 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 16:06:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 16:07:27 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 16:07:27 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 16:08:54 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 16:08:54 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
r
Is this cluster running more than one year?
s
The cluster was created 874 days ago, so yes
r
I just found out that you hit this before. Would you like to rerun the check? https://github.com/harvester/harvester/issues/3863#issuecomment-1539681311
👀 1
s
Haha - yep, I get FAIL FAIL again. I'll follow my instructions...
r
It’s kind of a milestone… Congrats your cluster survived for another year 😆
🎉 1
BTW, this should not happen again post v1.3…
👍 1
a
874 days, is a live evidence that Harvester is fairly good 🙂
s
Hell yes! Although I have lost quite a lot of hair at upgrade times. Other than that, it's absolutely fabulous 🙂
👍 1
I've followed those instructions and I get [OK] [OK] now. I don't recall if I need to wait for a while, bounce a pod, or bounce a node to get things moving again.
👍 1
r
If you followed the instructions, you’ve already restarted those pods. Just wait a bit for rancher to do its job. If it doesn’t proceed, we can check again what’s the new blocker 🙈
s
Ah - I needed to do the same update of the certs on all three nodes of the cluster. Now things are moving :-)
👍 2
Thanks so much :-) The 1.2.0 -> 1.2.1 upgrade has completed successfully and I’m now starting the 1.2.1 -> 1.2.2.
And the 1.2.1 -> 1.2.2 completed without a hitch 🙂
🎉 2