This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

05/17/2024, 4:39 PM

This message was deleted.

sticky-summer-13450

05/17/2024, 4:50 PM

Support bundle after trying that upgrade process couple of times:

supportbundle_5bb44244-434e-4530-ad35-35c4ef1ff661_2024-05-17T16-40-53Z.zip

enough-elephant-21781

05/17/2024, 4:51 PM

checking kubectl get bundles -A i think will tell you what the bundle is complaining about. Whether an easy fix or not, no idea 🙂 I do think it means the bundle is already unhappy before your upgrade began, so something leftover or something that has occured since your last upgrade. i think

👍 1

sticky-summer-13450

05/17/2024, 4:53 PM

Copy code

$ kubectl get bundles -A --context=harvester
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 1/1                       
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging                           0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1

enough-elephant-21781

05/17/2024, 4:55 PM

https://docs.harvesterhci.io/v1.3/upgrade/v1-1-2-to-v1-2-0#known-issues Maybe can use it to debug. I don't see an exact match to yours, but I do see out-of-sync being a symptom of others. If you didn't want to wait for the experts, i'd start using those debug tools to try and find more clues.

👍 1

sticky-summer-13450

05/17/2024, 5:01 PM

I wonder if it'll turn out to be something to do with ssl-certificates... That caused the v1.1.3 to v1.2.0 upgrade a lot of problems.

enough-elephant-21781

05/17/2024, 5:02 PM

Yeah I saw that section had the out-of-sync. I also have had issues with certs before and dont' really remembe the magical configuration I ended with. It gets a bit confusing especially when running addon-rancher too and wanting to use lets-encrypt certs for anything external facing..

⬆️ 1

salmon-city-57654

05/20/2024, 2:37 AM

Hi @ancient-pizza-13099, could you help to check this issue?

ancient-pizza-13099

05/22/2024, 10:28 AM

@sticky-summer-13450 There is an object with type

ManagedChart

name

rancher-logging

, which is

paused

, please use

kubectl edit managedchart -n fleet-local rancher-logging

to set

paused: false

, then wait a while to check it via

kubectl get bunlde -A

Copy code

- apiVersion: <http://management.cattle.io/v3|management.cattle.io/v3>
  kind: ManagedChart
    namespace: fleet-local

    chart: rancher-logging
    defaultNamespace: cattle-logging-system
    paused: true
    releaseName: rancher-logging
    repoName: harvester-charts

👍 1

sticky-summer-13450

05/23/2024, 10:54 AM

Hi @ancient-pizza-13099 . Having changed the paused value to

false

...

Copy code

$ kubectl get managedchart -n fleet-local rancher-logging -o yaml --context=harvester|grep paused
  paused: false

How long should I wait for the bundle to not be in sync?

Copy code

$ kubectl get bundles -A --context=harvester --watch
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 1/1                       
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging                           0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1

It's been 10 minutes so far...

ancient-pizza-13099

05/23/2024, 1:16 PM

please kill the

fleet-agent-*

pod, and wait the new pod to be replaced, when it's still

OutOfSync

, please check the log of this new

fleet-agent-*

pod, see if something related to

rancher-logging

is there

👀 1

sticky-summer-13450

05/23/2024, 1:24 PM

Nothing about

rancher-logging

in the first 5 minutes of watching the logs on the new

fleet-agent-*

pod:

Copy code

I0523 13:18:28.090204       1 leaderelection.go:248] attempting to acquire leader lease cattle-fleet-local-system/fleet-agent-lock...
I0523 13:18:28.154819       1 leaderelection.go:258] successfully acquired lease cattle-fleet-local-system/fleet-agent-lock
E0523 13:18:30.380053       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=Node controller"
time="2024-05-23T13:18:30Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
E0523 13:18:30.404382       1 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>: 
E0523 13:18:30.412858       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T13:18:30Z" level=info msg="Starting <http://fleet.cattle.io/v1alpha1|fleet.cattle.io/v1alpha1>, Kind=BundleDeployment controller"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release harvester-crd"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release local-managed-system-agent"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release mcc-local-managed-system-upgrade-controller"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release harvester"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release fleet-agent-local"
time="2024-05-23T13:18:30Z" level=info msg="getting history for release rancher-logging-crd"
time="2024-05-23T13:18:31Z" level=info msg="getting history for release rancher-monitoring-crd"
I0523 13:18:34.690814       1 request.go:601] Waited for 1.00462687s due to client-side throttling, not priority and fairness, request: GET:<https://10.53.0.1:443/api/v1/namespaces/kube-system/serviceaccounts?labelSelector=objectset.rio.cattle.io%2Fhash%3De852fa897f5eae59a44b4bfe186aad80b10b94b3>
time="2024-05-23T13:18:35Z" level=info msg="Deleting orphan bundle ID rke2, release kube-system/rke2-canal"
E0523 13:19:04.968560       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:19:43.286588       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:20:22.799857       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:20:43.405029       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:20:59.167852       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:21:39.514631       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:22:25.925259       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 13:22:39.169064       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>

ancient-pizza-13099

05/23/2024, 1:57 PM

Copy code

status:
    conditions:
    - lastUpdateTime: "2023-09-11T20:44:09Z"
      message: OutOfSync(1) [Cluster fleet-local/local]
      status: "False"
      type: Ready
    - lastUpdateTime: "2023-09-11T20:44:09Z"
      status: "True"
      type: Processed
    - lastUpdateTime: "2023-09-11T20:46:31Z"
      message: no chart version found for rancher-logging-100.1.3+up3.17.7
      reason: Error
      status: "False"
      type: Defined
    display:
      readyClusters: 0/1
      state: OutOfSync

ancient-pizza-13099

05/23/2024, 1:58 PM

please

cat /oem/harvester.config

ancient-pizza-13099

05/23/2024, 1:58 PM

this managedchart is running with an non-existing chart version

ancient-pizza-13099

05/23/2024, 1:58 PM

100.1.3+up3.17.7

ancient-pizza-13099

05/23/2024, 2:00 PM

Copy code

runtimeversion: v1.24.7+rke2r1
rancherversion: v2.6.9
harvesterchartversion: 1.1.0
monitoringchartversion: 100.1.0+up19.0.3
systemsettings: {}
clusternetworks: {}
loggingchartversion: 100.1.3+up3.17.7

ancient-pizza-13099

05/23/2024, 2:00 PM

Is your cluster running with Harvester v1.1.0 ? or was it upgraded from v1.1.0 ?

ancient-pizza-13099

05/23/2024, 2:02 PM

If your harvester is running with

v1.2.0

the the

rancher-logging

version should be:

Copy code

loggingchartversion: 102.0.0+up3.17.10

ancient-pizza-13099

05/23/2024, 2:04 PM

please try to

kubectl edit managedchart -n fleet-local rancher-logging

, and set the version to

102.0.0+up3.17.10

, and then watch the

status

of the

managedchart rancher-logging

, see if above error messages are gone.

ancient-pizza-13099

05/23/2024, 2:09 PM

The

rancher-logging-crd

managedchart is using the correct version; and

rancher-logging

should also use this version

Copy code

spec:
    chart: rancher-logging-crd
    defaultNamespace: cattle-logging-system
    paused: false
    releaseName: rancher-logging-crd
    repoName: harvester-charts
    targets:
    - clusterName: local
      clusterSelector:
        matchExpressions:
        - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
          operator: DoesNotExist
    version: 102.0.0+up3.17.10

sticky-summer-13450

05/23/2024, 2:10 PM

This cluster, that's been in existence for over 2 years, has been through loads of versions - yes, it was upgraded from v1.1.x, with all of the various fun those old upgrades had.

Copy code

> sudo cat /oem/harvester.config
...
runtimeversion: v1.21.7+rke2r1
harvesterchartversion: 1.0.0
monitoringchartversion: 100.0.0+up16.6.0
systemsettings: {}

sticky-summer-13450

05/23/2024, 2:12 PM

I stuck with v1.2.0 because the last upgrade from v1.1.x was a bit painful, and v1.2.0 has worked so nicely 🙂

sticky-summer-13450

05/23/2024, 2:12 PM

give me a moment and I'll patch that CRD.

ancient-pizza-13099

05/23/2024, 2:15 PM

don't patch the CRD

ancient-pizza-13099

05/23/2024, 2:15 PM

just patch

rancher-logging managedchart

sticky-summer-13450

05/23/2024, 2:16 PM

Sorry - that's what I meant!

👍 1

sticky-summer-13450

05/23/2024, 2:23 PM

Well, that's made

fleet-agent-*

pod log be more exciting:

Copy code

E0523 14:19:31.989373       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
E0523 14:20:22.698582       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T14:20:22Z" level=info msg="preparing upgrade for rancher-logging"
time="2024-05-23T14:20:22Z" level=info msg="performing update for rancher-logging"
time="2024-05-23T14:20:22Z" level=info msg="getting history for release rancher-logging"
time="2024-05-23T14:20:23Z" level=error msg="error syncing 'cluster-fleet-local-local-1a3d67d0a899/mcc-rancher-logging': handler bundle-deploy: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: \"rancher-logging-rke2-journald-aggregator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first, resource mapping not found for name: \"psp.logging-operator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first], handler bundle-trigger: the server could not find the requested resource, requeuing"
E0523 14:20:23.346324       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T14:20:23Z" level=info msg="preparing upgrade for rancher-logging"
time="2024-05-23T14:20:23Z" level=info msg="performing update for rancher-logging"
time="2024-05-23T14:20:23Z" level=error msg="error syncing 'cluster-fleet-local-local-1a3d67d0a899/mcc-rancher-logging': handler bundle-deploy: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: \"rancher-logging-rke2-journald-aggregator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first, resource mapping not found for name: \"psp.logging-operator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first], handler bundle-trigger: the server could not find the requested resource, requeuing"
E0523 14:20:23.741982       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-05-23T14:20:23Z" level=info msg="preparing upgrade for rancher-logging"

Copy code

$ kubectl get bundles -A --context=harvester
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 1/1                       
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging                           0/1                       ErrApplied(1) [Cluster fleet-local/local: unable to build kubernetes objects from current release manifest: [resource mapping not found for name: "rancher-logging-rke2-journald-aggregator" namespace: "cattle-logging-system" from "": no matches for kind "PodSecurityPolicy" in version "policy/v1beta1"...
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1

ancient-pizza-13099

05/23/2024, 2:27 PM

please double check the

managedchart rancher-logging-crd

in your cluster

Copy code

spec:
    chart: rancher-logging-crd
    defaultNamespace: cattle-logging-system
    paused: false
    releaseName: rancher-logging-crd
    repoName: harvester-charts
    targets:
    - clusterName: local
      clusterSelector:
        matchExpressions:
        - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
          operator: DoesNotExist
    version: 102.0.0+up3.17.10

ancient-pizza-13099

05/23/2024, 2:27 PM

the

spec.version

👀 1

ancient-pizza-13099

05/23/2024, 2:28 PM

kubectl get <http://addons.harvesterhci.io|addons.harvesterhci.io> -A

sticky-summer-13450

05/23/2024, 2:33 PM

rancher-logging-crd

and

rancher-logging

`version`s are identical - after I edited

rancher-logging

as you suggested above:

Copy code

apiVersion: <http://management.cattle.io/v3|management.cattle.io/v3>
kind: ManagedChart
spec:
  chart: rancher-logging-crd
  defaultNamespace: cattle-logging-system
  paused: false
  releaseName: rancher-logging-crd
  repoName: harvester-charts
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
        operator: DoesNotExist
  version: 102.0.0+up3.17.10

Copy code

apiVersion: <http://management.cattle.io/v3|management.cattle.io/v3>
kind: ManagedChart
spec:
  chart: rancher-logging
  defaultNamespace: cattle-logging-system
  paused: false
  releaseName: rancher-logging
  repoName: harvester-charts
  targets:
  - clusterName: local
    clusterSelector:
      matchExpressions:
      - key: <http://provisioning.cattle.io/unmanaged-system-agent|provisioning.cattle.io/unmanaged-system-agent>
        operator: DoesNotExist
  version: 102.0.0+up3.17.10

sticky-summer-13450

05/23/2024, 2:34 PM

Copy code

$ kubectl get <http://addons.harvesterhci.io|addons.harvesterhci.io> -A --context=harvester003
NAMESPACE          NAME                    HELMREPO                                                 CHARTNAME                         ENABLED
harvester-system   harvester-seeder        <http://harvester-cluster-repo.cattle-system.svc/charts>   harvester-seeder                  false
harvester-system   pcidevices-controller   <http://harvester-cluster-repo.cattle-system.svc/charts>   harvester-pcidevices-controller   false
harvester-system   vm-import-controller    <http://harvester-cluster-repo.cattle-system.svc/charts>   harvester-vm-import-controller    false

ancient-pizza-13099

05/23/2024, 2:36 PM

It is quite tricky, your cluster also does not have

rancher-monitoring managedchart

ancient-pizza-13099

05/23/2024, 2:37 PM

I guess in the Harvester dashboard gui, you have not grafana which shows the cluster & vm statistics pictures.

👀 1

sticky-summer-13450

05/23/2024, 2:42 PM

I guess not - I don't think I've ever seen any Grafana style monitoring. There's nothing obvious missing in the Harvester dashboard...

sticky-summer-13450

05/23/2024, 3:07 PM

I don't think there's anything I've actively done to not have

rancher-monitoring managedchart

. It's probably a victim of upgrades not going 100% to plan.

ancient-pizza-13099

05/23/2024, 4:58 PM

Form the support-bundle, there are such upgrade history:

Copy code

kind: Upgrade
  metadata:
    creationTimestamp: "2022-05-29T10:39:08Z"

    previousVersion: v1.0.0
    repoInfo: |
      release:
          harvester: v1.0.1
          harvesterChart: 1.0.1
          os: Harvester v1.0.1
          kubernetes: v1.21.11+rke2r1
          rancher: v2.6.4-harvester3
          monitoringChart: 100.1.0+up19.0.3
          


    creationTimestamp: "2022-05-29T18:00:57Z"

    previousVersion: v1.0.1
    repoInfo: |
      release:
          harvester: v1.0.2
          harvesterChart: 1.0.2
          os: Harvester v1.0.2                  



    creationTimestamp: "2022-09-02T18:04:51Z"
    previousVersion: v1.0.2
    repoInfo: |
      release:
          harvester: v1.0.3
          harvesterChart: 1.0.3
          os: Harvester v1.0.3
          kubernetes: v1.22.12+rke2r1


    creationTimestamp: "2022-12-04T10:49:21Z"
    previousVersion: v1.1.0
    repoInfo: |
      release:
          harvester: v1.1.1
          harvesterChart: 1.1.1
          os: Harvester v1.1.1



    creationTimestamp: "2023-04-27T17:56:01Z"
    previousVersion: v1.1.1
    repoInfo: |
      release:
          harvester: v1.1.2
          harvesterChart: 1.1.2
          os: Harvester v1.1.2
          kubernetes: v1.24.11+rke2r1


    creationTimestamp: "2023-10-02T07:31:21Z"
    previousVersion: v1.2.0
    repoInfo: |
      release:
          harvester: v1.2.0
          harvesterChart: 1.2.0
          os: Harvester v1.2.0

ancient-pizza-13099

05/23/2024, 4:59 PM

What is missing is the

v103->v110

ancient-pizza-13099

05/23/2024, 4:59 PM

let me figure how to recover them

sticky-summer-13450

05/23/2024, 5:08 PM

Thank you. I'll be offline for the next 15 hours now.

sticky-summer-13450

05/29/2024, 5:56 PM

I'd still really like to upgrade this Harvester cluster, so if you do have any clue how to have the right components running to allow that I would be very grateful 🙂

ancient-pizza-13099

06/03/2024, 8:16 AM

@sticky-summer-13450 Sorry for the delay, we had some holidays. I will continue this.

ancient-pizza-13099

06/03/2024, 9:23 AM

@sticky-summer-13450 As the

rancher-logging

is not working as expected, we can delete this managedchart first; after upgrade, then we create it back. (in v1.3.0, this is an

addon

https://docs.harvesterhci.io/v1.3/advanced/addons )

kubeclt delete managedchart -n fleet-local rancher-logging

👀 1

sticky-summer-13450

06/03/2024, 2:26 PM

Oh - great. Thanks, I'll give that a go.

👍 1

ancient-pizza-13099

06/03/2024, 3:46 PM

Ping me with your new update; if everything goes well and your cluster upgrades to v1.3.0, then let's try to create the new addon.

sticky-summer-13450

06/07/2024, 4:50 PM

Thanks @ancient-pizza-13099 - sorry for my delay in responding. I ran that command:

Copy code

$ kubectl delete managedchart -n fleet-local rancher-logging --context harvester003
<http://managedchart.management.cattle.io|managedchart.management.cattle.io> "rancher-logging" deleted

And something happened in the

fleet-agent-*

log:

Copy code

E0607 16:24:49.225869       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-06-07T16:24:49Z" level=info msg="uninstall: Deleting rancher-logging"
time="2024-06-07T16:24:49Z" level=info msg="uninstall: Failed to delete release: [unable to build kubernetes objects for delete: [resource mapping not found for name: \"rancher-logging-rke2-journald-aggregator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first, resource mapping not found for name: \"psp.logging-operator\" namespace: \"cattle-logging-system\" from \"\": no matches for kind \"PodSecurityPolicy\" in version \"policy/v1beta1\"\nensure CRDs are installed first]]"
time="2024-06-07T16:24:49Z" level=error msg="error syncing 'cluster-fleet-local-local-1a3d67d0a899/mcc-rancher-logging': handler bundle-cleanup: failed to delete release: rancher-logging, requeuing"
time="2024-06-07T16:25:26Z" level=info msg="Deleting orphan bundle ID rke2, release kube-system/rke2-canal"
E0607 16:25:33.707626       1 memcache.go:206] couldn't get resource list for <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: Got empty response for: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>

But the upgrade dialogue complained about not enough disk space in

/usr/local

Copy code

admission webhook "<http://validator.harvesterhci.io|validator.harvesterhci.io>" denied the request: Node "harvester001" has insufficient free system partition space 23.51GiB (df -h '/usr/local/'). The upgrade requires at least 30GiB of free system partition space on each node.

It seems that 70GB is in use in the folder

/usr/local/.state

Copy code

harvester001:/usr/local # df -h /usr/local
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p5   98G   70G   24G  75% /usr/local
harvester001:/usr/local # find . -maxdepth 1 -type d -exec du -sh \{} \;
70G	.
164M	./bin
16K	./upgrade_tmp
8.0K	./etc
16K	./lost+found
4.0K	./cloud-config
28K	./share
28K	./lib
70G	./.state

This is the same on all three nodes. Each node has one 2TB NVME and it's completely partitioned/managed by Harvester.

ancient-pizza-13099

06/13/2024, 8:14 AM

@sticky-summer-13450 You can refer https://docs.harvesterhci.io/v1.3/upgrade/index/#free-system-partition-space-requirement to add an annotation

<http://harvesterhci.io/minFreeDiskSpaceGB|harvesterhci.io/minFreeDiskSpaceGB>

to lower the required disk for bypassing the check. After long time running, ther are some old container images / temp files which may occupy system partition spaces; the newer Harvester version will try to clean old container images automatically.

sticky-summer-13450

06/13/2024, 8:15 AM

Thanks - I'd missed that. I'll look now

🤝 1

sticky-summer-13450

06/14/2024, 1:35 PM

I did that, but nothing has happened in 24 hours. The "Download Logs" option doesn't exist. I could try starting over but that's caused problems in the past.

ancient-pizza-13099

06/14/2024, 1:46 PM

kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> -A

then take the yaml output of the latest one.

sticky-summer-13450

06/14/2024, 1:54 PM

Copy code

$ kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> hvst-upgrade-9smf2 -o yaml -n harvester-system 
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  creationTimestamp: "2024-06-13T08:19:36Z"
  finalizers:
  - <http://wrangler.cattle.io/harvester-upgrade-controller|wrangler.cattle.io/harvester-upgrade-controller>
  generateName: hvst-upgrade-
  generation: 2
  labels:
    <http://harvesterhci.io/latestUpgrade|harvesterhci.io/latestUpgrade>: "true"
    <http://harvesterhci.io/upgradeState|harvesterhci.io/upgradeState>: PreparingLoggingInfra
  name: hvst-upgrade-9smf2
  namespace: harvester-system
  resourceVersion: "1315869202"
  uid: 54bea1a3-5344-48d0-a8bb-328279c98aa9
spec:
  image: ""
  logEnabled: true
  version: v1.2.1
status:
  conditions:
  - status: Unknown
    type: Completed
  - status: Unknown
    type: LogReady
  previousVersion: v1.2.0
  upgradeLog: hvst-upgrade-9smf2-upgradelog

ancient-pizza-13099

06/14/2024, 1:58 PM

The

upgrade log

depends on the

rancher-logging crd

; you may delete this object, and upgrade again,

untick

the

enableLog

option

sticky-summer-13450

06/14/2024, 1:59 PM

Ah - that makes sense 🙂

ancient-pizza-13099

06/14/2024, 1:59 PM

if the

enableLog

option is not there, then manually create the `Upgrade`object like

Copy code

$ kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> hvst-upgrade-9smf2 -o yaml -n harvester-system 
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  name: hvst-upgrade-9smf2
  namespace: harvester-system
spec:
  image: ""
  logEnabled: false
  version: v1.2.1

👍 1

sticky-summer-13450

06/14/2024, 2:03 PM

I get this using the GUI, so I'll try the non-GUI method

sticky-summer-13450

06/14/2024, 2:17 PM

no - unfortunately that doesn't work:

Copy code

$ echo "apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  name: hvst-upgrade-9smf2
  namespace: harvester-system
spec:
  image: ""
  logEnabled: false
  version: v1.2.1" | kubectl apply -f - -n harvester-system

Error from server (BadRequest): error when creating "STDIN": admission webhook "<http://validator.harvesterhci.io|validator.harvesterhci.io>" denied the request: managed chart hvst-upgrade-9smf2-upgradelog-operator is not ready, please wait for it to be ready

ancient-pizza-13099

06/14/2024, 2:20 PM

manually delete this managedchart

managed chart hvst-upgrade-9smf2-upgradelog-operator

ancient-pizza-13099

06/14/2024, 2:20 PM

previous upgrade has some left resources

sticky-summer-13450

06/14/2024, 2:21 PM

Understood

sticky-summer-13450

06/14/2024, 2:28 PM

you are Awesome Jian - the upgrade image is downloading.

sticky-summer-13450

06/14/2024, 2:28 PM

🤞

🤝 1

sticky-summer-13450

06/14/2024, 3:41 PM

It looks like the Harvester services have been updated but the nodes don't seem to be draining and being upgraded.

sticky-summer-13450

06/14/2024, 3:41 PM

Is the upgrade of the nodes also managed by the

fleet-agent-*

pod? After all of the activity of updating the service components the Fleet agent is only doing it's usual logging about

Copy code

E0614 15:40:22.691390       1 memcache.go:206] couldn't get resource list for custom.metrics.k8s.io/v1beta1: Got empty response for: custom.metrics.k8s.io/v1beta1

ancient-pizza-13099

06/14/2024, 3:49 PM

cc @red-king-19196 any insights?

ancient-pizza-13099

06/14/2024, 3:49 PM

kubectl get upgrade

and output the yaml of latest object, let's check it's status now

👍 1

ancient-pizza-13099

06/14/2024, 3:49 PM

<http://upgrade.harvesterhci.io|upgrade.harvesterhci.io>

sticky-summer-13450

06/14/2024, 3:54 PM

Copy code

$ kubectl get <http://upgrade.harvesterhci.io|upgrade.harvesterhci.io> hvst-upgrade-9smf2 -o yaml -n harvester-system --context harvester003
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
  annotations:
    <http://harvesterhci.io/replica-replenishment-wait-interval|harvesterhci.io/replica-replenishment-wait-interval>: "600"
    <http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
      {"apiVersion":"<http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>","kind":"Upgrade","metadata":{"annotations":{},"name":"hvst-upgrade-9smf2","namespace":"harvester-system"},"spec":{"image":"","logEnabled":false,"version":"v1.2.1"}}
  creationTimestamp: "2024-06-14T14:27:09Z"
  finalizers:
  - <http://wrangler.cattle.io/harvester-upgrade-controller|wrangler.cattle.io/harvester-upgrade-controller>
  generation: 17
  labels:
    <http://harvesterhci.io/latestUpgrade|harvesterhci.io/latestUpgrade>: "true"
    <http://harvesterhci.io/upgradeState|harvesterhci.io/upgradeState>: UpgradingNodes
  name: hvst-upgrade-9smf2
  namespace: harvester-system
  resourceVersion: "1317770693"
  uid: c2d96a42-49c2-4601-b5a1-3fb05173d8a8
spec:
  image: ""
  logEnabled: false
  version: v1.2.1
status:
  conditions:
  - status: Unknown
    type: Completed
  - lastUpdateTime: "2024-06-14T14:27:09Z"
    message: Upgrade observability is administratively disabled
    reason: Disabled
    status: "False"
    type: LogReady
  - lastUpdateTime: "2024-06-14T14:48:35Z"
    status: "True"
    type: ImageReady
  - lastUpdateTime: "2024-06-14T14:52:08Z"
    status: "True"
    type: RepoReady
  - lastUpdateTime: "2024-06-14T14:59:27Z"
    status: "True"
    type: NodesPrepared
  - lastUpdateTime: "2024-06-14T15:00:56Z"
    status: "True"
    type: SystemServicesUpgraded
  - status: Unknown
    type: NodesUpgraded
  imageID: harvester-system/hvst-upgrade-9smf2
  nodeStatuses:
    harvester001:
      state: Images preloaded
    harvester002:
      state: Images preloaded
    harvester003:
      state: Images preloaded
  previousVersion: v1.2.0
  repoInfo: |
    release:
        harvester: v1.2.1
        harvesterChart: 1.2.1
        os: Harvester v1.2.1
        kubernetes: v1.25.9+rke2r1
        rancher: v2.7.5
        monitoringChart: 102.0.0+up40.1.2
        minUpgradableVersion: v1.1.2

red-king-19196

06/14/2024, 4:05 PM

Need to check the cluster state:

Copy code

kubectl -n fleet-local get clusters.provisioning local -o yaml

Also the log from rancher pod (find out the leader pod)

👀 1

sticky-summer-13450

06/14/2024, 4:06 PM

Copy code

$ kubectl -n fleet-local get clusters.provisioning local -o yaml --context harvester003
apiVersion: <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>
kind: Cluster
metadata:
  annotations:
    <http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
      {"apiVersion":"<http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>","kind":"Cluster","metadata":{"annotations":{},"labels":{"<http://rke.cattle.io/init-node-machine-id|rke.cattle.io/init-node-machine-id>":"fr5bvk6lt5mrfbnls4swkkmgqgnfnfj2wqwcwf4lzxflhr8ldv9dml"},"name":"local","namespace":"fleet-local"},"spec":{"kubernetesVersion":"v1.21.7+rke2r1","rkeConfig":{"controlPlaneConfig":null}}}
    <http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/4yQzU7DMBCEXwXt2Slt079Y4oAQ4sCVF9jYS2Ow15G9CYfK746SVqJC4udo78xovjlBIEGLgqBPgMxRUFzkPD1j+0ZGMskiubgwKOJp4eKts6ChT3F02UV2fKyMH7JQqkwiFAL1ozV+MKXqOL6DhoCMRwrEciUYa3Xz7NjePZwj/8xiDAQafDTo/yXOPZrJAUXB3NdFfnGBsmDoQfPgvQKPLflfR+gwd6Bhu9ztt3XdUGNwc7Crdr9u6jW1y/pg91vb2LXdbHarA6jzYpbSVwho6DCNNIMWBd9Yrtu+eiKpzpeiIPdkpnbzx2Wq+0G6R7Z9dCygT2WSCcpwwciURrJPxJRmZtDLUj4DAAD//5CVWGcAAgAA
    <http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: provisioning-cluster-create
    <http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=Cluster
    <http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: local
    <http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: ""
  creationTimestamp: "2022-01-22T15:25:52Z"
  finalizers:
  - <http://wrangler.cattle.io/provisioning-cluster-remove|wrangler.cattle.io/provisioning-cluster-remove>
  - <http://wrangler.cattle.io/rke-cluster-remove|wrangler.cattle.io/rke-cluster-remove>
  generation: 17
  labels:
    <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 50675339e9ca48d1b72932eb038d75d9d2d44618
    <http://provider.cattle.io|provider.cattle.io>: harvester
  name: local
  namespace: fleet-local
  resourceVersion: "1317770694"
  uid: 9a5208ed-bb43-4b24-8b5a-c63df8b761ce
spec:
  kubernetesVersion: v1.25.9+rke2r1
  localClusterAuthEndpoint: {}
  rkeConfig:
    chartValues: null
    machineGlobalConfig: null
    provisionGeneration: 4
    upgradeStrategy:
      controlPlaneConcurrency: "1"
      controlPlaneDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: true
        force: true
        gracePeriod: 0
        ignoreDaemonSets: true
        postDrainHooks:
        - annotation: <http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>
        preDrainHooks:
        - annotation: <http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 0
      workerConcurrency: "1"
      workerDrainOptions:
        deleteEmptyDirData: true
        disableEviction: false
        enabled: true
        force: true
        gracePeriod: 0
        ignoreDaemonSets: true
        postDrainHooks:
        - annotation: <http://harvesterhci.io/post-hook|harvesterhci.io/post-hook>
        preDrainHooks:
        - annotation: <http://harvesterhci.io/pre-hook|harvesterhci.io/pre-hook>
        skipWaitForDeleteTimeoutSeconds: 0
        timeout: 0
status:
  clientSecretName: local-kubeconfig
  clusterName: local
  conditions:
  - lastUpdateTime: "2023-09-11T20:46:55Z"
    message: custom-13b01cc43f01,custom-2666fac666a5
    reason: Waiting
    status: Unknown
    type: Ready
  - lastUpdateTime: "2022-01-22T15:25:52Z"
    status: "False"
    type: Reconciling
  - lastUpdateTime: "2022-01-22T15:25:52Z"
    status: "False"
    type: Stalled
  - lastUpdateTime: "2024-06-14T15:00:56Z"
    status: "True"
    type: Created
  - lastUpdateTime: "2024-05-08T06:47:03Z"
    status: "True"
    type: RKECluster
  - status: Unknown
    type: DefaultProjectCreated
  - status: Unknown
    type: SystemProjectCreated
  - lastUpdateTime: "2024-05-08T06:47:03Z"
    message: 'configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes:
      kube-controller-manager, kube-scheduler'
    reason: Waiting
    status: Unknown
    type: Provisioned
  - lastUpdateTime: "2024-05-08T06:47:03Z"
    message: 'configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes:
      kube-controller-manager, kube-scheduler'
    reason: Waiting
    status: Unknown
    type: Updated
  - lastUpdateTime: "2022-10-28T18:24:35Z"
    status: "True"
    type: Connected
  observedGeneration: 17
  ready: true

sticky-summer-13450

06/14/2024, 4:11 PM

How much of the

rancher-*

pod logs do you want - this looks like the active pod...

Copy code

2024/06/14 15:41:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
W0614 15:45:08.409704      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.981953      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.982009      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.983346      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:08.984509      33 transport.go:313] Unable to cancel request for *client.addQuery
W0614 15:45:10.413736      33 transport.go:313] Unable to cancel request for *client.addQuery
2024/06/14 15:46:02 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:46:02 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:46:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 15:47:23 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:47:23 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:48:49 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:48:49 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:51:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 15:55:17 [ERROR] Error during subscribe websocket: close sent
2024/06/14 15:56:03 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:56:03 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:56:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 15:57:25 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 15:57:25 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:58:53 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 15:58:53 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 16:01:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 16:06:05 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 16:06:05 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 16:06:41 [INFO] Downloading repo index from <http://harvester-cluster-repo.cattle-system/charts/index.yaml>
2024/06/14 16:07:27 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler
2024/06/14 16:07:27 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 16:08:54 [ERROR] rkecluster fleet-local/local: error while retrieving management cluster from cache: management cluster cache was nil
2024/06/14 16:08:54 [INFO] [planner] rkecluster fleet-local/local: waiting: configuring bootstrap node(s) custom-6214769cf9b9: waiting for probes: kube-controller-manager, kube-scheduler

red-king-19196

06/14/2024, 4:12 PM

Is this cluster running more than one year?

sticky-summer-13450

06/14/2024, 4:13 PM

The cluster was created 874 days ago, so yes

red-king-19196

06/14/2024, 4:14 PM

I just found out that you hit this before. Would you like to rerun the check? https://github.com/harvester/harvester/issues/3863#issuecomment-1539681311

👀 1

sticky-summer-13450

06/14/2024, 4:18 PM

Haha - yep, I get FAIL FAIL again. I'll follow my instructions...

red-king-19196

06/14/2024, 4:21 PM

It’s kind of a milestone… Congrats your cluster survived for another year 😆

🎉 1

red-king-19196

06/14/2024, 4:21 PM

BTW, this should not happen again post v1.3…

👍 1

ancient-pizza-13099

06/14/2024, 4:26 PM

874 days, is a live evidence that Harvester is fairly good 🙂

sticky-summer-13450

06/14/2024, 4:28 PM

Hell yes! Although I have lost quite a lot of hair at upgrade times. Other than that, it's absolutely fabulous 🙂

👍 1

sticky-summer-13450

06/14/2024, 4:28 PM

I've followed those instructions and I get [OK] [OK] now. I don't recall if I need to wait for a while, bounce a pod, or bounce a node to get things moving again.

👍 1

red-king-19196

06/14/2024, 4:32 PM

If you followed the instructions, you’ve already restarted those pods. Just wait a bit for rancher to do its job. If it doesn’t proceed, we can check again what’s the new blocker 🙈

sticky-summer-13450

06/14/2024, 4:39 PM

Ah - I needed to do the same update of the certs on all three nodes of the cluster. Now things are moving :-)

👍 2

sticky-summer-13450

06/14/2024, 8:18 PM

Thanks so much :-) The 1.2.0 -> 1.2.1 upgrade has completed successfully and I’m now starting the 1.2.1 -> 1.2.2.

sticky-summer-13450

06/14/2024, 9:44 PM

And the 1.2.1 -> 1.2.2 completed without a hitch 🙂

🎉 2

136 Views

Open in Slack

Previous Next