This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

06/26/2024, 10:10 PM

This message was deleted.

prehistoric-balloon-31801

06/27/2024, 2:06 PM

A support bundle will help!

alert-london-41705

06/27/2024, 3:29 PM

support bundle attached

supportbundle_f7dfeabf-0bd1-4a80-aff5-9efbe61cbee7_2024-06-27T15-22-42Z.zip

prehistoric-balloon-31801

06/28/2024, 2:09 AM

Did you delete the upgrade? there is no upgrade resource in the bundle. what's the last state

alert-london-41705

06/28/2024, 2:25 AM

@prehistoric-balloon-31801 yes, I initally hit the bug where it stalled out at 50%. Ifound a few similar issues in. It was suggested elsewhere (I forget now) to delete the bundle and try it again. But when I deleted it the UI showed it was at 1.2.2. So I figured it was solved, until I tried to activate the monitoring plugins. It's up and running right now, with guest vms running, though I can't enable addons. I attempted to rollback a couple days ago, hoping I could restart the upgrade, but that didn't seem to do anything.

prehistoric-balloon-31801

06/28/2024, 2:39 AM

It might be off-topic, Is this a low-spec machine cluster? I saw a lot of complaints in the etcd log. The upgrade is deleted and it's hard to tell where it stuck. To me, the system software has been upgraded but might not be complete. And nodes/RKE2 are waiting for an upgrade.

prehistoric-balloon-31801

06/28/2024, 2:39 AM

I'd say it might be no harm to trigger an upgrade again and see how it goes

alert-london-41705

06/28/2024, 2:39 AM

It’s very low spec. 3 4 core mini pcs. Its just for own education

alert-london-41705

06/28/2024, 2:40 AM

The problem is ui says it’s 1.2.2 and upgrade button is gone.

prehistoric-balloon-31801

06/28/2024, 2:41 AM

can you try

kubectl create -f <https://releases.rancher.com/harvester/v1.2.2/version.yaml>

and see if the button comes up

alert-london-41705

06/28/2024, 2:55 AM

it did, I'll try that and see what happens

alert-london-41705

06/28/2024, 4:59 AM

so it shows 1.3.1 as the available version, but when I select it I get

admission webhook "<http://validator.harvesterhci.io|validator.harvesterhci.io>" denied the request: managed chart harvester is not ready, please wait for it to be ready

alert-london-41705

06/28/2024, 5:02 AM

Copy code

kubectl get bundledeployment -A
NAMESPACE                                NAME                         DEPLOYED   MONITORED   STATUS
cluster-fleet-local-local-1a3d67d0a899   fleet-agent-local            True       True        
cluster-fleet-local-local-1a3d67d0a899   local-managed-system-agent   True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-harvester                True       True        deployment.apps harvester-system/harvester-webhook [progressing] Pending termination: 1
cluster-fleet-local-local-1a3d67d0a899   mcc-harvester-crd            True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-rancher-logging-crd      True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-rancher-monitoring-crd   True       True

prehistoric-balloon-31801

06/28/2024, 6:30 AM

there are paused managed charts from the previous upgrade, you can unpause them first to let fleet reconcile, e.g.

Copy code

kubectl edit managedchart harvester -n fleet-local

And edit the path

spec.paused

to false

Copy code

paused: false

alert-london-41705

06/28/2024, 1:37 PM

that fixed that error, and it's changed - lol

admission webhook "<http://validator.harvesterhci.io|validator.harvesterhci.io>" denied the request: managed chart harvester-crd is not ready, please wait for it to be ready

alert-london-41705

06/28/2024, 1:38 PM

Copy code

kubectl get bundledeployment -A
NAMESPACE                                NAME                         DEPLOYED   MONITORED   STATUS
cluster-fleet-local-local-1a3d67d0a899   fleet-agent-local            True       True        
cluster-fleet-local-local-1a3d67d0a899   local-managed-system-agent   True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-harvester                True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-harvester-crd            True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-rancher-logging-crd      True       True        
cluster-fleet-local-local-1a3d67d0a899   mcc-rancher-monitoring-crd   True       True

prehistoric-balloon-31801

07/01/2024, 2:37 AM

Hi Bryan, sorry for the late reply, can you paste a new support bundle. You can also check bundles not bundle deployment.

Copy code

kubectl get bundles -A

alert-london-41705

07/01/2024, 6:59 PM

No worries, support bundle is attached.

supportbundle_f7dfeabf-0bd1-4a80-aff5-9efbe61cbee7_2024-07-01T18-52-57Z.zip

alert-london-41705

07/01/2024, 6:59 PM

Copy code

kubectl get bundles -A
NAMESPACE     NAME                         BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local            1/1                       
fleet-local   local-managed-system-agent   1/1                       
fleet-local   mcc-harvester                1/1                       
fleet-local   mcc-harvester-crd            0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-logging-crd      0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-monitoring-crd   0/1                       OutOfSync(1) [Cluster fleet-local/local]

prehistoric-balloon-31801

07/02/2024, 3:51 AM

Hi Bryan, please help repeat the unpause operation on the remaining charts

Copy code

harvester-crd
rancher-logging-crd
rancher-monitoring-crd

alert-london-41705

07/03/2024, 3:42 PM

ok current status:

Copy code

kubectl get bundles -A
NAMESPACE     NAME                                         BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                            1/1                       
fleet-local   local-managed-system-agent                   1/1                       
fleet-local   mcc-harvester                                1/1                       
fleet-local   mcc-harvester-crd                            1/1                       
fleet-local   mcc-hvst-upgrade-dqqhz-upgradelog-operator   1/1                       
fleet-local   mcc-rancher-logging-crd                      0/1                       OutOfSync(1) [Cluster fleet-local/local]
fleet-local   mcc-rancher-monitoring-crd                   0/1                       OutOfSync(1) [Cluster fleet-local/local]

alert-london-41705

07/03/2024, 3:43 PM

when I look at

kubectl edit managedchart rancher-logging-crd -n fleet-local

I see this

Copy code

lastUpdateTime: "2024-06-05T16:45:44Z"
    message: no chart version found for rancher-logging-crd-102.0.0+up3.17.10

prehistoric-balloon-31801

07/05/2024, 7:55 AM

@ancient-pizza-13099 can you help Bryan on this? I think the previous upgrade somehow failed in the middle of apply_manifest. cluster-repo, rancher, and harvester are already upgraded. We need to check where the applied manifest terminates and move the logging/monitoring chart to correct versions. Then he can re-initiate the upgrade again.

alert-london-41705

07/07/2024, 2:04 PM

Ya I was trying to figure out if there was a way to manually pull the right image versions down and try again

prehistoric-balloon-31801

07/08/2024, 8:40 AM

@alert-london-41705 can you edit the

ancher-monitoring-crd

managedchart and its

spec.version

103.0.3+up45.31.1

. Then do the same thing to

rancher-logging-crd

amnagedchart and its

spec.version

103.0.0+up3.17.10

prehistoric-balloon-31801

07/08/2024, 8:42 AM

Similar commands:

Copy code

kubectl edit managedchart rancher-monitoring-crd -n fleet-local
kubectl edit managedchart rancher-logging-crd -n fleet-local

alert-london-41705

07/08/2024, 4:07 PM

Copy code

kubectl get bundles -A
NAMESPACE     NAME                                         BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                            1/1                       
fleet-local   local-managed-system-agent                   1/1                       
fleet-local   mcc-harvester                                1/1                       
fleet-local   mcc-harvester-crd                            1/1                       
fleet-local   mcc-hvst-upgrade-dqqhz-upgradelog-operator   1/1                       
fleet-local   mcc-rancher-logging-crd                      1/1                       
fleet-local   mcc-rancher-monitoring-crd                   1/1

So that looks good now. The upgrade to 1.3.1 is still stuck on pending.

alert-london-41705

07/08/2024, 4:09 PM

upgrade log

hvst-upgrade-dqqhz-upgradelog-archive-2024-07-08T16-00-48Z.zip

alert-london-41705

07/08/2024, 4:20 PM

And the latest support bundle as well

supportbundle_f7dfeabf-0bd1-4a80-aff5-9efbe61cbee7_2024-07-08T16-12-36Z.zip

prehistoric-balloon-31801

07/09/2024, 3:15 AM

Hi Bryan, wasn't the previous failed upgrade from v1.2.1 to v1.2.2? The current running upgrade in the system is 1.3.1. It's not recommended to jump to v1.3.1 if the previous one is unfinished because Kubernetes will jump to two minor versions. Can you try these steps: • Delete the current upgrade:

hvst-upgrade-dqqhz

• Ensure plan

hvst-upgrade-dqqhz-prepare

is gone:

Copy code

kubectl get plans hvst-upgrade-dqqhz-prepare -n cattle-system
# nothing should display

• patch some fields

Copy code

kubectl label -n fleet-local cluster.provisioning local "<http://provisioning.cattle.io/management-cluster-name=local|provisioning.cattle.io/management-cluster-name=local>" --overwrite=true
  kubectl patch -n fleet-local cluster.provisioning local --subresource=status --type=merge --patch '{"status":{"fleetWorkspaceName": "fleet-local"}}'

• Ensure SUC is running, the deployment should be created after a while

Copy code

$ kubectl get deployment system-upgrade-controller -n cattle-system
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
system-upgrade-controller   1/1     1            1           14d

• start a 1.2.2 upgrade again

alert-london-41705

07/09/2024, 3:26 AM

yeah, I actually wasn't trying to get to 1.3.1, that showed up after we did this step: https://rancher-users.slack.com/archives/C01GKHKAG0K/p1719542486858259?thread_ts=1719439845.734179&cid=C01GKHKAG0K I'm totally fine to just get to a stable 1.2.2. Trying the above right now.

alert-london-41705

07/09/2024, 3:27 AM

seems like I might be back on my way...

Copy code

kubectl get deployment system-upgrade-controller -n cattle-system
NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
system-upgrade-controller   1/1     1            1           9s

alert-london-41705

07/09/2024, 3:42 AM

The UI, still shows it's at 1.2.2 and 1.3.1 is the only upgrade that's available... hrmm.

prehistoric-balloon-31801

07/09/2024, 4:55 AM

https://rancher-users.slack.com/archives/C01GKHKAG0K/p1719542486858259?thread_ts=1719439845.734179&channel=C01GKHKAG0K&message_ts=1719542486.858259 Does this work?

alert-london-41705

07/09/2024, 4:58 AM

seems to have

alert-london-41705

07/09/2024, 4:58 AM

says it's downloading the image now

alert-london-41705

07/10/2024, 1:45 AM

Was able to successfully get through the 1.2.2 and the 1.3.1 upgrades today! The addons are all working again as well! Thanks so much @prehistoric-balloon-31801 🙏 🎆

🙌 1

prehistoric-balloon-31801

07/10/2024, 2:08 AM

Good to hear. Do you mind sharing an SB so I can check if everything goes well? Unfortunately, the first upgrade attempt was deleted, I can't figure out why it failed with the SBs so far.

alert-london-41705

07/10/2024, 2:59 PM

Yeah, attached.

supportbundle_f7dfeabf-0bd1-4a80-aff5-9efbe61cbee7_2024-07-10T14-51-52Z.zip

22 Views

Open in Slack

Previous Next