This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

08/05/2024, 6:28 PM

This message was deleted.

abundant-hair-58573

08/05/2024, 6:31 PM

Was there something in the release notes that I may have missed telling us to check that v1 box as part of the upgrade from 1.5.x to 1.6.x?

late-needle-80860

08/05/2024, 6:59 PM

Really weird that v1 data engine would be disabled. It’s enabled per default.

abundant-hair-58573

08/05/2024, 7:01 PM

Yea I thought it was odd too. This was just our test deployment, so I checked the box and my volumes started attaching and I could create new volumes again. I'll see if the same thing happens when we upgrade prod in a couple weeks. I definitely did not uncheck that box or any boxes in the Danger Zone when performing the upgrade.

late-needle-80860

08/05/2024, 7:10 PM

Is the checkbox unchecked on one or more of your prod. clusters?

late-needle-80860

08/05/2024, 7:10 PM

Did NOT experience this

abundant-hair-58573

08/05/2024, 10:32 PM

My production clusters are still running 1.5.3 and there isn't even a checkbox for the V1 data engine, just the v2 engine and it says it's in feature preview, it is not checked.

abundant-hair-58573

08/06/2024, 7:21 PM

A couple things. I noticed the box was unchecked again. I went back into Apps to look at the default settings and it shows the longhorn app stuck in "Pending Upgrade" status. I went through the "edit" option and it's not checked by default in the helm options.

abundant-hair-58573

08/06/2024, 7:24 PM

I tried to check it and continue on, but it failed with

Copy code

beginning wait for 22 resources with timeout of 10m0s
Release "longhorn-crd" has been upgraded. Happy Helming!
NAME: longhorn-crd
2024-08-06T19:23:07.350172945Z LAST DEPLOYED: Tue Aug  6 19:23:01 2024
2024-08-06T19:23:07.350179945Z NAMESPACE: longhorn-system
2024-08-06T19:23:07.350186016Z STATUS: deployed
2024-08-06T19:23:07.350191916Z REVISION: 4
2024-08-06T19:23:07.350197706Z TEST SUITE: None

2024-08-06T19:23:07.368579705Z ---------------------------------------------------------------------
SUCCESS: helm upgrade --cleanup-on-fail=true --history-max=5 --install=true --namespace=longhorn-system --timeout=10m0s --values=/home/shell/helm/values-longhorn-crd-103.3.1-up1.6.2.yaml --version=103.3.1+up1.6.2 --wait=true longhorn-crd /home/shell/helm/longhorn-crd-103.3.1-up1.6.2.tgz
---------------------------------------------------------------------
helm upgrade --cleanup-on-fail=true --history-max=5 --install=true --namespace=longhorn-system --timeout=10m0s --values=/home/shell/helm/values-longhorn-103.3.1-up1.6.2.yaml --version=103.3.1+up1.6.2 --wait=true longhorn /home/shell/helm/longhorn-103.3.1-up1.6.2.tgz
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

But I don't see any failed pods or anything, so I have no idea how to get out of this

late-needle-80860

08/06/2024, 8:00 PM

What are you controlling the longhorn upgrade with?

abundant-hair-58573

08/06/2024, 8:01 PM

I just did it through the Rancher Apps

late-needle-80860

08/06/2024, 8:32 PM

How does it go if you do it directly via helm?

abundant-hair-58573

08/06/2024, 10:08 PM

We've always used Rancher Apps to manage Longhorn, which I think just uses helm anyways. This is the first time we've upgraded Longhorn in these new clusters though. I thought it went fine since everything worked, I didn't realize it was still pending until I tried to go back and change the default values

abundant-hair-58573

08/06/2024, 10:26 PM

Weird, on our prod system I see both longhorn and longhorn-crd when running

helm list -n longhorn-system

, but in the test cluster where the upgrade is pending I see

Copy code

helm list -n longhorn-system
NAME        	NAMESPACE      	REVISION	UPDATED                                	STATUS  	CHART                       	APP VERSION
longhorn-crd	longhorn-system	4       	2024-08-06 19:23:01.644953587 +0000 UTC	deployed	longhorn-crd-103.3.1+up1.6.2	v1.6.2

abundant-hair-58573

08/06/2024, 10:27 PM

well I see it with

-a

Copy code

helm list -n longhorn-system -a
NAME        	NAMESPACE      	REVISION	UPDATED                                	STATUS         	CHART                       	APP VERSION
longhorn    	longhorn-system	2       	2024-07-26 14:51:25.425570157 +0000 UTC	pending-upgrade	longhorn-103.3.1+up1.6.2    	v1.6.2     
longhorn-crd	longhorn-system	4       	2024-08-06 19:23:01.644953587 +0000 UTC	deployed       	longhorn-crd-103.3.1+up1.6.2	v1.6.2

abundant-hair-58573

08/06/2024, 10:31 PM

Seems like a common problem, if this is what I'm hitting https://github.com/helm/helm/issues/7476 I'm definitely not going to do a rollback and then upgrade again though, not even sure how Longhorn would handle that

abundant-hair-58573

08/06/2024, 10:34 PM

I see suggestions to delete the secret, but I'm not sure that's a good idea.

Copy code

kubectl get secret -A -l status=pending-upgrade
NAMESPACE         NAME                             TYPE                 DATA   AGE
longhorn-system   sh.helm.release.v1.longhorn.v2   <http://helm.sh/release.v1|helm.sh/release.v1>   1      11d

abundant-hair-58573

08/06/2024, 10:46 PM

well, I deleted that secret (just a test system so not a big deal) and now I don't even see longhorn under "Installed Apps", I just see the longhorn-crd app in there. Is there a way to recover from this without reinstalling longhorn and losing my volumes? Losing data is fine in test but I'd like to work this out in case it happens in prod

late-needle-80860

08/07/2024, 9:40 AM

Yeah longhorn generally is not happy about being downgraded.

late-needle-80860

08/07/2024, 9:41 AM

I would try a regular outside rancher apps helm upgrade … of course on test first

90 Views

Open in Slack

Previous Next