This message was deleted.
# longhorn-storage
a
This message was deleted.
a
Was there something in the release notes that I may have missed telling us to check that v1 box as part of the upgrade from 1.5.x to 1.6.x?
l
Really weird that v1 data engine would be disabled. It’s enabled per default.
a
Yea I thought it was odd too. This was just our test deployment, so I checked the box and my volumes started attaching and I could create new volumes again. I'll see if the same thing happens when we upgrade prod in a couple weeks. I definitely did not uncheck that box or any boxes in the Danger Zone when performing the upgrade.
l
Is the checkbox unchecked on one or more of your prod. clusters?
Did NOT experience this
a
My production clusters are still running 1.5.3 and there isn't even a checkbox for the V1 data engine, just the v2 engine and it says it's in feature preview, it is not checked.
A couple things. I noticed the box was unchecked again. I went back into Apps to look at the default settings and it shows the longhorn app stuck in "Pending Upgrade" status. I went through the "edit" option and it's not checked by default in the helm options.
I tried to check it and continue on, but it failed with
Copy code
beginning wait for 22 resources with timeout of 10m0s
Release "longhorn-crd" has been upgraded. Happy Helming!
NAME: longhorn-crd
2024-08-06T19:23:07.350172945Z LAST DEPLOYED: Tue Aug  6 19:23:01 2024
2024-08-06T19:23:07.350179945Z NAMESPACE: longhorn-system
2024-08-06T19:23:07.350186016Z STATUS: deployed
2024-08-06T19:23:07.350191916Z REVISION: 4
2024-08-06T19:23:07.350197706Z TEST SUITE: None

2024-08-06T19:23:07.368579705Z ---------------------------------------------------------------------
SUCCESS: helm upgrade --cleanup-on-fail=true --history-max=5 --install=true --namespace=longhorn-system --timeout=10m0s --values=/home/shell/helm/values-longhorn-crd-103.3.1-up1.6.2.yaml --version=103.3.1+up1.6.2 --wait=true longhorn-crd /home/shell/helm/longhorn-crd-103.3.1-up1.6.2.tgz
---------------------------------------------------------------------
helm upgrade --cleanup-on-fail=true --history-max=5 --install=true --namespace=longhorn-system --timeout=10m0s --values=/home/shell/helm/values-longhorn-103.3.1-up1.6.2.yaml --version=103.3.1+up1.6.2 --wait=true longhorn /home/shell/helm/longhorn-103.3.1-up1.6.2.tgz
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
But I don't see any failed pods or anything, so I have no idea how to get out of this
l
What are you controlling the longhorn upgrade with?
a
I just did it through the Rancher Apps
l
How does it go if you do it directly via helm?
a
We've always used Rancher Apps to manage Longhorn, which I think just uses helm anyways. This is the first time we've upgraded Longhorn in these new clusters though. I thought it went fine since everything worked, I didn't realize it was still pending until I tried to go back and change the default values
Weird, on our prod system I see both longhorn and longhorn-crd when running
helm list -n longhorn-system
, but in the test cluster where the upgrade is pending I see
Copy code
helm list -n longhorn-system
NAME        	NAMESPACE      	REVISION	UPDATED                                	STATUS  	CHART                       	APP VERSION
longhorn-crd	longhorn-system	4       	2024-08-06 19:23:01.644953587 +0000 UTC	deployed	longhorn-crd-103.3.1+up1.6.2	v1.6.2
well I see it with
-a
Copy code
helm list -n longhorn-system -a
NAME        	NAMESPACE      	REVISION	UPDATED                                	STATUS         	CHART                       	APP VERSION
longhorn    	longhorn-system	2       	2024-07-26 14:51:25.425570157 +0000 UTC	pending-upgrade	longhorn-103.3.1+up1.6.2    	v1.6.2     
longhorn-crd	longhorn-system	4       	2024-08-06 19:23:01.644953587 +0000 UTC	deployed       	longhorn-crd-103.3.1+up1.6.2	v1.6.2
Seems like a common problem, if this is what I'm hitting https://github.com/helm/helm/issues/7476 I'm definitely not going to do a rollback and then upgrade again though, not even sure how Longhorn would handle that
I see suggestions to delete the secret, but I'm not sure that's a good idea.
Copy code
kubectl get secret -A -l status=pending-upgrade
NAMESPACE         NAME                             TYPE                 DATA   AGE
longhorn-system   sh.helm.release.v1.longhorn.v2   <http://helm.sh/release.v1|helm.sh/release.v1>   1      11d
well, I deleted that secret (just a test system so not a big deal) and now I don't even see longhorn under "Installed Apps", I just see the longhorn-crd app in there. Is there a way to recover from this without reinstalling longhorn and losing my volumes? Losing data is fine in test but I'd like to work this out in case it happens in prod
l
Yeah longhorn generally is not happy about being downgraded.
I would try a regular outside rancher apps helm upgrade … of course on test first