This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

10/30/2022, 2:24 PM

This message was deleted.

famous-journalist-11332

10/31/2022, 10:12 PM

Can you send use the support bundle?

sticky-summer-13450

11/01/2022, 7:46 AM

Here you go.

longhorn-support-bundle_41f2cfe5-e432-4e6a-9f8c-bfab0577ae9b_2022-11-01T07-45-41Z.zip

famous-journalist-11332

11/02/2022, 2:01 AM

How long has it stuck in this state? @sticky-summer-13450

famous-journalist-11332

11/02/2022, 2:01 AM

Todo: Check if there is volume crash

sticky-summer-13450

11/02/2022, 7:57 AM

This one has been in this state since Saturday or Sunday - so around 4 days.

famous-journalist-11332

11/03/2022, 11:41 PM

Thank you. Looks like this is a buggy state in which 2 engines are active at the same time. cc @kind-alarm-73406 @famous-shampoo-18483 Ref: https://github.com/longhorn/longhorn-manager/blob/259cf16fb34ae554c2703df2f2a9e7d08fbd2e33/controller/volume_controller.go#L518

Copy code

2022-11-01T07:45:24.117994945Z time="2022-11-01T07:45:24Z" level=warning msg="Error syncing Longhorn volume longhorn-system/pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8" controller=longhorn-volume error="failed to sync longhorn-system/pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8: failed to reconcile engine/replica state for pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8: BUG: found the second active engine pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8-e-3c989560 besides pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8-e-2a27c92a" node=harvester003

@sticky-summer-13450 You can get out of this buggy state by

kubectl edit <http://engines.longhorn.io|engines.longhorn.io> pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8-e-3c989560 -n longhorn-system

and set

spec.active

false

kind-alarm-73406

11/03/2022, 11:42 PM

is there a v.spec.migrationID set?

famous-journalist-11332

11/03/2022, 11:43 PM

No, it is not set

Copy code

spec:
    size: 10737418240
    frontend: blockdev
    frombackup: ""
    datasource: ""
    datalocality: disabled
    stalereplicatimeout: 30
    nodeid: ""
    migrationnodeid: ""
    engineimage: longhornio/longhorn-engine:v1.3.2
    backingimage: ""
    standby: false
    diskselector: []
    nodeselector: []
    disablefrontend: false
    revisioncounterdisabled: false
    lastattachedby: ""
    accessmode: rwx
    migratable: true
    encrypted: false
    numberofreplicas: 3
    replicaautobalance: ignored
    baseimage: ""
    recurringjobs: []

kind-alarm-73406

11/03/2022, 11:43 PM

I am guessing there was a live migration of a VM and the node might have been crashed in the middle of it?

kind-alarm-73406

11/03/2022, 11:43 PM

Thanks for the above, looks like the volume should detach and all engines should turn off 🙂

famous-journalist-11332

11/03/2022, 11:45 PM

Some how there are 2 engines active at the same time and volume controller is not happy about that. Not sure how we can guarantee an atomic engines’ activeness flip in the first place

famous-journalist-11332

11/04/2022, 2:02 AM

Created a GitHub ticket https://github.com/longhorn/longhorn/issues/4827

famous-shampoo-18483

11/04/2022, 2:04 AM

I just found the similar issue ticket: https://github.com/longhorn/longhorn/issues/1755

👍 1

sticky-summer-13450

11/04/2022, 9:17 AM

You can get out of this buggy state by

kubectl edit <http://engines.longhorn.io|engines.longhorn.io> pvc-8e5a70d5-a853-4082-b6f6-0ef2cd3cd8c8-e-3c989560 -n longhorn-system

and set

spec.active

false

Well, applying that edit left that volume with zero replicas - so I guess I should have just blown away this broken Harvester VM days ago and started again. Never mind - it was only a k3s node, I can rebuild it. I hope the issue sees some traction and gets fixed 🙂

43 Views

Open in Slack

Previous Next