https://rancher.com/ logo
Title
s

sticky-summer-13450

11/28/2022, 12:43 PM
Longhorn in my Harvester has been working perfectly for a while. This morning I noticed that one of my VMs has stopped. I looked in the Longhorn UI and I see that the root disk volume is not happy. Whenever that volume is mounted it goes into an "Attaching" -> "Detaching" loop. How can I work out what happened? And how I can get that that partition back into a healthy mode?
i

icy-agency-38675

11/29/2022, 3:12 PM
Can you check the longhorn-manager log to see any related error?
s

sticky-summer-13450

11/29/2022, 3:58 PM
The logs don't go back far enough to see what caused the issue to start. But there's loads of this going on - hundreds of lines per second.
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Listing snapshots" serviceURL="10.52.3.28:10006"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Getting volume" serviceURL="10.52.3.28:10006"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Getting replica rebuilding status" serviceURL="10.52.3.28:10006"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Get snapshot purge status" serviceURL="10.52.3.28:10006"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Getting backup restore status" serviceURL="10.52.3.28:10006"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Getting snapshot clone status" serviceURL="10.52.3.28:10006"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=info msg="Process Manager: prepare to create process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Process Manager: validate process path: /engine-binaries/longhornio-longhorn-engine-v1.3.2/longhorn dir: /engine-binaries/ image: longhornio-longhorn-engine-v1.3.2 binary: longhorn"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=info msg="Process Manager: created process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-11-29T15:55:51Z" level=info msg="Starting with replicas [\"<tcp://10.52.3.27:10150>\"]"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-11-29T15:55:51Z" level=info msg="Connecting to remote: 10.52.3.27:10150"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-11-29T15:55:51Z" level=info msg="Opening: 10.52.3.27:10150"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-11-29T15:55:51Z" level=warning msg="failed to create backend with address <tcp://10.52.3.27:10150>: failed to open replica 10.52.3.27:10150 from remote: rpc error: code = Unknown desc = Failed to find metadata for volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img"
time="2022-11-29T15:55:51Z" level=info msg="Adding backend: <tcp://10.52.3.27:10150>"
2022/11/29 15:55:51 cannot create an available backend for the engine from the addresses [<tcp://10.52.3.27:10150>]
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=info msg="Process Manager: process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c error out, error msg: exit status 1"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-11-29T15:55:51Z" level=info msg="stop waiting for gRPC service of process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c to start at localhost:10010"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Listing replicas" serviceURL="10.52.3.28:10000"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Listing replicas" serviceURL="10.52.3.28:10001"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Listing snapshots" serviceURL="10.52.3.28:10000"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Listing snapshots" serviceURL="10.52.3.28:10001"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Process Manager: start getting logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Getting volume" serviceURL="10.52.3.28:10000"
[longhorn-instance-manager] time="2022-11-29T15:55:52Z" level=debug msg="Getting volume" serviceURL="10.52.3.28:10001"
i

icy-agency-38675

11/30/2022, 6:18 AM
Is the volume still at attach-detach loop? If yes, you can provide us the support bundle? Then, we can try to figure out the root cause.
👍 1
s

sticky-summer-13450

11/30/2022, 8:55 AM
Yes, still in an attach-detach loop. To recap: the volume was the root disk of a Harvester VM. The VM was running. Overnight, I'm assuming, something happened to the volume and the VM went into a stopping-starting loop. Two days prior I had rebuilt two of the nodes in the 3 node cluster - one after the other. Each time I waited for all of the volumes to be completely replicated to the newly rebuilt node before moving onto the next. I was happy that all volumes were working as expected - and the VM this volume was attach to was functioning correctly. Attached...
i

icy-agency-38675

12/01/2022, 2:36 PM
Thank you. Checking
From the support bundle, the volume
pvc-3b137621-8434-456b-8f08-410ade838d8b
should have 3 replicas, but somehow there is only one replica
pvc-3b137621-8434-456b-8f08-410ade838d8b-r-3e66e8d4
for this volume. However, the snapshot disk file’s metadata is missing, so the volume can’t be started and traps into the attach-detach loop.
2022-11-30T08:49:46.857428044Z [pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-11-30T08:49:46Z" level=warning msg="failed to create backend with address <tcp://10.52.3.27:10150>: failed to open replica 10.52.3.27:10150 from remote: rpc error: code = Unknown desc = Failed to find metadata for volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img"
The
volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img
is created correctly.
2022-11-28T05:12:35.827027788Z time="2022-11-28T05:12:35Z" level=info msg="Prune overlapping chunks from volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img based on volume-head-001.img"
Not sure why the disk’s metafile is deleted. Can you list the files in the volume’s replica folder?
s

sticky-summer-13450

12/01/2022, 3:59 PM
yep - 2 minutes to log on to the node which should still have the volume
👍 1
/var/lib/harvester/defaultdisk/replicas/pvc-3b137621-8434-456b-8f08-410ade838d8b-0ade9458 # ls -la
total 487332
drwx------  2 root root        4096 Nov 28 05:12 .
drwxr-xr-x 30 root root        4096 Nov 28 05:30 ..
-rw-------  1 root root        4096 Nov 28 05:12 revision.counter
-rw-r--r--  1 root root 64424509440 Nov 28 05:12 volume-head-006.img
-rw-r--r--  1 root root         178 Nov 28 05:12 volume-head-006.img.meta
-rw-r--r--  1 root root 64424509440 Nov 28 05:12 volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img
-rw-r--r--  1 root root 64424509440 Nov 28 05:12 volume-snap-b5d617df-075c-4850-a495-7818238ae609.img
-rw-r--r--  1 root root         178 Nov 28 05:12 volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
-rw-r--r--  1 root root         195 Nov 28 05:12 volume.meta
i

icy-agency-38675

12/01/2022, 4:06 PM
Yeah, the snapshot disk `volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img`’s metadata is missing. Some questions need your help. I’m thinking how to fix it manually. 1. Did you take the snapshot of the volume manually? 2. Can you show the content of
volume-head-006.img.meta
,
volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
and
volume.meta
s

sticky-summer-13450

12/01/2022, 4:09 PM
1. No - I did not make the snapshot manually.
👍 1
i

icy-agency-38675

12/01/2022, 4:09 PM
I will continue to work on the issue tomorrow morning. Thank you for the information in advance.
s

sticky-summer-13450

12/01/2022, 4:10 PM
harvester002:/var/lib/harvester/defaultdisk/replicas/pvc-3b137621-8434-456b-8f08-410ade838d8b-0ade9458 # cat volume-head-006.img.meta
{"Name":"volume-head-002.img","Parent":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","Removed":false,"UserCreated":false,"Created":"2022-11-28T05:12:48Z","Labels":null}

harvester002:/var/lib/harvester/defaultdisk/replicas/pvc-3b137621-8434-456b-8f08-410ade838d8b-0ade9458 # cat volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
{"Name":"volume-head-001.img","Parent":"volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img","Removed":false,"UserCreated":false,"Created":"2022-11-28T05:12:48Z","Labels":null}

harvester002:/var/lib/harvester/defaultdisk/replicas/pvc-3b137621-8434-456b-8f08-410ade838d8b-0ade9458 # cat volume.meta
{"Size":64424509440,"Head":"volume-head-006.img","Dirty":false,"Rebuilding":true,"Error":"","Parent":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","SectorSize":512,"BackingFilePath":""}
Thank you!
i

icy-agency-38675

12/02/2022, 5:49 AM
The
volume.meta
shows this replica is at
rebuilding
state, so I’m not quite sure if the replica is corrupted or not. However, we can try the following steps and see if the volume can be rescued. Before doing the steps, highly suggest backing up the replica directory (including all files). Steps: 1. Scale down the workload using the volume, then the volume should be detached. 2. Update
volume.meta
manually ◦ Set
Rebuilding
to
false
3. Update
volume-head-006.img.meta
manually ◦ Set
Name
to
volume-head-006.img
4. Scale up the workload --- As for the root cause of the issue, we need to further investigate, because the log messages are missing and not enough for analysis.
s

sticky-summer-13450

12/02/2022, 8:39 AM
1. Scale down the workload using the volume, then the volume should be detached.
I believe the workload is scaled down. The Harvester VM is stopped. I did attempt to use Longhorn UI's "Attach" option to attach the volume to the node - but I see no way to detach the volume in the Longhorn UI. Is there a Longhorn kubernetes object I can manipulate to detach?
As for the root cause of the issue, we need to further investigate, because the log messages are missing and not enough for analysis.
To be honest, I'm quite worried that a Harvester VM could lose at least two replicas (from the nodes harvester001 and harvester003) and leave this replica in this "inconsistent" state. My fastest remediation would have been to toss this VM (it's "only" a kubernetes node, so it wouldn't take long to rebuild it). I hope that investigating the issue would help to fix an issue.
i

icy-agency-38675

12/02/2022, 9:12 AM
Oh, wait. Let me update the steps. The steps forgot to update
volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
. Just to make sure if the workaround is feasible in your case 1. Scale down the workload using the volume, then the volume should be detached. (can skip if the workload is down) 2. Update
volume.meta
manually ◦ Set
Rebuilding
to
false
3. Update
volume-head-006.img.meta
manually ◦ Set
Name
to
volume-head-006.img
◦ Set
Parent
to
volume-snap-b5d617df-075c-4850-a495-7818238ae609.img
4. Update
volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
manually ◦ Set
Name
to
volume-snap-b5d617df-075c-4850-a495-7818238ae609.img
◦ Set
Parent
to
""
5. Scale up the workload (can skip if the workload is down) The 2-4 steps are to fix the disk chain
volume-head-006.img
->
volume-snap-b5d617df-075c-4850-a495-7818238ae609.img
.
To be honest, I’m quite worried that a Harvester VM could lose at least two replicas (from the nodes harvester001 and harvester003) and leave this replica in this “inconsistent” state.
You are right. Typically, Once the three replicas encounter errors, LH should auto salvage one of the replicas and rebuild the remaining two replicas. But somehow, the disk files/metadata files of the replica in your volume are messy. I guess there is a race condition when LH was doing replica rebuilding and snapshotting simultaneously. Do you mind opening a ticket in longhorn/longhorn with the support bundle? Then, we can continue investigating the issue and see if others have any idea. Many thanks.
s

sticky-summer-13450

12/02/2022, 10:01 AM
Brief description on https://github.com/longhorn/longhorn/issues/4985 I'll add comments with the result of the above.
👍 1
I also noticed that there are a lot of orphan volumes on two of the nodes in the cluster too - I think I need to see if there is any correlation on the times for the orphans to the approximate time of the failure of the this specific volume.
i

icy-agency-38675

12/02/2022, 10:39 AM
Can you provide the orphan volumes’ name?
s

sticky-summer-13450

12/02/2022, 10:58 AM
Yep.
$ kubectl get orphans --context harvester003 --namespace longhorn-system -o=custom-columns='NAME:.metadata.name, NODE:spec.nodeID, DataName:.spec.parameters.DataName'
NAME                                                                       NODE           DataName
orphan-040a73be61fafa35ba18891f312e37005436061fe7fae46bcdb33fe5b0679b0d   harvester001   pvc-59136de4-f641-44e8-a372-1fa9bb6f7f7c-7031f59c
orphan-06152dc0fce540aa1e421585503e8623a91ffc00272d4bb72d6acfb705b6108e   harvester001   pvc-67e7a314-7384-469f-9268-bdcd8728e526-0c56bc40
orphan-0c7fbc8bcb83a9daf5ff52f6a7032b395e2e57c363971b194b58bbbe56ea4266   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-ee0b5c5f
orphan-0da3ce4ffb9a2f92a1643720d959462d463a282ab39a52aa5f2e9f8840ad019d   harvester001   pvc-67e7a314-7384-469f-9268-bdcd8728e526-38f87f1a
orphan-0dc0de542de3711f1b30a452b4c7bf1d09eefec424b20eb7e425bf26790303a9   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-bc94b640
orphan-113596b19a803a5ed30690d7291ce66fb3dc56e30caa1605e9e977b3c6d309ae   harvester001   pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-715ead93
orphan-1fbf4b8f4a3c808c40ac49a66f75c2b81ff2027328876362b20a99a65be77524   harvester001   pvc-67e7a314-7384-469f-9268-bdcd8728e526-5d0cfede
orphan-24d22f9513691dc056e4b63eadd6794b609f1e27c6279b3de34476bda27dad7b   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-08aca8ad
orphan-31f1dfe066af63523a48e6c9c1dbbc408f523b156e9b6fc4b40b9517b3cc53ca   harvester001   pvc-67e7a314-7384-469f-9268-bdcd8728e526-a5ef9f89
orphan-358a6a81220e4a0fca00f7aef97f4f9d6459a97e9df2a3a2a1eed7bb8759c23e   harvester001   pvc-3b46b003-57e6-4338-a750-1f5308635672-936977a8
orphan-3cb1841ee61908c259f172afe71e64682e644236a60edbc7081cf73547826593   harvester002   pvc-bba3d4da-5ac0-4cb0-b2c3-2357ffaaf5fc-1a915877
orphan-3cf9e09f5f917ba316457ec474a1cf8069b17dcca9e99915949d91e7c8b80d48   harvester001   pvc-1a758e3f-8718-4e95-9a85-38bfacc99476-c09d13bc
orphan-41395cc6ca7b1076ae76eec716be0cfd0da8f4955d005d7743ef1d91d6becb28   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-d555936d
orphan-42abcba1d4893e64294f603837d7024ae50a02c7a43ab676cf923b2f7c1c6686   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-98936904
orphan-465bbce6c0e090be8751adc6aa73d06ad2c02fc9ac8577b66aa6ee246191a2e5   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-ff11e61b
orphan-478bbeb4819c0a71e05dc279210a5f928cc738556ac420478b5f0ed918c631a5   harvester001   pvc-67e7a314-7384-469f-9268-bdcd8728e526-2f1e4410
orphan-59276e20dd35ac28cedcc5adb1fc333dd88800c11f74807d53fd6288343e9576   harvester001   pvc-59136de4-f641-44e8-a372-1fa9bb6f7f7c-5135e57d
orphan-5e03081f2dd65a7c1b401a684cd254263c64486008315b84dc7115419fefba2b   harvester002   pvc-3f9a22e4-df30-45fc-b4c7-baed0c4ff217-df0142e7
orphan-5e3da08eb150690cfeb6ede4cbec697fff2a943d0ad30cdc726e3c6420ff1765   harvester001   pvc-3b46b003-57e6-4338-a750-1f5308635672-76a8e73c
orphan-62cf53a289f5fc1720163f68f908fde6380677f63cca5ff0d18ebe9da3835738   harvester001   pvc-a51fcf2e-5a85-41e5-9135-a886605de9f5-1ab519fd
orphan-68a2f1dd2a5944e6fbc54660c850fe73d9a07a6066407a9e4b1ef8cd4afe02d2   harvester001   pvc-c4dfa684-2e3a-496f-9396-0e137a8f85e7-d0f9fbc3
orphan-6d3e59be241345e4067482c85465468ee8a44d882d482206f3fc77d3b588cc76   harvester001   pvc-1a758e3f-8718-4e95-9a85-38bfacc99476-a8cde24d
orphan-6d977f419b7b3e422af824525faaf3c2d6beafa689aba167386b4ee3c50970de   harvester002   pvc-a5b5fe4c-eca4-4c97-a3db-f9490980c044-d521cbce
orphan-7c105b6297c858f5f023cdc469cba4183dec0d3c52d4927e609d68f230aaa57a   harvester001   pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-bb520089
orphan-7c705686fdb1f0c8b6374e7d4759709cc9d3698aff2d541789f1fa38410a2d80   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-ea742212
orphan-7f7a3e383e3a3b42ea79df11cc035603960608fe7276fe3f0400fa378aa26fef   harvester002   pvc-3b137621-8434-456b-8f08-410ade838d8b-ef1b7a55
orphan-816a5dfe6b796d2cbeb607c4e3af9e1e8d650eaef65ceb772b1737c42398c405   harvester002   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-95aa9131
orphan-81e0a71f781dbd66c0643c0a6ce8f3e0fe6982b4ed7a9201121f3ed69be089d7   harvester001   pvc-3b46b003-57e6-4338-a750-1f5308635672-827f6168
orphan-830b1c5a34f77676343953159febc584f339d66138f1f8c1c1eeb615931fe9da   harvester001   pvc-8bbbf49d-a556-4ca4-a10e-c3492a1856bb-01fd0001
orphan-8de4af35f4d6c7128fc12a1c9bc47215cca161a4850bbf6ed8486fe88db66a58   harvester001   pvc-1a758e3f-8718-4e95-9a85-38bfacc99476-2e70c401
orphan-8f8f3ba518876ced70a93f383901aeda9075bbe8f57bc2e1ec2472ca75a26fa1   harvester001   pvc-8bbbf49d-a556-4ca4-a10e-c3492a1856bb-33066224
orphan-93b7fb07b3c254f5358320072039b602b4ef5a1445673baa245c68767b68aad5   harvester001   pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-c879d9b6
orphan-97c06af7df5966c1ae6f85591a5d88097361e6b2dd95b9891d0ee112c8b5683f   harvester001   pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-d364be3e
orphan-98070018f5a8be03c8646585d18978c5ca46d0dc2440080424e63fa835217c4b   harvester001   pvc-c4dfa684-2e3a-496f-9396-0e137a8f85e7-746a3758
orphan-9a2c73da5f459d286861b29ec3ea374512c019528253e354b4c3e3177b5ab859   harvester001   pvc-3b46b003-57e6-4338-a750-1f5308635672-ada59f74
orphan-9a7c69e8ab1b4195992afcdc8c35b5d651816bcc74278d7f2c84135b4efb7f26   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-f3a02449
orphan-9bba5b69b168818771fe5e0959f6b8c10f4df6faab9cf39f2ae8c06347d2c42f   harvester001   pvc-c4dfa684-2e3a-496f-9396-0e137a8f85e7-7b4a8eb0
orphan-a5f3a9fdf1e1ba5fe4b81d57d5baa82118ae57c93ee19ee821bc969d0f41f0ef   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-27b76632
orphan-ac8bbeae78f52fbb9aa3e981cd067b4b9b1e731454d2a98e57fb04c7177f0d51   harvester001   pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-06ea7081
orphan-b115798b3f388b84c1e1042570ae539bab261f5c3952f86030bb23086c85c88b   harvester001   pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-fb01aaaa
orphan-b1ec301556793b04bca78f1d0a2040ee77e5aa659ce2a394bc38dee437e558ea   harvester001   pvc-c4dfa684-2e3a-496f-9396-0e137a8f85e7-2b9bf9df
orphan-b61a90e669abb9eaafede28274996574d36d0008220d7cad79d9c745ed5c069a   harvester001   pvc-0ca5a4f3-d641-4b31-b33d-96b925d9af04-fc617acc
orphan-bc5c41766365668b58078ffc53ba0bb0e96a9ee2a64da7bde41b939ffba6e0a4   harvester001   pvc-2909b66d-9e7e-4b12-98f8-c94ccfc08357-781c4429
orphan-bd8496cd312441b7935e371e6591daa32d5e90fcc408b2b66995065fc9322829   harvester001   pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba-3d80a64c
orphan-bd8ae569df54976f5acb33364fab3528c8aa233ad760c5277544b4949f841685   harvester002   mdsh-copy_2022-12-02_pvc-3b137621-8434-456b-8f08-410ade838d8b-0ade9458
orphan-c9a59540a23df4d096123a83ba1eef8608aa63fdc98474ce38a47f1f9e8b497a   harvester001   pvc-59136de4-f641-44e8-a372-1fa9bb6f7f7c-c2e3a3ec
orphan-ca607c475d277444d405166a19458a5edc2e388c5fed1272df9b797780cb8761   harvester001   pvc-8bbbf49d-a556-4ca4-a10e-c3492a1856bb-bc03f06a
orphan-d2cdf54ce07abe6f670c7313e608997e9a603d5a594d5c71e5731409fc0d174c   harvester001   pvc-59136de4-f641-44e8-a372-1fa9bb6f7f7c-0bc3ee4e
orphan-d45d68e5f74a72f5834a869eb63583a02eca4d01849e261aebf3f1cfa31eb097   harvester001   pvc-8bbbf49d-a556-4ca4-a10e-c3492a1856bb-6f65707c
orphan-d9f7effab64d928d6b8e0a8019d371cc13c2e1b868971ad0fe67d7a0fc92d464   harvester002   pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-a121628a
orphan-e4e7e5761b9c6a80535517e03b8789c2dce6becbefab36763b7380891d4fb2f2   harvester001   pvc-c4dfa684-2e3a-496f-9396-0e137a8f85e7-50e8bbc4
orphan-e73f9438a0b9087c81a668306ae233209635f94e9571e5e544018bd34faad0aa   harvester001   pvc-59136de4-f641-44e8-a372-1fa9bb6f7f7c-2d3e2959
orphan-f295f4810dd74f08e9e5382f9bf1996fbc1d64b0eba2408ef94a292afbc62286   harvester001   pvc-3b46b003-57e6-4338-a750-1f5308635672-d3e710aa
orphan-fd057e02991c63d295e41ff8aa3219d0b7755894e85304d2ac7df00bd41c3f66   harvester001   pvc-3b46b003-57e6-4338-a750-1f5308635672-a3735417
orphan-fd240823c245d84d7dc0d0feb2929744940721e81ccb27ce0b6e7e8187a88e8d   harvester001   pvc-59136de4-f641-44e8-a372-1fa9bb6f7f7c-cb569d9a
The copy I made as a backup before thinking about following your procedure is:
mdsh-copy_2022-12-02_pvc-3b137621-8434-456b-8f08-410ade838d8b-0ade9458
I believe I made all of those alterations and the volume is still in the attach-detach loop.
# cat volume.meta 
{"Size":64424509440,"Head":"volume-head-006.img","Dirty":true,"Rebuilding":false,"Error":"","Parent":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","SectorSize":512,"BackingFilePath":""}

# cat volume-head-006.img.meta 
{"Name":"volume-head-006.img","Parent":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","Removed":false,"UserCreated":false,"Created":"2022-11-28T05:12:48Z","Labels":null}

# cat volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
{"Name":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","Parent":"","Removed":false,"UserCreated":false,"Created":"2022-11-28T05:12:48Z","Labels":null}
I have tried to start the VM in Harvester which uses this volume, and the VM just hangs in the "starting" phase. All three nodes in this cluster have been restarted, before I made the changes above, because of the Harvester update to v1.1.1 - which left Longhorn at v1.3.2.
Curiously, over night the attach-detach loop has stopped - although the contents of the files has not changed.
# ls -la
total 487332
drwx------  2 root root        4096 Dec  4 18:18 .
drwxr-xr-x 35 root root        4096 Dec  4 21:48 ..
-rw-------  1 root root        4096 Nov 28 05:12 revision.counter
-rw-r--r--  1 root root 64424509440 Nov 28 05:12 volume-head-006.img
-rw-r--r--  1 root root         178 Dec  4 18:01 volume-head-006.img.meta
-rw-r--r--  1 root root 64424509440 Nov 28 05:12 volume-snap-0b1e6133-4c31-4b2d-a15d-3742e757fa76.img
-rw-r--r--  1 root root 64424509440 Nov 28 05:12 volume-snap-b5d617df-075c-4850-a495-7818238ae609.img
-rw-r--r--  1 root root         159 Dec  4 18:01 volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
-rw-r--r--  1 root root         195 Dec  4 18:18 volume.meta

# cat volume.meta 
{"Size":64424509440,"Head":"volume-head-006.img","Dirty":true,"Rebuilding":false,"Error":"","Parent":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","SectorSize":512,"BackingFilePath":""}

# cat volume-head-006.img.meta 
{"Name":"volume-head-006.img","Parent":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","Removed":false,"UserCreated":false,"Created":"2022-11-28T05:12:48Z","Labels":null}

# cat volume-snap-b5d617df-075c-4850-a495-7818238ae609.img.meta
{"Name":"volume-snap-b5d617df-075c-4850-a495-7818238ae609.img","Parent":"","Removed":false,"UserCreated":false,"Created":"2022-11-28T05:12:48Z","Labels":null}
There is still only one replica.
I tried starting the VM in Harvester and the volume went into the attach-detach loop until I stopped the VM. This is the first few seconds of logs from the Instance manager - Slack won't allow me to paste more...
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Process Manager: validate process path: /engine-binaries/longhornio-longhorn-engine-v1.3.2/longhorn dir: /engine-binaries/ image: longhornio-longhorn-engine-v1.3.2 binary: longhorn"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=info msg="Process Manager: created process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-12-05T08:40:18Z" level=info msg="Starting with replicas [\"<tcp://10.52.3.102:10345>\"]"
time="2022-12-05T08:40:18Z" level=info msg="Connecting to remote: 10.52.3.102:10345"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-12-05T08:40:18Z" level=info msg="Opening: 10.52.3.102:10345"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] time="2022-12-05T08:40:18Z" level=warning msg="backend <tcp://10.52.3.102:10345> size does not match 10737418240 != 64424509440 in the engine initiation phase"
time="2022-12-05T08:40:18Z" level=info msg="Adding backend: <tcp://10.52.3.102:10345>"
[pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c] 2022/12/05 08:40:18 cannot create an available backend for the engine from the addresses [<tcp://10.52.3.102:10345>]
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=info msg="Process Manager: process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c error out, error msg: exit status 1"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=info msg="stop waiting for gRPC service of process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c to start at localhost:10003"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=info msg="Process Manager: prepare to create process pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Process Manager: validate process path: /engine-binaries/longhornio-longhorn-engine-v1.3.2/longhorn dir: /engine-binaries/ image: longhornio-longhorn-engine-v1.3.2 binary: longhorn"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=info msg="Process Manager: created process pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Starting with replicas [\"<tcp://10.52.3.102:10075>\" \"<tcp://10.52.0.205:10135>\" \"<tcp://10.52.2.13:10120>\"]"
time="2022-12-05T08:40:18Z" level=info msg="Connecting to remote: 10.52.3.102:10075"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Opening: 10.52.3.102:10075"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Connecting to remote: 10.52.0.205:10135"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Opening: 10.52.0.205:10135"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Connecting to remote: 10.52.2.13:10120"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Opening: 10.52.2.13:10120"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Adding backend: <tcp://10.52.3.102:10075>"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Adding backend: <tcp://10.52.0.205:10135>"
time="2022-12-05T08:40:18Z" level=info msg="Adding backend: <tcp://10.52.2.13:10120>"
time="2022-12-05T08:40:18Z" level=info msg="Start monitoring <tcp://10.52.3.102:10075>"
time="2022-12-05T08:40:18Z" level=info msg="Start monitoring <tcp://10.52.0.205:10135>"
time="2022-12-05T08:40:18Z" level=info msg="Start monitoring <tcp://10.52.2.13:10120>"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Get backend <tcp://10.52.3.102:10075> revision counter 3598841"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Get backend <tcp://10.52.0.205:10135> revision counter 3598841"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="Get backend <tcp://10.52.2.13:10120> revision counter 3598841"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:18Z" level=info msg="device pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011: SCSI device /dev/longhorn/pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011 shutdown"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Process Manager: start getting logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Listing replicas" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:18Z" level=debug msg="Listing snapshots" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Getting volume" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Getting replica rebuilding status" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Get snapshot purge status" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Getting backup restore status" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Getting snapshot clone status" serviceURL="10.52.3.101:10002"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=info msg="Process Manager: prepare to create process pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Process Manager: validate process path: /engine-binaries/longhornio-longhorn-engine-v1.3.2/longhorn dir: /engine-binaries/ image: longhornio-longhorn-engine-v1.3.2 binary: longhorn"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=info msg="Process Manager: created process pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Starting with replicas [\"<tcp://10.52.0.205:10120>\" \"<tcp://10.52.3.102:10360>\" \"<tcp://10.52.2.13:10105>\"]"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Connecting to remote: 10.52.0.205:10120"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Opening: 10.52.0.205:10120"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Connecting to remote: 10.52.3.102:10360"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Opening: 10.52.3.102:10360"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Process Manager: got logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Process Manager: start getting logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Connecting to remote: 10.52.2.13:10105"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Opening: 10.52.2.13:10105"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Adding backend: <tcp://10.52.0.205:10120>"
time="2022-12-05T08:40:19Z" level=info msg="Adding backend: <tcp://10.52.3.102:10360>"
time="2022-12-05T08:40:19Z" level=info msg="Adding backend: <tcp://10.52.2.13:10105>"
time="2022-12-05T08:40:19Z" level=info msg="Start monitoring <tcp://10.52.2.13:10105>"
time="2022-12-05T08:40:19Z" level=info msg="Start monitoring <tcp://10.52.0.205:10120>"
time="2022-12-05T08:40:19Z" level=info msg="Start monitoring <tcp://10.52.3.102:10360>"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Get backend <tcp://10.52.0.205:10120> revision counter 0"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Get backend <tcp://10.52.3.102:10360> revision counter 0"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="Get backend <tcp://10.52.2.13:10105> revision counter 0"
[pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-e-d2b24f89] time="2022-12-05T08:40:19Z" level=info msg="device pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e: SCSI device /dev/longhorn/pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e shutdown"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=info msg="wait for gRPC service of process pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74 to start at localhost:10004"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] go-iscsi-helper: tgtd is already running
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:19Z" level=info msg="go-iscsi-helper: found available target id 4"
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:19Z" level=info msg="New data socket connection established"
tgtd: device_mgmt(246) sz:109 params:path=/var/run/longhorn-pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011.sock,bstype=longhorn,bsopts=size=21474836480
tgtd: bs_thread_open(409) 16
[pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-e-5fbaea74] time="2022-12-05T08:40:19Z" level=info msg="default: automatically rescan all LUNs of all iscsi sessions"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Process Manager: got logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-12-05T08:40:19Z" level=debug msg="Process Manager: start getting logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:20Z" level=debug msg="Process Manager: got logs for process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:20Z" level=debug msg="Process Manager: prepare to delete process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:20Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
[longhorn-instance-manager] time="2022-12-05T08:40:20Z" level=debug msg="Process Manager: deleted process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:20Z" level=info msg="Process Manager: successfully unregistered process pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c"
[longhorn-instance-manager] time="2022-12-05T08:40:20Z" level=debug msg="Process update: pvc-3b137621-8434-456b-8f08-410ade838d8b-e-68ae6e4c: state error: Error: exit status 1"
i

icy-agency-38675

12/05/2022, 8:58 AM
Sorry for the late reply. The same problematic volume?
s

sticky-summer-13450

12/05/2022, 9:01 AM
Yes.
i

icy-agency-38675

12/05/2022, 9:02 AM
Did you follow the steps I provided, correct? Then, can you provide me the support bundle? I will check later. Thank you.
s

sticky-summer-13450

12/05/2022, 9:05 AM
Yes - I followed the steps you provided. You should see the results 5 messages up.
i

icy-agency-38675

12/05/2022, 9:08 AM
Yes, I’ve check the messages. However, the log messages did provide enough information for the error. So, I’d like to check longhorn-manager and other instance-manager logs.
i

icy-agency-38675

12/05/2022, 9:13 AM
Thank you. Will check it. The worst situation might be the data of the volume is corrupted. Anyway, I will check it. Many thanks.
s

sticky-summer-13450

12/05/2022, 9:14 AM
Thanks 🙂
There were the manager logs from the time I started the workload (started the VM).
time="2022-12-05T08:40:18Z" level=debug msg="Instance handler updated instance pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011-r-0d53709e state, old state stopped, new state running"
time="2022-12-05T08:40:18Z" level=debug msg="Instance process pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-r-6b52dc9c had been created, need to wait for instance manager update"
time="2022-12-05T08:40:18Z" level=debug msg="Instance handler updated instance pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-r-6b52dc9c state, old state stopped, new state starting"
time="2022-12-05T08:40:19Z" level=debug msg="Instance pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-r-6b52dc9c starts running, Storage IP 10.52.2.13"
time="2022-12-05T08:40:19Z" level=debug msg="Instance pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-r-6b52dc9c starts running, IP 10.52.2.13"
time="2022-12-05T08:40:19Z" level=debug msg="Instance pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-r-6b52dc9c starts running, Port 10105"
time="2022-12-05T08:40:19Z" level=debug msg="Instance handler updated instance pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e-r-6b52dc9c state, old state starting, new state running"
time="2022-12-05T08:40:22Z" level=error msg="invalid customized default setting taint-toleration with value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>, will continue applying other customized settings" error="failed to set the setting taint-toleration with invalid value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>: cannot modify toleration setting before all volumes are detached"
10.52.3.105 - - [05/Dec/2022:08:40:27 +0000] "GET /v1/volumes/pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011 HTTP/1.1" 200 7791 "" "Go-http-client/1.1"
10.52.3.105 - - [05/Dec/2022:08:40:28 +0000] "GET /v1/volumes/pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011 HTTP/1.1" 200 7791 "" "Go-http-client/1.1"
10.52.3.105 - - [05/Dec/2022:08:40:30 +0000] "GET /v1/volumes/pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e HTTP/1.1" 200 7795 "" "Go-http-client/1.1"
10.52.3.105 - - [05/Dec/2022:08:40:30 +0000] "GET /v1/volumes/pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e HTTP/1.1" 200 7795 "" "Go-http-client/1.1"
W1205 08:40:33.130755       1 warnings.go:70] batch/v1beta1 CronJob is deprecated in v1.21+, unavailable in v1.25+; use batch/v1 CronJob
time="2022-12-05T08:40:52Z" level=error msg="invalid customized default setting taint-toleration with value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>, will continue applying other customized settings" error="failed to set the setting taint-toleration with invalid value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>: cannot modify toleration setting before all volumes are detached"
10.52.3.105 - - [05/Dec/2022:08:41:06 +0000] "GET /v1/volumes/pvc-687e2039-89ad-4235-b21f-708ca5dd8f21 HTTP/1.1" 200 7451 "" "Go-http-client/1.1"
time="2022-12-05T08:41:15Z" level=warning msg="error get size" collector=backup error="strconv.ParseFloat: parsing \"\": invalid syntax" node=harvester001
10.52.0.216 - - [05/Dec/2022:08:41:15 +0000] "GET /metrics HTTP/1.1" 200 23396 "" "Prometheus/2.28.1"
10.52.3.105 - - [05/Dec/2022:08:41:15 +0000] "GET /v1/volumes/pvc-ab70b8de-d6d9-4e4d-baee-48b967dfbbba HTTP/1.1" 200 8256 "" "Go-http-client/1.1"
10.52.3.105 - - [05/Dec/2022:08:41:15 +0000] "GET /v1/volumes/pvc-3b46b003-57e6-4338-a750-1f5308635672 HTTP/1.1" 200 7923 "" "Go-http-client/1.1"
time="2022-12-05T08:41:22Z" level=error msg="invalid customized default setting taint-toleration with value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>, will continue applying other customized settings" error="failed to set the setting taint-toleration with invalid value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>: cannot modify toleration setting before all volumes are detached"
10.52.3.105 - - [05/Dec/2022:08:41:34 +0000] "GET /v1/volumes/pvc-d4829d1a-07c8-488e-9eb4-98f17e5a639e HTTP/1.1" 200 7795 "" "Go-http-client/1.1"
10.52.3.105 - - [05/Dec/2022:08:41:34 +0000] "GET /v1/volumes/pvc-cad1bc49-c1dc-44e7-8e03-e8587697d011 HTTP/1.1" 200 7791 "" "Go-http-client/1.1"
time="2022-12-05T08:41:52Z" level=error msg="invalid customized default setting taint-toleration with value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>, will continue applying other customized settings" error="failed to set the setting taint-toleration with invalid value <http://kubevirt.io/drain:NoSchedule|kubevirt.io/drain:NoSchedule>: cannot modify toleration setting before all volumes are detached"
time="2022-12-05T08:42:15Z" level=warning msg="error get size" collector=backup error="strconv.ParseFloat: parsing \"\": invalid syntax" node=harvester001
10.52.0.216 - - [05/Dec/2022:08:42:15 +0000] "GET /metrics HTTP/1.1" 200 23413 "" "Prometheus/2.28.1"
i

icy-agency-38675

12/05/2022, 9:26 AM
Looks no errors here. The errors in the log messages are false alarmed (improved in the upcoming v1.4.0)
s

sticky-summer-13450

12/05/2022, 9:26 AM
🙂
i

icy-agency-38675

12/06/2022, 9:12 AM
From the latest support bundle, the volume was attached and then detached repeatedly. Not sure if the volume is corrupted and the application (I mean harvester) found the error and detach it. You may need to rebuild the node .
For the root cause of the messy disk metadata files, I will continue the investigation.
s

sticky-summer-13450

12/06/2022, 3:14 PM
Thank you. I only kept the volume (and it's VM) around in case you needed to study it further.
🙏 1
i

icy-agency-38675

12/07/2022, 3:11 AM
Thank you.