This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

12/05/2024, 5:49 PM

This message was deleted.

late-needle-80860

12/05/2024, 10:02 PM

Something is consuming disk storage space

late-needle-80860

12/05/2024, 10:02 PM

Could be snapshots

late-needle-80860

12/05/2024, 10:02 PM

Can’t see what disk it is … as you haven’t expanded any of the unschedulable nodes

late-needle-80860

12/05/2024, 10:12 PM

Something is requesting PVC’s of a size that makes allocated space look like that

icy-apple-1347

12/09/2024, 7:46 PM

I got a reply from my colleagues. Here is what they said:

Yeah something quite wrong is going on with our longhorn. I reset our cluster completely ( which included restarting longhorn which got us into a clean state with longhorn before any PVCs were even trying to be created ( first screenshot ).

Then one PVC started to get created I even saw weird behaviour happening, this PVC is only requesting 10 Gi of space but wasn't getting Attached straight away and was staying in detached for some reason ( second and third screenshots ). Which seems odd because with only one PVC I would've thought it right away would get bound and attached and be healthy since at that time longhorn was only managing that PVC.

Then checking the nodes page again I can see that each node is already using double what the PVC is requesting, and longhorn is already trying to re balance the replicas ( screenshots 4,5, and 6 ).

From what I can tell we have no instruction telling longhorn to create any snapshots so I don't think that snapshots should be taking up any of the allocated space. But one thing I have noticed today whilst looking at this is that all of the Disks seems to have the same name ( screenshot 7 ) which I'm now suspicious of if its causing Longhorn to get confused on which disk is which, tho I'm not sure where that name is even set from...

Sorry for screenshot spam, it just makes it much easier to explain what I'm seeing.

icy-apple-1347

12/09/2024, 8:04 PM

I have another follow up. Our company had a power-outage a while ago, and it caused problems with the cluster disks. We ended up just restoring the disks from a working copy. When we run

lvdisplay

, our lvm on nodes 1 and 3 are saying that they were created by node 2 and all of the nodes have the same disk name and LV UUID I don't really know what kind of issues this could cause but all 3 nodes have an LV with the same UUID and creation date. Do you think its possible that is tripping up longhorn and causing issues if all of the LVMs have the same UUID?

icy-apple-1347

12/09/2024, 9:32 PM

Looks like having the same UUIDs for things was what was causing the problem! At least for now everything is working

❤️ 1

late-needle-80860

12/09/2024, 9:54 PM

Interesting! And good find.

late-needle-80860

12/09/2024, 9:54 PM

What did you do to update the UUID of the LV’s?

icy-apple-1347

12/09/2024, 9:55 PM

I believe we ended up just deleting them and re-creating them. They were the same because they were copied from a VM template we took. The data we had there wasn't important so we were ok with losing it.

late-needle-80860

12/09/2024, 9:56 PM

Aaah okay - makes sense.

late-needle-80860

12/09/2024, 9:56 PM

Don’t you have bootstrap automation for setting up the clusters, disk setup and so on?

icy-apple-1347

12/09/2024, 9:57 PM

We do, but thought it would be faster to copy a template VM. We're using RKE2. It takes us maybe 1-2 hours for a full restore. I think this team thought they could avoid the cluster bootstrapping process if they restarted from a template

late-needle-80860

12/11/2024, 8:38 AM

does bootstrapping take more than two hours per node? Wow!

icy-apple-1347

12/11/2024, 4:37 PM

Ah sorry, not per node. For the whole cluster. Per-node is maybe like 15-30m just configuring RKE2 and waiting for it to start

🤔 1

late-needle-80860

12/11/2024, 10:06 PM

hmm ok

24 Views

Open in Slack

Previous Next