This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

05/16/2024, 9:09 AM

This message was deleted.

curved-piano-98970

05/16/2024, 9:19 AM

could it be related to replica auto-balance enabled?

powerful-librarian-10572

05/16/2024, 9:19 AM

Yes, that setting is enabled, but how could i reduce replica rebuild time?

powerful-librarian-10572

05/16/2024, 9:20 AM

Supposedly i don't need to rebuild the whole replica after a drain, as the data change is overall minimal (esp on huge volumes which contain oinly fat unchainged data)

curved-piano-98970

05/16/2024, 9:23 AM

i believe you need to enable the fast replica rebuild option

curved-piano-98970

05/16/2024, 9:23 AM

to shorten the rebuild time

powerful-librarian-10572

05/16/2024, 9:23 AM

Already done

curved-piano-98970

05/16/2024, 9:23 AM

but i'm reading about it as we speak

powerful-librarian-10572

05/16/2024, 9:23 AM

Its on by default iirc

curved-piano-98970

05/16/2024, 9:23 AM

A replica consists of a chain of snapshots, showing a history of the changes in the data within a volume.

powerful-librarian-10572

05/16/2024, 9:23 AM

So i just need to do some snapshots

curved-piano-98970

05/16/2024, 9:24 AM

i believe so

curved-piano-98970

05/16/2024, 9:24 AM

the definition of a replica is that

curved-piano-98970

05/16/2024, 9:24 AM

so not taking any, it means that the rebuild will take a long time because it has no starting point

curved-piano-98970

05/16/2024, 9:27 AM

the thing i ask you now is , if we run snapshot cleanups meaning we delete snapshots

curved-piano-98970

05/16/2024, 9:28 AM

if a disaster happen and we lose a replica, it means the rebuild will take ages

powerful-librarian-10572

05/16/2024, 9:28 AM

It's fine if a disaster happen if the replica rebuild is long

powerful-librarian-10572

05/16/2024, 9:28 AM

I just don't want long rebuild under a simple node drain

curved-piano-98970

05/16/2024, 9:30 AM

i guess probably taking a snapshot is the best way, even though this gets me thinking shouldn't the replica be left there even if we move a pod from node 1 to node 2 let's say

curved-piano-98970

05/16/2024, 9:30 AM

meaning that the replica should still be available on that node

powerful-librarian-10572

05/16/2024, 9:31 AM

When you drain a node, the node become unschedulable and all the replicaset get stopped

powerful-librarian-10572

05/16/2024, 9:31 AM

Thus all replica become offline

curved-piano-98970

05/16/2024, 9:31 AM

ah well yes you're draining longhorn daemon sets sorry yes

curved-piano-98970

05/16/2024, 9:31 AM

my bad

powerful-librarian-10572

05/16/2024, 9:31 AM

I'm draining the whole node

curved-piano-98970

05/16/2024, 9:32 AM

therefore he tries to rebuild the replica i agree

powerful-librarian-10572

05/16/2024, 9:32 AM

If i want to perofrm maintenance on it yknow

curved-piano-98970

05/16/2024, 9:32 AM

maybe though, a workaround could be to increase the replica timeout

curved-piano-98970

05/16/2024, 9:32 AM

before rebuild

curved-piano-98970

05/16/2024, 9:32 AM

there is a set time before the rebuild is triggered

powerful-librarian-10572

05/16/2024, 9:32 AM

I don't think this will work, as an offline replica thats out of syunc become failed as soon as the volume has changed

powerful-librarian-10572

05/16/2024, 9:33 AM

Which happens as i have a database on it that always change yknow

powerful-librarian-10572

05/16/2024, 9:33 AM

I should just separate dbs from storage lmao

curved-piano-98970

05/16/2024, 9:33 AM

understood, but maybe when the node comes back up, shouldn't the replica rebuild be faster and not run from scratch?

powerful-librarian-10572

05/16/2024, 9:33 AM

Thats exactly the question i'm asking lol

powerful-librarian-10572

05/16/2024, 9:34 AM

It didnt

curved-piano-98970

05/16/2024, 9:34 AM

i thought your replica rebuild triggered in a new node

powerful-librarian-10572

05/16/2024, 9:34 AM

Ah no, i only have 3 nodes in my cluster

curved-piano-98970

05/16/2024, 9:35 AM

got it, so i think the fast rebuild is failing due to missing checksums

powerful-librarian-10572

05/16/2024, 9:35 AM

Granted, your remark makes more sense than my current context and just made me understand why data locality may be annoying

curved-piano-98970

05/16/2024, 9:35 AM

usually though when i run databases, i always set replicas to 1 and strict localit

curved-piano-98970

05/16/2024, 9:36 AM

when i perform maintenance operation i let the database perform the failover operation

powerful-librarian-10572

05/16/2024, 9:36 AM

I don't see the point of using longhorn if you use strict local doe

powerful-librarian-10572

05/16/2024, 9:36 AM

I don't understand the point of this option

powerful-librarian-10572

05/16/2024, 9:36 AM

isnt it similar to host bind mount?

curved-piano-98970

05/16/2024, 9:36 AM

because of all the other features like backup, restore etc

powerful-librarian-10572

05/16/2024, 9:38 AM

I don't use the longhorn backup system in my current env, as what is mission critical are databases, and they backup themselves with dumps (which is safer and more efficient than a volume backup, too)

powerful-librarian-10572

05/16/2024, 9:38 AM

I could use the backup for my fat, not-db storage tho.

curved-piano-98970

05/16/2024, 9:38 AM

yes i agree with you, i also backup via native db backup, i also like to have a backup of the backup

curved-piano-98970

05/16/2024, 9:39 AM

i use cloudnative postgres for example and it allows me to restore from a volumesnapshot aswell, and longhorn allows me to do this

curved-piano-98970

05/16/2024, 9:40 AM

"I should just separate dbs from storage lmao" what did you mean with this?

curved-piano-98970

05/16/2024, 9:40 AM

like i don't see how you can not evict longhorn daemonsets

curved-piano-98970

05/16/2024, 9:40 AM

unless they are into dedicated database nodes with let's say a label

curved-piano-98970

05/16/2024, 9:44 AM

i might not be of help

curved-piano-98970

05/16/2024, 9:44 AM

i suggest though opening a discussion on github, the devs are usually very active and helpful 🙂

powerful-librarian-10572

05/16/2024, 9:47 AM

By sepearating db from stroage, i mean i'm using a single volume per service i host, and many of them have a database, meaning my storage service has 500gb of files AND the 300mb (ever-changing) db in the same volume

powerful-librarian-10572

05/16/2024, 9:48 AM

I just setup snapshots, i'll see if tomorrow the node rebuild is faster before annoying the devs

powerful-librarian-10572

05/16/2024, 9:48 AM

In any way, thanks for helping me out !! Hope we figure that fs trim thats bugging you.

famous-journalist-11332

05/18/2024, 1:04 AM

Sorry if the the ans was already resolve. But if not, what is the Longhorn version? Depend on the setting over here, Longhorn might behave differenly https://longhorn.io/docs/1.6.1/references/settings/#node-drain-policy Also, yes, there are some setting to speed up rebuilding by check the snapshot checksum https://longhorn.io/docs/1.6.1/references/settings/#fast-replica-rebuild-enabled

powerful-librarian-10572

05/18/2024, 1:43 PM

1.6.1 mate yeh

2 Views

Open in Slack

Previous Next