This message was deleted.
# longhorn-storage
a
This message was deleted.
c
could it be related to replica auto-balance enabled?
p
Yes, that setting is enabled, but how could i reduce replica rebuild time?
Supposedly i don't need to rebuild the whole replica after a drain, as the data change is overall minimal (esp on huge volumes which contain oinly fat unchainged data)
c
i believe you need to enable the fast replica rebuild option
to shorten the rebuild time
p
Already done
c
but i'm reading about it as we speak
p
Its on by default iirc
c
A replica consists of a chain of snapshots, showing a history of the changes in the data within a volume.
p
So i just need to do some snapshots
c
i believe so
the definition of a replica is that
so not taking any, it means that the rebuild will take a long time because it has no starting point
the thing i ask you now is , if we run snapshot cleanups meaning we delete snapshots
if a disaster happen and we lose a replica, it means the rebuild will take ages
p
It's fine if a disaster happen if the replica rebuild is long
I just don't want long rebuild under a simple node drain
c
i guess probably taking a snapshot is the best way, even though this gets me thinking shouldn't the replica be left there even if we move a pod from node 1 to node 2 let's say
meaning that the replica should still be available on that node
p
When you drain a node, the node become unschedulable and all the replicaset get stopped
Thus all replica become offline
c
ah well yes you're draining longhorn daemon sets sorry yes
my bad
p
I'm draining the whole node
c
therefore he tries to rebuild the replica i agree
p
If i want to perofrm maintenance on it yknow
c
maybe though, a workaround could be to increase the replica timeout
before rebuild
there is a set time before the rebuild is triggered
p
I don't think this will work, as an offline replica thats out of syunc become failed as soon as the volume has changed
Which happens as i have a database on it that always change yknow
I should just separate dbs from storage lmao
c
understood, but maybe when the node comes back up, shouldn't the replica rebuild be faster and not run from scratch?
p
Thats exactly the question i'm asking lol
It didnt
c
i thought your replica rebuild triggered in a new node
p
Ah no, i only have 3 nodes in my cluster
c
got it, so i think the fast rebuild is failing due to missing checksums
p
Granted, your remark makes more sense than my current context and just made me understand why data locality may be annoying
c
usually though when i run databases, i always set replicas to 1 and strict localit
when i perform maintenance operation i let the database perform the failover operation
p
I don't see the point of using longhorn if you use strict local doe
I don't understand the point of this option
isnt it similar to host bind mount?
c
because of all the other features like backup, restore etc
p
I don't use the longhorn backup system in my current env, as what is mission critical are databases, and they backup themselves with dumps (which is safer and more efficient than a volume backup, too)
I could use the backup for my fat, not-db storage tho.
c
yes i agree with you, i also backup via native db backup, i also like to have a backup of the backup
i use cloudnative postgres for example and it allows me to restore from a volumesnapshot aswell, and longhorn allows me to do this
"I should just separate dbs from storage lmao" what did you mean with this?
like i don't see how you can not evict longhorn daemonsets
unless they are into dedicated database nodes with let's say a label
i might not be of help
i suggest though opening a discussion on github, the devs are usually very active and helpful 🙂
p
By sepearating db from stroage, i mean i'm using a single volume per service i host, and many of them have a database, meaning my storage service has 500gb of files AND the 300mb (ever-changing) db in the same volume
I just setup snapshots, i'll see if tomorrow the node rebuild is faster before annoying the devs
In any way, thanks for helping me out !! Hope we figure that fs trim thats bugging you.
f
Sorry if the the ans was already resolve. But if not, what is the Longhorn version? Depend on the setting over here, Longhorn might behave differenly https://longhorn.io/docs/1.6.1/references/settings/#node-drain-policy Also, yes, there are some setting to speed up rebuilding by check the snapshot checksum https://longhorn.io/docs/1.6.1/references/settings/#fast-replica-rebuild-enabled
p
1.6.1 mate yeh