Big storage - how to go about this in a Harvester/...
# harvester
w
Big storage - how to go about this in a Harvester/Longhorn/Rancher stack? I've a large capacity storage issue > 5 TiB volume in use that has failed a few times and been reading that volumes > 1TiB (even half that size) are not ideal using Longhorn. This is ran from a VM provisioned in harvester with a large replicated filesystem attached. Probably 3-4 times a year it fails, usually due to power cuts or reboots - but when it happens it takes an age to rebuild and XFS repair is usually neeeded. I've also worked out we can use minio operator & minio tenant in a cluster, to utilise a large number of smaller volumes to give you a large sized pool. We are running k8s clusters on our harvester stack. The question is however - what to do about replication? Minio handles this - so I'm thinking of either customising the stack and using DirectPV ( though I'd rather not do this ) , I'm also concerned this adds something additional to install with harvester upgrades to make this available - alternatively I could provision a cluster with nodes 1:1 per host, and attach that way I guess... Alternatively using single replicate Longhorn volumes, in that instance I need to ensure that rancher/harvester know what to do with these volumes. With single replica volumes live migration wont be possible and upgrades to the stack will also fail unless all the pods are shutdown - since the workloads cant be safely migrated. Is there a way to tell rancher and harvester that this particular group of vms can be disrupted - i.e. allowing a given number of them to be shut down while maintenance is happening - as long as the replicas are numerous enough and not on the same node then in theory minio will deal with the fallout. Ultimately we have some 2TiB servers to backup, and we need volume to contain this data - we've about 100TiB capacity. I'd appreciate any thoughts on this and would love to hear how others have implemented this on a harvester / rancher / longhorn stack.
t
Since you had issues, I would off load the storage to an external iscsi/nfs array. Let the array manage the redundancy. You will need v1.5.0 or higher.
FYI Pure Storage arrays are designed to handle power outages. There isn’t even a power button. 😄
p
yeah also hacky but so far ZFS+harvester has been working fine for us. I wouldn't use longhorn for like, "storage storage"
t
over the years I have learned to stay away from the hacky stuff.
🐿️ 1
how many servers are you running?
w
Well we can’t do that, we do have an old iscsi array, but our storage is on the nodes. 1.5.0 isn’t stable yet I believe too. This question really is about how to leverage minio which can use many volumes to coalesce as a huge capacity volume, if it can be configured to work with what’s available in a rancher style stack it may be a winner for me. Also since the machines have storage in node it’s fast. We’re running 5 high performance harvester nodes, and a separate ha rancher cluster to manage
p
ehh not many and we're just testing out ZFS on like ~4. ymmv / do your own due diligence obvz 😉
t
Longhorn is great for taking advantage of the the storage that is on the servers. But for the replications it needs a TON of bandwidth. And as you have seen it does not handle power outages well. I have never seen minio used for volumes. I know it is an s3 service.
p
t
oh wow. that is cool.
w
bandwidth - we've got an extra dedicated 10G link used by longhorn - the HA it provides is desirable - but obviously if minio is handling that itself that bandwidth wont be used (using single replicass)
t
How are you connecting harvester to the the minio multi-drive? which csi?
w
I played with minio operator before and created a tennant with several TiB's of stroage and it worked nicely but it was wasteful in resources since far too much replication
You just choose a storage class and its dynamically provisioned
t
so you are installing the minio operator on harvester itself.
w
no - in a cluster hosted in vms provisioned by harvester using rancher (rke2)
I do have a single node minio vm running as a stop gap for s3 services, but when I last ran this i did it in cluster
t
OH… I thought you were talking about the Harvester level. AVOID using longhorn on the vms. longhorn on longhorn is SLOW.
w
no im not using longhorn in longhorn
I exposed the storage class to the cluster so its dynamically provisioned
t
good. slicing and dicing storage 3x3 is not performant. Ah. I still like iscsi/nfs arrays for both Harvester AND the vms as additional service. We have an array that can handle iscsi/nfs and S3. 😄
p
multi-node+drive minio basically is longhorn so you still wind up longhorn-in-longhorn 🦜
w
my use case is a bit different - since the storage is distrubuted to the vms - each machine has 4TiB NVME, 4TiB SSD and 20 TiB HDD - 5 nodes all identical, 2x10GiB nics, 1 for management, 1 for storage, seperate 1GiB for ingress
its not - because in the cluster you'd tell it the storage class to use then its provisioned by harvester using that class - however since minio handles replication it has the similar problem - however longhorn could be run without assuming minio handles itself
t
personally, I like bonding the 2x10gb for mgnt and storage into a single 20gb.
p
minio says
MinIO does not distinguish drive types and does not benefit from mixed storage types. Each pool must use the same type (NVMe, SSD)
For example, deploy a pool consisting of only NVMe drives. If you deploy some drives as SSD or HDD, MinIO treats those drives identically to the NVMe drives. This can result in performance issues, as some drives have differing or worse read/write characteristics and cannot respond at the same rate as the NVMe drives.
solution is dump longhorn, use bcachefs, then minio 😉
w
thats fine - we can setup seperate tennatnts with seperate sorage profiles if we want fast or slow pools
bcachefs? do you mean directfs
i guess the operator could be added to the harvester cluster and directfs on the hosts - but then that makes upgrading harvester tricky doesnt it
p
no bcachefs is sorta new/controlversial fs, but one feature is > A feature request we've had is configurationless tiering, smart tiering of member devices in a filesystem based on performance. This feature will allow easy and simple tiering of devices within a filesystem based on the performance of the device. The effect of this is that it will allow data that is commonly accessed, hot data, stored on the faster drives while data that is not used as often, cold data, will be stored on slower drives. This will increase filesystem performance greatly.
good for mixed storage
w
im not fussed about that to be honest, you can also do soemthing similar in LVM with NVME cache to speed up HDD's we've found that can work well for web servers.
What i need here is to get large capacity volumes workign reliably in cluster or virtually to be used as backup targets
p
yeah there's
csi-driver-lvm
too
w
trouble is i think this would need to be installed on the harvester end wouldn't it, and when you upgrade harvester you'd have to re-install since the complete image is replaced when you upgrade.
t
as of 1.5.0 harvester supports 3rd party CSIs for booting. installing as a CSI means in it not node dependent and will upgrade. 😄
w
nice - this might change things around a bit then! Do we know when 1.5.0 is use to become stable?
t
1.5.1 just came out. lot of fixes.. and improved csi handling. i talked to the my team at Pure Storage that is certifying it, and suse said to wait for 1.5.1.
1
p
btw lol "Upgrading Harvester causes the changes to the OS in the
after-install-chroot
stage to be lost. You must also configure the
after-upgrade-chroot
to make your changes persistent across an upgrade. Refer to Runtime persistent changes before upgrading Harvester." https://docs.harvesterhci.io/v1.5/advanced/csidriver/
w
Reminds me I need to get my powerchute in there!
p
thats the never-ending issue I keep being sad about https://github.com/harvester/harvester/issues/4556
t
p
hah I never heard about this
Copy code
apiVersion: <http://node.harvesterhci.io/v1beta1|node.harvesterhci.io/v1beta1>
kind: CloudInit
yeah its still a problem if you need drivers installed on the host os
I think the "correct" solution is fork harvester-installer?
t
couldn’t you use the cloudinit for installing the drivers?
p
maybe? e.g. zfs does kernel modules ... to me I'd rather have that stuff baked in an iso than having a cloud-init try to do it on boot on top of the immutable os 😅
w
Cloud init would make sense, if you want to bake in you’ll need to run your own image registry / build - could get quite manual otherwise and protracted each update
p
yeah you need to be rebuilding them either way ... maybe build/package per-harvester then cloud-init vs. bake into harvester ¯\_(ツ)_/¯
w
@thousands-advantage-10804 said - I have never seen minio used for volumes. I know it is an s3 service. I was referring to the volumes minio uses under the hood to store the data - it is an object store and we can use it for our backups, but we need quite a bit of space. Those volumes dont need replication - it sounds like I need to take volumes out of longhorn and use a direct mount driver then mount to minio so it can manage that storage to make this work. Thats going to be a PITA given we currently run the data of 20TiB drives and they are in use...
t
ah yes. also think about the layers. vms / longhorn / disk….
w
@thousands-advantage-10804 if that’s the primary focus bare metal with rancher management starts to make sense in the context of cloud - we only use harvester because we need some traditional vms it does sometime beg the question does the extra work and overhead justify the means