Storage question for you.... in Harvester the stor...
# harvester
w
Storage question for you.... in Harvester the storage is backed by longhorn as we know, we've been using a VM running MinIO for object storage, however with the recent changes to MinIO and the loss of most of the administrative functions in the UI we're looking at a solution to out object store needs. We have a second site running k3s/docker (truenas scale) that we had used replication and MinIO to get the data backed up offsite and this was working well, however the worrying developments at MinIO mean that this might not be an option for us, unless we run with a fork of the project. The question is - whats the ideal way to get an S3 compatible object store running on Harvester/Rancher - it feels like a feature that should be present or implementable at the Longhorn side - Ceph for instance has an S3 Object store available - but running Ceph on-top of Longhorn doesn't make sense to me (while it is possible to have 1 replica and hand off to longhorn to replicate, that feels like asking for trouble). I've found this article - https://freshbrewed.science/2022/09/06/cloudcsistorage.html in which it looks like the reader adds S3 - though I'm not sure what they are doing - could be using longhorn as block storage for AWS - which means external dependancies, anyway - before I dive into this I wanted to ask here whats the optimal route to do this? Will keep digging, but interested in thoughts here as Harvester/Rancher/Longhorn gets you 90% there but there is no built in object storage out of the box but you'd have thought these products are designed to work well to facilitate this?
I think the link I've provided is concerned only with backing up your filesystem to your S3 bucket rather than running the buckets from your Longhorn volumes.
S3Proxy only has basic bucket functions from what I read too - and its Java based so wouldn't expect great performance.
From what I read the mc client is still fully functional - if this is the case - then MinIO may still be viable, was just a bit shocked when checking via the UI to find half of it missing! Currently running as dedicated VM for backups and replicating offsite. Going to have a play and learn a bit more about the mc client to assert things are ok - but if anyone here has found a good methodology or elegant solution of augmenting S3 storage model on-top of a harvester/rancher/longhorn stack I'd love to hear it! Were running k8s clusters and a handful of VM's in this architecture with a storage in Longhorn.
b
We were in a similar position, but moved out from the harvester/longhorn stack.
Basically we had older hardware and decided to create two ceph clusters on bare metal nodes to use against our Harvester/Downstream K8s/K3s storage. The two clusters were SSDs vs HDD but with the ceph-csi plugin it made getting that storage into a cluster super lightweight/easy. Plus we felt like we had the addtional flexibility to provision/share cephFS/NFS to VMs at the same time and not just the RBD storage or S3 buckets.
w
@bland-article-62755 I wonder if similar can be achieved by using VM's in harvester for the Ceph nodes, backing the storage either as a direct passthrough from the hardware or as a single replica longhorn or even iscsi mount - each vm locked to a node. These would not be migrate-able however and I upgrading harvester would be a problem since it wont restart the nodes if the workloads cant be moved. We have setup our nodes using really power efficient hardware and the entire stack of 11 machines uses less wattage than one of our old machines (of which we have 4+SAN decommissioned) The old SAN out to run as iscsi volumes - that would also solve it, however thats power hungry and limited to 2x4x1GiB interfaces, rather than 2x10GiB per node in the new cluster, so both less bandwidth and a lot more power to boot. Running VM's in the cluster, using the underlying hardware feels like an interesting option however - if we can solve the upgrade issue to ensure the Ceph cluster can be kept running durning that process of updating then it could be a winner. Need to read-up on the update process, if it can be done with the right config - I think presently updates would fail if the workload cant be moved, or it might be a case of manual intervention node by node to ensure the update passes through and that the Ceph cluster has enough healthy VM's. I'm going to experiment over the next few weeks - if anyone has thoughts on this greatly appreciated and happy to share our progress.
p
have you looked at seaweedfs? long time since I've used ceph but I think seaweed is a much more lighweight
something like lvm csi or directpv probably makes sense instead of putting it on longhorn tho if you're using for this purpose
b
In my experience your ceph likes access to the disk directly and does better when it's 7+ nodes.
w
@prehistoric-morning-49258 will check out seaweedfs, the issue is all the disks are currently assigned to longhorn - there is a lot there. Longhorn does support ISCSSI and single replica volumes so in theory it could be told to leave it to the machine its providing the pv for...
However - more layers of abstraction not ideal I guess..
b
I'd stay away from trying to run it with VMs, it would be better to run rook-ceph from the kubernetes side than it would be to do some vm abstraction, but that's just my gut feeling.
themgt is right though, ceph is anything but lightweight.
p
yeah maybe not what you want but we're testing zfs for data, and then replicating data off is efficient and decoupled from harvester/longhorn
w
We have a small 5 node cluster running harvester atm, with another 5 node low power rancher management cluster (11w per machine) this is an on-site development cluster we use to try out different cluster setups etc pre - production, but we also run our dev-ops in there... Reason for using minio before is that we can run replication to a second site and we have the S3 services to provide object storage when we need it - which works well for us.
Now we've realised all the minio functions are there (post the 115k lines removed from the almost useful UI) we can continue as we are... There is an operator for minio we played with before but I wasn't super happy about how it worked - as longhorn is running replicas already and wasn't sure how many copies we ultimately end up with data. Ultimately if minio can be work with the system handling the integrity of the files then the operator is nice as we can have many many smaller volumes to make large ones - which is fast for rebuilds. Or even the other way around - with minio handling replication and longhorn leaving them at 1 replica... however that then means that upgrades to harvester will get stuck with volumes that cant be moved.
p
yeah I'm a zfs noob but it does seem slick for replicating heterogeneous data offsite fast. you can run it directly on a disk and then run s3 compatible whatever on top, but often we don't actually need object storage just the need to replicate block devices
b
Potentially. But you should be able to update the eviction/migration strategy and tell Harvester to shut them down to let the upgrades progress.
🙌 1
But you're right that it won't work out of box if you do that.
w
@prehistoric-morning-49258 thats interesting - our offsite is zfs based (truenas scale), it runs minio containerised on top - zfs is handling the replication - but this is a single very large node which says to me why shouldn't longhorn handle the replication to and use minio without replication... however harvester is multi-node... so dealing with consistency across nodes may be why its desirable to let minio do its thing.. (or ceph or seaweed...) We've got about 100TiB of free space atm, and quite a bit of capacity so might give some of these ideas a go!
Without 3 replicas in longhorn there will be service disruption during updates - so the option it seems maybe to do that and no replicas in minio - or the reverse and let minio deal with things - the issue would be though that during upgrades harvester wont know if its safe to stop them - at least if longhorn manages replication then the whole thing can in theory survive an update ... As longhorn knows when its ready then - think given that unless the pool is going to shut-down ahead of upgrade then the question really is can you cope with having to shut the service down until upgrades are compelte.
p
yeah. we have a weird setup, basically just single node harvesters and migrate VMs off then upgrade or generally reinstall. but it was because I borked some clusters early on and figured couldn't count on smooth let harvester handle it upgrades / recovery from failure anyhow. on the plus side no time spent troubleshooting etcd as our SPOF 😅
w
We are starting to find our way with harvester, usually some minor snags but the last update went relatively smooth and all the VM's migrated nicely. Recently we updated the nodes and doubled the ram (128GB per node) and using the maintenance mode the workloads shifted around with little or no drama.
p
yeah over time we'd like to move to 3 node clusters, but the question is always whats the plan when the upgrade goes off the happy path. need everything replicated/restorable quick to another 3 node cluster standing by
b
How are you cross-cluster migrating @prehistoric-morning-49258?
That's currently a painpoint for us.
w
Doesn't replace a backup plan.... hence our offsite storage ... usually if the upgrade doesn't happen you it'll just stop or hang until conditions allow it to move on. Most of the time what we've observed has been due to insufficient replicas on volumes. We have each of our vms documented in terms of setup and mirror the key data using a variety of techniques from minio to good old LVM snapshots with Rsync (great way to do it in my book). All the snap-shoting in harvester we've found really fussy to work with and its just quicker to setup fresh. For the K8's clusters we're using valerio, which seems to do a good job.
✅ 1
We've moving the bulk of our workloads to k8s so actually the VM use is minimal now.
p
we basically try to devops as much as possible so the VMs can be booted fresh from a config file. most of our data is Postgres, so cross-cluster is either pg logical replication / restore from S3 backups / or (experimentally) zfs, depending on situation/how broken things are
yeah our data isn't even that big but e.g. restoring a 150GB postgres DB from S3 takes too long to be viable except in real emergency. zfs seems like it can maybe fill that gap decently
w
Don't want to tempt fate but things pretty solid with what we have setup - both sites on 1GiB link, anything massive we can visit the site and hook up direct if needed and clone. We've been looking at postgres in HA mode on our cluster lately - long story... anyway I need to get back on it...
p
yeah. we have enough non-postgres data to need a solution, otherwise I think going all in on postgres replication would be viable