We're currently using Longhorn for replicated stor...
# longhorn-storage
f
We're currently using Longhorn for replicated storage and TopoLVM for storage when replication isn't required (IE: The underlying database does its own replication. Kafka, ElasticSearch, Postgres cluster). A couple pain points with this setup are that we're using multiple CSIs, backup strategies are different, and occasionally it's convenient to re-home a workload to a new node (IE: Someone didn't setup anti-affinity rules when creating a workload) One of our engineers brought up the idea of using a storage class in Longhorn that uses 1 replica and best-effort or strict-local data-locality. Is this a pattern that others are using? Beyond standard "you only have 1 copy of data" are there any safety concerns with this pattern?
b
Strict local can cause issues with migrations timing out. I wouldn't call it a safety concern but it might make migrations take longer. If you're not doing live migrations it might not be a concern.
๐Ÿ‘ 1
f
TY; Have you run Longhorn volumes with replicas=1 before? or are you talking about strict-local in general regardless of replica count?
b
Only ran replicas=1 as a test, but strict-local in general and fewer replicas could make things more painful.
We keep our LH volumes pretty small as we saw that anything over ~100 Gi tended to really slow down migrations/upgrades on our hardware
So we typically do boot volumes only. With it being that small the 3x replicas weren't a big deal. We use ceph rbd for data volumes.
f
Interesting; Are you using Ceph-RBD via Rook? or something else?
b
I don't think I'd change that, because the replicas for longhorn isn't just insurance against data loss, but all sorts of network/hardware resiliency to keep the VMs running vs having to wait and spin up a new one. Yeah the percona cluster might keep the db in HA but we'd rather not deal with losing a node.
We're using an external ceph cluster
๐Ÿ‘ 1
We have different clusters for different classes of hardware depending on the need.
๐Ÿ‘ 1
f
For background, we're largely talking about container workloads and not VM workloads.
b
We use it for container workloads too, but that's even more of a reason to stay away from strict local imho.
but I guess it depends on your scheduling.
Our stuff is on-demand JupyterHub/DataScience workloads for students so speed to launch is big for us.
(outside of using longhorn for Harvester)
f
re: even more of a reason to stay away from stict-local... yeah; I suppose we'll probably need to experiment a bit
b
Doing some sort of network filesystem instead of longhorn might be an easier route, but I guess it depends on your iops requirements.
f
A lot of the underlying workloads may not work well with network filesystems. Plus we're running an HCI stack, so we're not going to lean on an external storage systems.
We're running k3s+longhorn on top of Proxmox-VE, so I've thought about trying the proxmox-csi.
It's nice to lean on our local NVMe drives for IOPS-heavy workload, but we're just starting to put more IOPS-heavy workloads on kubernetes and trying to grapple with it
Thinking aloud a bit, I'm slightly hesitant to lean on the proxmox-csi because I expect in the future we may invert the "k8s on VM" stack to be "VM on k8s"
f
> One of our engineers brought up the idea of using a storage class in Longhorn that uses 1 replica and best-effort or strict-local data-locality. Strict local will make it so that workload pod has to live on that same node as the replica. Pod cannot move best-effort Longhorn will move replica to the pod's node when pod moves. So there will be more rebuilding activity when workload move Both have benefit that replica is on same node as workload thus better performance IMO, you probably good to stick with TopoLVM when replication is not needed. Then use backup solution like velero or kasten to handle multiple CSI providers
f
IMO, you probably good to stick with TopoLVM when replication is not needed. Then use backup solution like velero or kasten to handle all CSI provider
Any reason in particular that you're thinking of? Disk throughput? Not a situation longhorn designs around?
If data migration from topolvm is a concern, we're only running it on a dev cluster to sort out for we do local storage. We can swap it out pretty easily at this point
f
> Any reason in particular that you're thinking of? Disk throughput? Not a situation longhorn designs around? If you don't need replication and your workload already distributed, TopoLVM would be simpler and as fast as native disk performance. Longhorn v2 will address the performance issue but v2 is not GA yet.
f
We're in a position where we might be able to hold out for a bit for performance gains while the v2 data engine becomes GA in order to use 2 less operators (topolvm+velero), but I'll take both options under consideration.
๐Ÿ‘ 1