This message was deleted.
# harvester
a
This message was deleted.
b
I'm not sure, but.... I'll point this out: I would imagine that this depends on the ingress/egress traffic of your VMs and if they're on separate host interfaces for their VM networks, and if your management network is only running k8s/longhorn traffic. The next thing I would imagine affecting this is what your data replication/tolerance/datalocality is. After that, disk iops is going to make the biggest difference. Do the workloads have a RW or just read?
Longhorn is only required for the boot disks, but that doesn't mean you can't have other storage types for secondary data volumes.
With multiple network interfaces/switches I would think you could scale pretty large, but you might hit network saturation with only 1 interface/switch. It would probably really depend on the switch and network configs as well.
I can't tell you how many times I asked for a switch to keep traffic local, only to find out traffic was going all the way back to the router and back to the switch causing terrible bottlenecks.
b
100%. This is great feedback and I have certainly taken these factors into account. However, I was hoping that there may be some anecdotal reports of people’s cluster sizes and maybe even experiences with this
b
We have 9 nodes and aren't near network saturation, but have a second interface for the VM networks.
What scale are you considering?
b
Yeah, so we’re experimenting and would dedicate the longhorn (and admin/mgmt internal net) to 10GB switches w/ dual redudnant nics (trunked @ 2x10GB each node). I mean, we’re looking to deploy a 5 node network and want to do some calculations on how much utilization (and clients of specific workloads) a cluster like this could handle. I know I’m leaving a lot of information out here that would factor in, which is why I’m looking to the world to provide some anchoring. For instance, could you give me an idea of your rough aggregate public facing workload and the resultant metrics on your longhorn storage network? Or anything else that may help me do some calculations?
any data is helpful for me to start trying to make some rough calculations 😃
For example, in testing, on a 1GB network and 3 chunky nodes, we easily saturated the storage network just doing basic cluster creations and operations with 0 active traffic served.
but, that’s expected
b
I don't have access to those metrics, but with two nics, I think you're gonna be ok. But then again, how many Longhorn disks are you expecting? (how many VMs?)
because active read/writes are what's going to use up the bandwidth.
b
Yes, for sure. A lot of that ends up being dictated by client application workloads that we deploy to the K8s clusters whose nodes are ultimately provisioned on the harvester cluster.
b
Ah!
b
and how many disks are provisioned in which way as the storage pool on a longhorn cluster, etc.
b
And the rke clusters are all using the mgmt network?
b
yes
b
Well the RKE nodes will use more of the traffic too for things like etcd
I think that might be a better question for capacity
(also I have no idea what it is)
I'd provision a new cluster and compare traffic before and after to get an idea
b
For sure. I wonder if anyone else in this channel has any info or opinions to share?
m
Heyyy 👋 Harvester PM here at SUSE: We need to do some sort of benchmark and profiling, but it isn't something we actually have right now I'm afraid (at least from our side). It'd be really interesting to see who has done something similar, or, if you end up doing something yourself - if this is something we could potentially collaborate on.
b
Hey! That is great to hear! I would love to help, but I’m hoping to get some data first before committing to a spend which has a lot of unknowns. Collaboration sounds very interesting! I’d love to help.
The majority of the overhead on scaling within a harvester cluster would be related to longhorn storage, woudln’t it? I imagine the overhead of harvester itself isn’t too large. Is there data on that published anywhere?
m
Morning! From the Longhorn side?>
b
I imagine the overhead of harvester itself isn’t too large.
I'm pretty sure it's just whatever the traffic would be for a k3s cluster with x nodes + longhorn + (other downstream clusters or vms if they're using the mgmt network for their backend or network ) Otherwise you're looking for how much network traffic is generated per x amount of disk iops right?
b
Yep
m
Well you could also use a dedicated storage network to even further separate the longhorn traffic from the mgmt/k8s traffic and the inter-vm/rke2-guest-k8s-traffic by using 3 separate interfaces and also use a balance tlb or other bond type to even get more throughput, right?