This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

04/09/2024, 2:40 PM

This message was deleted.

bland-article-62755

04/09/2024, 7:27 PM

I'm not sure, but.... I'll point this out: I would imagine that this depends on the ingress/egress traffic of your VMs and if they're on separate host interfaces for their VM networks, and if your management network is only running k8s/longhorn traffic. The next thing I would imagine affecting this is what your data replication/tolerance/datalocality is. After that, disk iops is going to make the biggest difference. Do the workloads have a RW or just read?

bland-article-62755

04/09/2024, 7:28 PM

Longhorn is only required for the boot disks, but that doesn't mean you can't have other storage types for secondary data volumes.

bland-article-62755

04/09/2024, 7:29 PM

With multiple network interfaces/switches I would think you could scale pretty large, but you might hit network saturation with only 1 interface/switch. It would probably really depend on the switch and network configs as well.

bland-article-62755

04/09/2024, 7:30 PM

I can't tell you how many times I asked for a switch to keep traffic local, only to find out traffic was going all the way back to the router and back to the switch causing terrible bottlenecks.

bland-book-71921

04/10/2024, 2:11 PM

100%. This is great feedback and I have certainly taken these factors into account. However, I was hoping that there may be some anecdotal reports of people’s cluster sizes and maybe even experiences with this

bland-article-62755

04/10/2024, 2:14 PM

We have 9 nodes and aren't near network saturation, but have a second interface for the VM networks.

bland-article-62755

04/10/2024, 2:17 PM

What scale are you considering?

bland-book-71921

04/10/2024, 2:20 PM

Yeah, so we’re experimenting and would dedicate the longhorn (and admin/mgmt internal net) to 10GB switches w/ dual redudnant nics (trunked @ 2x10GB each node). I mean, we’re looking to deploy a 5 node network and want to do some calculations on how much utilization (and clients of specific workloads) a cluster like this could handle. I know I’m leaving a lot of information out here that would factor in, which is why I’m looking to the world to provide some anchoring. For instance, could you give me an idea of your rough aggregate public facing workload and the resultant metrics on your longhorn storage network? Or anything else that may help me do some calculations?

bland-book-71921

04/10/2024, 2:21 PM

any data is helpful for me to start trying to make some rough calculations 😃

bland-book-71921

04/10/2024, 2:21 PM

For example, in testing, on a 1GB network and 3 chunky nodes, we easily saturated the storage network just doing basic cluster creations and operations with 0 active traffic served.

bland-book-71921

04/10/2024, 2:21 PM

but, that’s expected

bland-article-62755

04/10/2024, 2:22 PM

I don't have access to those metrics, but with two nics, I think you're gonna be ok. But then again, how many Longhorn disks are you expecting? (how many VMs?)

bland-article-62755

04/10/2024, 2:23 PM

because active read/writes are what's going to use up the bandwidth.

bland-book-71921

04/10/2024, 2:24 PM

Yes, for sure. A lot of that ends up being dictated by client application workloads that we deploy to the K8s clusters whose nodes are ultimately provisioned on the harvester cluster.

bland-article-62755

04/10/2024, 2:24 PM

Ah!

bland-book-71921

04/10/2024, 2:24 PM

and how many disks are provisioned in which way as the storage pool on a longhorn cluster, etc.

bland-article-62755

04/10/2024, 2:24 PM

And the rke clusters are all using the mgmt network?

bland-book-71921

04/10/2024, 2:25 PM

yes

bland-article-62755

04/10/2024, 2:25 PM

Well the RKE nodes will use more of the traffic too for things like etcd

bland-article-62755

04/10/2024, 2:25 PM

I think that might be a better question for capacity

bland-article-62755

04/10/2024, 2:25 PM

(also I have no idea what it is)

bland-article-62755

04/10/2024, 2:26 PM

I'd provision a new cluster and compare traffic before and after to get an idea

bland-book-71921

04/10/2024, 2:45 PM

For sure. I wonder if anyone else in this channel has any info or opinions to share?

many-thailand-98649

04/11/2024, 4:26 AM

Heyyy 👋 Harvester PM here at SUSE: We need to do some sort of benchmark and profiling, but it isn't something we actually have right now I'm afraid (at least from our side). It'd be really interesting to see who has done something similar, or, if you end up doing something yourself - if this is something we could potentially collaborate on.

bland-book-71921

04/11/2024, 2:27 PM

Hey! That is great to hear! I would love to help, but I’m hoping to get some data first before committing to a spend which has a lot of unknowns. Collaboration sounds very interesting! I’d love to help.

bland-book-71921

04/11/2024, 4:17 PM

The majority of the overhead on scaling within a harvester cluster would be related to longhorn storage, woudln’t it? I imagine the overhead of harvester itself isn’t too large. Is there data on that published anywhere?

many-thailand-98649

04/11/2024, 8:56 PM

Morning! From the Longhorn side?>

bland-article-62755

04/11/2024, 9:01 PM

I imagine the overhead of harvester itself isn’t too large.

I'm pretty sure it's just whatever the traffic would be for a k3s cluster with x nodes + longhorn + (other downstream clusters or vms if they're using the mgmt network for their backend or network ) Otherwise you're looking for how much network traffic is generated per x amount of disk iops right?

bland-book-71921

04/11/2024, 9:30 PM

Yep

miniature-lock-53926

04/12/2024, 3:02 PM

Well you could also use a dedicated storage network to even further separate the longhorn traffic from the mgmt/k8s traffic and the inter-vm/rke2-guest-k8s-traffic by using 3 separate interfaces and also use a balance tlb or other bond type to even get more throughput, right?

2 Views

Open in Slack

Previous Next