This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

07/11/2023, 5:27 PM

This message was deleted.

icy-agency-38675

07/12/2023, 12:46 AM

It doesn’t matter. I think creating a raid device using these disks is good idea. For the io path algorithm, you can refer to https://longhorn.io/.

➕ 1

full-train-34126

07/13/2023, 6:02 AM

Thanks for the reply. Is that explained anywhere on the documentation? RAID particularly feels wrong. Longhorn conceptually to me looks a bit like a mix between Ceph and GlusterFS. You don't mix hardware RAID and Ceph. I'd really like to understand how that would all interact so I can optimise performance??

icy-agency-38675

07/13/2023, 7:34 AM

RAID particularly feels wrong.

Why do you think it is wrong?

Longhorn conceptually to me looks a bit like a mix between Ceph and GlusterFS.

Can you elaborate more on why you think LH is a mix Ceph and GlusterFS. I would recommend reading the https://longhorn.io/docs/1.5.0/concepts/ first and see how the data path work.

icy-agency-38675

07/13/2023, 7:35 AM

is it more performant to have many smaller disks,

Using raid can definitely increase the throughput. Or, do you mean latency or IOPS?

high-butcher-71851

07/13/2023, 3:06 PM

ChatGPT says

Copy code

When it comes to the performance of Longhorn storage algorithms, the number and size of disks can have an impact, but it ultimately depends on several factors. Longhorn is a distributed block storage solution, and its performance is influenced by various elements, such as the workload characteristics, I/O patterns, disk types, network infrastructure, and hardware capabilities.

In Longhorn, data is striped across multiple disks to improve performance and redundancy. When determining whether it is more performant to have many smaller disks or fewer larger disks, you should consider the following points:

    Disk I/O Parallelism: With many smaller disks, Longhorn can distribute I/O operations across a larger number of disks, potentially allowing for better parallelism and higher overall throughput. This can be advantageous for workloads that benefit from parallelism, such as scenarios with high concurrency or random I/O patterns.

    Disk Seek Time: Smaller disks typically have shorter seek times compared to larger disks. In scenarios where latency is critical, such as workloads with frequent small random reads, many smaller disks might offer better performance due to reduced seek times.

    Aggregated Bandwidth: Fewer larger disks can potentially provide higher aggregated bandwidth compared to many smaller disks. This can be beneficial for workloads that are more sequential in nature or require sustained high throughput. However, keep in mind that other factors like network bandwidth can also affect the overall performance.

    Resource Utilization: Having many smaller disks might lead to improved resource utilization since you can distribute the data and I/O load across multiple drives. It can also offer better fault tolerance, as the loss of a single small disk would impact a smaller portion of the data compared to a larger disk.

It's important to note that the performance difference between many smaller disks and fewer larger disks might not be substantial in all cases. It is recommended to benchmark and test your specific workload using different disk configurations to determine the optimal setup for your needs.

Additionally, Longhorn provides various storage optimizations, such as caching, snapshotting, and replication, which can impact performance as well. Considering the overall system architecture and workload characteristics will help in making an informed decision regarding the number and size of disks to achieve the desired performance goals.

icy-agency-38675

07/13/2023, 3:16 PM

Copy code

In Longhorn, data is striped across multiple disks to improve performance and redundancy.

The statement from ChatGPT is not correct.

👍 1

high-butcher-71851

07/13/2023, 3:22 PM

Good to know. I told it it was incorrect and now it says

Copy code

Longhorn uses a distributed block storage approach, but it does not employ data striping techniques. Instead, it creates replicas of the data across multiple disks to ensure redundancy and data availability.

Each replica of a Longhorn volume is stored on a separate disk, and the data is synchronously replicated to ensure data durability and fault tolerance. This means that the number of disks in Longhorn primarily affects the level of redundancy and fault tolerance rather than performance.

To improve performance in Longhorn, you can consider factors such as the disk type (SSD vs. HDD), network infrastructure, and workload characteristics. Longhorn provides options for configuring disk-specific parameters, such as disk scheduler and caching mechanisms, to optimize performance based on your specific requirements.

full-train-34126

07/13/2023, 4:44 PM

Thanks all (including ChatGPT 😁 ). So in my head where I have no space for new stuff, and just map new stuff to existing stuff I know, the longhorn engine per disk reminded me of ceph osds, but then the replication is of the whole volume across multiple disks which is what GlusterFS does. That's where my analogy came from. But sounds like the bottom line is it depends?? We're trying out longhorn with Harvester, so was trying to get my head around the best disk setup for a VM based workload.

icy-agency-38675

07/14/2023, 7:16 AM

I see.

We’re trying out longhorn with Harvester, so was trying to get my head around the best disk setup for a VM based workload.

@salmon-city-57654 Do you has best practice for harvester with LH?

salmon-city-57654

07/14/2023, 4:39 PM

Hi @full-train-34126, I check the above discussion. I thought the main problem was using individual or raid disks, right? There are some points you could refer to • using raid disks would get better throughput > Longhorn does not stripe data across multiple disks, so using raid disk will benefit • rebuild impact if using raid disk > raid rebuild will cause both performance degradation and potential risk to data safety • replica number could be set to 2 > We could only use the 2 replicas because raid provides data protection There is a related GH issue here: [Doc] On harvester nodes, do we add muitlple disks by “Add Disk” or do we add a Raided single disk? I saw your scenario would be VM based workload. For most cases, throughput would benefit user experience. But if your VM is running for DB or some workload needs high IOPS or low latency. That is another story. Would you mind sharing which workload you will run on VMs? Thanks @icy-agency-38675 for reminding me of this topic.

full-train-34126

07/14/2023, 5:10 PM

Thanks @salmon-city-57654. Ok, so we're evaluating Harvester vs vSphere to find a potential replacement for edge/small hypervisor use cases. The actual applications will be a mixed bag, some web servers, some databases, just general purpose VM servers. I've read the 'Architecture and Concepts' soany times, and still trying to get my head around the consequences of RAID vs individual disks... So if I use RAID, it handles the striping of data, and from longhorns perspective it just puts the replicas for that node on "one" big disk. But, if a disk in RAID dies longhorn will be suddenly missing blocks of data for certain replicas/volumes which will become degraded?? Doesn't that mean potentially with stripped data on a RAID disk all volumes on that node would be degraded requiring rebuild? Then k8s would migrate all my VMs to another node where the volumes were healthy? If I have multiple disks, does longhorn spread volumes across the disks in any way? As in hypothetically I have 3 disks and 3 VMs on a node with a replica/volume on each disk. Then when I lose a disk I lose one replicas/volumes and only one VM gets migrated off rather than all three?

full-train-34126

07/14/2023, 8:08 PM

Just rereading my question, I think I confused myself and got that wrong. What mean is, with RAID in a degraded state, longhorn isn't aware of the underlying disks, and will keep using the volume on the degraded raid potentially affecting performance. Whereas with individual disks, when one fails, longhorn detects volumes that are now degraded and automatically moves the VMs to a node with a healthy replica. Is that correct? So in a way RAID might give me better throughput, but if a disk fails the VMs will perform worse until I rebuild the RAID. Whereas single disks the mechanisms built into longhorn/harvester will just relocate VMs and volumes to a healthy node with less interruption of service and less performance penalty?

salmon-city-57654

07/17/2023, 2:57 PM

Hi @full-train-34126, Some descriptions are correct, I will try to give some opinions for you. Hope you can get some from it. Yap, the rebuild impact I mentioned is more like your description as second. The RAID is in a degraded state, for the Longhorn, it is still healthy because the system still could do normal operations for the exposed device. But will have the potential risk for performance. Also, in my opinion, I would not recommend a huge raid disk instead multiple small raid disks (depending on your devices) because we do not take much rebuild risk and also get some benefit from the raid (means stripe and throughput). For the individual disk configuration, IIRC, longhorn (actually, VM is controlled by Kubevirt) would not automatically move the VMs to a node with a healthy replica. The rebuild (longhorn’s) will occur, but the volume VM used is the exposed iSCSI volume, so only one replica failure does not make this volume stale. You need to take care of the performance downgrade during the rebuild (we do not have a detailed test). TL;DR, you can think more about your hardware devices. To get the balance between performance and recovery (means rebuild). There is no absolute answer for every case but a suitable configuration for your application. If you split your application for different IO patterns, using StorageClass with various disks would be more flexible. Thanks!

full-train-34126

07/19/2023, 5:30 PM

Apologies @salmon-city-57654, didn't get a notification for your response. Thank you very much for taking the time to explain in such detail. That makes a lot of sense. Much appreciated.

🙌 1

salmon-city-57654

07/20/2023, 1:53 AM

@full-train-34126 NP! If you have other problems feel free to ask on Longhorn/Harvester channel.

243 Views

Open in Slack

Previous Next