This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

02/28/2024, 10:16 AM

This message was deleted.

✅ 1

icy-agency-38675

02/29/2024, 2:36 AM

Do you want to use v1 volumes or try v2volumes?

cool-architect-86201

02/29/2024, 3:00 AM

I installed Longhorn through Chart and enabled v2. However, the Longhorn UI displays v1 version(I think it is UI bug), which confuses me whether it's v1 or v2. Yesterday, I ran performance tests and found that Longhorn's performance was hardly affected, so I assume it's using v2. @icy-agency-38675

icy-agency-38675

02/29/2024, 3:00 AM

Can you provide a support bundle?

cool-architect-86201

02/29/2024, 3:05 AM

Sure, BTW I would like to ask is that our business often involves disk expansion. Assuming that the physical disks of machine A are already full, what should I do? Can you quickly clone a copy of PVC from A onto machine B, assuming machine B has a larger disk We plan to use V2 in the production system, deploying a large number of blockchain nodes and designing for storage growth. Due to our fear of PVC fragments, we plan to set raid0 and replicas to 1. Do you have any good plans for this?

supportbundle_5ecc59ce-ac18-444f-91dd-c64ca5f24be2_2024-02-29T03-01-49Z.zip

icy-agency-38675

02/29/2024, 3:09 AM

Can you quickly clone a copy of PVC from A onto machine B, assuming machine B has a larger disk

Currently, no.

BTW I would like to ask is that our business often involves disk expansion. Assuming that the physical disks of machine A are already full, what should I do?

What's your disk? A lvm disk?

icy-agency-38675

02/29/2024, 3:10 AM

We plan to use V2 in the production system, deploying a large number of blockchain nodes and designing for storage growth.

Currently, v2 volume is an alpha version for testing, so please use it in a production system.

Due to our fear of PVC fragments, we plan to set raid0 and replicas to 1.

Can you elaborate more on the PVC fragments?

cool-architect-86201

02/29/2024, 3:14 AM

I conducted performance testing yesterday. The first was the operating system disk, the second was Longhorn v1.5.4 v1 Engine, and the third was Longhorn 1.6.0 V2 Engine.

Copy code

# bare disk on physical machine
 fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=50G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 51200MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=480MiB/s,w=159MiB/s][r=123k,w=40.7k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=634423: Wed Feb 28 05:40:15 2024
  read: IOPS=120k, BW=469MiB/s (492MB/s)(37.5GiB/81838msec)
   bw (  KiB/s): min=452280, max=500440, per=100.00%, avg=480917.99, stdev=10057.50, samples=163
   iops        : min=113070, max=125110, avg=120229.50, stdev=2514.35, samples=163
  write: IOPS=40.0k, BW=156MiB/s (164MB/s)(12.5GiB/81838msec); 0 zone resets
   bw (  KiB/s): min=150824, max=166472, per=100.00%, avg=160266.85, stdev=3351.87, samples=163
   iops        : min=37706, max=41618, avg=40066.72, stdev=837.96, samples=163
  cpu          : usr=10.71%, sys=44.15%, ctx=4542828, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=9830837,3276363,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=469MiB/s (492MB/s), 469MiB/s-469MiB/s (492MB/s-492MB/s), io=37.5GiB (40.3GB), run=81838-81838msec
  WRITE: bw=156MiB/s (164MB/s), 156MiB/s-156MiB/s (164MB/s-164MB/s), io=12.5GiB (13.4GB), run=81838-81838msec

Disk stats (read/write):
  nvme0n1: ios=9808669/3269628, merge=0/257, ticks=2715615/1742618, in_queue=4458234, util=99.96%

fio pod( v1 longhorn(v1.5.4) engine)

Copy code

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=50G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 51200MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=123MiB/s,w=41.5MiB/s][r=31.6k,w=10.6k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=68: Wed Feb 28 05:42:58 2024
  read: IOPS=31.3k, BW=122MiB/s (128MB/s)(37.5GiB/314319msec)
   bw (  KiB/s): min=104320, max=151968, per=100.00%, avg=125155.71, stdev=9316.09, samples=628
   iops        : min=26080, max=37992, avg=31288.89, stdev=2329.01, samples=628
  write: IOPS=10.4k, BW=40.7MiB/s (42.7MB/s)(12.5GiB/314319msec); 0 zone resets
   bw (  KiB/s): min=35104, max=50208, per=100.00%, avg=41711.13, stdev=3125.74, samples=628
   iops        : min= 8776, max=12552, avg=10427.74, stdev=781.43, samples=628
  cpu          : usr=5.96%, sys=26.34%, ctx=12104184, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=9830837,3276363,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=122MiB/s (128MB/s), 122MiB/s-122MiB/s (128MB/s-128MB/s), io=37.5GiB (40.3GB), run=314319-314319msec
  WRITE: bw=40.7MiB/s (42.7MB/s), 40.7MiB/s-40.7MiB/s (42.7MB/s-42.7MB/s), io=12.5GiB (13.4GB), run=314319-314319msec

Disk stats (read/write):
  sda: ios=9819399/3273370, merge=3273/463, ticks=15022532/4955509, in_queue=19978041, util=100.00%

pod v2 longhorn (v1.6.0)

Copy code

kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@fio-test-pod:/# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=50G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.28
Starting 1 process
test: Laying out IO file (1 file / 51200MiB)

Jobs: 1 (f=1): [m(1)][100.0%][r=490MiB/s,w=164MiB/s][r=125k,w=42.0k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=78: Wed Feb 28 08:32:26 2024
  read: IOPS=119k, BW=465MiB/s (488MB/s)(37.5GiB/82504msec)
   bw (  KiB/s): min=436960, max=507144, per=100.00%, avg=476690.24, stdev=13328.70, samples=164
   iops        : min=109240, max=126786, avg=119172.60, stdev=3332.15, samples=164
  write: IOPS=39.7k, BW=155MiB/s (163MB/s)(12.5GiB/82504msec); 0 zone resets
   bw (  KiB/s): min=146000, max=169992, per=100.00%, avg=158864.34, stdev=4360.70, samples=164
   iops        : min=36500, max=42498, avg=39716.10, stdev=1090.17, samples=164
  cpu          : usr=9.80%, sys=43.71%, ctx=4394736, majf=0, minf=6
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=9830837,3276363,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: bw=465MiB/s (488MB/s), 465MiB/s-465MiB/s (488MB/s-488MB/s), io=37.5GiB (40.3GB), run=82504-82504msec
  WRITE: bw=155MiB/s (163MB/s), 155MiB/s-155MiB/s (163MB/s-163MB/s), io=12.5GiB (13.4GB), run=82504-82504msec

icy-agency-38675

02/29/2024, 3:15 AM

Looks good!

cool-architect-86201

02/29/2024, 3:17 AM

I haven't decided which approach to use yet. Perhaps LVM is a good option, or maybe you have a better suggestion? The issue with PVC fragmentation is as follows: Suppose I introduce two disks via Block, each being 3.5TB. However, I cannot directly request a disk larger than 3.5TB, right? Because it cannot be stored separately on two paths. Yesterday, I tested the filesystem approach, and indeed, it's like this. This means that as my pod's data volume continues to grow, although I can see plenty of disk space through the UI, I cannot expand it. This is also why I want to set up RAID 0, to treat disk capacity as a whole for utilization. Thank you for your patient explanation.

icy-agency-38675

02/29/2024, 3:18 AM

Data engine the the volume

pvc-d870fe3c-e330-4ce6-a1cc-056e95d1bc10

v1

Copy code

name: pvc-d870fe3c-e330-4ce6-a1cc-056e95d1bc10
    namespace: longhorn-system
    resourceVersion: "495815"
    uid: 53f07169-334f-43de-852d-5b5d5523701c
  spec:
    Standby: false
    accessMode: rwo
    backendStoreDriver: "null"
    backingImage: "null"
    backupCompressionMethod: lz4
    dataEngine: v1

You need to create a storageclass for v2 volumes doc longhorn.io/docs/1.6.0/v2-data-engine/quick-start/#create-a-storageclass

👍 1

icy-agency-38675

02/29/2024, 3:20 AM

> I haven't decided which approach to use yet. Perhaps LVM is a good option, or maybe you have a better suggestion? LLM might be a good option for adjusting the size of underlying disk. If the disk is full, you can adjust it easily. (This really deps on your plan of the underlying disks) If you really need to expand the disks, you can set the node to maintenance node. longhorn.io/docs/1.6.0/maintenance/maintenance/

icy-agency-38675

02/29/2024, 3:21 AM

The issue with PVC fragmentation is as follows: Suppose I introduce two disks via Block, each being 3.5TB. However, I cannot directly request a disk larger than 3.5TB, right?

Yes

This is also why I want to set up RAID 0, to treat disk capacity as a whole for utilization.

Thank you for your patient explanation.

Yes, you're right. Using RAID 0 is good for your use case.

👍 1

cool-architect-86201

02/29/2024, 3:26 AM

Thank you for your answer, it's been very helpful. I'll go ahead and conduct the tests now. I noticed that Longhorn mentioned not to use V2 for production systems, but I believe the performance improvement is quite significant. Could you please tell me what I need to be aware of if I use the V2 engine in production? @icy-agency-38675

icy-agency-38675

02/29/2024, 3:29 AM

You can check the roadmap, but it might be updated over time according to our development progress.

🙏 1

cool-architect-86201

02/29/2024, 8:24 AM

hi @icy-agency-38675 I just test and try to add LVM blocks to Longhorn, but it failed, indicating that it couldn't add reverse blocks. Then, I attempted to create RAID0 and added

/dev/md0

to the Longhorn UI, and it was recognized. Does this mean Longhorn v2 doesn't support LVM? However, a new issue emerged: kubelet is reporting mount timeouts. There are no error events for PV or PVC.

icy-agency-38675

02/29/2024, 8:37 AM

Can you check the nvme_tcp and uio drive on the worker nodes lsmod | grep nvme lsmod | grep uio

cool-architect-86201

02/29/2024, 8:39 AM

They all existed

icy-agency-38675

02/29/2024, 8:40 AM

nvme_tcp and uio_pci_generic are missing. Please run the commands on all worker nodes

Copy code

modprobe nvme-tcp
modprobe uio_pci_generic

cool-architect-86201

02/29/2024, 9:13 AM

Sry @icy-agency-38675 : (, My DevOps script encountered an issue. When I try lvm block device, I got this error in longhorn UI. admission webhook "validator.longhorn.io" denied the request: disk disk-3 type block is not supported to reserve storage . In Figure 3, this is the benchmark I ran. May I ask if it's possible to use LVM in the Longhorn v2 engine?

icy-agency-38675

02/29/2024, 9:14 AM

Yeah, disk from a lvm is ok

cool-architect-86201

02/29/2024, 9:37 AM

I tried again, but it still doesn't work. Can you help me see where the problem is? @icy-agency-38675

icy-agency-38675

02/29/2024, 9:39 AM

Can you generate a support bundle? I will check it later.

cool-architect-86201

02/29/2024, 9:47 AM

Oh, I've found the issue. By default,

/dev/mapper/xxx

is a symbolic link that points to

/dev/dm-0

. When I entered this address in Longhorn, it was recognized correctly. I've updated the benchmark, and it seems there's a certain decline in performance. Are there any optimization parameters that can be adjusted? All of my machines have 10Gbps LAN cards. Longhorn is cool. Anyway, thank you very much for taking the time to help me.

🙌 1

icy-agency-38675

02/29/2024, 10:53 AM

You're welcome. Feel free to let us know anything we can help

👍 1

cool-architect-86201

03/01/2024, 11:10 AM

Hi @icy-agency-38675 ,We are a company specializing in blockchain solutions, and we are currently discussing the decision to incorporate Longhorn into our production services. Currently, we are using RKE2 Rancher and Longhorn to manage over a dozen machines running more than 30 different blockchain public chains, with a total data size of nearly 100TB. I would like to inquire whether Longhorn has enterprise support available, even if it is on a paid basis. We are particularly concerned about stability since we are not a company specialized in storage technologies, especially considering that we are directly utilizing the V2 engine. Thank you for your assistance.

icy-agency-38675

03/01/2024, 11:14 AM

@cool-architect-86201 Thanks for asking. I think @salmon-doctor-9726 will help answer this question later 🙂

cool-architect-86201

03/05/2024, 4:00 AM

Hi @icy-agency-38675 , Because V2 doesn't support expansion, I have already used the V1 version of the engine. Does the V1 version not support block devices? When I try to add /dev/md-0, it prompts me that SPDK is not installed. Therefore, does the V1 version not support block devices?

icy-agency-38675

03/05/2024, 4:07 AM

Hello v1 engine doesn't support adding block device.

👌 1

16 Views

Open in Slack

Previous Next