Disk write latency with and without RAID

I have run 2 tests with different setup as explained below. I'm using Azure VM Standard D64s v3.

For the first setup I have mounted 4 Premium SSD disks of 4 TB each, and am performing IO operations with 1 thread writing to each disk. the FIO job is

[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=8k
numjobs=1

[writer1]
rw=randwrite
directory=/datadrive

[writer2]
rw=randwrite
directory=/datadrive2

[writer4]
rw=randwrite
directory=/datadrive4

[writer5]
rw=randwrite
directory=/datadrive5

The output is

writer1: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer2: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer4: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer5: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
fio-3.1
Starting 4 processes
writer1: Laying out IO file (1 file / 30720MiB)
writer2: Laying out IO file (1 file / 30720MiB)
writer4: Laying out IO file (1 file / 30720MiB)
writer5: Laying out IO file (1 file / 30720MiB)
Jobs: 1 (f=1): [w(1),_(3)][26.1%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 02m:58s]
writer1: (groupid=0, jobs=1): err= 0: pid=58194: Tue Feb  2 13:12:50 2021
  write: IOPS=6041, BW=47.2MiB/s (49.5MB/s)(2978MiB/63096msec)
    slat (usec): min=3, max=118643, avg=26.05, stdev=350.12
    clat (usec): min=3, max=32369k, avg=42333.21, stdev=659122.27
     lat (usec): min=46, max=32369k, avg=42359.46, stdev=659122.00
    clat percentiles (usec):
     |  1.00th=[      76],  5.00th=[     108], 10.00th=[     174],
     | 20.00th=[    1254], 30.00th=[    3392], 40.00th=[    5014],
     | 50.00th=[    6521], 60.00th=[    8848], 70.00th=[   13042],
     | 80.00th=[   20055], 90.00th=[   30540], 95.00th=[   39584],
     | 99.00th=[  120062], 99.50th=[  851444], 99.90th=[ 7012877],
     | 99.95th=[16575890], 99.99th=[17112761]
   bw (  KiB/s): min=   10, max=153154, per=25.59%, avg=50460.17, stdev=63267.92, samples=121
   iops        : min=    1, max=19144, avg=6307.31, stdev=7908.42, samples=121
  lat (usec)   : 4=0.01%, 20=0.01%, 50=0.02%, 100=4.10%, 250=8.44%
  lat (usec)   : 500=3.32%, 750=1.71%, 1000=1.33%
  lat (msec)   : 2=4.04%, 4=10.16%, 10=30.03%, 20=16.68%, 50=17.47%
  lat (msec)   : 100=1.60%, 250=0.44%, 500=0.07%, 750=0.07%, 1000=0.06%
  lat (msec)   : 2000=0.11%, >=2000=0.35%
  cpu          : usr=2.82%, sys=11.92%, ctx=32608, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,381220,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
writer2: (groupid=0, jobs=1): err= 0: pid=58195: Tue Feb  2 13:12:50 2021
  write: IOPS=6476, BW=50.6MiB/s (53.1MB/s)(3080MiB/60882msec)
    slat (usec): min=3, max=160087, avg=29.46, stdev=435.82
    clat (usec): min=15, max=36835k, avg=39494.66, stdev=876736.78
     lat (usec): min=57, max=36835k, avg=39524.33, stdev=876736.58
    clat percentiles (usec):
     |  1.00th=[     176],  5.00th=[    1647], 10.00th=[    3097],
     | 20.00th=[    4686], 30.00th=[    5669], 40.00th=[    6849],
     | 50.00th=[    8586], 60.00th=[   11338], 70.00th=[   16057],
     | 80.00th=[   23987], 90.00th=[   34341], 95.00th=[   43779],
     | 99.00th=[   88605], 99.50th=[  130548], 99.90th=[ 1400898],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=   10, max=155414, per=30.24%, avg=59631.85, stdev=67499.68, samples=106
   iops        : min=    1, max=19426, avg=7453.75, stdev=8437.28, samples=106
  lat (usec)   : 20=0.01%, 50=0.01%, 100=0.39%, 250=1.15%, 500=1.21%
  lat (usec)   : 750=0.55%, 1000=0.54%
  lat (msec)   : 2=1.70%, 4=9.61%, 10=40.86%, 20=19.14%, 50=21.81%
  lat (msec)   : 100=2.09%, 250=0.78%, 500=0.01%, 750=0.01%, 1000=0.03%
  lat (msec)   : 2000=0.04%, >=2000=0.09%
  cpu          : usr=2.84%, sys=12.67%, ctx=8384, majf=0, minf=15
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,394287,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
writer4: (groupid=0, jobs=1): err= 0: pid=58196: Tue Feb  2 13:12:50 2021
  write: IOPS=6522, BW=50.0MiB/s (53.4MB/s)(3110MiB/61022msec)
    slat (usec): min=3, max=291305, avg=20.22, stdev=486.94
    clat (nsec): min=1500, max=11376M, avg=39221241.22, stdev=371308305.44
     lat (usec): min=39, max=11376k, avg=39241.67, stdev=371308.58
    clat percentiles (usec):
     |  1.00th=[      53],  5.00th=[      67], 10.00th=[      77],
     | 20.00th=[      97], 30.00th=[     135], 40.00th=[     343],
     | 50.00th=[    1172], 60.00th=[    2442], 70.00th=[    4424],
     | 80.00th=[    7439], 90.00th=[   24249], 95.00th=[   36963],
     | 99.00th=[  943719], 99.50th=[ 2466251], 99.90th=[ 5939135],
     | 99.95th=[ 7079986], 99.99th=[11207181]
   bw (  KiB/s): min=   10, max=153250, per=26.70%, avg=52653.34, stdev=55397.61, samples=121
   iops        : min=    1, max=19156, avg=6581.43, stdev=6924.70, samples=121
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.58%
  lat (usec)   : 100=20.60%, 250=16.69%, 500=4.34%, 750=3.05%, 1000=2.67%
  lat (msec)   : 2=9.20%, 4=10.93%, 10=15.50%, 20=4.87%, 50=9.05%
  lat (msec)   : 100=0.87%, 250=0.21%, 500=0.22%, 750=0.13%, 1000=0.11%
  lat (msec)   : 2000=0.36%, >=2000=0.61%
  cpu          : usr=3.07%, sys=12.42%, ctx=88453, majf=0, minf=1392
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,398034,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
writer5: (groupid=0, jobs=1): err= 0: pid=58197: Tue Feb  2 13:12:50 2021
  write: IOPS=6347, BW=49.6MiB/s (51.0MB/s)(2984MiB/60181msec)
    slat (usec): min=3, max=623424, avg=25.05, stdev=1061.43
    clat (usec): min=38, max=36942k, avg=40305.49, stdev=896283.77
     lat (usec): min=54, max=36942k, avg=40330.71, stdev=896284.15
    clat percentiles (usec):
     |  1.00th=[      81],  5.00th=[     293], 10.00th=[    2057],
     | 20.00th=[    3556], 30.00th=[    4686], 40.00th=[    5473],
     | 50.00th=[    6652], 60.00th=[    9110], 70.00th=[   12911],
     | 80.00th=[   25822], 90.00th=[   35390], 95.00th=[   45351],
     | 99.00th=[  102237], 99.50th=[  164627], 99.90th=[ 2004878],
     | 99.95th=[17112761], 99.99th=[17112761]
   bw (  KiB/s): min=   16, max=153843, per=31.67%, avg=62458.55, stdev=67494.99, samples=98
   iops        : min=    2, max=19230, avg=7807.10, stdev=8436.72, samples=98
  lat (usec)   : 50=0.01%, 100=2.27%, 250=2.52%, 500=1.31%, 750=0.92%
  lat (usec)   : 1000=0.57%
  lat (msec)   : 2=2.22%, 4=13.76%, 10=39.16%, 20=13.02%, 50=20.89%
  lat (msec)   : 100=2.29%, 250=0.76%, 500=0.04%, 750=0.09%, 1000=0.02%
  lat (msec)   : 2000=0.05%, >=2000=0.10%
  cpu          : usr=2.42%, sys=10.44%, ctx=13976, majf=0, minf=704
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,381975,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

In the second run, I created a RAID 0 array of 4 4 TB disks and mounted it. Now I run the following job

[global]
size=30g
direct=1
iodepth=256
ioengine=libaio
bs=8k
numjobs=1

[writer1]
rw=randwrite
directory=/datadrive9

[writer2]
rw=randwrite
directory=/datadrive9

[writer4]
rw=randwrite
directory=/datadrive9

[writer5]
rw=randwrite
directory=/datadrive9

Output is

writer1: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer2: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer4: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
writer5: (g=0): rw=randwrite, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=256
fio-3.1
Starting 4 processes
writer2: Laying out IO file (1 file / 30720MiB)
writer4: Laying out IO file (1 file / 30720MiB)
writer5: Laying out IO file (1 file / 30720MiB)
Jobs: 4 (f=4): [w(4)][10.4%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 09m:43s]
writer1: (groupid=0, jobs=1): err= 0: pid=22209: Tue Feb  2 13:53:40 2021
  write: IOPS=6011, BW=46.0MiB/s (49.2MB/s)(3210MiB/68338msec)
    slat (usec): min=4, max=19116, avg=10.13, stdev=52.29
    clat (usec): min=21, max=13267k, avg=42549.89, stdev=517341.32
     lat (usec): min=42, max=13267k, avg=42560.22, stdev=517341.29
    clat percentiles (usec):
     |  1.00th=[      69],  5.00th=[     103], 10.00th=[     223],
     | 20.00th=[     578], 30.00th=[     873], 40.00th=[    1303],
     | 50.00th=[    1680], 60.00th=[    2147], 70.00th=[    2737],
     | 80.00th=[    4490], 90.00th=[   15270], 95.00th=[   27919],
     | 99.00th=[  333448], 99.50th=[ 2038432], 99.90th=[ 9328133],
     | 99.95th=[11744052], 99.99th=[12549358]
   bw (  KiB/s): min=   16, max=347020, per=35.88%, avg=85557.71, stdev=105200.55, samples=77
   iops        : min=    2, max=43377, avg=10694.30, stdev=13150.00, samples=77
  lat (usec)   : 50=0.07%, 100=4.68%, 250=5.81%, 500=6.68%, 750=8.72%
  lat (usec)   : 1000=7.22%
  lat (msec)   : 2=23.63%, 4=21.15%, 10=9.07%, 20=4.43%, 50=6.50%
  lat (msec)   : 100=0.52%, 250=0.37%, 500=0.33%, 750=0.12%, 1000=0.10%
  lat (msec)   : 2000=0.10%, >=2000=0.50%
  cpu          : usr=2.32%, sys=7.07%, ctx=47306, majf=0, minf=699
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,410837,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
writer2: (groupid=0, jobs=1): err= 0: pid=22210: Tue Feb  2 13:53:40 2021
  write: IOPS=6393, BW=49.9MiB/s (52.4MB/s)(3412MiB/68310msec)
    slat (usec): min=3, max=325702, avg=31.82, stdev=630.62
    clat (usec): min=32, max=13313k, avg=40001.34, stdev=469309.55
     lat (usec): min=54, max=13313k, avg=40033.41, stdev=469313.55
    clat percentiles (usec):
     |  1.00th=[     100],  5.00th=[     196], 10.00th=[     453],
     | 20.00th=[    1045], 30.00th=[    1549], 40.00th=[    2180],
     | 50.00th=[    3195], 60.00th=[    4686], 70.00th=[    7242],
     | 80.00th=[   11863], 90.00th=[   24249], 95.00th=[   35390],
     | 99.00th=[  156238], 99.50th=[ 2105541], 99.90th=[ 9462350],
     | 99.95th=[ 9865004], 99.99th=[13086229]
   bw (  KiB/s): min=   16, max=243815, per=31.19%, avg=74378.34, stdev=68597.51, samples=94
   iops        : min=    2, max=30476, avg=9297.04, stdev=8574.66, samples=94
  lat (usec)   : 50=0.01%, 100=1.02%, 250=5.44%, 500=4.23%, 750=3.57%
  lat (usec)   : 1000=4.77%
  lat (msec)   : 2=18.39%, 4=18.58%, 10=20.83%, 20=10.44%, 50=10.46%
  lat (msec)   : 100=1.01%, 250=0.39%, 500=0.14%, 750=0.05%, 1000=0.05%
  lat (msec)   : 2000=0.12%, >=2000=0.51%
  cpu          : usr=3.29%, sys=16.53%, ctx=25760, majf=0, minf=705
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,436710,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
writer4: (groupid=0, jobs=1): err= 0: pid=22211: Tue Feb  2 13:53:40 2021
  write: IOPS=8253, BW=64.5MiB/s (67.6MB/s)(4405MiB/68309msec)
    slat (usec): min=3, max=984211, avg=29.89, stdev=1823.60
    clat (usec): min=39, max=13365k, avg=30969.61, stdev=406333.66
     lat (usec): min=60, max=13365k, avg=30999.67, stdev=406337.80
    clat percentiles (usec):
     |  1.00th=[      94],  5.00th=[     196], 10.00th=[     457],
     | 20.00th=[    1254], 30.00th=[    1860], 40.00th=[    2376],
     | 50.00th=[    2868], 60.00th=[    3425], 70.00th=[    4359],
     | 80.00th=[    6718], 90.00th=[   17957], 95.00th=[   28967],
     | 99.00th=[  154141], 99.50th=[  994051], 99.90th=[ 8355054],
     | 99.95th=[ 9328133], 99.99th=[13086229]
   bw (  KiB/s): min=   16, max=325328, per=39.05%, avg=93104.42, stdev=94687.86, samples=97
   iops        : min=    2, max=40666, avg=11637.75, stdev=11835.95, samples=97
  lat (usec)   : 50=0.01%, 100=1.36%, 250=5.02%, 500=4.31%, 750=3.65%
  lat (usec)   : 1000=2.97%
  lat (msec)   : 2=15.06%, 4=34.67%, 10=17.73%, 20=6.39%, 50=7.02%
  lat (msec)   : 100=0.64%, 250=0.33%, 500=0.17%, 750=0.07%, 1000=0.11%
  lat (msec)   : 2000=0.14%, >=2000=0.34%
  cpu          : usr=3.06%, sys=17.32%, ctx=27740, majf=0, minf=700
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,563804,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256
writer5: (groupid=0, jobs=1): err= 0: pid=22212: Tue Feb  2 13:53:40 2021
  write: IOPS=9155, BW=71.5MiB/s (75.0MB/s)(4886MiB/68308msec)
    slat (usec): min=3, max=1253.7k, avg=33.07, stdev=2124.81
    clat (usec): min=41, max=13312k, avg=27912.10, stdev=371043.54
     lat (usec): min=56, max=13312k, avg=27945.40, stdev=371050.01
    clat percentiles (usec):
     |  1.00th=[     116],  5.00th=[     302], 10.00th=[     619],
     | 20.00th=[    1680], 30.00th=[    2769], 40.00th=[    3851],
     | 50.00th=[    4752], 60.00th=[    5997], 70.00th=[    7963],
     | 80.00th=[   11076], 90.00th=[   21627], 95.00th=[   33817],
     | 99.00th=[  164627], 99.50th=[  497026], 99.90th=[ 7616857],
     | 99.95th=[ 9596568], 99.99th=[13086229]
   bw (  KiB/s): min=   16, max=363341, per=36.27%, avg=86484.03, stdev=82997.59, samples=116
   iops        : min=    2, max=45417, avg=10810.06, stdev=10374.71, samples=116
  lat (usec)   : 50=0.01%, 100=0.62%, 250=3.40%, 500=4.25%, 750=3.26%
  lat (usec)   : 1000=2.37%
  lat (msec)   : 2=8.40%, 4=19.23%, 10=35.58%, 20=11.98%, 50=8.25%
  lat (msec)   : 100=1.27%, 250=0.64%, 500=0.24%, 750=0.06%, 1000=0.07%
  lat (msec)   : 2000=0.16%, >=2000=0.21%
  cpu          : usr=3.77%, sys=21.57%, ctx=26577, majf=0, minf=17
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,625402,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=256

Run status group 0 (all jobs):
  WRITE: bw=233MiB/s (244MB/s), 46.0MiB/s-71.5MiB/s (49.2MB/s-75.0MB/s), io=15.5GiB (16.7GB), run=68308-68338msec

Disk stats (read/write):
    md0: ios=0/2036771, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/509201, aggrmerge=0/0, aggrticks=0/14988260, aggrin_queue=14611762, aggrutil=76.96%
  sdm: ios=0/508620, merge=0/0, ticks=0/16830518, in_queue=16437572, util=75.02%
  sdl: ios=0/508786, merge=0/0, ticks=0/13055450, in_queue=12693024, util=76.96%
  sdk: ios=0/508787, merge=0/0, ticks=0/14998518, in_queue=14628168, util=75.91%
  sdc: ios=0/510612, merge=0/0, ticks=0/15068554, in_queue=14688284, util=74.71%

I am noticing difference in latency distribution in these two runs. In non-RAID setup( first run), the latencies 4=10.16%, 10=30.03%, 20=16.68%, 50=17.47%

and in second run - RAID setup

2=15.06%, 4=34.67%, 10=17.73%, 20=6.39%, 50=7.02%

Noticing that distribution is higher with lower latencies with the RAID setup.

Why does this difference exist?

Sorry for the long question. I'm hoping somebody with knowledge how disk writes work can help with this question.


Solution 1:

You inject extra layer into storage stack and it brings extra processing latency. TL;DR: Nothing unexpected. If you’ll do queue_depth=1 you’ll get a better understanding of how much you lose.