HP DL380 G7 + Smart Array P410i + sysbench -> poor raid 10 performance
I have running system with low IO utilization:
- HP DL380G7 ( 24gb RAM )
- Smart Array p410i with 512mb battary backed write cache
- 6x SAS 10k rpm 146gb drives in RAID10
- Debian Squeze linux, ext4 + LVM, hpacucli installed
iostat (cciss/c0d1 = raid10 array, dm-7 = 60G lvm partition for test):
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0,00 101,20 0,00 6,20 0,00 0,42 138,58 0,00 0,00 0,00 0,00 cciss/c0d1 0,00 395,20 3,20 130,20 0,18 2,05 34,29 0,04 0,26 0,16 2,08 dm-0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-2 0,00 0,00 3,20 391,00 0,18 1,53 8,87 0,04 0,11 0,05 1,84 dm-3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-4 0,00 0,00 0,00 106,80 0,00 0,42 8,00 0,00 0,00 0,00 0,00 dm-5 0,00 0,00 0,00 0,60 0,00 0,00 8,00 0,00 0,00 0,00 0,00 dm-6 0,00 0,00 0,00 2,80 0,00 0,01 8,00 0,00 0,00 0,00 0,00 dm-1 0,00 0,00 0,00 132,00 0,00 0,52 8,00 0,00 0,02 0,01 0,16 dm-7 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
hpacucli "ctrl all show config"
Smart Array P410i in Slot 0 (Embedded) (sn: 5001438011FF14E0) array A (SAS, Unused Space: 0 MB) logicaldrive 1 (136.7 GB, RAID 1, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) array B (SAS, Unused Space: 0 MB) logicaldrive 2 (410.1 GB, RAID 1+0, OK) physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK) physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK) physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK) physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, OK) physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK) physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK) SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 5001438011FF14EF)
hpacucli "ctrl all show status"
Smart Array P410i in Slot 0 (Embedded) Controller Status: OK Cache Status: OK Battery/Capacitor Status: OK
Sysbench command
sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=128 --file-block-size=4K --file-total-size=54G --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off run --max-requests=30000
Sysbench results
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 16 Initializing random number generator from timer. Extra file open flags: 0 128 files, 432Mb each 54Gb total file size Block size 4Kb Number of random requests for random IO: 30000 Read/Write ratio for combined random IO test: 1.50 Using synchronous I/O mode Doing random read test Threads started! Done. Operations performed: 30000 Read, 0 Write, 0 Other = 30000 Total Read 117.19Mb Written 0b Total transferred 117.19Mb (935.71Kb/sec) 233.93 Requests/sec executed Test execution summary: total time: 128.2455s total number of events: 30000 total time taken by event execution: 2051.5525 per-request statistics: min: 0.00ms avg: 68.39ms max: 2010.15ms approx. 95 percentile: 660.40ms Threads fairness: events (avg/stddev): 1875.0000/111.75 execution time (avg/stddev): 128.2220/0.02
iostat during test
avg-cpu: %user %nice %system %iowait %steal %idle 0,00 0,01 0,10 31,03 0,00 68,86 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0,00 0,10 0,00 0,60 0,00 0,00 9,33 0,00 0,00 0,00 0,00 cciss/c0d1 0,00 46,30 208,50 1,30 0,82 0,10 8,99 29,03 119,75 4,77 100,00 dm-0 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-2 0,00 0,00 0,00 51,60 0,00 0,20 8,00 49,72 877,26 19,38 100,00 dm-3 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-4 0,00 0,00 0,00 0,70 0,00 0,00 8,00 0,00 0,00 0,00 0,00 dm-5 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-6 0,00 0,00 0,00 0,00 0,00 0,00 0,00 7,00 0,00 0,00 100,00 dm-1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 7,00 0,00 0,00 100,00 dm-7 0,00 0,00 208,50 0,00 0,82 0,00 8,04 25,00 75,29 4,80 100,00 dm-8 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Bonnie++ v1.96
cmd: /usr/sbin/bonnie++ -c 16 -n 0 Writing a byte at a time...done Writing intelligently...done Rewriting...done Reading a byte at a time...done Reading intelligently...done start 'em...done...done...done...done...done... Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 16 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP seo-db 48304M 819 99 188274 17 98395 8 2652 78 201280 8 265.2 1 Latency 14899us 726ms 15194ms 100ms 122ms 665ms 1.96,1.96,seo-db,16,1337541936,48304M,,819,99,188274,17,98395,8,2652,78,201280,8,265.2,1,,,,,,,,,,,,,,,,,,14899us,726ms,15194ms,100ms,122ms,665ms,,,,,,
Questions
So, sysbench showed 234 random reads per second.
I expect it to be at least 400.
What can be the bottleneck ? LVM ?
Another system with mdadm raid1 + 2x 7200rpm drives shows over 200 random reads per second...
Thanks for any help!
Your system is definitely underperforming based on your hardware specifications. I loaded the sysbench
utility on a couple of idle HP ProLiant DL380 G6/G7 servers running CentOS 5/6 to check their performance. These are normal fixed partitions instead of LVM. (I don't typically use LVM, because of the flexibility offered by HP Smart Array controllers)
The DL380 G6 has a 6-disk RAID 1+0 array on a Smart Array P410 controller with 512MB of battery-backed cache. The DL380 G7 has a 2-disk enterprise SLC SSD array. The filesystems are XFS. I used the same sysbench command line as you did:
sysbench --init-rng=on --test=fileio --num-threads=16 --file-num=128 --file-block-size=4K --file-total-size=54G --file-test-mode=rndrd --file-fsync-freq=0 --file-fsync-end=off --max-requests=30000 run
My results were 1595 random reads-per-second across 6-disks.
On SSD, the result was 39047 random reads-per-second. Full results are at the end of this post...
As for your setup, the first thing that jumps out at me is the size of your test partition. You're nearly filling the 60GB partition with 54GB of test files. I'm not sure if ext4 has an issue performing at 90+%, but that's the quickest thing for you to modify and retest. (or use a smaller set of test data)
Even with LVM, there are some tuning options available on this controller/disk setup. Checking the read-ahead and changing the I/O scheduler setting from the default cfq to deadline or noop is helpful. Please see the question and answers at: Linux - real-world hardware RAID controller tuning (scsi and cciss)
What is your RAID controller cache ratio? I typically use a 75%/25% write/read balance. This should be a quick test. The 6-disk array completed in 18 seconds. Yours took over 2 minutes.
Can you run a bonnie++ or iozone test on the partition/array in question? It would be helpful to see if there are any other bottlenecks on the system. I wasn't familiar with sysbench, but I think these other tools will give you a better overview of the system's capabilities.
Filesystem mount options may make a small difference, but I think the problem could be deeper than that...
hpacucli output...
Smart Array P410i in Slot 0 (Embedded) (sn: 50123456789ABCDE)
array A (SAS, Unused Space: 0 MB)
logicaldrive 1 (838.1 GB, RAID 1+0, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 300 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 300 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 300 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 50123456789ABCED)
sysbench DL380 G6 6-disk results...
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.
Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.
Operations performed: 30001 Read, 0 Write, 0 Other = 30001 Total
Read 117.19Mb Written 0b Total transferred 117.19Mb (6.2292Mb/sec)
1594.67 Requests/sec executed
Test execution summary:
total time: 18.8133s
total number of events: 30001
total time taken by event execution: 300.7545
per-request statistics:
min: 0.00ms
avg: 10.02ms
max: 277.41ms
approx. 95 percentile: 25.58ms
Threads fairness:
events (avg/stddev): 1875.0625/41.46
execution time (avg/stddev): 18.7972/0.01
sysbench DL380 G7 SSD results...
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 16
Initializing random number generator from timer.
Extra file open flags: 0
128 files, 432Mb each
54Gb total file size
Block size 4Kb
Number of random requests for random IO: 30000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.
Operations performed: 30038 Read, 0 Write, 0 Other = 30038 Total
Read 117.34Mb Written 0b Total transferred 117.34Mb (152.53Mb/sec)
39046.89 Requests/sec executed
Test execution summary:
total time: 0.7693s
total number of events: 30038
total time taken by event execution: 12.2631
per-request statistics:
min: 0.00ms
avg: 0.41ms
max: 1.89ms
approx. 95 percentile: 0.57ms
Threads fairness:
events (avg/stddev): 1877.3750/15.59
execution time (avg/stddev): 0.7664/0.00