Linux md vs. LVM performance
I am trying to tune my NAS, running openfiler, and wondering why I'm getting relatively poor read performance from 4 WD RE3 drives in RAID 5.
EDIT: Please note I am talking about the buffered disk read speed not cached speeds
EDIT: Changed formatting to make clear there are two sets of output.
When I run hdparm on the meta device I get the levels of performance I'd expect, drop to the volume and it's a third the speed !
Any one any idea why ? Is LVM that bad ?
Dean
Meta device /dev/md0 results
[root@nas2 etc]# hdparm -tT /dev/md0 /dev/md0: Timing cached reads: 4636 MB in 2.00 seconds = 2318.96 MB/sec Timing buffered disk reads: 524 MB in 3.01 seconds = 174.04 MB/sec
Vol group /dev/mapper/vg1-vol1 results
[root@nas2 etc]# hdparm -tT /dev/mapper/vg1-vol1 /dev/mapper/vg1-vol1: Timing cached reads: 4640 MB in 2.00 seconds = 2320.28 MB/sec Timing buffered disk reads: 200 MB in 3.01 seconds = 66.43 MB/sec
Edit: See section from the hdparm man page which suggest this is perfectly valid test for sequential read performance which is the issue I am trying to resolve.
-t Perform timings of device reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. This displays the speed of reading through the buffer cache to the disk without any prior caching of data. This measurement is an indication of how fast the drive can sustain sequential data reads under Linux, without any filesystem overhead. To ensure accurate measurements, the buffer cache is flushed during the processing of -t using the BLKFLSBUF ioctl. If the -T flag is also specified, then a correction factor based on the outcome of -T will be incorporated into the result reported for the -t operation.
Solution 1:
The default readahead settings for LVM are really pessimistic. Try blockdev --setra 8192 /dev/vg1/vol1
and see what that bumps your LVM performance up to. You will always take a performance hit using LVM; we measure it on properly configured systems at about 10% of underlying block device performance.
Solution 2:
I don't have a good explanation, but I can confirm the results.
Testing of RAID (raid5, 4x1.5TB drives)
root@enterprise:# hdparm -tT /dev/md2
/dev/md2:
Timing cached reads: 2130 MB in 2.00 seconds = 1065.81 MB/sec
Timing buffered disk reads: 358 MB in 3.00 seconds = 119.15 MB/sec
root@enterprise:# hdparm -tT /dev/md2
/dev/md2:
Timing cached reads: 2168 MB in 2.00 seconds = 1084.54 MB/sec
Timing buffered disk reads: 358 MB in 3.01 seconds = 119.10 MB/sec
test of volume which is uses md2 as the physical device.
root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
Timing cached reads: 2078 MB in 2.00 seconds = 1039.29 MB/sec
Timing buffered disk reads: 176 MB in 3.03 seconds = 58.04 MB/sec
root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
Timing cached reads: 2056 MB in 2.00 seconds = 1028.06 MB/sec
Timing buffered disk reads: 154 MB in 3.03 seconds = 50.81 MB/sec
I made the change proposed by womble and saw results like this.
root@enterprise:# blockdev --setra 8192 /dev/mapper/vg2-data
root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
Timing cached reads: 2106 MB in 2.00 seconds = 1053.82 MB/sec
Timing buffered disk reads: 298 MB in 3.00 seconds = 99.26 MB/sec
root@enterprise:# hdparm -tT /dev/mapper/vg2-data
/dev/mapper/vg2-data:
Timing cached reads: 2044 MB in 2.00 seconds = 1022.25 MB/sec
Timing buffered disk reads: 280 MB in 3.03 seconds = 92.45 MB/sec
Solution 3:
Make sure that you compare apples to apples.
hdparm -t
reads from the beginning of the device which is also the fastest part of your disk if you're giving it a whole disk (and it's spinning platters).
Make sure you compare it with a LV from the beginning of the disk.
To see the mapping use pvdisplay -m
.
(okay, granted, the difference in numbers may be negligible. But at least think about it :)