Linux RAID-0 performance doesn't scale up over 1 GB/s
I have trouble getting the max throughput out of my setup. The hardware is as follow :
- dual Quad-Core AMD Opteron(tm) Processor 2376
- 16 GB DDR2 ECC RAM
- dual Adaptec 52245 RAID controllers
- 48 1 TB SATA drives set up as 2 RAID-6 arrays (256KB stripe) + spares.
Software :
- Plain vanilla 2.6.32.25 kernel, compiled for AMD-64, optimized for NUMA; Debian Lenny userland.
- benchmarks run : disktest, bonnie++, dd, etc. All give the same results. No discrepancy here.
- io scheduler used : noop. Yeah, no trick here.
Up until now I basically assumed that striping (RAID 0) several physical devices should augment performance roughly linearly. However this is not the case here :
- each RAID array achieves about 780 MB/s write, sustained, and 1 GB/s read, sustained.
- writing to both RAID arrays simultaneously with two different processes gives 750 + 750 MB/s, and reading from both gives 1 + 1 GB/s.
- however when I stripe both arrays together, using either mdadm or lvm, the performance is about 850 MB/s writing and 1.4 GB/s reading. at least 30% less than expected!
- running two parallel writer or reader processes against the striped arrays doesn't enhance the figures, in fact it degrades performance even further.
So what's happening here? Basically I ruled out bus or memory contention, because when I run dd on both drives simultaneously, aggregate write speed actually reach 1.5 GB/s and reading speed tops 2 GB/s.
So it's not the PCIe bus. I suppose it's not the RAM. It's not the filesystem, because I get exactly the same numbers benchmarking against the raw device or using XFS. And I also get exactly the same performance using either LVM striping and md striping.
What's wrong? What's preventing a process from going up to the max possible throughput? Is Linux striping defective? What other tests could I run?
Have you tried to run latencytop while doing benchmarks? might be helpful to see which linux syscall is the culprit (if any).
That's an x8 PICe Gen 1 card as far as I can tell - the absolute maximum data rate it could support is 2GBytes/sec assuming zero overheads. Adaptec themselves only claim that the cards can sustain 1.2Gbytes/sec at best and you are exceeding that.
Equipped with industry-leading dual-core RAID on Chip (RoC), x8 PCI Express connectivity and 512MB of DDR cache, they provide over 250,000 IO per second and 1.2GB/s.
My guess is that since you are able to significantly exceed their claimed performance with two RAID 0 sets acting independently the additional load, small and all is it might be, that striping adds to that is overstressing the RAID CPU's, or possibly the RAM subsystem on the controller, at GByte/sec loads.