mkfs Operation Takes Very Long on Linux Software Raid 5
I've set-up a Linux software raid level 5 consisting of 4 * 2 TB disks. The disk array was created with a 64k stripe size and no other configuration parameters. After the initial rebuild I tried to create a filesystem and this step takes very long (about half an hour or more). I tried to create an xfs and ext3 filesystem, both took a long time, with mkfs.ext3 I observed the following behaviour, which might be helpful:
- writing inode tables runs fast until it reaches 1053 (~ 1 second), then it writes about 50, waits for two seconds, then the next 50 are written (according to the console display)
- when I try to cancel the operation with Control+C it hangs for half a minute before it is really canceled
The performance of the disks individually is very good, I've run bonnie++ on each one separately with write / read values of around 95 / 110MB/s. Even when I run bonnie++ on every drive in parallel the values are only reduced by about 10 MB. So I'm excluding hardware / I/O scheduling in general as a problem source.
I tried different configuration parameters for stripe_cache_size and readahead size without success, but I don't think they are that relevant for the file system creation operation.
The server details:
- Linux server 2.6.35-27-generic #48-Ubuntu SMP x86_64 GNU/Linux
- mdadm - v2.6.7.1
Does anyone has a suggestion on how to further debug this?
Solution 1:
I agree, that it may be related to stripe alignment. From my experience creation of unaligned XFS on 3*2TB RAID-0 takes ~5 minutes but if it is aligned to stripe size it is ~10-15 seconds. Here is a command for aligning XFS to 256KB stripe size:
mkfs.xfs -l internal,lazy-count=1,sunit=512 -d agsize=64g,sunit=512,swidth=1536 -b size=4096 /dev/vg10/lv00
BTW, stripe width in my case is 3 units, which will be the same for you with 4 drives but in raid-5.
Obviously, this also improves FS performance, so you better keep it aligned.
Solution 2:
I suspect you're running into the typical RAID5 small write issue. For writes under the size of a stripe size, it has to do a read-modify-write for both the data and the parity. If the write is the same size as the stripe, it can simply overwrite the parity, since it knows what the value is, and doesn't have to recalculate it.