How do I align my partition table properly?
I am in the process of building my first RAID5 array. I've used mdadm to create the following set up:
root@bondigas:~# mdadm --detail /dev/md1
/dev/md1:
Version : 00.90
Creation Time : Wed Oct 20 20:00:41 2010
Raid Level : raid5
Array Size : 5860543488 (5589.05 GiB 6001.20 GB)
Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Wed Oct 20 20:13:48 2010
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
Rebuild Status : 1% complete
UUID : f6dc829e:aa29b476:edd1ef19:85032322 (local to host bondigas)
Events : 0.12
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
4 8 64 3 spare rebuilding /dev/sde
While that's going I decided to format the beast with the following command:
root@bondigas:~# mkfs.ext4 /dev/md1p1
mke2fs 1.41.11 (14-Mar-2010)
/dev/md1p1 alignment is offset by 63488 bytes.
This may result in very poor performance, (re)-partitioning suggested.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=16 blocks, Stripe width=48 blocks
97853440 inodes, 391394047 blocks
19569702 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
11945 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Writing inode tables: ^C 27/11945
root@bondigas:~# ^C
I am unsure what to do about "/dev/md1p1 alignment is offset by 63488 bytes." and how to properly partition the disks to match so I can format it properly.
Solution 1:
Since alignment pops up in a lot of places -
- "Advanced Format" hard drives with 4k blocks
- SSDs
- RAID
- LVM
- I'll expand a bit on the question.
Aligning partitions
"Linux on 4kB-sector disks" (IBM developerWorks) walks through the steps with fdisk, parted and GPT fdisk.
With fdisk:
sudo fdisk /dev/XXX
c # turn off DOS compatibility
u # switch to sector units
p # print current partitions, check that start sectors are multiples of 8
# for a new partition:
n # new partition
<select primary/secondary and partition #>
first sector: 2048
# 2048 is default in recent fdisk,
# and is compatible with Vista and Win 7,
# 4k-sector disks and all common RAID stripe sizes
Aligning the file system
This is primarily relevant for RAID (levels 0, 5 and 6; not level 1); the file system performs better if it is created with knowledge of the stripe sizes.
It can also be used for SSDs if you wish to align the file system to the SSD erase block size (Theodore Tso, Linux kernel developer).
In the OP post mkfs
apparently auto-detected the optimal settings, so no further action was required.
If you wish to verify, for RAID the relevant parameters are:
- block size (file system block size, ex. 4096)
- stripe size (same as mdadm chunk size, ex. 64k)
- stride:
stripe size / block size
(ex. 64k / 4k = 16) - stripe-width:
stride * #-of-data-disks
(ex. 4 disks RAID 5 is 3 data disks; 16*3 = 48)
From Linux Raid Wiki. See also this simple calculator for different RAID levels and number of disks.
For SSD erase block alignment the parameters are:
- fs block size (ex. 4096)
- SSD erase block size (ex. 128k)
- stripe-width: erase-block-size / fs-block-size (ex. 128k / 4k = 32)
From Theodore's SSD post.
Aligning LVM extents
The potential issue is that LVM creates a 192k header. This is a multiple of 4k (so no issue with 4k-block disks) but may not be a multiple of RAID stripe size (if LVM runs on a RAID) or SSD erase block size (if LVM runs on SSD).
See Theodore's post for the workaround.
Solution 2:
A friend of mine pointed out that I can just mkfs.ex4 right on /dev/md1
without partitioning anything, so I deleted the partition and did that and it appears to be formatting now.
Solution 3:
I find this way to be the easiest
parted -a opt /dev/md0
(parted) u MiB
(parted) rm 1
(parted) mkpart primary 1 100%
or an alternate dirty method would simply go like this
(parted) mkpart primary ext4 1 -1
Solution 4:
It seems like mkfs.ext4 wants filesystems on your RAID to start on a 64 KiB boundary. If you use the whole disk, it starts at 0 which is of course also a multiple of 64 KiB...
Most partitioning tools nowadays will use a 1 MiB boundary by default anyway (fdisk probably doesn't).
The reason for this is that most hard disks & SSDs use fysical sectors on the device that are much bigger than the logical sectors. The result of that is that if you read a logical sector of 512 bytes from disk, the hardware actually has to reads a much larger amount of data.
In case of your software RAID device something similar happens: data on it is stored in "chunks" of 64 KiB with the default mdadm settings.