How to speed up rsync/tar of large Maildir?

I ended up writing a little python script to calculate the correlation between directory names and inodes, inodes and data blocks, and directory names to data blocks. It turns out that ext4 tends to have rather poor correlation between the order the file names appear in the directory, and where they are stored on disk. After discussing it on the ext4 mailing list, it turns out that this is the result of the hashed directory indexes used to speed up lookups in large directories. The names are stored in hash order, which effectively scrambles their order relative to anything else.

It seems to me and at least one other commenter that this is a deficiency in the fs that should be fixed. Ted Ts'o ( the ext maintainer ) feels that it would be too difficult to do in the fs, and that good tools ( like rsync and tar ) should have an option to sort the directory by inode number before reading the files.

So it looks like feature enhancement requests need filed for rsync and tar.


Few points to consider:

  • How many files are we talking about? find /path/to/your/maildir/ | wc -l should give you a rough indication. Hundreds of thousands should be okay. Hundreds of millions might suggest you need to prune, archive and generally clean up.

  • Is the disk slow? There are many benchmarks available like a the comprehensive bonnie++ through to the quick and simple Disk Utility benchmarker. Run one and see if you're suffering.

    • That may raise hardware issues - replace for something faster
    • Filesystem issues - are you using something known to be very slow at high random read IOPS?

But ultimately, tarring and then transferring should give you the best overall throughput at the cost of you needing to be there to set up the transfer once you've generated the tar.


Try setting disabling atime tracking or using relative atime on the new disk partition. This will limit overhead. Changing from a non-journaling file system like ext2 to a journaling file system like ext3 or ext4 will have some performance hits

When I moved Maildirs, I did a preparatory rsync to get all the directories in place ahead of time. Then, there were only updates to do.

When you are ready to do the real move you may want to ensure the directories are stable.

  • place the SMTP daemon in queue only mode,
  • disable queue runs by the SMTP daemon, and
  • disable access by the user.

Reactivate after the file move is done.

EDIT: I think you have identified the problem. Tar and rsync will both walk the directories. Due to normal file changes in the Maildir, files for each directory will end up scattered around the disk. A tool like dump would read the partition in block order, but would replicate the problem to the new partition. A second rsync should run much faster than the second.