What are the practical differences between Maildir and Mbox?
Although I understand the basics of the two storage formats (1 file per email under Maildir vs. 1 single file per mailbox under mbox), I am wondering what the practical implications are here -
- Is one storage format more scalable than the other?
- Are there data integrity concerns / differences?
- Are there clearly defined situations where you should use one format over the other?
Solution 1:
Don't manage mailboxes from postfix. Never. Redirect messages for delivery via POP/IMAP server that has appropriate functionality. In case of dovecot there is dovecot-lda
aka deliver
that do everything and much more, like user-controlled message filtering, quota management, autoreplying and so on.
Anyway maildir is newer and preferrable format due to the lot of improvements comparatively to maibox. Maildir has an index for each folder that allow to control duplicates, expiration times and even full-text search. Also, maildir is significantly faster on a huge pile of messages. Dovecot can easily operate maildir with 300k messages in it without any visible slowdown. Mailbox such big is a problem itself. Also, most modern POP/IMAP servers has a lot of utilities for common tasks in large infrastructure.
Solution 2:
If you're using NFS for mail storage, do not use mbox under any circumstances whatsoever. And if you want a scalable solution, Maildir is the way to go.
The main problem with the mbox format is that of file locking - if you have more than one mail server, or more than one process trying to access the mailbox at the same time, you run a large risk of getting a corrupted mailbox. It's also difficult to go through a mailbox and delete a large number of messages, for instance when you've suffered a bounce storm.
Maildir is designed to work without failure when you have several mail servers, or several processes on one server, delivering email to the same account at the same time as the IMAP or POP server is accessing the account.
The main problem with Maildir is if you are using a file system that slows down when handling too many inodes, and if your backup system is bad at handling multiple files. As for file systems, back when I did email sysadmining at an ISP, VXFS was the best for this. For backups, I don't have any recommendation; unfortunately most of them seem to be designed to handle databas servers rather than a gazillion small files.