Running postfix on ubuntu, sending alot of mail ( ~ 1 million messages ) per day. loads are extremly high but not much in terms of cpu and memory load. Anyone in a similiar situation and know how to remove the bottleneck?

All mail on this server is outbound.

I would have to assume the bottleneck is disk.

Just an update, here is what iostat looks like:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.12   99.88    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    12.38    0.00    2.48     0.00   118.81    48.00     0.00    0.00   0.00   0.00
sdb               1.49    22.28   72.28   42.57   629.70  1041.58    14.55   135.56  834.31   8.71 100.00

Are these numbers in line with the performance you would expect from a single disk?

sdb is dedicated to postfix.

I think it is queue shuffling, from incoming->active->deferred

More details from questions:

Server: Quad core Xeon(R) CPU E5405 @ 2.00GH with 4 GB ram

Load average: 464.88, 489.11, 483.91, 4 cores. but the memory utilization and cpu is minimal

Postfix instances between 16 - 32


This may sound a bit crazy, but you should:

  1. Turn down logging to the bare minimum you need. Make syslog only log mail.err or higher.
  2. Add more RAM. Yes, Postfix doesn't need it, but extra RAM means extra page cache for the kernel.
  3. You didn't mention what filesystem is on /dev/sdb (which matters some too), but definitely switch it over to noatime, which should reduce the load at least a little bit.
  4. See how big your /var/spool/postfix is. If it's under a couple gig, consider moving it to a ramdisk.

I have to disagree with those that have suggested using a RAM disk for "/var/spool/postfix". This means that your entire mail queue will be stored in RAM. If your server crashes, or loses power, messages in the queue are gone forever. This is really bad from the client/user perspective because the message has already been successfully accepted for delivery. Worse, your server will not send a notice stating that an email bounced or couldn't be delivered because the queue will be empty when the server comes back up.

Instead, I'd add as many fast disks as you can afford; I can't really estimate how many you'll need with the information given. From the "iostat" output above, it looks like you're doing ~ 120 IOPS to 'sdb' (sum of r/s and w/s). You can reasonably estimate that a single 15k RPM SCSI or FC disk will handle 150 IOPS. I would start with 5 15k RPM SCSI disks and a decent RAID controller. Set it up as RAID-10 across 4 drives with 1 hot spare. I'm not sure that this will completely solve your problem, but it definitely won't make it worse.


Run postfix under some profiler (gprof?), or look in the logs. Postfix logs a lot of timing information that might tell you where the hold up is. Common places to look are:

  1. Disk performance. Might be time for RAID-10 for your queue.
  2. Any kind of network IO on messages. DNS blacklists? SAV?
  3. Milters and other filters you've installed.
  4. Authentication and UID lookups being done over the network or to a process (ldap, sql).
  5. not using proxy: for slow maps (like the above)

A million messages a day is about 11 per second, assuming throughput is constant. Postfix by itself should be able to handle at least an order of magnitude greater than that on entry-level server hardware. So I suspect you have more than just postfix running, or very unevenly distributed throughput peaks.

Your situation certainly looks like a heavily I/O-bound server. This is to be expected with an MTA, which needs to make lots of small writes to guarantee that it won't lose mail.

Take time to tune I/O on both /var/spool/postfix and /var/log. Best practice for busy postfix servers is to separate the two across different spindles, and to make sure that asynchronous logging is enabled. prefix the logfile name for your mail log with a dash on Linux.

mail.info                              -/var/log/mail.log

or similar.

If you're using amavisd-new, make sure its work area is on a tmpfs filesystem. We usually put it on /tmp/vscan/. This is safe, since amavisd-new doesn't return an end-of-data response until the downstream (post-filter) hop has accepted the message.

Some people recommend noatime mount options for the postfix spool. This is potentially unwise, due to the way postfix depends on file system semantics. See for example http://archives.neohapsis.com/archives/postfix/2006-01/1916.html.