Sendmail delays on some email addresses

On our webserver, the PHP mail() command consistently hangs on some email addresses but is fine with the majority. It hangs for over 2 mins, by which time the PHP script has lost connection with the DB so returns an error to the browser.

We use sendmail and I can see a delay of 2:36 in the mail log (/var/log/maillog) for the email address that causes the problem:

Dec  9 11:24:00 liveserver sendmail[12666]: nB9BLOHa012666: to=***blanked_out***, delay=00:02:36, mailer=esmtp, pri=31326, dsn=4.4.3, stat=queued

It is easy to reproduce the problem. I can put the email I want to test in the following command:

echo "Test message from sendmail." | sendmail [email protected] [email protected]

Most email address cause the command to return within 1s (including invalid email addresses). But the problematic email address hangs for 2:36.

  • Why doesn't sendmail queue the message and return immediately so PHP can continue running?
  • Does anyone have any tips for debugging the issue?
  • Does anyone have any tips on how to probe the problematic email address to see why it is causing a delay?

Note: We currently have 550 messages queued - but this number is not above normal (find /var/spool/mqueue -type f -name qf\* -print|wc -l|tr -d ' ').


Solution 1:

Typically email delays are DNS problems.

Try running:

host -t mx problemdomain.com

If that doesn't seem to be the problem, use sendmail -bi -v to get more debugging output.

Solution 2:

Perhaps its dns issue, try doing a dig problemmail.com.

You could also try strace and se what the process does:

attach to the process:
strace -ff -s 512 -v -p pid

start the process with strace:
strace -ff -s 512 -v sendmail -ffromtest...........

add -o ~/sendmail.strace for output into a file.

-ff makes it follow forks

Solution 3:

To answer my own questions (in case it's useful to others):

Why doesn't sendmail queue the message and return immediately so PHP can continue running?

It should do. If it doesn't it means the DNS server setup is broken. Normally DNS lookups are quick - A query on a non-existing MX/domain should receive a NXDOMAIN response within milli-seconds. It should not wait such a long time - this issue is probably causing other problems with many programs, e.g. sshd and NFS?

Does anyone have any tips for debugging the issue?

Try running:

host -t mx problemdomain.com

Then run it again using googleDNS (IP address of 8.8.8.8) instead of the current DNS service:

host -t mx problemdomain.com 8.8.8.8

If there is a difference, it means the current DNS server setup is broken. Check the nameservers in /etc/resolv.conf and perhaps raise a issue ticket with the hosting company or whoever supplies the nameservers you are using?

As a temporary workaround, try adding a DNS timeout in /etc/resolv.conf:

timeout: n
    sets the amount of time (in seconds) the resolver will wait for a response from a remote name server before retrying the query via a different name server.

Does anyone have any tips on how to probe the problematic email address to see why it is causing a delay?

Try dig, a DNS lookup utility and monitor the status code. e.g. NOERROR for success, NXDOMAIN for not found, etc:

dig problemdomain.com

Try nslookup, a program to query Internet name servers:

nslookup problemdomain.com

Try timing the sendmail command and use -bi -v to get more info:

time echo "This is a test message" | /usr/lib/sendmail -bi -v [email protected] [email protected]