Gratuitous CRLF in Subject: line - why is it there, and is it legal?
I'm running into a problem with a NAGIOS system sending emails to a popular email-to-SMS service. The email-to-SMS service takes emails with text in the Subject:
line, and sends them on to the mobile number encoded in the To:
field. So far so good. Sadly, sendmail (and postfix before it) seem to be inserting a gratuitous CRLF into the (necessarily long) Subject:
line, and that's causing my SMS messages to be truncated at the CRLF if and only if the Subject:
line contains one or more colons past the gratuitous CRLF.
I am confident that the messages are being created correctly, but just to be sure, here's me creating a completely noddy test message to myself, with a long Subject:
line:
echo "foo" | mail -s "1234567 101234567 201234567 301234567 401234567 501234567 601234567 701234567 801234567 90123456789" [email protected]
Note there's no extra colon in this Subject:
line; all I'm doing here is showing that an extra CRLF is inserted on the wire. Here's the result of sudo ngrep -x port 25
:
44 61 74 65 3a 20 46 72 69 2c 20 33 31 20 4d 61 Date: Fri, 31 Ma
79 20 32 30 31 33 20 31 30 3a 34 33 3a 35 35 20 y 2013 10:43:55
2b 30 31 30 30 0d 0a 54 6f 3a 20 72 65 61 70 65 +0100..To: reape
72 40 74 65 61 70 61 72 74 79 2e 6e 65 74 0d 0a [email protected]..
53 75 62 6a 65 63 74 3a 20 31 32 33 34 35 36 37 Subject: 1234567
20 31 30 31 32 33 34 35 36 37 20 32 30 31 32 33 101234567 20123
34 35 36 37 20 33 30 31 32 33 34 35 36 37 20 34 4567 301234567 4
30 31 32 33 34 35 36 37 20 35 30 31 32 33 34 35 01234567 5012345
36 37 0d 0a 20 36 30 31 32 33 34 35 36 37 20 37 67.. 601234567 7
30 31 32 33 34 35 36 37 20 38 30 31 32 33 34 35 01234567 8012345
36 37 20 39 30 31 32 33 34 35 36 37 38 39 0d 0a 67 90123456789..
55 73 65 72 2d 41 67 65 6e 74 3a 20 48 65 69 72 User-Agent: Heir
6c 6f 6f 6d 20 6d 61 69 6c 78 20 31 32 2e 34 20 loom mailx 12.4
37 2f 32 39 2f 30 38 0d 0a 4d 49 4d 45 2d 56 65 7/29/08..MIME-Ve
72 73 69 6f 6e 3a 20 31 2e 30 0d 0a 43 6f 6e 74 rsion: 1.0..Cont
65 6e 74 2d 54 79 70 65 3a 20 74 65 78 74 2f 70 ent-Type: text/p
6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 75 73 lain; charset=us
About half way down (marked in bold+italic), between the 501234567
and the 601234567
in the original Subject:
header, you can see a CRLF being inserted (0x0d 0x0a
, on the left-hand side hex dump, ..
on the right-hand side plain text).
The receiving MTA seems happy to post-process this, and when I look at the on-disc stored mail at the receiving end, I see only a LF (0x0a) in the Subject: line, and the line is parsed correctly and in its entirety by, eg, alpine
. Nevertheless, the CRLF is there on the wire, and between me and the (excellent) email-to-SMS support people, we've established that these are the cause of the problem.
So my question is: is it lawful for an MTA to insert a gratuitous CRLF on the wire?
If it is, and I can prove it, then it's the email-to-SMS house's problem, because they are being intolerant. If it isn't, or it is but I can't prove it, then it becomes my problem, so an answer with references would be most useful.
Edit: I can now come clean that the email-to-SMS service in question is kapow. Once this problem was explained to them, they got it, worked with me to develop and test a fix, and have deployed the fix. My long subject lines with colons in now get relayed correctly into SMSes. I don't normally trumpet individual companies, especially not on SF, but I thought it worthy of note that kapow Did The Right Thing. (Disclaimer: I have no connection with kapow except as a paying customer who's happy about the way they dealt with his problem.)
Solution 1:
Well, if I understand RFC 822, they are legal in certain cases, I think it's an artifact from the days of small screens with 24x80 resolutions..
These sections seem to be fairly clear Subjects can be folded, and folding is a CRLF plus LWSP(linear white space) character.. it's possible they've been supeseded, Wietse (on the postfix lists) knows his RFCs inside out if you want a definitive answer.
3.1.1. LONG HEADER FIELDS
Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name and a field-body.
For convenience, the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called "folding". The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted. Thus, the single line
To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBN
can be represented as:
To: "Joe & J. Harvey" <ddd @ Org>,
JJV@BBN
and
To: "Joe & J. Harvey"
<ddd@ Org>, JJV
@BBN
and
To: "Joe &
J. Harvey" <ddd @ Org>, JJV @ BBN
The process of moving from this folded multiple-line
representation of a header field to its single line represen-
tation is called "unfolding". Unfolding is accomplished by
regarding CRLF immediately followed by a LWSP-char as
equivalent to the LWSP-char.
Note: While the standard permits folding wherever linear-
white-space is permitted, it is recommended that struc-
tured fields, such as those containing addresses, limit
folding to higher-level syntactic breaks. For address
fields, it is recommended that such folding occur
between addresses, after the separating comma.
3.1.2. STRUCTURE OF HEADER FIELDS
Once a field has been unfolded, it may be viewed as being com-
posed of a field-name followed by a colon (":"), followed by a
field-body, and terminated by a carriage-return/line-feed.
The field-name must be composed of printable ASCII characters
(i.e., characters that have values between 33. and 126.,
decimal, except colon). The field-body may be composed of any
ASCII characters, except CR or LF. (While CR and/or LF may be
present in the actual text, they are removed by the action of
unfolding the field.)
Certain field-bodies of headers may be interpreted according
to an internal syntax that some systems may wish to parse.
These fields are called "structured fields". Examples
include fields containing dates and addresses. Other fields,
such as "Subject" and "Comments", are regarded simply as
strings of text.
Note: Any field which has a field-body that is defined as
other than simply <text> is to be treated as a struc-
tured field.
Field-names, unstructured field bodies and structured
field bodies each are scanned by their own, independent
"lexical" analyzers.
3.1.3. UNSTRUCTURED FIELD BODIES
For some fields, such as "Subject" and "Comments", no struc-
turing is assumed, and they are treated simply as <text>s, as
in the message body. Rules of folding apply to these fields,
so that such field bodies which occupy several lines must
therefore have the second and successive lines indented by at
least one LWSP-char.
Edit by the questioner: I hope NickW will forgive me for adding a note to the effect that RFC822 has been obsoleted by RFC2822, but the new RFC says pretty much the same thing in its section 2.2.3, and explicitly confirms that such folding should be removed before any further processing is done:
Each header field is logically a single line of characters comprising the field name, the colon, and the field body. For convenience however, and to deal with the 998/78 character limitations per line, the field body portion of a header field can be split into a multiple line representation; this is called "folding". The general rule is that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP. For example, the header field:
Subject: This is a test
can be represented as:
Subject: This is a test
Note: Though structured field bodies are defined in such a way that folding can take place between many of the lexical tokens (and even within some of the lexical tokens), folding SHOULD be limited to
placing the CRLF at higher-level syntactic breaks. For instance, if a field body is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items in preference to other places where the field could be folded, even if it is allowed elsewhere.The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation.
This is not to detract from the fact that NickW unerringly pointed me at pretty much exactly what I needed to know, only to help this answer stay relevant for anyone who might stumble across it in the future.
Solution 2:
Sendmail server (SendMail) imposes SMTP line length limits but it is much higher (990 bytes or more for smtp mailers).
SendMail != SendEmail
As I understand Nagios uses by default SendEmail client to send emails. It seems that email client you make Nagios use imposes such "harsh" limits on length of email header/subject line.
Check and report email client configured in commands.cfg
configuration file.
(notify-host-by-email
and notify-service-by-email
settings).