How I can parse email to get original recipient of an email? [closed]

I had email source with me and want parse original recipient of email.

Lets say "[email protected]" is receiving a email, but in "To" list [email protected], [email protected] & [email protected] are mentioned. I want to get only user1 from email source.

In initial analysis, email from mdeamon server contains "X-MDaemon-Deliver-To:" tag. Similarly email from Devcot mail server contains "Delivered-To:". But not getting generic parsing logic to get original email recipient.

How I can parse email to get original recipient of an email?


Solution 1:

In the general case, it is not possible to do what you are asking for. It is also explicitly discouraged in the standard governing Internet email.

It might be possible in some specific scenarios, but those will be highly specific. (Likely depending on specific software in used, software configuration, etc.)

The reason for this is that the email message (RFC 5822) does not contain all the transport-layer information (with SMTP being RFC 5821). Additionally, including all that information can very easily lead to information disclosure; see also RFC 7258.

The trivial case for illustrating this is if you are sending an email to multiple recipients on the same domain using the Bcc: field; in that case, the message (payload data including headers) as transmitted does not contain the envelope recipient information, and the trace headers do not normally contain the recipient addresses in that case. This means that parsing the recipient address out of the email becomes not just difficult, but outright impossible, since the information isn't even there. Other, perfectly valid, examples can be constructed as an extension to this example.

Quoting RFC 5822 section 7.2:

There is no inherent relationship between either "reverse" (from MAIL, SAML, etc., commands) or "forward" (RCPT) addresses in the SMTP transaction ("envelope") and the addresses in the header section. Receiving systems SHOULD NOT attempt to deduce such relationships and use them to alter the header section of the message for delivery. The popular "Apparently-to" header field is a violation of this principle as well as a common source of unintended information disclosure and SHOULD NOT be used.

Note the definition of SHOULD NOT from RFC 2119:

  1. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label.

Quoting RFC 7258 section 2:

To summarise: current capabilities permit some actors to monitor content and metadata across the Internet at a scale never before seen. This pervasive monitoring is an attack on Internet privacy. The IETF will strive to produce specifications that mitigate pervasive monitoring attacks.