Rails - Mail, getting the body as Plain Text
Given: message = Mail.new(params[:message])
as seen here: http://docs.heroku.com/cloudmailin
It shows how to get the message.body as HTML, how to do you get the plain/text version?
Thanks
The code above:
message = Mail.new(params[:message])
will create a new instance of the mail gem from the full message. You can then use any of the methods on that message to get the content. You can therefore get the plain content using:
message.text_part
or the HTML with
message.html_part
These methods will just guess and find the first part in a multipart message of either text/plain or text/html content type. CloudMailin also provides these as convenience methods however via params[:plain] and params[:html]. It's worth remembering that the message is never guaranteed to have a plain or html part. It may be worth using something like the following to be sure:
plain_part = message.multipart? ? (message.text_part ? message.text_part.body.decoded : nil) : message.body.decoded
html_part = message.html_part ? message.html_part.body.decoded : nil
As a side note it's also important to extract the content encoding from the message when you use these methods and make sure that the output is encoded into the encoding method you desire (such as UTF-8).
What is Mail
?
The message
defined in the question appears to be an instance of the same Mail
or Mail::Message
class, which is also used in ActionMailer::Base
, or in the mailman gem.
I'm not sure where this is integrated into rails, but Steve Smith has pointed out that this is defined in the mail gem.
- Usage Section of the gem's readme on github.
-
Documentation of
Mail::Message
on rubydoc.info.
Extracting a Part From a Multipart Email
In the gem's readme, there is an example section on reading multipart emails.
Besides the methods html_part
and text_part
, which simply find the first part of the corresponding mime type, one can access and loop through the parts manually and filter by the criteria as needed.
message.parts.each do |part|
if part.content_type == 'text/plain'
# ...
elsif part.content_type == 'text/html'
# ...
end
end
The Mail::Part
is documented here.
Encoding Issues
Depending on the source of the received mail, there might be encoding issues. For example, rails could identify the wrong encoding type. If, then, one tries to convert the body to UTF-8 in order to store it in the database (body_string.encode('UTF-8')
), there might be encoding errors like
Encoding::UndefinedConversionError - "\xFC" from ASCII-8BIT to UTF-8
(like in this SO question).
In order to circumvent this, one can readout the charset from the message part and tell rails what charset it has been before encoding to UTF-8:
encoding = part_to_use.content_type_parameters['charset']
body = part_to_use.body.decoded.force_encoding(encoding).encode('UTF-8')
Here, the decoded
method removes the header lines, as shown in the encoding section of the mail gem's readme.
EDIT: Hard Encoding Issues
If there are really hard encoding issues, the former approach does not solve, have a look at the excellent charlock_holmes gem.
After adding this gem to the Gemfile
, there is a more reliable way to convert email encodings, using the detect_encoding
method, which is added to Strings by this gem.
I found it helpful to define a body_in_utf8
method for mail messages. (Mail::Part
also inherits from Mail::Message
.):
module Mail
class Message
def body_in_utf8
require 'charlock_holmes/string'
body = self.body.decoded
if body.present?
encoding = body.detect_encoding[:encoding]
body = body.force_encoding(encoding).encode('UTF-8')
end
return body
end
end
end
Summary
# select the part to use, either like shown above, or as one-liner
part_to_use = message.html_part || message.text_part || message
# readout the encoding (charset) of the part
encoding = part_to_use.content_type_parameters['charset'] if part_to_use.content_type_parameters
# get the message body without the header information
body = part_to_use.body.decoded
# and convert it to UTF-8
body = body.force_encoding(encoding).encode('UTF-8') if encoding
EDIT: Or, after defining a body_in_utf8
method, as shown above, the same as one-liner:
(message.html_part || message.text_part || message).body_in_utf8