Rails - Mail, getting the body as Plain Text

Given: message = Mail.new(params[:message])

as seen here: http://docs.heroku.com/cloudmailin

It shows how to get the message.body as HTML, how to do you get the plain/text version?

Thanks


The code above:

message = Mail.new(params[:message])

will create a new instance of the mail gem from the full message. You can then use any of the methods on that message to get the content. You can therefore get the plain content using:

message.text_part

or the HTML with

message.html_part

These methods will just guess and find the first part in a multipart message of either text/plain or text/html content type. CloudMailin also provides these as convenience methods however via params[:plain] and params[:html]. It's worth remembering that the message is never guaranteed to have a plain or html part. It may be worth using something like the following to be sure:

plain_part = message.multipart? ? (message.text_part ? message.text_part.body.decoded : nil) : message.body.decoded
html_part = message.html_part ? message.html_part.body.decoded : nil

As a side note it's also important to extract the content encoding from the message when you use these methods and make sure that the output is encoded into the encoding method you desire (such as UTF-8).


What is Mail?

The message defined in the question appears to be an instance of the same Mail or Mail::Message class, which is also used in ActionMailer::Base, or in the mailman gem.

I'm not sure where this is integrated into rails, but Steve Smith has pointed out that this is defined in the mail gem.

  • Usage Section of the gem's readme on github.
  • Documentation of Mail::Message on rubydoc.info.

Extracting a Part From a Multipart Email

In the gem's readme, there is an example section on reading multipart emails.

Besides the methods html_part and text_part, which simply find the first part of the corresponding mime type, one can access and loop through the parts manually and filter by the criteria as needed.

message.parts.each do |part|
  if part.content_type == 'text/plain'
    # ...
  elsif part.content_type == 'text/html'
    # ...
  end 
end

The Mail::Part is documented here.

Encoding Issues

Depending on the source of the received mail, there might be encoding issues. For example, rails could identify the wrong encoding type. If, then, one tries to convert the body to UTF-8 in order to store it in the database (body_string.encode('UTF-8')), there might be encoding errors like

Encoding::UndefinedConversionError - "\xFC" from ASCII-8BIT to UTF-8

(like in this SO question).

In order to circumvent this, one can readout the charset from the message part and tell rails what charset it has been before encoding to UTF-8:

encoding = part_to_use.content_type_parameters['charset']
body = part_to_use.body.decoded.force_encoding(encoding).encode('UTF-8')

Here, the decoded method removes the header lines, as shown in the encoding section of the mail gem's readme.

EDIT: Hard Encoding Issues

If there are really hard encoding issues, the former approach does not solve, have a look at the excellent charlock_holmes gem.

After adding this gem to the Gemfile, there is a more reliable way to convert email encodings, using the detect_encoding method, which is added to Strings by this gem.

I found it helpful to define a body_in_utf8 method for mail messages. (Mail::Part also inherits from Mail::Message.):

module Mail
  class Message
    def body_in_utf8
      require 'charlock_holmes/string'
      body = self.body.decoded
      if body.present?
        encoding = body.detect_encoding[:encoding]
        body = body.force_encoding(encoding).encode('UTF-8')
      end
      return body
    end
  end
end

Summary

# select the part to use, either like shown above, or as one-liner
part_to_use = message.html_part || message.text_part || message

# readout the encoding (charset) of the part
encoding = part_to_use.content_type_parameters['charset'] if part_to_use.content_type_parameters

# get the message body without the header information
body = part_to_use.body.decoded

# and convert it to UTF-8
body = body.force_encoding(encoding).encode('UTF-8') if encoding

EDIT: Or, after defining a body_in_utf8 method, as shown above, the same as one-liner:

(message.html_part || message.text_part || message).body_in_utf8