How do I encode/decode HTML entities in Ruby?
To encode the characters, you can use CGI.escapeHTML
:
string = CGI.escapeHTML('test "escaping" <characters>')
To decode them, there is CGI.unescapeHTML
:
CGI.unescapeHTML("test "unescaping" <characters>")
Of course, before that you need to include the CGI library:
require 'cgi'
And if you're in Rails, you don't need to use CGI to encode the string. There's the h
method.
<%= h 'escaping <html>' %>
HTMLEntities can do it:
: jmglov@laurana; sudo gem install htmlentities
Successfully installed htmlentities-4.2.4
: jmglov@laurana; irb
irb(main):001:0> require 'htmlentities'
=> []
irb(main):002:0> HTMLEntities.new.decode "¡I'm highly annoyed with character references!"
=> "¡I'm highly annoyed with character references!"
I think Nokogiri gem is also a good choice. It is very stable and has a huge contributing community.
Samples:
a = Nokogiri::HTML.parse "foo bär"
a.text
=> "foo bär"
or
a = Nokogiri::HTML.parse "¡I'm highly annoyed with character references!"
a.text
=> "¡I'm highly annoyed with character references!"
To decode characters in Rails use:
<%= raw '<html>' %>
So,
<%= raw '<br>' %>
would output
<br>
If you don't want to add a new dependency just to do this (like HTMLEntities
) and you're already using Hpricot
, it can both escape and unescape for you. It handles much more than CGI
:
Hpricot.uxs "foo bär"
=> "foo bär"