htmlentities in PHP but preserving html tags
You can get the list of correspondances character => entity used by htmlentities
, with the function get_html_translation_table
; consider this code :
$list = get_html_translation_table(HTML_ENTITIES);
var_dump($list);
(You might want to check the second parameter to that function in the manual -- maybe you'll need to set it to a value different than the default one)
It will get you something like this :
array
' ' => string ' ' (length=6)
'¡' => string '¡' (length=7)
'¢' => string '¢' (length=6)
'£' => string '£' (length=7)
'¤' => string '¤' (length=8)
....
....
....
'ÿ' => string 'ÿ' (length=6)
'"' => string '"' (length=6)
'<' => string '<' (length=4)
'>' => string '>' (length=4)
'&' => string '&' (length=5)
Now, remove the correspondances you don't want :
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);
Your list, now, has all the correspondances character => entity used by htmlentites, except the few characters you don't want to encode.
And now, you just have to extract the list of keys and values :
$search = array_keys($list);
$values = array_values($list);
And, finally, you can use str_replace to do the replacement :
$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_out);
And you get :
string '<p><font style="color:#FF0000">Camión español</font></p>' (length=84)
Which looks like what you wanted ;-)
Edit : well, except for the encoding problem (damn UTF-8, I suppose -- I'm trying to find a solution for that, and will edit again)
Second edit couple of minutes after : it seem you'll have to use utf8_encode
on the $search
list, before calling str_replace
:-(
Which means using something like this :
$search = array_map('utf8_encode', $search);
Between the call to array_keys
and the call to str_replace
.
And, this time, you should really get what you wanted :
string '<p><font style="color:#FF0000">Camión español</font></p>' (length=70)
And here is the full portion of code :
$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);
$search = array_keys($list);
$values = array_values($list);
$search = array_map('utf8_encode', $search);
$str_in = '<p><font style="color:#FF0000">Camión español</font></p>';
$str_out = str_replace($search, $values, $str_in);
var_dump($str_in, $str_out);
And the full output :
string '<p><font style="color:#FF0000">Camión español</font></p>' (length=58)
string '<p><font style="color:#FF0000">Camión español</font></p>' (length=70)
This time, it should be ok ^^
It doesn't really fit in one line, is might not be the most optimized solution ; but it should work fine, and has the advantage of allowing you to add/remove any correspondance character => entity you need or not.
Have fun !
Might not be terribly efficient, but it works
$sample = '<p><font style="color:#FF0000">Camión español</font></p>';
echo htmlspecialchars_decode(
htmlentities($sample, ENT_NOQUOTES, 'UTF-8', false)
, ENT_NOQUOTES
);
This is optimized version of the accepted answer.
$list = get_html_translation_table(HTML_ENTITIES);
unset($list['"']);
unset($list['<']);
unset($list['>']);
unset($list['&']);
$string = strtr($string, $list);
No solution short of a parser is going to be correct for all cases. Yours is a good case:
<p><font style="color:#FF0000">Camión español</font></p>
but do you also want to support:
<p><font>true if 5 < a && name == "joe"</font></p>
where you want it to come out as:
<p><font>true if 5 < a && name == "joe"</font></p>
Question: Can you do the encoding BEFORE you build the HTML. In other words can do something like:
"<p><font>" + htmlentities(inner) + "</font></p>"
You'll save yourself lots of grief if you can do that. If you can't, you'll need some way to skip encoding <, >, and " (as described above), or simply encode it all, and then undo it (eg. replace('<', '<')
)