Generating XML document in PHP (escape characters)
I'm generating an XML document from a PHP script and I need to escape the XML special characters. I know the list of characters that should be escaped; but what is the correct way to do it?
Should the characters be escaped just with backslash (\') or what is the proper way? Is there any built-in PHP function that can handle this for me?
Solution 1:
I created simple function that escapes with the five "predefined entities" that are in XML:
function xml_entities($string) {
return strtr(
$string,
array(
"<" => "<",
">" => ">",
'"' => """,
"'" => "'",
"&" => "&",
)
);
}
Usage example Demo:
$text = "Test & <b> and encode </b> :)";
echo xml_entities($text);
Output:
Test &amp; <b> and encode </b> :)
A similar effect can be achieved by using str_replace
but it is fragile because of double-replacings (untested, not recommended):
function xml_entities($string) {
return str_replace(
array("&", "<", ">", '"', "'"),
array("&", "<", ">", """, "'"),
$string
);
}
Solution 2:
Use the DOM classes to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.
Edit: This was criticized by @Tchalvak:
The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.
Which is wrong, DOMDocument can properly output just a fragment not the whole document:
$doc->saveXML($fragment);
which gives:
Test & <b> and encode </b> :)
Test &amp; <b> and encode </b> :)
as in:
$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();
// adding XML verbatim:
$xml = "Test & <b> and encode </b> :)\n";
$fragment->appendXML($xml);
// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));
// output the result
echo $doc->saveXML($fragment);
See Demo
Solution 3:
What about the htmlspecialchars()
function?
htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);
Note: the ENT_XML1
flag is only available if you have PHP 5.4.0 or higher.
htmlspecialchars()
with these parameters replaces the following characters:
-
&
(ampersand) becomes&
-
"
(double quote) becomes"
-
'
(single quote) becomes'
-
<
(less than) becomes<
-
>
(greater than) becomes>
You can get the translation table by using the get_html_translation_table()
function.