Generating XML document in PHP (escape characters)

I'm generating an XML document from a PHP script and I need to escape the XML special characters. I know the list of characters that should be escaped; but what is the correct way to do it?

Should the characters be escaped just with backslash (\') or what is the proper way? Is there any built-in PHP function that can handle this for me?


Solution 1:

I created simple function that escapes with the five "predefined entities" that are in XML:

function xml_entities($string) {
    return strtr(
        $string, 
        array(
            "<" => "&lt;",
            ">" => "&gt;",
            '"' => "&quot;",
            "'" => "&apos;",
            "&" => "&amp;",
        )
    );
}

Usage example Demo:

$text = "Test &amp; <b> and encode </b> :)";
echo xml_entities($text);

Output:

Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

A similar effect can be achieved by using str_replace but it is fragile because of double-replacings (untested, not recommended):

function xml_entities($string) {
    return str_replace(
        array("&",     "<",    ">",    '"',      "'"),
        array("&amp;", "&lt;", "&gt;", "&quot;", "&apos;"), 
        $string
    );
}

Solution 2:

Use the DOM classes to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.


Edit: This was criticized by @Tchalvak:

The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.

Which is wrong, DOMDocument can properly output just a fragment not the whole document:

$doc->saveXML($fragment);

which gives:

Test &amp; <b> and encode </b> :)
Test &amp;amp; &lt;b&gt; and encode &lt;/b&gt; :)

as in:

$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();

// adding XML verbatim:
$xml = "Test &amp; <b> and encode </b> :)\n";
$fragment->appendXML($xml);

// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));

// output the result
echo $doc->saveXML($fragment);

See Demo

Solution 3:

What about the htmlspecialchars() function?

htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);

Note: the ENT_XML1 flag is only available if you have PHP 5.4.0 or higher.

htmlspecialchars() with these parameters replaces the following characters:

  • & (ampersand) becomes &amp;
  • " (double quote) becomes &quot;
  • ' (single quote) becomes &apos;
  • < (less than) becomes &lt;
  • > (greater than) becomes &gt;

You can get the translation table by using the get_html_translation_table() function.