How do you make strings "XML safe"?

I am responding to an AJAX call by sending it an XML document through PHP echos. In order to form this XML document, I loop through the records of a database. The problem is that the database includes records that have '<' symbols in them. So naturally, the browser throws an error at that particular spot. How can this be fixed?


Solution 1:

Since PHP 5.4 you can use:

htmlspecialchars($string, ENT_XML1);

You should specify the encoding, such as:

htmlspecialchars($string, ENT_XML1, 'UTF-8');

Update

Note that the above will only convert:

  • & to &amp;
  • < to &lt;
  • > to &gt;

If you want to escape text for use in an attribute enclosed in double quotes:

htmlspecialchars($string, ENT_XML1 | ENT_COMPAT, 'UTF-8');

will convert " to &quot; in addition to &, < and >.


And if your attributes are enclosed in single quotes:

htmlspecialchars($string, ENT_XML1 | ENT_QUOTES, 'UTF-8');

will convert ' to &apos; in addition to &, <, > and ".

(Of course you can use this even outside of attributes).


See the manual entry for htmlspecialchars.

Solution 2:

By either escaping those characters with htmlspecialchars, or, perhaps more appropriately, using a library for building XML documents, such as DOMDocument or XMLWriter.

Another alternative would be to use CDATA sections, but then you'd have to look out for occurrences of ]]>.

Take also into consideration that that you must respect the encoding you define for the XML document (by default UTF-8).