Do I really need to encode '&' as '&'?

Yes. Just as the error said, in HTML, attributes are #PCDATA meaning they're parsed. This means you can use character entities in the attributes. Using & by itself is wrong and if not for lenient browsers and the fact that this is HTML not XHTML, would break the parsing. Just escape it as & and everything would be fine.

HTML5 allows you to leave it unescaped, but only when the data that follows does not look like a valid character reference. However, it's better just to escape all instances of this symbol than worry about which ones should be and which ones don't need to be.

Keep this point in mind; if you're not escaping & to &, it's bad enough for data that you create (where the code could very well be invalid), you might also not be escaping tag delimiters, which is a huge problem for user-submitted data, which could very well lead to HTML and script injection, cookie stealing and other exploits.

Please just escape your code. It will save you a lot of trouble in the future.


Validation aside, the fact remains that encoding certain characters is important to an HTML document so that it can render properly and safely as a web page.

Encoding & as & under all circumstances, for me, is an easier rule to live by, reducing the likelihood of errors and failures.

Compare the following: which is easier? Which is easier to bugger up?

Methodology 1

  1. Write some content which includes ampersand characters.
  2. Encode them all.

Methodology 2

(with a grain of salt, please ;) )

  1. Write some content which includes ampersand characters.
  2. On a case-by-case basis, look at each ampersand. Determine if:
  • It is isolated, and as such unambiguously an ampersand. eg. volt & amp
     > In that case don't bother encoding it.
  • It is not isolated, but you feel it is nonetheless unambiguous, as the resulting entity does not exist and will never exist since the entity list could never evolve. E.g., amp&volt
     >. In that case, don't bother encoding it.
  • It is not isolated, and ambiguous. E.g., volt&amp
     > Encode it.

??