HTML: Include, or exclude, optional closing tags?

Some HTML1 closing tags are optional, i.e.:

</HTML>
</HEAD>
</BODY>
</P>
</DT>
</DD>
</LI>
</OPTION>
</THEAD>
</TH>
</TBODY>
</TR>
</TD>
</TFOOT>
</COLGROUP>

Note: Not to be confused with closing tags that are forbidden to be included, i.e.:

</IMG>
</INPUT>
</BR>
</HR>
</FRAME>
</AREA>
</BASE>
</BASEFONT>
</COL>
</ISINDEX>
</LINK>
</META>
</PARAM>

Note: xhtml is different from HTML. xhtml is a form of xml, which requires every element have a closing tag. A closing tag can be forbidden in html, yet mandatory in xhtml.

Are the optional closing tags

  • ideally included, but we'll accept them if you forgot them, or
  • ideally not included, but we'll accept them if you put them in

In other words, should I include them, or should I not include them?

The HTML 4.01 spec talks about closing element tags being optional, but doesn't say if it's preferable to include them, or preferable to not include them.

On the other hand, a random article on DevGuru says:

The ending tag is optional. However, it is recommended that it be included.

The reason I ask is because you just know it's optional for compatibility reasons; and they would have made them (mandatory | forbidden) if they could have.

Put it another way: What did HTML 1, 2, 3 do with regards to these, now optional, closing tags. What does HTML 5 do? And what should I do?

Note

Some elements in HTML are forbidden from having closing tags. You may disagree with that, but that is the specification, and it's not up for debate. I'm asking about optional closing tags, and what the intention was.

Footnotes

1HTML 4.01


Solution 1:

There are cases where explicit tags help, but sometimes it's needless pedantry.

Note that the HTML spec clearly specifies when it's valid to omit tags, so it's not always an error.

For example you never need </body></html>. Nobody ever remembers to put <tbody> explicitly (to the point that XHTML made exceptions for it).

You don't need </head><body> unless you have DOM-manipulating scripts that actually search <head> (then it's better to close it explicitly, because rules for implied end of <head> could surprise you).

Nested lists are actually better off without </li>, because then it's harder to create erroneous ul > ul tree.

Valid:

<ul>
  <li>item
  <ul>
    <li>item
  </ul>
</ul>

Invalid:

<ul>
  <li>item</li>
  <ul>
    <li>item</li>
  </ul>
</ul>

And keep in mind that end tags are implied whether you try to close all elements or not. Putting end tags won't automatically make parsing more robust:

<p>foo <p>bar</p> baz</p>

will parse as:

<p>foo</p><p>bar</p> baz

It can only help when you validate documents.

Solution 2:

The optional ones are all ones that should be semantically clear where they end, without needing the end tag. E.G. each <li> implies a </li> if there isn't one right before it.

The forbidden end tags all would be immediately followed by their end tag so it would be kind of redundant to have to type <img src="blah" alt="blah"></img> every time.

I almost always use the optional tags (unless I have a very good reason not to) because it lends to more readable and updateable code.

Solution 3:

I am adding some links here to help you with the history of HTML, for you to understand the various contradictions. This is not the answer to your question, but you will know more after reading these various digests.

  • How Did We Get Here? – Dive Into HTML5
  • The History of the Web
  • Brief History of HTML
  • HTML’s History – HTML WG Wiki

Some excerpts from Dive Into HTML5:

[T]he fact that “broken” HTML markup still worked in web browsers led authors to create broken HTML pages. A lot of broken pages. By some estimates, over 99% of HTML pages on the web today have at least one error in them. But because these errors don’t cause browsers to display visible error messages, nobody ever fixes them.

The W3C saw this as a fundamental problem with the web, and they set out to correct it. XML, published in 1997, broke from the tradition of forgiving clients and mandated that all programs that consumed XML must treat so-called “well-formedness” errors as fatal. This concept of failing on the first error became known as “draconian error handling,” after the Greek leader Draco who instituted the death penalty for relatively minor infractions of his laws. When the W3C reformulated HTML as an XML vocabulary, they mandated that all documents served with the new application/xhtml+xml MIME type would be subject to draconian error handling. If there was even a single well-formedness error in your XHTML page […] web browsers would have no choice but to stop processing and display an error message to the end user.

This idea was not universally popular. With an estimated error rate of 99% on existing pages, the ever-present possibility of displaying errors to the end user, and the dearth of new features in XHTML 1.0 and 1.1 to justify the cost, web authors basically ignored application/xhtml+xml. But that doesn’t mean they ignored XHTML altogether. Oh, most definitely not. Appendix C of the XHTML 1.0 specification gave the web authors of the world a loophole: “Use something that looks kind of like XHTML syntax, but keep serving it with the text/html MIME type.” And that’s exactly what thousands of web developers did: they “upgraded” to XHTML syntax but kept serving it with a text/html MIME type.

Even today, millions of web pages claim to be XHTML. They start with the XHTML doctype on the first line, use lowercase tag names, use quotes around attribute values, and add a trailing slash after empty elements like <br /> and <hr />. But only a tiny fraction of these pages are served with the application/xhtml+xml MIME type that would trigger XML’s draconian error handling. Any page served with a MIME type of text/html — regardless of doctype, syntax, or coding style — will be parsed using a “forgiving” HTML parser, silently ignoring any markup errors, and never alerting end users (or anyone else) even if the pages are technically broken.

XHTML 1.0 included this loophole, but XHTML 1.1 closed it, and the never-finalized XHTML 2.0 continued the tradition of requiring draconian error handling. And that’s why there are billions of pages that claim to be XHTML 1.0, and only a handful that claim to be XHTML 1.1 (or XHTML 2.0). So are you really using XHTML? Check your MIME type. (Actually, if you don’t know what MIME type you’re using, I can pretty much guarantee that you’re still using text/html.) Unless you’re serving your pages with a MIME type of application/xhtml+xml, your so-called “XHTML” is XML in name only.

[T]he people who had proposed evolving HTML and HTML forms were faced with two choices: give up, or continue their work outside of the W3C. They chose the latter, registered the whatwg.org domain, and in June 2004, the WHAT Working Group was born.

[T]he WHAT working group was quietly working on a few other things, too. One of them was a specification, initially dubbed Web Forms 2.0, which added new types of controls to HTML forms. (You’ll learn more about web forms in A Form of Madness.) Another was a draft specification called “Web Applications 1.0,” which included major new features like a direct-mode drawing canvas and native support for audio and video without plugins.

In October 2009, the W3C shut down the XHTML 2 Working Group and issued this statement to explain their decision:

When W3C announced the HTML and XHTML 2 Working Groups in March 2007, we indicated that we would continue to monitor the market for XHTML 2. W3C recognizes the importance of a clear signal to the community about the future of HTML.

While we recognize the value of the XHTML 2 Working Group’s contributions over the years, after discussion with the participants, W3C management has decided to allow the Working Group’s charter to expire at the end of 2009 and not to renew it.

The ones that win are the ones that ship.