HTML5: W3C vs WHATWG. Which gives the most authoritative spec?
I'm in halfway trough an html parser and found html5 defined explicitly the rules of thumb for parsing ill formed html. (And I used to infer them from DTDs, sigh)
I love that fact, but I know well that html5 isn't finalized yet (also I wonder if it ever will) and that it isn't developed by the W3C, but by the WHATWG.
Searching for the spec I need I'm presented with:
- 8.2 section of the W3C TR
http://www.w3.org/TR/html5/syntax.html#parsing
or
- 11.2 section of the WHATWG web-apps/current-work
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
If it wasn't for the section numbers I would induce those are simply the same. But the different numbering makes me wonder. Which version is, supposedly, the most authoritative?
WHATWG seems to have more sections, and to have been added to since W3C uploaded its candidate recommendation.
Will W3C update to the WHATWG version?
Or will they stick to their current candidate until it gets to the official recommendation status?
Which html5 spec are we poor devils supposed to follow, when in doubt?
Always choose WHATWG over W3C, no exceptions.
Anne van Kesteren, (a WHATWG member who was a major contributor to the the HTML specification prior to the WHATWG and W3C versions diverging, and who remains a major contributor to the WHATWG specification) describes the current situation between WHATWG and W3C as follows on his blog:
The W3C has forked the [WHATWG] HTML Standard for the nth time. As always, it is pretty disastrous:
- Erased all Git history of the document.
- Did not document how they transformed the document. Issues of mismatches have already been reported and it will likely be a long time, if ever, before all bugs due to this process are uncovered, since it was not open.
- Did not discuss plans with the wider community.
- Did not discuss plans with the folks they were forking from.
- Did not even discuss plans with the members of the W3C Web Platform Working Group.
- Erased the acknowledgments section.
- Erased the copyright and licensing information and replaced it with their own.
2019: The war is finally over
On May 28th, 2019, W3C and the WHATWG have signed a agreement to collaborate on a single, authoritative version of the HTML and DOM specifications.
According to W3C's statement, the two parties have come to the following terms:
- W3C and WHATWG work together on HTML and DOM, in the WHATWG repositories, to produce a Living Standard and Recommendation/Review Draft-snapshots
- WHATWG maintains the HTML and DOM Living Standards
- W3C facilitates community work directly in the WHATWG repositories (bridging communities, developing use cases, filing issues, writing tests, mediating issue resolution)
- W3C stops independent publishing of a designated list of specifications related to HTML and DOM and instead will work to take WHATWG Review Drafts to W3C Recommendations
Biased answer from an editor of WHATWG HTML here. Hopefully the facts can speak for themselves though.
The WHATWG Living Standard should be considered authoritative. It is constantly worked on by a large community of contributors, including all browser vendors. No browser vendors implement according to W3C HTML; for some such as Firefox and Chrome this is a matter of publicly stated policy.
The WHATWG Living Standard is constantly receiving bug fixes and new features. For more information on this model of spec development, which more closely matches modern software development practices, see What does "Living Standard" mean?.
Unfortunately, the W3C sometimes copies and pastes our work onto their own website, and puts their own logo on it, and changes the names of the editors, and such. They do this for a variety of reasons, one of the largest of which is face-saving for the sake of their paying member companies (example of them stating this). What's worse, they like to release "versions" (like HTML "5.0", "5.1", etc.) which are just outdated versions missing modern bug fixes and features that clog up search result pages, causing confusion like this very question. We are currently tracking the confusion caused by these forks, of which HTML is only one.
You can track their progress on the copy-and-paste job in their issue tracker or in commits such as this one. It's a fun game to spot the bugs they introduce while doing this copy-and-paste job, as they generally do not read or understand the content they are copying, leading to widespread errors and inconsistencies.