Simon Willison points to David “liorean” Andersson’s article on HTML5 vs XHTML2. This debate about the evolution of HTML has gotten confusing. In a nutshell, the W3C wanted to fix HTML by making it proper grown-up XML, hence XHTML which was meant to succede HTML 4.0. Unfortunately XHTML never really caught on. One of its inherent problems is nicely put by Andersson:
Among the reasons for this is the draconian error handling of XML. XML parsing will stop at the first error in the document, and that means that any errors will render a page totally unreachable. A document with an XML well formedness error will only display details of the error, but no content. On pages where some of the content is out of the control of XML tools with well-designed handling of different character encodings—where users may comment or post, or where content may come from the outside in the form of trackbacks, ad services, or widgets, for example—there’s always a risk of a well-formedness error. Tag-soup parsing browsers will do their best to display a page, in spite of any errors, but when XML parsing any error, no matter how small, may render your page completely useless.
So nobody took much notice of XHTML; the W3C’s influence declined; and a rival anything-but-Microsoft group called WHATWG commenced work on its own evolution of HTML which it called HTML 5.
In the meantime the W3C eventually realised that XHTML was never going to catch on and announced that it would revive work on HTML. Actually it is still working on XHTML2 in parallel. I suppose the idea, to the extent it has been thought through, is that XHTML will be the correct format for the well-formed Web, and HTML for the ill-formed or tag-soup Web. The new W3C group has its charter here. In contrast to WHATWG, this group includes Microsoft; in fact, Chris Wilson from the IE team is co-chair with Dan Connolly. However, convergence with WHATWG is part of the charter:
The HTML Working Group will actively pursue convergence with WHATWG, encouraging open participation within the bounds of the W3C patent policy and available resources.
In theory then, WHATWG HTML 5 and W3C HTML 5 will be the same thing. Don’t hold your breath though, since according to the FAQ:
When will HTML 5 be finished? Around 15 years or more to reach a W3C recommendation (include estimated schedule).
I suppose the thing will move along and we will see bits of HTML 5 being implemented by the main browsers. But will it make much difference? Although HTML is a broken specification, it is proving sufficient to support AJAX and to host other interesting stuff like Flash/Apollo, WPF and WPF/E, and so on. Do we need HTML 5? It remains an open question. Maybe the existence of a working group where all the browser vendors are talking is reward in itself: it may help to fix the most pressing real-world problem, which is browser inconsistency.