Microsoft’s Jean Paoli on the XML document debate

I spoke to Microsoft’s Jean Paoli, General manager for interoperability and  Office Open XML architecture on the hot topic of the Microsoft Office XML formats and their standardisation. Paoli was an editor on the W3C committee which specified XML 1.0 in 1998.

Why a hot topic? Briefly, in 1999 Sun purchased an office suite called Star Office and released the code as open source, creating a free Office suite called Open Office. One of its likely goals was to undermine the dominance of Microsoft Office by providing a free alternative. In 2002 work began to standardize an XML specification which is an evolution of the Open Office document formats as Open Document Format (ODF); in 2006 it became an ISO standard. The combination of Open Office plus an ISO standard document format is proving an attractive combination, especially to government departments, and threatens Microsoft’s near-monopoly in Office productivity software.

Perhaps in response to ODF, Microsoft set to work standardizing its own XML document format, this time based on the files used by Microsoft Office, and called Office Open XML (OOXML). It has achieved ECMA standardization, and is now proceeding towards ISO, a move ardently opposed by IBM and other ODF supporters.

Paoli talked about how Microsoft has been working with XML in office for a long time, since Office 2000 in fact. It’s true, and I remember the launch of Office 2000 and the demonstration of round-tripping Office documents between HTML and native office, thanks to embedded XML tags. It turned out to be a feature more cursed than admired, because of the bloat it added to Word documents exported to HTML, but nevertheless he is right – Microsoft has been working seriously on XML in Office for many years, and it must frustrate the company to see the later ODF specifications sneaking ahead in the standards game. Let’s also note that the decision to have Office 2007 save by default in XML is a bold move, ensuring immediate wide usage of OOXML, but also risking the annoyance of existing customers, most of whom know and care little about document formats but just want seamless interchange. Sending a .Docx to a Mac user, for example, may cause real difficulty for the recipient.

Different goals

“Open Document Format and Office Open XML have very different goals”, says Paoli, responding to the claim that the world needs only one standard XML format for office documents. “Both of them are formats for documents … both are good.”

What’s distinctive about the goals of OOXML? Primarily, to have full fidelity with pre-existing binary documents created in Microsoft Office. “What people want is to make sure that their billions of important documents can be saved in a format where they don’t lose any information. As a design goal, we said that those formats have to represent all the information that enables high-fidelity migration from the binary formats”, says Paoli. He mentions work with institutions including the British Library and the US Library of Congress, concerned to preserve the information in their electronic archive.

I asked Paoli if such users could get equally good fidelity by converting their documents to ODF. “Absolutely not,” he says. “I am very clear on that. Those two formats are done for different reasons.”

What can go wrong? Paoli gives as an example the myriad ways borders can be drawn round tables in Microsoft Office and all its legacy versions. “There are 100 ways to draw the lines around a table,” he says. “The Open XML format has them all, but ODF which has not been designed for backward compatibility, does not have them. It’s really the tip of the iceberg. So if someone translates a binary document with a table to ODF, you will lose the framing details. That is just a very small example.”

Another benefit Paoli claims for OOXML is performance. “A lot of things are designed differently because we believe it will work faster. The spreadsheet format has been designed for very big spreadsheets because we know our users, especially in the finance industry, use very large spreadsheets. They use spreadsheets like databases. It’s not that one is better than the other, it’s that they have been designed for different things.”

I asked Paoli what would be the consequences if in fact OOXML does not become an ISO standard. He will not answer the question directly, but is defensive. “This is a long process. We will continue discussing what we should do better. It’s not like a yes or no. But what’s important is that it is already an ECMA standard. Some governments told us they would prefer it were an ISO standard. So we know that, and respect that.

Page 1 of 4 | Next page