I spoke to Microsoft’s Jean Paoli, General manager for interoperability and Office Open XML architecture on the hot topic of the Microsoft Office XML formats and their standardisation. Paoli was an editor on the W3C committee which specified XML 1.0 in 1998.
Why a hot topic? Briefly, in 1999 Sun purchased an office suite called Star Office and released the code as open source, creating a free Office suite called Open Office. One of its likely goals was to undermine the dominance of Microsoft Office by providing a free alternative. In 2002 work began to standardize an XML specification which is an evolution of the Open Office document formats as Open Document Format (ODF); in 2006 it became an ISO standard. The combination of Open Office plus an ISO standard document format is proving an attractive combination, especially to government departments, and threatens Microsoft’s near-monopoly in Office productivity software.
Perhaps in response to ODF, Microsoft set to work standardizing its own XML document format, this time based on the files used by Microsoft Office, and called Office Open XML (OOXML). It has achieved ECMA standardization, and is now proceeding towards ISO, a move ardently opposed by IBM and other ODF supporters.
Paoli talked about how Microsoft has been working with XML in office for a long time, since Office 2000 in fact. It’s true, and I remember the launch of Office 2000 and the demonstration of round-tripping Office documents between HTML and native office, thanks to embedded XML tags. It turned out to be a feature more cursed than admired, because of the bloat it added to Word documents exported to HTML, but nevertheless he is right – Microsoft has been working seriously on XML in Office for many years, and it must frustrate the company to see the later ODF specifications sneaking ahead in the standards game. Let’s also note that the decision to have Office 2007 save by default in XML is a bold move, ensuring immediate wide usage of OOXML, but also risking the annoyance of existing customers, most of whom know and care little about document formats but just want seamless interchange. Sending a .Docx to a Mac user, for example, may cause real difficulty for the recipient.
“Open Document Format and Office Open XML have very different goals”, says Paoli, responding to the claim that the world needs only one standard XML format for office documents. “Both of them are formats for documents … both are good.”
What’s distinctive about the goals of OOXML? Primarily, to have full fidelity with pre-existing binary documents created in Microsoft Office. “What people want is to make sure that their billions of important documents can be saved in a format where they don’t lose any information. As a design goal, we said that those formats have to represent all the information that enables high-fidelity migration from the binary formats”, says Paoli. He mentions work with institutions including the British Library and the US Library of Congress, concerned to preserve the information in their electronic archive.
I asked Paoli if such users could get equally good fidelity by converting their documents to ODF. “Absolutely not,” he says. “I am very clear on that. Those two formats are done for different reasons.”
What can go wrong? Paoli gives as an example the myriad ways borders can be drawn round tables in Microsoft Office and all its legacy versions. “There are 100 ways to draw the lines around a table,” he says. “The Open XML format has them all, but ODF which has not been designed for backward compatibility, does not have them. It’s really the tip of the iceberg. So if someone translates a binary document with a table to ODF, you will lose the framing details. That is just a very small example.”
Another benefit Paoli claims for OOXML is performance. “A lot of things are designed differently because we believe it will work faster. The spreadsheet format has been designed for very big spreadsheets because we know our users, especially in the finance industry, use very large spreadsheets. They use spreadsheets like databases. It’s not that one is better than the other, it’s that they have been designed for different things.”
I asked Paoli what would be the consequences if in fact OOXML does not become an ISO standard. He will not answer the question directly, but is defensive. “This is a long process. We will continue discussing what we should do better. It’s not like a yes or no. But what’s important is that it is already an ECMA standard. Some governments told us they would prefer it were an ISO standard. So we know that, and respect that.
“We have been in discussion with the IDABC (Interoperable Delivery of European eGovernment Services to Public Administrations, Businesses and Citizens) .In 2004, the IDABC said ‘Microsoft should consider the merits of submitting the XML format to an international standards body of their choice.’ We responded to the IDABC specific ask. It’s surprising to see some people now saying this is a bad thing, when the EU asked us to standardise the format.”
See my further comments about IDABC below. It is clear though that Paoli is upset by what he sees as an international campaign against OOXML orchestrated by IBM, the sole naysayer in the ECMA voting. “There are IBM employees going to ISO, and saying a lot of technically incorrect things. When ODF went to ISO Microsoft did not interfere. IBM is betting on ODF, to have governments preferentially buying IBM software. It is OK to compete, but using this kind of argument around is it an open format or not … it’s widely known now, Office Open XML is an open format, even the EU says it is.”
I put it to Paoli that OOXML is hard to implement because of all its legacy support, some of which is currently not well documented. “I don’t believe that at all. It’s actually the opposite,” he says. He make the point that third parties like Corel, which have previously implemented support for binary formats like .doc and .xls, should find it easy to transition to OOXML. “We believe Open XML adoption by vendors like Corel will be very easy because they have already been doing 90% of the work, doing the binary formats. The features are already there.”
I have been critical of the Microsoft-sponsored open source converter between OOXML and ODF, and its integration with Microsoft Office. Wouldn’t it have been better if Microsoft’s own Office team had worked on this, and come up with something of higher quality?
“It is a version 1, honestly,” he says, adding “I am sure it is not perfect. On performance, we were surprised by the delay that you got. In terms of the fidelity of the translation, that’s why we put it in the open. I am sure this is going to evolve. There are going to be things that will not be able to be translated because the formats are different.”
I mistrust Microsoft’s motives here. Paoli points to the conversion errors as evidence of how poorly ODF can represent legacy Office documents. My hunch is that this has more to do with the poor quality of the converter. Nor is its open source status any excuse. This component, or an alternative converter, is critical to the future of Microsoft Office if, as expected, significant numbers of institutions standardise on ODF. Without a good converter, mandating ODF is in effect mandating non-use of Microsoft Office.
Finally, I asked Paoli whether there will ever be a reference implementation of OOXML other than Microsoft Office. “Absolutely,” he says. “It was announced by both Corel and Sun. They are going to fully implement Office Open XML. Novell also integrated the translator into Open Office. Sun developers also posted on their blog that they are implementing Office Open XML.”
This is a stretch, to say the least. It’s true that Sun says here that there will be support for OOXML in Star Suite (the commercial version of Open Office):
Q: Will StarSuite be compatible with the new ‘Microsoft Office Open XML Formats’ – the new file format for the next release of Microsoft Office?
A: Yes, StarSuite will be compatible with the new file format. Microsoft had not published a specification at the launch time of StarSuite 8. The next release of StarSuite will be able to load and save those files.
Having said that, even if it delivers some sort of import filter, the idea that Sun is preparing a reference implementation of OOXML is laughable. It’s also true that Corel has announced its support for both OOXML and ODF:
Corel’s pragmatic approach to emergent XML file formats provides customers with maximum flexibility, lowers costs and reduces risk by insulating customers against committing to a standard that may not become adopted.
But once again this is not a reference implementation, merely a promise of compatibility, with who knows how long a list of errors and omissions.
I am bewildered by Paoli’s response to my question. Surely he understands the difference between a reference implementation and an import/export filter? Here’s Wikipedia , quoting from NIST (National Institute of Standards and Technology):
A reference implementation is, in general, an implementation of a specification to be used as a definitive interpretation for that specification. During the development of the … conformance test suite, at least one relatively trusted implementation of each interface is necessary to (1) discover errors or ambiguities in the specification, and (2) validate the correct functioning of the test suite.
Some closing thoughts
On the face of it standardising Office Open XML is a benefit. Even if there is no full implementation other than Microsoft Office, it helps developers working with the formats, by making them less of a moving target. So what reason is there to oppose standardisation?
Well, if a customer is offered two office suites, and one can the tick the ISO box whereas the other cannot, that could well swing the deal, especially in government and academic markets. It follows that opposing standardisation is a good way to damage Microsoft in one of its core markets. However, such motivations are not meant to drive standards bodies.
Does it matter if OOXML is not standardised with ISO? For those with a commercial interest, of course it does. For users, it could matter if they are forced to switch from Microsoft Office to Open Office solely because ODF is the ISO standard, and suffer loss of productivity or failures in working with existing documents.
In saying this, I am presuming that Microsoft Office is a poor choice if you want to work with ODF – correct, I think. Switching from one office suite to another can be costly, not only because of training, but also because of the large number of templates, macros and applications which rely on Microsoft Office.
Of course there may also be good reasons for migrating from Microsoft Office to Open Office. One is that Open Office is free, open source and cross-platform, while Microsoft Office is not. It is easy to build a case for Open Office without needing to play the ISO card.
In practice, I doubt that ISO standardisation of OOXML would much hold back ODF adoption. Even Microsoft’s own arguments may count against OOXML. Microsoft seems be saying that Office Open XML is designed primarily to be able to translate to and from Microsoft Office binary formats without loss of fidelity. That is its foremost argument for wanting OOXML standardized alongside ODF, since ODF does not have this goal. This legacy support is costly, because it bloats the specification. Should we than conclude that ODF is the best specification for new documents, while OOXML is mainly suitable for archiving legacy documents? That seems logical; yet I am sure Microsoft would resist such a conclusion. If ISO standardization is achieved, Microsoft is not going to go to its customers and say, “Use OOXML for legacy documents, and ODF for new documents”. No, it is going to say, “Use ISO Standard OOXML for all your documents.” I suggest there is doublethink here.
Finally, those on both sides of this debate could do better in presenting their case. On the IBM/ODF side there is open hostility; while Microsoft does too little to engage with the community and to have the technical debate it claims to welcome (I give honourable exception to Brian Jones, who is a model advocate on his blog). Silly comments about reference implementations and the poor quality of the Microsoft-sponsored OOXML/ODF converter do not help.
Postscript on IDABC recommendations
The documents from the IDABC referenced by Paoli are here. The recent PEGSCO (Pan-European eGovernment Services) Committee report is an interesting read. As Paoli notes, it welcomes the standardisation of both OOXML and ODF, and adds:
Both the ODF and the OpenXML document format specifications are XML based, promising great opportunities to explore the information contained in documents via tools other than traditional office suites. Examples of such exploration include indexing of document collections, automatic extraction of metadata from documents, search-engines, extraction of specific information for re-use, etc.
That said, more is said about adopting ODF than OOXML, and the report is worried about the existence of two standards:
Member State experts have identified the perceived compatibility problems between ISO 26300 (ODF) based products and the commercial applications that dominate the offices of today’s administrations as the main barrier for the use open document exchange and storage formats. The potential arrival of a second international standard for revisable documents may mean that administrations will be required to support multiple formats leading to more complexity and increased costs. Although filters, translators and plug-ins may theoretically enable interoperability, experience shows that multiple transformations of formats may lead to problems, especially as there is no complete mapping between all features of each of the different standards. Technical experts that are familiar with both standards also indicate that there remain, for each of the two standards, a number of technical problems to be solved.
On to the key section, recommendations:
Industry, industry consortia and international standardisation bodies are invited:
6.6. To work together towards one international open document standard, acceptable to all, for revisable and non-revisable documents respectively.
So standards are good; one standard is better. Comfort for both sides here.