Tim Anderson's ITWriting [Valid RSS]

Tech writing blog

Blog Home RSS Archives ITWriting.com
Add to Feedburner Add to Bloglines Add to Newsgator Add to My Yahoo

February 8, 2005

Is Microsoft Office XML an open standard?

Posted 4691 days ago on February 8, 2005

Last week I attended a press briefing to mark 7 years of XML. One of the speakers was Microsoft's Jean Paoli, Microsoft's XML Architect and someone who has been involved with XML since its beginning. I asked Jean why Microsoft does not submit its Office XML schemas, including WordprocessingML and SpreadsheetML, to an independent standards body.

Paoli's answer was in two parts. First, he explained that Microsoft's primary focus is on custom schemas within Office documents. "When you want to interchange data you need to think about your document as data, in order to be integrated in the database. Just exchanging presentation doesn't work." It's a fair point. Using a custom schema within a document means that the elements have a meaning that is understood within your organization or by your partners. Merely knowning that a certain piece of text is Heading Level 3, or bold, or indented, tells you nothing useful. Knowing that an element represents a customer number, or an order quantity, or a product description, makes the document meaningful. In these cases, Microsoft cannot provide the schema; it is something you have to agree with your partners.

Still, that's not the whole story. If Word, Excel and the other Office applications save documents in a fully documented XML format, it opens up numerous possibilities. For developers, it means they can create applications that consume and generate Office documents without needing to wrestle with OLE structured storage, or COM automation, or the tangle of twisty passages that make up Rich Text Format. For users, it should make it easy to convert documents from Word or Excel to other formats, such as those used by Open Office, without loss of fidelity. There is even the possiblity of multiple applications using the same formats. We see this with HTML, where the same document can be edited in different applications while remaining as HTML. Having the Office XML schemas standardised by an independent body would give them a significant boost.

Unfortunately, Paoli says it won't happen. The reason he gives is as follows:

"Backward compatibility. We have today 400 million users of Office, which means billions of documents. So we went and did a huge job of documenting electronically all these features and we put that into this WordML format. Well we need to maintain this damn thing, and we need to maintain this big format, we have like 1500 tags. Who is going to maintain that? A standard body? It doesn't know what is inside of Word. That's the problem. So we said we are going to give you a license, open and free, you can write filters, we know Open Office is writing filters. It's legal. Also we provide all the technical support, all the documentation, everything is open. Except that we are going to have to maintain it because after that we've got 400 million customers who are going to call us for a bug."

As far as I can see, the implication is that Microsoft is not interested in providing a universal XML schema for describing word processor or spreadsheet documents. Rather, it is interested merely in adding XML persistence to Microsoft Office documents. That means it doesn't matter if the Office schemas are a bit ramshackle and full of bits that only make sense in Microsoft Office. Still useful, and still a good way of enabling XML-based Office solutions, but I think it is a shame that Microsoft didn't go further and try to establish an industry standard. Too difficult? Or was it concerned about accelerating the commoditization of Office, thus reducing its commercial value? By contrast, the Open Office folk clearly are trying to establish a standard. In the absence of any real competition, I guess they are likely to succeed.

If this topic interests you, see this IDABC site for a bunch of interesting reports supplemented by comments from Microsoft, Sun and IBM on the subject of open document formats.

No comments, be the first!

Comments are closed

Recent posts

Users plead with Borland to give up .NET
IE7 to be released 18th October,...
If Microsoft doesn't use UAC, why...
Google's unsettling lack of direction
Vista security: now prove it

Powered by bBlog