Miguel de Icaza on ODF vs OOXML

Novell’s Miguel de Icaza has an important and unusual perspective on Microsoft technology. Unlike many open source advocates, he is deeply familiar with the Microsoft platform because of his work on Mono, the open source implementation of the .NET Framework. I therefore read with interest his comments on the war against Microsoft Open Office XML now being waged by the sponsors of the rival Open Document Format. 

As de Icaza observes, it is “hard to articulate” the difference between OOXML and ODF. They are XML schemas, inpenetrable to non-technical folk. Both appear to do the same thing, yet in detail they have little in common. Here’s a key comment:

The high-level comparisons so far have focused on tiny details (encoding, model used for the XML). There is nothing fundamentally better or worse in those standards like there is between XML Schema and Relax NG. ODF grew out of OpenOffice.org and is influenced by its internal design. OOXML grew out of Microsoft Office and it is influenced by its internal design. No real surprises there.

I agree. But isn’t the OOXML specification too bulky and verbose, as its opposition claims?

If Microsoft had produced 760 pages (the size of ODF) as the documentation for the “.doc”, “.xls” and “.ppt” that lacked for example the formula specification, wouldn’t people justly complain that the specification was incomplete and was useless?

Quite possibly. And I am unimpressed by the efforts of Rob Weir and others at IBM in taking pot shots at flaws in OOXML rather than being constructive in helping Microsoft transition from proprietary binary document formats to XML formats with a standardised specification.

That said, OOXML and ODF do have different aims, something which Weir does not recognize. He writes in his response to de Icaza:

OOXML, on the other hand, matches to an inane degree the internals of a single vendor’s legacy application, with no concessions to platform-neutrality.

The point Weir misses is that (as I understand it) the rationale behind OOXML is to be able to represent all the world’s immense archive of Microsoft Office documents in an XML format with a published specification and without loss of information. In that sense, its goals are less lofty than those of ODF, which wants to be the one true office document specification for the world.

That means OOXML has a huge legacy burden to carry. It also implies that much of the cruft in OOXML is not there to be used by new applications, but rather to document what has to be done to support old stuff in Office.

My background is in software development, and I’ve explored the intricacies of RTF (Rich Text Format), the non-XML specification for Word documents and pretty much what you had to use prior to OOXML. I found the documentation inadequate, too closely tied to versions of Word, and difficult to work with. OOXML is delightful in comparison. The ability to generate and consume Office documents in XML substantially benefits developer productivity.

Another benefit is in working with Office documents on the server. Ugly solutions like automating Office applications on a server in order to create or process documents are no longer necessary.

I therefore disagree that OOXML has no value.

A single Office XML format for the world would have been nice. If the ODF folk had got Microsoft on board in the early days of the specification that might have been possible, though the scenario was politically implausible. What we have instead is two formats; but at least they are both XML and therefore amenable to programmatic manipulation and conversion. I think that’s progress, though it falls short of the ideal. Furthermore, it likely would not have happened without the existence of Open Office and ODF. They have won the argument for open document formats; no need to spoil it by obstructing the standardisation process for which they fought.

 

Technorati tags: , , , ,

19 thoughts on “Miguel de Icaza on ODF vs OOXML”

  1. As a software vendor who is actively developing applications to use both ODF and OOXML, I think you are mixing two different issues. One issue is whether having OOXML is a “good thing”, and another is whether OOXML should become an “open standard”.

    There is no doubt that Microsoft releasing OOXML is a huge step forward. While it was possible to find older documentation for the binary formats, or to hack your way through them, or sign agreements with Microsoft to use the specs, it was not easy, and it was not accessible to the world of XML functionality. It has opened up software possibilities for my company that should be both profitable and useful to the world. I love having OOXML defined and complete.

    So, take it as a given that very few, Rob Weir included, are saying anything like “OOXML has no value”. But that is not the question at hand. The question is, should OOXML be considered an “open standard” by ISO and the world? I think this would be a bad idea all around, even for Microsoft. It would aid in Microsoft’s battle for the hearts and souls of some governments, but ti would also tie their hands to an external standard that would be increasingly less controlled internally. It is a bit like the different between saying “The United Nations is a good and positive thing” and saying “The United Nations should rule the world”. You can think the U.N. is a positive force and not want it making all the rules for your country. Microsoft wants the benefits of a standard without the constraints, and I think they are likely to greatly regret this move in a few years, because if OOXML is made a standard, the battle then shifts to modifying the standard in ways that are useful to other companies, and Microsoft may wind up not complying to its own standard, or wanting to change parts and unable to get them changed.

    But even aside from Microsoft’s woes, an open standard is supposed to be more than useful, which OOXML is, and also be prescriptive. This just isn’t a standard that others can write to. There are many specifications and published API’s put out by many companies and software vendors. Almost every software product I have ever used has come with some sort of internal specifications and API, but they are not a standard I have to meet. They are instead a set of directions about how to work with that software. OOXML falls neatly into the category of how I can work with Microsoft Office 2007, but not of a portable format. I can depend on a JPEG file or an XML file working in lots of different applications. HTML is a format that may not render the same in every application, but renders reasonably in all of them. I just don’t see how OOXML is going to fit that standard. It is too specific to the functioning of a single application.

    So, OOXML is good, useful and a positive enhancement to those working with Microsoft Office. It isn’t, thank goodness, an official standard. That allows Microsoft to add to it or delete from it at will, which is actually better for me as a vendor. ODF is less specific, but better suited to open, transportable standard, although it too needs s a lot of work.

  2. I agree that OOXML is of huge value in just being specified, however it “looks” from a technical standpoint. However, when looking at it from a technical standpoint, the specification sucks. Yay for open standards and everything, but as Rob Weir puts it, “it’s not a specification. It’s a DNA sequence”. He’s spot on with that comment and that’s OOXML’s problem.

    I understand OOXML’s role in defining and specifying how the existing documents “out there” works, but what Microsoft has done is basically documenting all of the extremely weird stuff that goes on in their binary formats and then directly, without any sanitizing, without any canonicalization and most importantly; without doing any changes to Microsoft Office, define a serialization of all of this hubajuba in XML. It’s great that it’s XML, but it’s bad that it’s just a serialization of an extremely broken data and processing model.

    What they could have done instead of just documenting how the binary stuff works (or, how it’s supposed to work in Office’s new serialization of the binary stuff) is to build it all on top of ODF or at least try to specify an elegant format that caters to more use cases and applications than Microsoft Office.

  3. Microsoft were on board early in the process, having membership of the OASIS technical committee. It is however not in their interest to compete head-to-head with their office suite competition. Supporting a genuinely open, designed for the job, international standard for office file formats, would have done – it would allow organisations to deploy their own choice of productivity tools based on their unbiased assessments of cost v. value.

  4. > what Microsoft has done is basically
    > documenting all of the extremely weird stuff
    > that goes on in their binary formats and
    > then directly, without any sanitizing,
    > without any canonicalization and most
    > importantly; without doing any changes to
    > Microsoft Office, define a serialization of
    > all of this hubajuba in XML.

    Overstating the case I think. There is something in what you say but it is not that bad.

    Tim

  5. Ben,

    > I think you are mixing two different issues.
    > One issue is whether having OOXML is a “good
    > thing”, and another is whether OOXML should
    > become an “open standard”.

    I agree these are separate issues; however I do feel there are benefits in standardisation specifically because it somewhat constrains the primary vendor. Part of the reason RTF is such hard work is that it changes with every new release of Word. Further, both Wordpad and Word are meant to support RTF but making something compatible with both is tricky. Standardisation helps to prevent this kind of mess.

    There is also a benefit for governments etc who have more assurance that the documents they create will be readable in future.

    Tim

  6. > Microsoft were on board early in the process

    Maybe in a nominal, notional sense; but not in a real sense. Otherwise we would not have OOXML…

    Tim

  7. I don’t think Weir is necessarily against OOXML as a format; he’d accept that it has a niche. His point has been that it’s silly to call OOXML a “standard”, because it isn’t standard anything and never will be. As you say, it’s designed to express the contents of archived Office documents in a particular proprietary binary format. For that, it’s not bad, though it’s also incomplete and not always good XML (back to those “pot shots”.)

    But why on Earth would anyone push it as a competitor to ODF? I’d rather have legacy documents stored as documented OOXML than undocumented binary blobs, but I’d rather still have them in a real standard, which is to say, one not full of a particular vendor’s cruft. From that point of view, I like OOXML as a half-way house on the way to real interoperability, but nothing more.

    The problem is that it’s going to be used for new documents as well as old, and people who don’t know the technical issues involved will believe that both specifications are equal. It doesn’t matter that a format is “open” if it’s crufty and needlessly difficult to implement.

    What’s especially vexing is that the legacy-compatibility cruft could probably have been specified in extensions to ODF. That is, after all, why XML starts with an X.

  8. > I don’t think Weir is necessarily against OOXML as a format

    Really? Read his blog.

    > His point has been that it’s silly to call OOXML a “standard”,
    > because it isn’t standard anything and never will be.

    Why not? Why is it different, for example, from Adobe’s forthcoming standardisation of PDF?

    > It doesn’t matter that a format is “open” if it’s crufty
    > and needlessly difficult to implement.

    True, but again I think you overstate the case. You don’t need to implement or use much of OOXML for it to be useful.

    Tim

  9. One of the problem is :
    you can’t use xooml to create a fully Microsoft-compatible product. For example you don’t have VBA. Also, I believe the legacy format (including bugs) are not all there.
    Of course, I would be happy to be proved wrong : just write one! (word or excel compatible) And I will publish a full page of how good xooml is :))

  10. > you can’t use xooml to create a fully Microsoft-compatible product.

    I’m sure that’s true if “fully” means implementing every last bit of legacy cruft.

    However it should be feasible to write an app that generates valid OOXML and imports OOXML with graceful degradation.

    Tim

  11. As someone who has both a background in software development and works every day with Office documents, all I care about is my ownership of the information. If I were to find out that by adopting a particular format I had been locked in uncessarily to a particular vendor’s products then I would have to look very hard for scenarios under which this lock-in became an issue.

    The only one I can currently think of that might affect me regards whether the huge legacy store of information that my vast employer generates became inaccessible if we were to decide to change our office platform (which is highly unlikely). Where I work, there are several text-mining initiatives underway to extract and index the document store. Adoption of *any* XML standard would be a boon to such an initiative and this matters far more to us than whether we can adopt OpenOffice at some nebulously defined date in the future.

  12. Tim said “The point Weir misses is that (as I understand it) the rationale behind OOXML is to be able to represent all the world’s immense archive of Microsoft Office documents in an XML format with a published specification and without loss of information.”

    Really? Heard of VBA macros I guess. What’s a document value if you can’t instantiate it because you are using an application whose implementation is based on a paper which does not define VBA macros?

    It so happens that many business word and excel documents use VBA macros.

    I’m pretty sure you’d hate to take the responsability to put a product out there that would do just that. As it most certainly would fly back on your face quite rapidly. Deservedly so.

    (and before you ask, VBA macros is a simple example. You need to bring the entire “legacy” documents are made of, a number of elements are simply not defined, specified or mentioned in the Ecma 376 paper).

  13. > Really? Heard of VBA macros I guess. What’s
    > a document value if you can’t instantiate it
    > because you are using an application whose
    > implementation is based on a paper which
    > does not define VBA macros?

    Have you thought this through? First, a macro-supplemented document is an application, not just a document. Second, even if you fully documented VBA (which actually is pretty well documented, though not part of OOXML) it would not guarantee that you could implement a reader successfully. Reason: that macro could call the Windows API, or COM components: it is in effect open-ended. I think it is unreasonable to condemn the spec simply on the grounds that it does not cover VBA.

    How is VBA “a simple example” by the way? I can’t think of anything like it, unless you mean the variants like Word Basic, non-VBA Excel macros etc. Similar reasoning would apply.

    Tim

  14. Tim said “I think it is unreasonable to condemn the spec simply on the grounds that it does not cover VBA.”

    It’s because you are a technical person, not a user.

    Tell users why they should care about what you are talking about. If VBA macros are not supported, then the experience is bad. It’s like visiting a web 2.0 site with javascript disabled.

    So either it’s in, and we have a competitive landscape, or it’s not. In the latter case, MS Office is the one and unique platform and the end of the story. If you admit that, what kind of value do you see in Ecma 376?

    As for VBA being documented, it seems to me you confuse the VBA for Excel, VBA for Word and VBA for Powerpoint documentations which are part of the associated application installs, with the actual developer plumbing documentation. A documentation, currently privately owned by Microsoft alone (in fact, a spin off called Summit Software).

    If you had read the specs in particular when it comes to VBA, you would have noticed that Microsoft introduced (without much fanfare) an additional set of constraints in order to bind the VBA macros to the actual document. This effectively makes the VBA macros dependent to any form of Load, Save and Run in any application that intends to preserve the file format.
    This new attachment (see attribute CodeName in Ecma 376) is not described. In fact, in many places of the specs, the attribute is just described as a string with no explanation of the entire component life cycle that is serialized in it.

    Those are facts.

    PS : I am independent vendor, a neutral party, and sell the most advanced third-party Excel 2007 generator out there at the moment. Believe it or not, I have been through EXACTLY what it takes to implement a part of the specs.

  15. > If you admit that, what kind of value do you see in Ecma 376?

    I think it is a mistake to lump all kinds of documents together. If we are concerned with document interchange and analysis – importing, exporting, parsing and generating Office documents, a standardised format is useful. Frankly I hardly ever need to receive a document containing macros, and in most circumstances would regard it as impolite to send one out. If I were writing an application that would be different – but then I would want to know what platform the application would run on.

    > This new attachment (see attribute CodeName in Ecma 376) is not described.
    > In fact, in many places of the specs, the attribute is just described as
    > a string with no explanation of the entire component life cycle
    > that is serialized in it.

    That sounds like a flaw in the specs. I am sure there are plenty of these; however I would like to see the spec improved and not destroyed, as to my mind it is a step forward.

    See my recent post on the ODF converter:

    http://www.itwriting.com/blog/?p=116

    – it’s sloppy work in my view and needs fixing. But I do want this stuff to work.

    > PS : I am independent vendor, a neutral party, and sell
    > the most advanced third-party Excel 2007 generator out there at the moment.
    > Believe it or not, I have been through EXACTLY what it takes
    > to implement a part of the specs.

    I believe you, and thanks for your comments.

    Tim

  16. Tim said “How is VBA “a simple example” by the way? I can’t think of anything like it, unless you mean the variants like Word Basic, non-VBA Excel macros etc. Similar reasoning would apply.”

    To answer this question, I meant by “a simple example” the fact that VBA macros is not all that is missing in Ecma 376.

    I’ll give you another example since you seem to be eager to hear about them. Password-protection. If you password-protect an Office 2007 document, it becomes an OLE document, not a ZIP file. Good luck finding any reference to the entire encryption/decryption mechanism in the specs.
    If you do a keyword search for “password-protection” in Ecma 376, all you will find is an algorithm and a couple explanations about the hash key computed from the password string. The hash key is stored in the parts in order to perform the password matching later on. But it’s not the password-protection.

    Without describing the password-protection in Ecma 376, Microsoft has very explicitely removed any such document out there from any kind of interoperability.

    And it’s not like password-protected documents are not used out there…

  17. Tim said “I would like to see the spec improved and not destroyed, as to my mind it is a step forward.”

    I would like the TC45 committee to go back to the drawing board until the specs stabilizes. Ideally, an independent implementation should be evidence that the specs as a merit.

    If you are ok with “progressive work”, I think that 1) you are taking the Microsoft party line (just like Miguel did last week, except that he’s still a Novell employee, therefore he has no credibility left whatsoever now), 2) your reasoning allows for Internet Explorer-like strategies to prevent any form of competition in the market place. Do I have to remind the progressive implementations that were plugged in it version after version and how the agenda was to move the barrier entry always higher with each new release?

  18. Tim said “See my recent post on the ODF converter:

    http://www.itwriting.com/blog/?p=116

    – it’s sloppy work in my view and needs fixing. But I do want this stuff to work.”

    I agree with the conclusion. It so happens I did not even have to actually test this thing to draw that conclusion.

    1) The source code is very poor.

    2) The requirements are unheard of (requires Office 2007 which in turn requires XP SP2, requires .NET 2.0)

    3) the development reports that are part of the distrib show that half of action items have either no status or partial status. It has not stopped Microsoft from doing a PR around the world, boasting with an interoperability component that had just reached, quote, “complete” status, end quote. Of course, whoever has been following the JTC1 INCITS committee knows that they are doing a vote this Monday 5 feb 2007. Funny coincidence…

    4) the development of the opposite way (OOXML to ODF) translator started only on December 2006.

    5) the CleverAge guy already said on his blog back on October that the component would not support round-tripping. He said there were fundamental problems. It has not stopped Microsoft from telling the world that this component does interoperate documents, with no limitation whatsoever. Isn’t this a lie by omission?

  19. > I would like the TC45 committee to go back to the drawing board
    > until the specs stabilizes. Ideally, an independent implementation
    > should be evidence that the specs as a merit.

    I respect that view, more than that of ODF advocates playing politics. My question is: how much work does the spec need? Difficult to answer without a lot of research; many of those best equipped to comment seem to have entrenched positions or loyalties which undermine their remarks. It also depends I would suggest on how you want to use the spec. If you want to write an alternative to MS Office of course you will have a different view than if you merely want a stable, standardised document format with which to work.

    > your reasoning allows for Internet Explorer-like strategies
    > to prevent any form of competition in the market place.

    I think you are proposing that standardising the MS Office formats in a way that only Microsoft can implement would impede competition. It would, if there were no alternative, and if you had to implement all or none. But neither of these is the case. I think a standardised MS Office document spec would assist rather than impede competition.

    It’s also obvious that having governments or other institutions mandating that MS Office documents must NOT be used would/will assist the competition – but that strikes me as inflexible and potentially costly to productivity. I’d rather see products compete on merit rather than by regulation.

    Tim

Comments are closed.