Open Document to Office Open XML converter: not good

The first full release of the Open XML to Open Document Format translator is available for download. Great news for interoperability – or is it?

I like to try things out before writing about them, so here’s what I did. I downloaded the Word 2007 add-in and ran the setup. Then I opened Word, and opened the document I was working on, which happens to be called Using DigiKam.docx. This is just under 800 words long and contains no graphics. I went to Home – Save As, and looked for Open Document in the list of document types. No deal. Puzzled, I looked again at the Home menu in Word 2007. Ah, there it is. A separate top-level entry for ODF with Open and Save As menu items. Not ideal in terms of integration, but never mind.

Note: there is an important issue here. Imagine you are an organization that has decided to mandate ODF for your documents, but to continue using Microsoft Office. What you want to do is to fiddle with Group Policy and have Word default to opening and saving ODT (Open Document Text). As far as I can tell, this is not possible with this version 1.0 release. In fact it is worse than that. If you have a new document, and choose ODF – Save As, you get the following error:

Please save your document before exporting to ODF. So instead of just clicking Save, users have to save twice, first as .docx, next as ODT. Ugly. It gets worse, read on.

OK, so I decided to save my current document as ODF. A wait message appeared: it took the converter about 30 seconds to save the document. I don’t like to think what would happen to a 10,000 word report full of charts and tables.

Next, I closed the document, went to ODF – Open, and chose the document I just saved. Another 30 seconds later I get this message about lost elements:

If I go into details, it tells me that the header dimensions and document creation and modification dates might have been lost. Fair enough, nothing drastic – unless perhaps I am laying out a booklet for publication. Of course you would be mad to use a document converter like this in such circumstances – but let’s not forget the implications of potential inflexible government legislation that might mandate such a thing.

I notice a curious thing. My opened document has been renamed to Using DigiKam_tmp.docx. Let me get my head round this. Let’s say I want always to save in ODF. I have to save as .docx, then export to ODF. Then I open the ODF document, which now has _tmp appended. I make some changes, and want to export it as ODF. I get, you guessed it, the “Please save before exporting” message. So I click save, and get a view of all my temporary documents, because the converter puts the imported document in my temp folder. If I try to save it directly, I get a “this file is read-only” error. So I save it to My Documents, then I go to ODF - Save As. Next session, I go to ODF – Open and guess what. My file is now called Using DigiKam_tmp_tmp.docx.

So the message is: don’t even think about using this converter as a means of standardising on Open Document while still using Word. It will cause immense and unnecessary hassle. However, it could still be useful for importing and exporting documents interchanged with others using, say, Open Office.

Not the same

That said, I noticed something else about my round-tripped document. It was different. In Word, I have my Normal style set with no space before or after. After round-tripping, these paragraphs had 10pt space after applied.

It gets worse. The converter lost all my paragraph styles – not the formatting, but the style tagging. This is a deal-breaker for me, as I depend on paragraph styles; but I am probably in a minority. Still, it prompted me to look at the list of unsupported features. Casting my eye down the page I came across this item:

In Open XML in real spacing between two consecutive paragraphs is the biger [stet]. For example first paragraph style has spacing after 10pt and second has spacing before 20pt the real spacing is 20pt. In Open Document Format real spacing is sum. In our example the real spacing is 30pt.

Is that my spacing problem? It could be related; but this is not what I would call a model of clarity. Let’s just say that the ODF converter will mess up your paragraph spacing.

Question: why was I warned that I might lose “header dimensions”, but these more significant issues – no paragraph styles, messed up spacing - went unmentioned?

Not professional quality

I realise that despite the flaws this converter could be a life-saver if you get a document that would otherwise be unreadable, or if you are forced by regulation to send a document in ODF format. However it does not merit Microsoft’s effusive press release, nor Brian Jones enthusiatic blog entry. It falls far short of the standards set by Microsoft Office. Perhaps I am judging too swiftly; but you will understand my scepticism considering the design flaws noted above, the extreme performance problems, and the fact that it somewhat messed up my short document without any graphics.

Practical considerations

In closing, some practical notes. If you really want to work with Open Document, don’t use Microsoft Office. If you want to use Microsoft Office, don’t use the converter except in an emergency, not in this release at least. For Word documents, RTF is the least bad option and macro-free; or failing that, the Office binary formats are actually well understood by third-party applications.

What if you use an application that supports Open Document and want to distribute richly formatted documents to others? Well, in the real world Microsoft Office is everywhere, so the same applies: RTF or Microsoft Office binary formats will help the recipients to get their work done.

Update: I spoke to Microsoft’s Jean Paoli about a number of Office Open XML issues – see here for the interview. He acknowledged there are some issues but said that performance is usually better than I found it to be. I’m sceptical but will try to do some more testing.

Technorati tags: , , , , , , ,

VN:F [1.9.18_1163]
Rate this post
Rating: 0.0/10 (0 votes cast)

Related posts:

  1. Office 2010 offers choice of Open Document or Microsoft XML formats
  2. Why Open Office does not import/export Microsoft Office Open XML
  3. The UK government is adopting Open Document: some observations
  4. Office Web Apps better then Open Office for .docx on Linux
  5. First encounter with Office 2007 document compatibility

8 comments to Open Document to Office Open XML converter: not good

  • I came to the same conclusion on my Great Software blog last week. This is a predicted embarrassment for this Microsoft-funded effort. And you’re right: either get with ODF now or just use .doc format under Word within Compatibility Mode, skipping OXML altogether. While the future of ODF looks bright and strong, OXML looks dangerously unstable.

  • Has anyone got hold of the Sun version? This should integrate (i.e. exist in Save-As and option to default to odt) and provide a fairly solid quality of export import. They also claim a version for odp and ods will be available in the spring.

  • Tony McNamara

    I would have thought that, given that Open XML has yet to be ratified at ISO/IEC, Microsoft would have been out to impress. This certainly does not impress. Or is it that this effort has proven the criticism levelled at the ECMA-approved documentation, and that a document format that requires 6,000 pages to describe is, like, 5,000 pages too many?

  • After skimming through the code base, I can say pretty confidently that the problems this converter has in terms of speed (and perhaps fidelity) is due to the extensive use of XSLT. XSLT is a remarkable language, and I love it, but for extreme performant converters handling large documents, it’s simply just not fast enough. The converter even seems to be using XslCompiledTransform, which should be the fastest way to transform via XSLT in .NET, but it’s still not very performant compared to direct “hands-on” conversion in a SAX-like manner.

    It would be interesting to see how Saxon fared in an equal converter with the same XSLT files, though. Since it exists for .NET, the converter could just as well have used that instead of Microsoft’s System.Xml.Xsl library.

  • Tim

    After skimming through the code base, I can say pretty confidently that the problems this converter has in terms of speed (and perhaps fidelity) is due to the extensive use of XSLT.

    I agree. A curious design decision. I made this point to Jean Paoli who just muttered something about it being a first attempt. Yet, it seems is almost as if it is deliberately poor, checking a box without really delivering. Microsoft could do much better.

    Tim

  • “They also claim a version for odp and ods will be available in the spring.”
    Hmmm… I don’t think so…

  • “They also claim a version for odp and ods will be available in the spring.”
    Hmmm… I don’t think so…

    Soft Maniac, Tim was write this post at last winter ;)