Open Office chokes on Open XML spec

I’ve been looking at the notoriously lengthy specification for ECMA-376, also known as Office Open XML. You can download it here in both PDF and Open XML (.docx) format. I grabbed both.

At 5219 pages the Markup Language Reference is a seriously large document. The PDF is about 34MB, and the docx about 14MB. I started with the PDF, which opens easily enough, but is desperately slow to search, prompting me to try the alternate docx version. This dialog amused me:

There are too many spelling or grammatical errors in Office Open XML Part 4 - Markup Language Reference_final.docx to continue displaying them.

The docx took longer to load than the PDF, but searching is indeed quicker. Frankly it’s impressive that it is usable at all. I was trained to avoid long documents, on the grounds that they are prone to corruption, and that if they do corrupt you lose more work. That was when we thought 100 pages was long. I cannot think of any good reason why this document is not broken into smaller pieces

Still, it makes an interesting test case. I wondered how Open Office would cope with a document this size. I saved it as a .doc – 68MB – and loaded it into Open Office 2.1. This took several hours (I left it overnight). Once loaded, Open Office repaginated the document to 7453 pages. Searching it was pretty quick though, if anything faster than Word. Finally I saved it in Open Document format. 15MB. Note that both Open Document and Open XML are zipped formats, which explains their smaller size.

Open Office takes 31 minutes to load the Open Document version – quicker than loading the .doc, but not tolerable for normal work. By contrast, Word 2007 can load the .docx to a usable state in around 5 minutes. It all suggests that Word 2007 and/or Open XML is superior for very long documents. I’m using Vista by the way, with 3GB RAM.

What about the content? So far I’m impressed. The entries I’ve looked at have been clear,to the point, and include example code. No doubt there are dark corners, but this strikes me as a good effort.


  1. No surprise there. Office Open XML is a decoy, chaff to obscure the ISO standard document format used by everybody but Msft. Ignore the phony format and use the real one.

    More anticompetitive behavior from the poster child for why we need a 21st century antitrust law.

