Tag Archives: word 2010

Microsoft Open XML embarrassment: spaces go missing between words

Microsoft’s controversial Office Open XML format, now officially called just Open XML*, has an embarrassing bug in its Office 2010 and/or Office 2007 implementation, as reported by  Dennis O’Reilly on Cnet.

In a nutshell: if you save a document from Word 2010 using the default .docx format, and send it to a user with Word 2007 but who has a different default printer driver, then a few seemingly random spaces may get dropped from between words or sentences when it is opened on the other machine. When saved in Word 2007, the spaces remain missing if the document is re-opened in Word 2010.

The consequences for one user were severe:

I had this same problem the other day, when I finished writing an in-class essay on my laptop (Win7 64-bit, Office 2010 32-bit), transferred it to a classroom computer (WinXP, Office 2007), and printed the document. I was out of time, so I had to turn in the paper without reading over the printed copy. I had triple-checked the essay on my laptop, so it had no spelling or formatting errors, right?

I got my essay back, and I had 20% of my grade taken away due to frequent spacing errors between words. Shocked, I double-checked my original copy of the document, and there were no spacing errors. Even more perplexing, I opened the file on a classroom computer, and, sure enough, I found many spacing errors between words and sentences.

Now, as I understand it a large part of the point of Open XML is to preserve fidelity in archived documents so I consider this a significant bug.

I’ll speculate a bit on why this problem occurs. It is a bug; but it also reflects the fact that Word is a word processor, not a professional text layout tool. Word processor documents may change formatting slightly according to the printer driver installed; and I’d guess that the missing spaces occur when the line breaks are altered by a different printer driver.

This is why a workaround is for both users to set Adobe PDF as the default printer driver, making them consistent. Another workaround is to revert to the old binary .doc format.

It is still quite wrong for spaces to disappear in this manner, though the bug could be in Word 2007 rather than in Word 2010.

I also notice that nobody from Microsoft has officially commented on the problem. Disclosure is important.

Update: Microsoft has now commented and says:

This is an issue related to how Word 2007 opened files. In other words, the issue is not with Word 2010, it was a defect in the file / open code of Word 2007 that caused the problem. Reports that Open XML caused this issue are not accurate. We discovered and fixed the issue in Word 2007 as part of a release that first appeared on September 25, 2008, well before shipping Office 2010.

The suggested remedy is to apply Office 2007 Service Pack 2.

If you have already applied this and still get the problem, please inform Microsoft – and I would be interested too.

*Note: Although Microsoft sites like this one say Open XML I’m told that the official name is still Office Open XML or possibly something like ISO/IEC 29500:2008 Office Open XML File Formats.