Microsoft Open XML embarrassment: spaces go missing between words

Microsoft’s controversial Office Open XML format, now officially called just Open XML*, has an embarrassing bug in its Office 2010 and/or Office 2007 implementation, as reported by  Dennis O’Reilly on Cnet.

In a nutshell: if you save a document from Word 2010 using the default .docx format, and send it to a user with Word 2007 but who has a different default printer driver, then a few seemingly random spaces may get dropped from between words or sentences when it is opened on the other machine. When saved in Word 2007, the spaces remain missing if the document is re-opened in Word 2010.

The consequences for one user were severe:

I had this same problem the other day, when I finished writing an in-class essay on my laptop (Win7 64-bit, Office 2010 32-bit), transferred it to a classroom computer (WinXP, Office 2007), and printed the document. I was out of time, so I had to turn in the paper without reading over the printed copy. I had triple-checked the essay on my laptop, so it had no spelling or formatting errors, right?

I got my essay back, and I had 20% of my grade taken away due to frequent spacing errors between words. Shocked, I double-checked my original copy of the document, and there were no spacing errors. Even more perplexing, I opened the file on a classroom computer, and, sure enough, I found many spacing errors between words and sentences.

Now, as I understand it a large part of the point of Open XML is to preserve fidelity in archived documents so I consider this a significant bug.

I’ll speculate a bit on why this problem occurs. It is a bug; but it also reflects the fact that Word is a word processor, not a professional text layout tool. Word processor documents may change formatting slightly according to the printer driver installed; and I’d guess that the missing spaces occur when the line breaks are altered by a different printer driver.

This is why a workaround is for both users to set Adobe PDF as the default printer driver, making them consistent. Another workaround is to revert to the old binary .doc format.

It is still quite wrong for spaces to disappear in this manner, though the bug could be in Word 2007 rather than in Word 2010.

I also notice that nobody from Microsoft has officially commented on the problem. Disclosure is important.

Update: Microsoft has now commented and says:

This is an issue related to how Word 2007 opened files. In other words, the issue is not with Word 2010, it was a defect in the file / open code of Word 2007 that caused the problem. Reports that Open XML caused this issue are not accurate. We discovered and fixed the issue in Word 2007 as part of a release that first appeared on September 25, 2008, well before shipping Office 2010.

The suggested remedy is to apply Office 2007 Service Pack 2.

If you have already applied this and still get the problem, please inform Microsoft – and I would be interested too.

*Note: Although Microsoft sites like this one say Open XML I’m told that the official name is still Office Open XML or possibly something like ISO/IEC 29500:2008 Office Open XML File Formats.

VN:F [1.9.18_1163]
Rate this post
Rating: 7.4/10 (10 votes cast)
Microsoft Open XML embarrassment: spaces go missing between words, 7.4 out of 10 based on 10 ratings

Related posts:

  1. Google, Adobe, Mozilla: Open source war of words is all about owning the platform
  2. UK government’s open source commitment words not deeds says Ingres VP
  3. Microsoft accused of failure to observe Open XML standards process
  4. Microsoft Office Live Workspace: what’s missing from the FAQ?
  5. Dancing on a pin: Microsoft belatedly answers Open XML critics

11 comments to Microsoft Open XML embarrassment: spaces go missing between words

  • Caligula

    “Severe” is 20% of a grade on a single paper?! Yet more watering down of adjectives. Stunning!

  • tim

    @Caligula I don’t know the full context but 20% could be the difference between a pass and fail; for a word processor bug to inflict that on a user is severe IMO.

    All these things are relative. Nobody died.

    Tim

  • A WYSIWYG word processor will try to mimic the printed page, so the document the user is editing is a close match to how it will print out. This requires that the word processor has access to “font metrics” for the fonts used in the document, as well as page dimensions. The spacing between the words is based on font metrics as well as style settings. It would be odd, but it is entirely possible that if there were a failure in font matching or retrieving font metrics for the font in use, for a word processor to get confused and display incorrect letter spacing. But this would typically look bad everywhere in the document. It would not be subtle.

    What is strange in this case is the bug appears to be format-dependent, i.e., it occurs with OOXML but not DOC files. Of course, this is not happening everywhere or we would have heard about this a long time ago, when Office 2010 was in beta or first released.

    In any case, this is one reason why the WYSIWYG model needs to change. The MVP advice to simply install a printer driver misses the mark. We’re quickly moving to netbooks, smartphones and tablets. Should we require printer drivers on every device just to render a document correctly? I don’t think so.

  • Philip

    This issue is NOT only related to Word 2007.

    It happens on documents created in Word in Office for Mac 2011 which are later opened on PCs running Word 2002 (updated as far as possible; yes some organisations still use that).

    Is there a solution to this?

  • Andrew

    Yes its a bug. Yes, I have seen it and it is a major problem in a small percentage of files.

    However, it is not fair to say that Word is not a “professional page layout tool” if you are alluding to other packages. The problem is caused by files created in current versions of software being opened in software released at least three years earlier. Forwards compatibility is not something software developers generally even attempt. For instance, can you open Adobe Indesign CS5.5 files in Adobe Indesign CS5 – Not unless you convert the file to interchange format first. Can you open Adobe Indesign 5.0 files in Adobe Indesign CS4 – Not unless you convert to interchange first.

    What other “professional page layout tool” were you thinking of – do older versions routinely offer the ability to open native file formats created by subsequent versions?

  • tim

    @Andrew your point is weakened insofar as AIUI Open XML is intended to be an interchange format as well as a native format.

    When I say “professional page layout tool” I am thinking of DTP software like InDesign or Quark.

    Tim

  • MichaelK

    I realize this an old article, but it came up in my search for a solution to a similar problem in Word 2003. In my case, the spaces between words go missing when simply copying text in word and pasting into something like a blog. The problem isn’t reproduced when pasting into a simple text editor (but then you loose a significant amount of formatting and you lose all hyperlinks).

    In troubleshooting, I’ve found that Open Office Writer does not produce this problem and is a pseudo work-around.

    [Yes, I realize MS Word 2003 is ancient and now not even supported by Microsoft, but many businesses use software this old. I'm hoping that posting here helps save a few people time in researching the problem.]

  • Jason

    I still get this problem moving between Word:Mac 2011 and MS Word 2007. To be honest it’s really annoying.

  • David

    I have this problem now opening documents in Word:Mac 2011 from colleagues with older versions of Word on PC. Agree – really annoying and makes documents look unprofessional. Embarrassing for me to submit a report with Word-generated typos.

  • Bernie

    I have a similar problem using Open Office Writer. When I write a letter and print it (on a Kodak ESP5200+), it prints with whole lines of text missing, and some text that prints is streaked. The first time, I just thought the ink cartridge needed replacing. Replaced it and attempted to print: Same result. I printed random web page and it printed perfectly. I don’t know if it’s a driver, software or a virus.