Tag Archives: open xml

Microsoft Open XML embarrassment: spaces go missing between words

Microsoft’s controversial Office Open XML format, now officially called just Open XML*, has an embarrassing bug in its Office 2010 and/or Office 2007 implementation, as reported by  Dennis O’Reilly on Cnet.

In a nutshell: if you save a document from Word 2010 using the default .docx format, and send it to a user with Word 2007 but who has a different default printer driver, then a few seemingly random spaces may get dropped from between words or sentences when it is opened on the other machine. When saved in Word 2007, the spaces remain missing if the document is re-opened in Word 2010.

The consequences for one user were severe:

I had this same problem the other day, when I finished writing an in-class essay on my laptop (Win7 64-bit, Office 2010 32-bit), transferred it to a classroom computer (WinXP, Office 2007), and printed the document. I was out of time, so I had to turn in the paper without reading over the printed copy. I had triple-checked the essay on my laptop, so it had no spelling or formatting errors, right?

I got my essay back, and I had 20% of my grade taken away due to frequent spacing errors between words. Shocked, I double-checked my original copy of the document, and there were no spacing errors. Even more perplexing, I opened the file on a classroom computer, and, sure enough, I found many spacing errors between words and sentences.

Now, as I understand it a large part of the point of Open XML is to preserve fidelity in archived documents so I consider this a significant bug.

I’ll speculate a bit on why this problem occurs. It is a bug; but it also reflects the fact that Word is a word processor, not a professional text layout tool. Word processor documents may change formatting slightly according to the printer driver installed; and I’d guess that the missing spaces occur when the line breaks are altered by a different printer driver.

This is why a workaround is for both users to set Adobe PDF as the default printer driver, making them consistent. Another workaround is to revert to the old binary .doc format.

It is still quite wrong for spaces to disappear in this manner, though the bug could be in Word 2007 rather than in Word 2010.

I also notice that nobody from Microsoft has officially commented on the problem. Disclosure is important.

Update: Microsoft has now commented and says:

This is an issue related to how Word 2007 opened files. In other words, the issue is not with Word 2010, it was a defect in the file / open code of Word 2007 that caused the problem. Reports that Open XML caused this issue are not accurate. We discovered and fixed the issue in Word 2007 as part of a release that first appeared on September 25, 2008, well before shipping Office 2010.

The suggested remedy is to apply Office 2007 Service Pack 2.

If you have already applied this and still get the problem, please inform Microsoft – and I would be interested too.

*Note: Although Microsoft sites like this one say Open XML I’m told that the official name is still Office Open XML or possibly something like ISO/IEC 29500:2008 Office Open XML File Formats.

Word 2010 ugly font in .doc format

I’ve come across what looks like a bug in Word 2010. I generally do not send documents in the new Word .docx format, because it can cause problems for the recipient; I prefer to use the old .doc format. I’m also averse to the multi-colour default style set in Word, and generally change it to the Word 2003 style set.

So I typed a document in that style set, and prepared it for sending by using Save As to convert it from .docx to .doc.

Here’s the before:

image

and after:

image

The font spacing has gone awry in the heading. I don’t know to what extent this is specific to a particular font or style; but I have verified the behaviour on a second machine and confirmed that the error exists in the printed output as well as on screen.

It’s unfortunate because .doc support remains a critical feature of Office – if this is a common problem, it would be enough to send me back to Word 2007.

I would love to know what is causing it. I realise there are cases where a .docx cannot be quite the same when saved as .doc, because of different features, but I have never before come across this kind of corruption. Excellent compatibility between .doc and .docx is meant to be a key reason to use Microsoft’s Open XML.

Incidentally, it is not unique to documents which start life as .docx. I get the same problem if I set the default format to .doc and type the same content.

Update

I got this one wrong. It is not a bug in Word 2010; it is the same in Word 2007, and I’m surprised I have not noticed it before. The likely reason is that it only occurs at 16pt and higher, which is when kerning is enabled by default. The fix is to disable kerning in that style (Heading 1):

image

Curiously, the ugly font does look better in Word 2003 on Windows XP; I don’t have Office 2003 installed on Windows 7 so cannot test that combination.

Of course this does still illustrate that saving a .docx as .doc can spoil the formatting.

Office Web Apps better then Open Office for .docx on Linux

I’ve been reviewing Office and SharePoint 2010, and trying out Ubuntu Lucid Lynx, so I thought I would put the two together with a small experiment.

I borrowed a document from Microsoft’s press materials for Office 2010. Perhaps surprisingly, they are in .doc format, not the Open XML .docx that was introduced in Office 2007. That didn’t suit my purposes, so I converted it to .docx using Save As in Office 2010.

image

Then I stuck it on SharePoint 2010.

Next, I downloaded it to Ubuntu and opened it in Open Office. It was not a complete disaster, but the formatting was badly messed up.

Finally, still in Ubuntu, I navigated to SharePoint and viewed the same document there. It looked fine.

Even better, I was able to click Edit in Browser, make changes, and save. The appearance is not quite WYSIWYG in edit mode, but is the same as in IE on Windows.

The exercise illustrates two points. One is that Open Office is not a good choice for working with Open XML – incidentally, the document looked fine when opened in the old binary .doc format. The other is that SharePoint 2010 and Office Web Apps will have real value on mixed networks suffering from document compatibility issues with Office and its newer formats.

Dancing on a pin: Microsoft belatedly answers Open XML critics

Microsoft’s Doug Mahugh has replied to accusations from ISO expert Alex Brown that the company is doing little to implement its own Open XML standard. The issue is that the XML document formats in Office 2007 are, from the ISO perspective, meant to be “Transitional” – a compromised format designed to interoperate with existing binary documents – and that the standard Microsoft is meant to be implementing is “Strict”, an improved standard that can more easily be implemented by others.

Mahugh says:

I’d like to state clearly and unequivocally at this time that we will support reading and writing of ISO/IEC 29500 Strict no later than the next major release of Office, code-named Office “15.”

He doesn’t say whether or not it “Strict” will be the default in Office 15, which we can expect to see in around 2013. This is the real pain-point for users: if the default changes, the result is the frustration of sending or receiving unreadable documents.

Microsoft is dancing on a pin. On the one hand, it wants to convince governments, academics and other standards-sensitive organisations that Microsoft Office does the right thing. On the other hand, the benefit to users of breaking document compatibility for the sake of ISO compliance is rather invisible.

Document compatibility is the thinking behind having read-only support for Strict in Office 2010 (and coming to Office 2007). If Microsoft can get read-only support widely deployed, then in 2013 the Strict documents that start to circulate will not be so problematic.

The approach is not completely unreasonable; these things take time. That said, Microsoft’s communication of its intentions has been poor. Further, Mahugh does not answer the parts of Alex Brown’s post that address quality:

It is also a worrying commentary on the standards-savvyness of the Office developers that the first amateur attempts of part-time outsiders find problems with documents which Redmond’s internal QA processes have missed. I confidently predict that fuller validation of Office document is likely to reveal many problems both with those documents, and with the Standard itself, over the coming years.

My perspective on this as a journalist is that Microsoft did not consider Open XML or standards compliance even worth a mention in its publicity so far and its detailed reviewers’ guide for Office 2010. That suggests it is not much of a priority.

So full support in 2013 or thereabouts. My expectation is that by then saving and editing documents online will be more common than it is today, and that the assumptions the Office team seems to make about the steady progress of its huge desktop suite are likely to prove faulty.

Microsoft accused of failure to observe Open XML standards process

XML specialist Alex Brown, who was involved in the ISO standardisation of Microsoft’s Open XML – still perhaps best known as OOXML – says Microsoft has failed to honour the commitments it made when the standard was approved. In particular, it seems little progress has been made between Office 2007 and Office 2010. The key problem is that Microsoft implemented Open XML before it was standardised. There were numerous changes made during the standardisation process, but what to do about the existing implementation? Loosely, the existing unacceptable format was given a “Transitional” status, while the more satisfactory, corrected format was called “Strict”. Microsoft promised to implement the “Strict” variant as soon as it could. Brown adds:

I was convinced at the time, and remain convinced today, that the division of OOXML into Strict and Transitional variants was the innovation which allowed the Standard to pass. Enough National Bodies could then vote in good conscience for OOXML knowing that their preferred, Strict, variant would be under their control into the future while the Transitional variant (which – remember – they had effectively rejected in 2007) would remain purely for the purpose of accurately specifying old documents: a useful aim in itself.

It is now two years since Open XML was approved, and Microsoft is on the brink of releasing a new version of Office. So does Office 2010 implement Open XML Strict? Apparently not – it’s the Transitional version. That is bad enough; worse still, according to Brown, it does not even conform correctly to that:

It is also a worrying commentary on the standards-savvyness of the Office developers that the first amateur attempts of part-time outsiders find problems with documents which Redmond’s internal QA processes have missed. I confidently predict that fuller validation of Office document is likely to reveal many problems both with those documents, and with the Standard itself, over the coming years.

Note that Brown is basing his remarks on the preview of Office 2010; we have not seen the final release yet. I can believe that Microsoft may fix some issues, but it looks vanishingly unlikely that Office 2010 will implement the “Strict” standard which ISO approved.

Brown’s remarks shed light on something I noticed when reviewing the preview:

As for Open XML, it’s notable that Microsoft neglects to mention it at all in its Reviewer’s Guide, even though this is supposedly the release that will fully implement ISO/IEC 29500. It is odd how this has gone from a cause to campaign for, to not-worth-mentioning in just over a year. To be fair, few users ever cared about XML formats themselves: it is only when documents get scrambled or fail to open that such things become important.

No wonder Microsoft said nothing about it, if in reality it has lost interest in conformance.

I think it is a good thing for Microsoft to standardise its Office formats. Selfish manipulation of standards committees on the other hand is not acceptable. One thing is for sure: if Brown is right and

without a change of direction, the entire OOXML project is now surely heading for failure.

then the company will only have itself to blame. Its nightmare will re-emerge: entire governments mandating OpenOffice for the sake of  standards conformance.

That said, and despite the hype, I regard Office 2010 as a minor release. 64-bit Excel, a few tweaks, and a first foray into browser-hosted versions. Microsoft often displays this pattern, following up a release with major changes – Office 2007, for example – with one that is really just a refinement of what went before. It is not impossible that somewhere in the corridors of Redmond a team is working on a new Office that does a much better job with the Open XML standard.

Over to Microsoft – serious about Open XML? Or just doing the minimum necessary to protect a lucrative market dominance – maybe a bit less than the minimum?

Update: Microsoft’s Doug Mahugh has replied to Brown’s comments here. I am writing separately about this.