Tag Archives: ooxml

The UK government is adopting Open Document: some observations

The UK government is adopting the Open Document Format for Office Applications, for documents that are editable (read-only documents will be PDF or HTML). You can read Mike Bracken’s (Government Digital Service) blog on the subject here, and the details of the new requirements here. If you want to see the actual standards, they are on the OASIS site here.

I followed the XML document standards wars in some details back in 2006-2008. The origins of ODF go back to Sun Microsystems (a staunch opponent of Microsoft) which acquired an Office suite called Star Office, made it open source, and supported OpenOffice.org. My impression was that Sun’s intentions were in part to disrupt the market for Microsoft Office, and in part to promote a useful open standard out of conviction. OpenOffice eventually found its way to the Apache Foundation after Oracle’s acquisition of Sun. You can find it here.

During the time, Microsoft responded by shifting Office to use XML formats by default – these are the formats we know as .docx, .xlsx etc. It also made the formats an open standard via ECMA and ISO, to the indignation of ODF advocates who found every possible fault in the standards and the process. There were and are faults; but it has always seemed to me that an open XML standard for Microsoft Office documents was a real step forward from the wholly proprietary (but reverse engineered) binary formats.

The standards wars are to some extent a proxy for the effort to shift Microsoft from its dominance of business document authoring. Microsoft charges a lot for Office, particularly for businesses, and arguably this is an unnecessary burden. On the other hand, it is a good product which I personally prefer to the alternatives on Windows (on the Mac I am not so sure), and considering the amount of use Office gets during the working day even a small improvement in productivity is worth paying for.

As a further precaution, Microsoft added ODF support into its own Office suite. This was poor at first, though it has no doubt improved since 2007. However I would not advise anyone to set Microsoft Office to use ODF by default, unless mandated by some requirement such as government regulation. It is not the native format and I would expect a greater likelihood that something could go slightly wrong in formatting or metadata.

Bracken does not mention Microsoft Office in his blog; but as ever, the interesting part of this decision is how it will impact Office users in government, or working with government. If it is a matter of switching defaults in Office, that is no big deal, but if it means replacing Microsoft Office with Open Office or its fork, Libre Office, that will have more impact.

The problem with abandoning Microsoft Office is not only that that the alternatives may fall short, but also that the ecosystem around Microsoft Office and is document formats is richer – in other words, tools that consume or generate Office documents, add-ins for Office, and so on.

This also means that Microsoft Office documents are, in my experience, more interoperable (not less) than ODF documents.

That does not in itself make the UK government’s decision a bad one, because in making the decision it is helping to promote an alternative ecosystem. On the other hand, it does mean that the decision could be costly in constraining the choice of tools while the ODF ecosystem catches up (if it does).

How does the move towards cloud services like Office 365 and Google Docs impact on all this? Microsoft says it supports ODF in SharePoint; but for sure it is better to use Microsoft’s own formats there. For example, check the specifications for Office Online. You can edit docx in the browser, but not odt (Open Document Text); it is the same story with spreadsheets and presentations.

Google has recently added native support for the Microsoft formats to Google Docs.

Amazon’s Zocalo service, which I have just reviewed for the Register, can preview Microsoft’s formats in the browser, but while it also supports odt for preview, it does not support ods (Open Document Spreadsheet).

A good decision then by the UK government? Your answer may be partly ideological, but as a UK taxpayer, my feelings are mixed.

For more information on this and other government IT matters, I recommend Bryan Glick’s pieces over on Computer Weekly, like this one.

Microsoft Open XML embarrassment: spaces go missing between words

Microsoft’s controversial Office Open XML format, now officially called just Open XML*, has an embarrassing bug in its Office 2010 and/or Office 2007 implementation, as reported by  Dennis O’Reilly on Cnet.

In a nutshell: if you save a document from Word 2010 using the default .docx format, and send it to a user with Word 2007 but who has a different default printer driver, then a few seemingly random spaces may get dropped from between words or sentences when it is opened on the other machine. When saved in Word 2007, the spaces remain missing if the document is re-opened in Word 2010.

The consequences for one user were severe:

I had this same problem the other day, when I finished writing an in-class essay on my laptop (Win7 64-bit, Office 2010 32-bit), transferred it to a classroom computer (WinXP, Office 2007), and printed the document. I was out of time, so I had to turn in the paper without reading over the printed copy. I had triple-checked the essay on my laptop, so it had no spelling or formatting errors, right?

I got my essay back, and I had 20% of my grade taken away due to frequent spacing errors between words. Shocked, I double-checked my original copy of the document, and there were no spacing errors. Even more perplexing, I opened the file on a classroom computer, and, sure enough, I found many spacing errors between words and sentences.

Now, as I understand it a large part of the point of Open XML is to preserve fidelity in archived documents so I consider this a significant bug.

I’ll speculate a bit on why this problem occurs. It is a bug; but it also reflects the fact that Word is a word processor, not a professional text layout tool. Word processor documents may change formatting slightly according to the printer driver installed; and I’d guess that the missing spaces occur when the line breaks are altered by a different printer driver.

This is why a workaround is for both users to set Adobe PDF as the default printer driver, making them consistent. Another workaround is to revert to the old binary .doc format.

It is still quite wrong for spaces to disappear in this manner, though the bug could be in Word 2007 rather than in Word 2010.

I also notice that nobody from Microsoft has officially commented on the problem. Disclosure is important.

Update: Microsoft has now commented and says:

This is an issue related to how Word 2007 opened files. In other words, the issue is not with Word 2010, it was a defect in the file / open code of Word 2007 that caused the problem. Reports that Open XML caused this issue are not accurate. We discovered and fixed the issue in Word 2007 as part of a release that first appeared on September 25, 2008, well before shipping Office 2010.

The suggested remedy is to apply Office 2007 Service Pack 2.

If you have already applied this and still get the problem, please inform Microsoft – and I would be interested too.

*Note: Although Microsoft sites like this one say Open XML I’m told that the official name is still Office Open XML or possibly something like ISO/IEC 29500:2008 Office Open XML File Formats.

Office 2010 offers choice of Open Document or Microsoft XML formats

I was surprised to see the following dialog after an in-place upgrade of Office 2007 to Office 2010:

image

Admittedly there is a strong steer towards the Microsoft formats which, we are told, are “designed to support all the features of Microsoft Office”.

On the other hand, this was an in-place upgrade and default save options were already present in Office 2007. Given that most in-place upgrades preserve settings – which is part of the point of an in-place upgrade – you would expect it just to keep the old defaults.

I’m guessing therefore that this is aimed at appeasing/convincing regulators and governments that Microsoft Office plays nice with standards.

That said, there is little reason to choose the ODF format unless it is required. It will cause problems with formatting and content, and is especially risky with Excel spreadsheets.

If you want to use ODF, save money and get more complete support by using OpenOffice.

Update: Neowin has some background here.

Microsoft accused of failure to observe Open XML standards process

XML specialist Alex Brown, who was involved in the ISO standardisation of Microsoft’s Open XML – still perhaps best known as OOXML – says Microsoft has failed to honour the commitments it made when the standard was approved. In particular, it seems little progress has been made between Office 2007 and Office 2010. The key problem is that Microsoft implemented Open XML before it was standardised. There were numerous changes made during the standardisation process, but what to do about the existing implementation? Loosely, the existing unacceptable format was given a “Transitional” status, while the more satisfactory, corrected format was called “Strict”. Microsoft promised to implement the “Strict” variant as soon as it could. Brown adds:

I was convinced at the time, and remain convinced today, that the division of OOXML into Strict and Transitional variants was the innovation which allowed the Standard to pass. Enough National Bodies could then vote in good conscience for OOXML knowing that their preferred, Strict, variant would be under their control into the future while the Transitional variant (which – remember – they had effectively rejected in 2007) would remain purely for the purpose of accurately specifying old documents: a useful aim in itself.

It is now two years since Open XML was approved, and Microsoft is on the brink of releasing a new version of Office. So does Office 2010 implement Open XML Strict? Apparently not – it’s the Transitional version. That is bad enough; worse still, according to Brown, it does not even conform correctly to that:

It is also a worrying commentary on the standards-savvyness of the Office developers that the first amateur attempts of part-time outsiders find problems with documents which Redmond’s internal QA processes have missed. I confidently predict that fuller validation of Office document is likely to reveal many problems both with those documents, and with the Standard itself, over the coming years.

Note that Brown is basing his remarks on the preview of Office 2010; we have not seen the final release yet. I can believe that Microsoft may fix some issues, but it looks vanishingly unlikely that Office 2010 will implement the “Strict” standard which ISO approved.

Brown’s remarks shed light on something I noticed when reviewing the preview:

As for Open XML, it’s notable that Microsoft neglects to mention it at all in its Reviewer’s Guide, even though this is supposedly the release that will fully implement ISO/IEC 29500. It is odd how this has gone from a cause to campaign for, to not-worth-mentioning in just over a year. To be fair, few users ever cared about XML formats themselves: it is only when documents get scrambled or fail to open that such things become important.

No wonder Microsoft said nothing about it, if in reality it has lost interest in conformance.

I think it is a good thing for Microsoft to standardise its Office formats. Selfish manipulation of standards committees on the other hand is not acceptable. One thing is for sure: if Brown is right and

without a change of direction, the entire OOXML project is now surely heading for failure.

then the company will only have itself to blame. Its nightmare will re-emerge: entire governments mandating OpenOffice for the sake of  standards conformance.

That said, and despite the hype, I regard Office 2010 as a minor release. 64-bit Excel, a few tweaks, and a first foray into browser-hosted versions. Microsoft often displays this pattern, following up a release with major changes – Office 2007, for example – with one that is really just a refinement of what went before. It is not impossible that somewhere in the corridors of Redmond a team is working on a new Office that does a much better job with the Open XML standard.

Over to Microsoft – serious about Open XML? Or just doing the minimum necessary to protect a lucrative market dominance – maybe a bit less than the minimum?

Update: Microsoft’s Doug Mahugh has replied to Brown’s comments here. I am writing separately about this.