All posts by onlyconnect

Microsoft’s Jean Paoli on Office Open XML

I spoke to Jean Paoli about Office Open XML and its standardisation. I respect Paoli, one of the originators of the XML specification. His major point, apart from complaining about what he calls IBM’s orchestrated campaign against the ISO standardisation of OOXML, is that only Microsoft’s XML format can maintain fidelity with legacy Office documents. Unfortunately the example he gives – borders around a table – is not often a critical feature; but in general I take the point. He seemed not to understand my question about whether there will be a non-MS Office reference implementation.

Leaving aside OOXML vs ODF for a moment, Paoli observes that “The responsibility of migrating 450 million users is huge.” He is talking about the decision to make XML the default format in Office 2007. Undoubtedly a brave move, and painful for users in some cases, but for developers the ability to work with XML (whether it is OOXML or ODF) is a joy compared to the old binary formats, or Word’s Rich Text Format.

Technorati tags: , , , ,

Will CDs become worthless?

I often see classified ads for CDs with a stated reason for sale like this one, plucked from eBay today (emphasis mine):

CD & INLAY EXCELLENT. CASE SCRATCHED BUT NO CRACKS Having a clearout as gone digital. All CDs genuine and in very good condition unless stated.

Now, one can only speculate about the meaning of “Having a clearout as gone digital”. Perhaps it means the person has purchased all the songs they want from iTunes or similar, or has a Rhapsody subscription, and is now selling off CDs they no longer use. Alternatively it might mean these CDs have been ripped to PC or Mac and are now equally redundant, though one can question the legality and/or ethics of ripping a CD, selling it, and continuing to enjoy the music. In one sense it matters little; the question that intrigues me is what effect this activity has on the secondhand market. If enough people follow suit there is going to be a huge excess of supply over demand.

I can think of several reasons why CDs might remain desirable in a music server household:

  • As proof of ownership of a some kind of licence to rip
  • So you can admire the sleeve and read the booklet
  • To obtain that special mix or mastering that isn’t easily found online
  • Because you can’t bear to admit that your expensive CD player no longer has a useful function

Now ask yourself how much any of the above matter to the average person.

Collectors will still collect, of course. But my advice to anyone contemplating the sale of their CD collection is: do it soon.

 

Technorati tags: , , , , ,

Official performance patch for Outlook 2007

Computerworld has drawn my attention to a new performance patch for Outlook 2007, issued on Friday. Here’s what Microsoft says:

This update fixes a problem in which a calendar item that is marked as private is opened if it is found by using the Search Desktop feature. The update also fixes performance issues that occur when you work with items in a large .pst file or .ost file.

The patch is welcome; there’s no doubting that Outlook 2007 has proved horribly slow for many users. But does it fix the problems? If you read through the comments to earlier postings on this subject you’ll notice that there are actually several performance issues. The main ones I’m aware of:

  1. Slow receive from POP3 mail servers. Sometimes caused by conflicts between Vista’s TCP optimization and certain routers – see comment 27 here for a fix.
  2. Add-ins, for example Dell Media Direct, Acrobat PDFMaker, Microsoft’s Business Contact Manager. See Tools – Trust Center – Add-ins and click Go by the “Com Add-ins” dropdown to manage these.
  3. Desktop search indexing. You can disable this (it’s an add-in) but it is a shame to do so, since it is one of the best new features.
  4. Large local mailbox – could be a standalone .PST (Personal Store), or an .OST (Offline Store) that is kept in synch with Exchange.

The published fix appears to address only the problem with large local mailboxes.

Does it work? I’ve applied it, and it seems to help a bit, though I reckon performance remains worse than Outlook 2003. My hunch is that the issues are too deep-rooted for a quick fix, especially if you keep desktop search enabled. I’ll be interested to see whether the patch fixes another Outlook 2007 annoyance: if you close down Windows while Search is still indexing Outlook, you almost always get a message saying “The data file ‘Mailbox …’ was not closed properly. The file is being checked for problems. Then, of course, you wait and wait.

Is it our fault for having large mailboxes? Here’s a comment from Microsoft’s Jessica Arnold, quoted in the Computerworld article referenced above:

Outlook wasn’t designed to be a file dump, it was meant to be a communications tool,” she said. “There is that fine line, but we don’t necessarily want to optimize the software for people that store their e-mail in the same .PST file for ten years.”

A fair point; yet quick, indexed access to email archives is important to many of us. Archiving to a PST is hazardous, especially since by default Outlook archives to the local machine, not to the server; and in many organizations local documents are not backed up. Running a large mailbox may not be a good solution, but what is better?

Perhaps the answer is Gmail, if you are always online and can cope with the privacy issues. Note the first selling point which Google claims for its service:

Fast search
Use Google search to find the exact message you want, no matter when it was sent or received.

Apparently Google understands that users want to be able to find old messages. Surely a desktop application should be at least as good for finding these, as an internet mailbox that might be thousands of miles away?

Update: I still get “The data file ‘Mailbox …’ was not closed properly.” Not fixed.

See also http://blogs.msdn.com/willkennedy/archive/2007/04/17/outlook-performance-update.aspx where a member of the Outlook team further describes the patch.

 

HTML5 vs XHTML2 vs DoNothing

Simon Willison points to David “liorean” Andersson’s article on HTML5 vs XHTML2. This debate about the evolution of HTML has gotten confusing. In a nutshell, the W3C wanted to fix HTML by making it proper grown-up XML, hence XHTML which was meant to succede HTML 4.0. Unfortunately XHTML never really caught on. One of its inherent problems is nicely put by Andersson:

Among the reasons for this is the draconian error handling of XML. XML parsing will stop at the first error in the document, and that means that any errors will render a page totally unreachable. A document with an XML well formedness error will only display details of the error, but no content. On pages where some of the content is out of the control of XML tools with well-designed handling of different character encodings—where users may comment or post, or where content may come from the outside in the form of trackbacks, ad services, or widgets, for example—there’s always a risk of a well-formedness error. Tag-soup parsing browsers will do their best to display a page, in spite of any errors, but when XML parsing any error, no matter how small, may render your page completely useless.

So nobody took much notice of XHTML; the W3C’s influence declined; and a rival anything-but-Microsoft group called WHATWG commenced work on its own evolution of HTML which it called HTML 5.

In the meantime the W3C eventually realised that XHTML was never going to catch on and announced that it would revive work on HTML. Actually it is still working on XHTML2 in parallel. I suppose the idea, to the extent it has been thought through, is that XHTML will be the correct format for the well-formed Web, and HTML for the ill-formed or tag-soup Web. The new W3C group has its charter here. In contrast to WHATWG, this group includes Microsoft; in fact, Chris Wilson from the IE team is co-chair with Dan Connolly. However, convergence with WHATWG is part of the charter:

The HTML Working Group will actively pursue convergence with WHATWG, encouraging open participation within the bounds of the W3C patent policy and available resources.

In theory then, WHATWG HTML 5 and W3C HTML 5 will be the same thing. Don’t hold your breath though, since according to the FAQ:

When will HTML 5 be finished? Around 15 years or more to reach a W3C recommendation (include estimated schedule).

I suppose the thing will move along and we will see bits of HTML 5 being implemented by the main browsers. But will it make much difference? Although HTML is a broken specification, it is proving sufficient to support AJAX and to host other interesting stuff like Flash/Apollo, WPF and WPF/E, and so on. Do we need HTML 5? It remains an open question. Maybe the existence of a working group where all the browser vendors are talking is reward in itself: it may help to fix the most pressing real-world problem, which is browser inconsistency.

 

Technorati tags: , , ,

Microsoft will move your server to the cloud

The excellent Mary Jo Foley has a key quote from Microsoft’s Steve Berkowitz, VP of online services, speaking at the Search Engine Strategies conference in New York yesterday:

Basically, we’re moving the server from your office to cloud,” Berkowitz said

This is the right strategy; but I have not heard it before from Microsoft. At one briefing a year or so ago I asked how Microsoft was positioning its Live products versus its Small Business Server (SBS) offerings, and got no kind of answer worth reporting. The problem is that those SBS customers are exactly the ones who will be moving first to cloud-based services, yet they also form an important and highly successful market for old-style Windows servers. Microsoft cannot create a new market without cannibalising its old one. Another factor is that when a business adopts SBS, they are hooked into Microsoft Office as well; SBS includes Sharepoint and Exchange, both of which link directly to Office applications on the clients. Disrupting this cosy cash-cow is dangerous; yet it is being disrupted anyway, by the likes of Google and Saleforce.com, so in reality Microsoft has no choice.

The opportunity for Microsoft is offer its lan-based customers a smooth transition to on-demand services, maintaining features that work best with Microsoft Office without losing the benefits of zero maintenance and anywhere access to data.

Has it got the vision and courage to pursue such as strategy? Is its Live technology even up to the job? Or will it continue to focus on servers for your LAN and watch its business slowly but surely erode?

 

Orange is undecided about Flash on mobile devices

I spoke to Steve Glagow at Orange, Director of Orange Partner Programme, in advance of the Orange Partner Camp at Cape Canaveral next week. I asked him about what trends he is seeing in development for mobile devices. He was guarded, saying that Orange is seeing growth in all three of the core platforms it supports: Symbian Series 60, Microsoft Windows Mobile, and Linux. He says that “Linux is dramatically increasing”, but of course it it is doing so from a small base in this context; Symbian is the largest platform for Orange in absolute terms, and Java the most prominent language. Palm’s adoption of Windows Mobile has given Microsoft a boost, especially in the US. What about Flash, which is less widely deployed on mobile devices than it is on the desktop. Will Orange be pre-installing the Flash runtime? “The reason I won’t answer that is that we’ve been looking at Flash for some time now, and we’ve not made a formal decision,” he told me.

It’s an intriguing answer. Many us think that Flash/Flex/Apollo (all of which use the Flash runtime) is set to grow substantially as a rich client platform, supported by XML web services or Flex Data Services on the server. Extending this to mobile devices makes sense, but only if the runtime is deployed. Adobe needs to break into this Java-dominated space. The Apple iPhone could also be an influence here: as far as I’m aware, it is not initially going to include either runtime, but I have the impression that Steve Jobs is warmer towards Flash than towards Java, which he called “this big heavyweight ball and chain.”

My prediction: Flash will get out there eventually. As fast data connections become more common, the Flash runtime will be increasingly desirable.

 

Making search better: smarter algorithms, or richer metadata?

Ephraim Schwartz’s article on search fatigue starts with a poke at Microsoft (I did the same a couple of months ago), but goes on to look at the more interesting question of how search results can be improved. Schwartz quotes a librarian called Jeffrey Beall who gives a typical librarian’s answer:

The root cause of search fatigue is a lack of rich metadata and a system that can exploit the metadata.

It’s true up to a point, but I’ll back algorithms over metadata any day. A problem with metadata is that it is never complete and never up-to-date. Another problem is that it has a subjective element: someone somewhere (perhaps the author, perhaps someone else) decided what metadata to apply to a particular piece of content. In consequence, if you rely on the metadata you end up missing important results.

In the early days of the internet, web directories were more important than they are today. Yahoo started out as a directory: sites were listed hierarchically and you drilled down to find what you wanted. Yahoo still has a directory; so does Google; another notable example is dmoz. Directories apply metadata to the web; in fact, they are metadata (data about data).

I used to use directories, until I discovered AltaVista, which as wikipedia says was “the first searchable, full-text database of a large part of the World Wide Web.” AltaVista gave me many more results; many of them were irrelevant, but I could narrow the search by adding or excluding words. I found it quicker and more useful than trawling through directories. I would rather make my own decisions about what is relevant.

The world agreed with me, though it was Google and not AltaVista which reaped the reward. Google searches everything, more or less, but ranks the results using algorithms based on who knows what – incoming links, the past search habits of the user, and a zillion other factors. This has changed the world.

Even so, we can’t shake off the idea that better metadata could further improve search, and therefore improve our whole web experience. Wouldn’t it be nice if we could distinguish synonymns like pipe (plumbing), pipe (smoking) and pipe (programming)? What about microformats, which identify rich data types like contact details? What about tagging – even this post is tagged? Or all the semantic web stuff which has suddenly excited Robert Scoble:

Basically Web pages will no longer be just pages, or posts. They’ll all be split up into little objects, stored in a database (a massive, scalable one at that) and then your words can be displayed in different ways. Imagine a really awesome search engine that could bring back much much more granular stuff than Google can today.

Maybe, but I’m a sceptic. I don’t believe we can ever be sufficiently organized, as a global community, to follow the rules that would make it work. Sure, there is and will be partial success. Metadata has its place, it will always be there. But in the end I don’t think the clock will turn back; I think plain old full-text search combined with smart ranking algorithms will always be more important, to the frustration of librarians everywhere.

 

Infinitely scalable web services

Amazon’s Jeff Barr links to several posts about buiding scalable web services on S3 (web storage) and EC2 (on-demand server instances).

I have not had time to look into the detail of these new initiatives, but the concept is compelling. This is where Amazon’s programmatic approach pays off in a big way. Let me summarise:

1. You have some web application or service. Anything you like. Football results; online store; share dealing; news service; video streaming; you name it.

2. Demand of course fluctuates. When your server gets busy, the application automatically fires up new server instances and performance does not suffer. When demand tails off, the application automatically shuts down server instances, saving you money and making those resources available to other EC2 users.

3. Storage is not an issue; S3 has unlimited expandibility.

This approach makes huge sense. Smart programming replaces brute force hardware investment. I like it a lot.

 

Technorati tags: , ,

120 days with Vista

Is there any more to say about Vista? Probably not; yet after reading 30 days with Vista I can’t resist a few comments.

The author, Brian Boyko, says:

On two separate computers I had major stability problems which resulted in loss of data. This is an unforgivable sin …. Additionally, Vista claims backwards compatibility, but I’ve had major and minor problems alike with many of my games, more than a few third-party applications, my peripherals, and, in short, I encountered problems that actively prevented me from getting my work done. Based on my personal experiences with Vista over a 30 day period, I found it to be a dangerously unstable operating system, which has caused me to lose data.

As for me, I installed Vista RTM on four computers shortly after it was released to manufacturing in November last year. Two plain desktops, one media center, one laptop. Just for the record, my experience is dull by comparison with Boyko’s. No lost data; all my important apps run fine; I am not plagued by UAC prompts; the OS is stable.

Have there been hassles? Yes. Tortoise SVN crashes Explorer from time to time; a perfectly good Umax scanner has no driver; Vista on the laptop had severe resume problems which only recently seem to have been fixed by a BIOS update. And Creative’s X-Fi drivers for Vista are terrible. There are also annoyances, like Vista’s habit of thinking your documents are music.

At the same time, I’ve seen nothing to change my opinion that the majority of Vista’s problems are driver-related. Overall I like it better than XP; it doesn’t get in the way of my work and I would hate to go back.

When I do use XP, some of the things I miss are the search box in the Start menu (the Vista Start menu is miles better in other ways as well); the thumbnail previews in the task bar and in alt-tab switching; and copy and paste which doesn’t give up at the first hurdle. I also miss Vista’s more Unix-like Home directories, sensibly organized under Users rather than buried in Documents and Settings.

Security-wise, I consider both User Account Control and IE’s protected mode to be important improvements.

Forget the “Wow”. This is just the latest version of Windows; and it’s not as good as it should be, five years on from XP.

Nevertheless, it is a real improvement, and I’ve been happy with it over the last four months.

 

Technorati tags: ,

MP3 device runs .NET – but in Mono guise

I’ve long been interested in Mono, the open-source implementation of Microsoft .NET. It seems to be maturing; the latest sign is the appearance of an MP3 player using Linux and Mono. Engadget has an extensive review. Miguel de Icaza says on his blog:

The Sansa Connect is running Linux as its operating system, and the whole application stack is built on Mono, running on an ARM processor.

I had not previously considered Mono for embedded systems; yet here it is, and why not?

The device is interesting too. As Engadget says:

… you can get literally any music in Yahoo’s catalog whenever you have a data connection handy

This has to be the future of portable music. It’s nonsense loading up a device with thousands of songs when you can have near-instant access to whatever you like. That said, wi-fi hotspots are not yet sufficiently widespread or cheap for this to work for me; but this model is the one that makes sense, long-term.

I wonder if iPhone/iTunes will end up doing something like this?

Technorati tags: , , ,