Category Archives: search

image

Who’s got the best search engine?

Please try the test here and vote because this is fascinating. It’s simple: perform a search and pick which is the best result, as in, which result best corresponds with what you are looking for. The script gives you the top result from Google, Yahoo and Microsoft (not in that order), but – crucially – does not show which is which. Currently, after 1400 votes, 34% have voted for the first, 53% for the second, and 29% for the third.

Of course this is an inexact science. Two different people could perform the same search and prefer different results. Further, it is not quite fair, in that the search engines could have personalization algorithms that will not operate when you go via a third-party script. I also hope nobody is cheating here, since unfortunately the test is insecure, in that you can work out which search engine is which and vote accordingly.

It is still interesting because it removes branding from the search results. This counts against Google, which has the best brand for search. After all, the brand has become a verb, “to Google”. Some people probably think Google invented web search.

Although number two is significantly ahead, the figures are already closer than actual market share would suggest. That implies that factors other than pure results are of critical performance in the search wars – though I suppose you could argue that if one search engine gives you the best result 53% of the time, you will end up using it 100% of the time.

Has anyone done a more secure test, maybe showing the first page of results rather than just the top hit?

Technorati tags: , ,

Tafiti: search as a rich internet application

Tafiti is fascinating. Imagine what Google search would look like, if re-designed as a Flash application. This is it, except that it’s Live Search, not Google, and Silverlight, not Flash.

Let’s start with the good stuff. I ran this on a machine without Silverlight installed, and the installation of the plug-in was fast and smooth (though it restarts IE without remembering the open tabs, which is mildly annoying). The Tahiti app looks good and scales nicely. I searched for Silverlight, and results came back fast. You can easily filter the results, or drag and item onto a “shelf” for future reference, and the shelf persists between sessions.

The best feature is a carousel at bottom left. This modifies the search by different types: books, news, feeds, web or images. The layout of the search results changes to match the type of search, so you get book covers and a print-like font for the book search, big headlines for a news search, and so on.

What’s bad about Tafiti? The biggest irritation comes when you actually want to navigate to a site you’ve found. The generic problem here is that you typically want to keep the list of results as well. I normally solve this by right-clicking and opening the site in a new tab. But this is an application, not HTML, so when you right-click you get a single menu option, “Silverlight configuration.” If you left-click it is even worse:

Tafiti is trying to show the site you chose. Please disable your popup blocker to see your selection.

It wants to open the site in a new window, see, and that triggers the popup blocker. Easily fixed with “Always allow popups from this site”, but still a jarring experience.

These are actually minor quibbles. The more fundamental issue is, do you want search as an RIA? The problem is that search is a basic utility. What I want is quick results and easy navigation, never mind the frills, so I will take some persuading. Still, it could work if the application adds real value. Maybe a way of displaying more results on a page, without clutter, or categorising the results in some sensible way. It’s difficult, because attempts to be helpful often end up being counter-productive – and Microsoft is a specialist in over-helpful UIs, sadly.

Despite these reservations, I think Tafiti is a great Silverlight demo, because the technology is nearly invisible. On my system at least, it just works, and at this stage that is what counts for most.

PS: I am not sure what Tafiti is meant to mean, but according to Wikipedia it is a dialect of a Polynesian language and means the strangers, or people from a distance. Perhaps Microsoft is talking about its search market share vs Google?

Update: in the comments here and on the official faq it is said that Tafiti means “do research” in Swahili, and that the app is specifically aimed at “research projects that span multiple search queries and sessions”.

Technorati tags: , , ,

How to buy market share in search … or not

Microsoft gained remarkable market share in search last month, up from 8.4% to 13.2%. At last, competition for Google and Yahoo. Or is it? It turns out that most (not quite all) of the search gain was thanks to the Live Search Club, an online word game which links to Live Search. Remove its 3 million hits, and the gain is just 0.3%.

It gets worse. The Live Search Club lets you win points by completing games, and then exchange your points for prizes such as a Zune or Windows Vista. Very nice. But some dastardly individuals devised bots that complete the games for you. Result: product to sell on eBay. A low trick.

Personally I’m not chuffed with Live Search Club. I completed a game of Chicktionary without using a bot, won 20 points, but when I tried to register the site had gone offline. Drat. Still, perhaps Microsoft is coming up with some anti-bot measures.

It strikes me that Microsoft is being a little naive here. On the other hand, here I am writing about Live Search. So as a PR effort, I guess its working.

Google bans essay ads

The BBC reports that Google will ban essay adverts. I knew this was a problem but hadn’t appreciated how severe it is:

Banning the ads strikes me as sensible, but won’t students simply perform a search instead? Google could also block the searches, but that’s censorship and has difficulties of its own.

The internet has made both paid and unpaid plagiarism too easy; but there has always been a fine line between plagiarism and research (a song by Tom Lehrer comes to mind). Perhaps it is time to change the way students are assessed.

 

Technorati tags: , ,

Official performance patch for Outlook 2007

Computerworld has drawn my attention to a new performance patch for Outlook 2007, issued on Friday. Here’s what Microsoft says:

This update fixes a problem in which a calendar item that is marked as private is opened if it is found by using the Search Desktop feature. The update also fixes performance issues that occur when you work with items in a large .pst file or .ost file.

The patch is welcome; there’s no doubting that Outlook 2007 has proved horribly slow for many users. But does it fix the problems? If you read through the comments to earlier postings on this subject you’ll notice that there are actually several performance issues. The main ones I’m aware of:

  1. Slow receive from POP3 mail servers. Sometimes caused by conflicts between Vista’s TCP optimization and certain routers – see comment 27 here for a fix.
  2. Add-ins, for example Dell Media Direct, Acrobat PDFMaker, Microsoft’s Business Contact Manager. See Tools – Trust Center – Add-ins and click Go by the “Com Add-ins” dropdown to manage these.
  3. Desktop search indexing. You can disable this (it’s an add-in) but it is a shame to do so, since it is one of the best new features.
  4. Large local mailbox – could be a standalone .PST (Personal Store), or an .OST (Offline Store) that is kept in synch with Exchange.

The published fix appears to address only the problem with large local mailboxes.

Does it work? I’ve applied it, and it seems to help a bit, though I reckon performance remains worse than Outlook 2003. My hunch is that the issues are too deep-rooted for a quick fix, especially if you keep desktop search enabled. I’ll be interested to see whether the patch fixes another Outlook 2007 annoyance: if you close down Windows while Search is still indexing Outlook, you almost always get a message saying “The data file ‘Mailbox …’ was not closed properly. The file is being checked for problems. Then, of course, you wait and wait.

Is it our fault for having large mailboxes? Here’s a comment from Microsoft’s Jessica Arnold, quoted in the Computerworld article referenced above:

Outlook wasn’t designed to be a file dump, it was meant to be a communications tool,” she said. “There is that fine line, but we don’t necessarily want to optimize the software for people that store their e-mail in the same .PST file for ten years.”

A fair point; yet quick, indexed access to email archives is important to many of us. Archiving to a PST is hazardous, especially since by default Outlook archives to the local machine, not to the server; and in many organizations local documents are not backed up. Running a large mailbox may not be a good solution, but what is better?

Perhaps the answer is Gmail, if you are always online and can cope with the privacy issues. Note the first selling point which Google claims for its service:

Fast search
Use Google search to find the exact message you want, no matter when it was sent or received.

Apparently Google understands that users want to be able to find old messages. Surely a desktop application should be at least as good for finding these, as an internet mailbox that might be thousands of miles away?

Update: I still get “The data file ‘Mailbox …’ was not closed properly.” Not fixed.

See also http://blogs.msdn.com/willkennedy/archive/2007/04/17/outlook-performance-update.aspx where a member of the Outlook team further describes the patch.

 

Making search better: smarter algorithms, or richer metadata?

Ephraim Schwartz’s article on search fatigue starts with a poke at Microsoft (I did the same a couple of months ago), but goes on to look at the more interesting question of how search results can be improved. Schwartz quotes a librarian called Jeffrey Beall who gives a typical librarian’s answer:

The root cause of search fatigue is a lack of rich metadata and a system that can exploit the metadata.

It’s true up to a point, but I’ll back algorithms over metadata any day. A problem with metadata is that it is never complete and never up-to-date. Another problem is that it has a subjective element: someone somewhere (perhaps the author, perhaps someone else) decided what metadata to apply to a particular piece of content. In consequence, if you rely on the metadata you end up missing important results.

In the early days of the internet, web directories were more important than they are today. Yahoo started out as a directory: sites were listed hierarchically and you drilled down to find what you wanted. Yahoo still has a directory; so does Google; another notable example is dmoz. Directories apply metadata to the web; in fact, they are metadata (data about data).

I used to use directories, until I discovered AltaVista, which as wikipedia says was “the first searchable, full-text database of a large part of the World Wide Web.” AltaVista gave me many more results; many of them were irrelevant, but I could narrow the search by adding or excluding words. I found it quicker and more useful than trawling through directories. I would rather make my own decisions about what is relevant.

The world agreed with me, though it was Google and not AltaVista which reaped the reward. Google searches everything, more or less, but ranks the results using algorithms based on who knows what – incoming links, the past search habits of the user, and a zillion other factors. This has changed the world.

Even so, we can’t shake off the idea that better metadata could further improve search, and therefore improve our whole web experience. Wouldn’t it be nice if we could distinguish synonymns like pipe (plumbing), pipe (smoking) and pipe (programming)? What about microformats, which identify rich data types like contact details? What about tagging – even this post is tagged? Or all the semantic web stuff which has suddenly excited Robert Scoble:

Basically Web pages will no longer be just pages, or posts. They’ll all be split up into little objects, stored in a database (a massive, scalable one at that) and then your words can be displayed in different ways. Imagine a really awesome search engine that could bring back much much more granular stuff than Google can today.

Maybe, but I’m a sceptic. I don’t believe we can ever be sufficiently organized, as a global community, to follow the rules that would make it work. Sure, there is and will be partial success. Metadata has its place, it will always be there. But in the end I don’t think the clock will turn back; I think plain old full-text search combined with smart ranking algorithms will always be more important, to the frustration of librarians everywhere.

 

Microsoft attempts to buy search share

Microsoft is giving enterprises incentives to use Live Search instead of Google or Yahoo, according to a ComputerWorld report; John Battelle has more details.

Buying search share is nothing new; the Mozilla Foundation apparently gets a ton of money from Google for making it the default in FireFox. This is just another skirmish in the search/toolbar/gadget wars; the stakes are high, because search is the user interface of the web.

I doubt the strategy will have much impact, unless Microsoft fixes what really matters: the quality of its search engine.

It’s hard to overstate the importance of search today. I was reminded of this during a recent presentation on software usability. Speaker Larry Constantine made an example of a feature in Word: how to insert a caption for an image.

Problems like this are easier than they were in the pre-Google era, for the simple reason that users are now able to search for the answer. Try it: Google for “word insert caption” (without the quotes) and up come dozens of postings on the subject. Quicker and better than online help.

Since the ability to search efficiently is now a key productivity factor, it follows that businesses should think twice before allowing themselves to be bribed into enforcing search preferences. Better to evaluate the search engines, and maybe give some training in how to use them.

 

Technorati tags: , ,

Google can’t count

CodeGear’s Anders Ohlsson is excited because Google shows over half a million hits for “Delphi for PHP”. Even with the quotes.

I get the same results. More, in fact. Google says 654,000 hits.

Now try reading them. I get to page 35, then the hits come to a halt. There are 10 hits per page so that makes, hmmm, 350 hits. A bit less exciting. Let’s be honest, a lot less exciting. The real figure is probably a little higher, but not by half a million.

I do get this line (we’ve all seen it before):

In order to show you the most relevant results, we have omitted some entries very similar to the 341 already displayed. If you like, you can repeat the search with the omitted results included.

Trying the “complete” search does get more results, but they are just as repetitive as Google warns. Google appears to limit results to 1000 hits, so there is no obvious way to find out where the other alleged 653,000 hits can be found.

Microsoft’s Live Search says 24,473 results, but the trail runs out on page 80. That’s 800. So Microsoft Live Search can’t count either.

Yahoo says 322,000, but like Google can only show 1000 of them. I remain sceptical about the missing 321,000.

I’ve noticed this before. Certain phrases trigger huge numbers of alleged hits, but they vanish if you try to view them. Others seem to work fine. Perhaps someone more knowledgeable about the inner workings of search engines can explain why. It appears to be an unreliable measure.

 

Technorati tags: , ,

Why Microsoft’s search share is declining

Internet Explorer is the dominant browser, Windows the dominant desktop, yet Microsoft’s share of internet search is apparently declining. Here’s why. I’m researching Yahoo Pipes; I forget the exact url for the Pipes home page so I type the search into the IE7 search box, where Microsoft’s “Live search” is the default.

The page I want is not on the first page of results. The ads are irrelevant. Some of the search results are at least relevant, but they are not what I would call top tier results.Even the O’Reilly link is a page for all articles tagged Yahoo, not one of the actual Pipes articles.

So I switch to Google search. The page I want is top of the search results. The other entries are more relevant. Even the ad is moderately relevant (at least it is about software Pipes not metal tubes).

This is of course anecdotal. It was also a tough test, considering Yahoo Pipes is new. Perhaps there are hundreds of other searches where Live Search gets better results. All I can say is that I rarely discover them, whereas I frequently find Google’s results much better. This just struck me as a good example.

Microsoft will never improve its share of search unless it can deliver at least equally good results.

See also my IT Week comment.

Technorati tags: , ,

Google Maps puzzler

Ran into this puzzler today:

Google Maps showing Mansfield on the map, but unable to find it for directions.

I could not persuade Google Maps that it knew where Mansfield is. I mean Mansfield in Nottinghamshire; but even when I typed in “Mansfield, Nottinghamshire, UK”, Google Maps professed ignorance – although I could easily scroll to the actual location on the map itself by the strategy of searching for another place first. No doubt there are other Mansfields in the world – but why not a disambiguation menu?

Eventuall I tried Mansfield, UK and it worked. Except that the position is miles out:

I guess Google Maps just doesn’t like Mansfield. Microsoft’s local.live.com had no problems.

I’m sure a postcode would have worked; but these are not always conveniently at hand.

Technorati tags: , ,