Category Archives: search

Yahoo’s mindshare problem

Last weekend I attended Yahoo’s Open Hack Day in London.

It was excellent. I wasn’t hacking myself; but enjoyed the tech talks. I also had an opportunity to interview execs including co-founder David Filo, Cody Simms who does Product Management for Yahoo Open Strategy, and Sophie Major the head of the International Developer Network.

Highlights for me were Rasmus Lerdorf talking about smart PHP tricks, and a session on the amazing Yahoo Query Language which really does make the Internet look like one giant database which you can query.

I wrote up some of my interview for The Register, concluding:

Open Hack Day certainly showcased some impressive technology. The question is whether Yahoo! still has the marketing muscle to reverse its declining influence and truly to unsettle the likes of Google and Facebook and disrupt the market.

Events this week proved this exact point. During Open Hack Day there were talks on Microformats, RDFa and Yahoo Search Monkey. Search Monkey reads data on your site that includes semantic mark-up in order to present more meaningful search results.

On Tuesday Google announced Rich Snippets:

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.

So were the headlines “Google catches up with Yahoo”? Not at all; most of the world apparently thought Google had invented something new and amazing. Timothy O’Brien reported on it for O’Reilly and apparently was not aware of Yahoo’s earlier initiative. He added a postscript:

We’ve had some response about failing to mention Yahoo’s SearchMonkey which also supports RDFa and Microformats. Google is certainly not the first search engine to support RDFa and Microformats, but it certainly has the most influence on the search market. With 72% of the search market, Google has the influence to make people pay attention to RDFa and Microformats.

Correct; though I also suspect Yahoo could do a better job of marketing its technology. Talk of disrupting Google seems fanciful at this point. Having said that, Twitter is doing it just a little bit: somehow it is easier for a tiny organization with a bright idea than for a giant from the past.

In the meantime, take a look at YQL. It’s brilliant.

Bytemark failure illustrates value of Twitter

This site is hosted at Bytemark, which has a good track record for performance and service. On Sunday afternoon Bytemark and all its virtual servers became inaccessible. Seeking reassurance that this was a temporary problem and being worked on, I tried to get more information. This is a relatively small ISP and there is no 24-hr telephone support; there is an urgent email support address, but since this would be sent via Bytemark’s servers, which were down, I knew there was no point in using it.

I turned to Twitter search, where I found others tweeting about the problem, including MD Matthew Bloch:

is busy working out WTF is wrong with Bytemark’s core network, update on the forum as soon as it’s accessible again

re: Bytemark, both our Manchester core routers seem down, engineer is 20 mins away from data centre to help us with diagnosis.

tracking down enormous source of traffic on Bytemark network

wondering why the network is back up – still poring over switch configurations but things looking a little more useful.

and so on, with the directors demonstrating a degree of personal involvement that larger ISPs rarely display. The outage is still annoying of course; but knowing that it is being worked on with urgency along with a bit of information about the nature of the issue makes a huge difference.

When you need to search for the latest information, Twitter works well because it is rigorously sorted by time and date, which Google never is.

Technorati Tags: ,,

Experts Exchange: a great way to make money on the Web

For Experts Exchange that is, not for you. Experts Exchange is a question and answer site which most people who use Google have come across, because it often features high in the rankings when you search for troubleshooting information about some strange Windows error or the like. This can be frustrating, because the solutions are behind a paywall. The paywall is partial, since sometimes if you scroll down … and down … and down, you find the solution at the bottom of the page, after a ton of useless category listings. However, this isn’t always the case; either some solutions are protected, or the site detects frequent visits and turns off the solutions after a while. I think it is the latter. This can be frustrating, since there is good information in many of the solutions. You also have to pay if you want to ask questions beyond a very limited allowance each month.

The great thing from the point of view of the site owners is that they don’t pay a penny for the expertise they sell, other than for moderation and hosting. If you sign up as an Expert, you can post solutions, though you still can’t see all the other solutions until you acquire a certain number of points. Points are awarded for accepted solutions, and solutions are accepted if the questioner marks them so. If the questioner doesn’t bother (not uncommon) then eventually a moderator turns up and decides which answers merit points. If an expert gets lots of points, the reward is an Experts Exchange certification for the subject area in which the points were won.

I tried being an Expert recently and it is quite fun if you are interested in the kinds of technical problems people want to solve and/or get any satisfaction from helping them. It is also quite annoying. Questions vary from trivial to impossible; with the trivial ones, it is a race against time as numerous Experts try to post their solution first. Some are impossible because they are hopelessly vague (so common with support issues), have no clear answer – eg “should I pay for help with SEO” – or because what the questioner wants simply cannot be done.

It is also interesting to see what questions are being asked. There is a heavy bias towards Windows. I guess this is another reminder of Microsoft’s continuing dominance, though it also reflects the culture of the community that has formed around the site. Many of the programming questions seem to be from beginners, though often wrestling with real business applications, raising questions about the level of IT expertise out there.

It might be worth answering a few simple questions to get 10,000 points, or 3,000 per month thereafter, as this qualifies Experts to get free use of the site. What about spending hours trying to fix a tricky and intricate problem with Active Directory, without access to the system you are trying to troubleshoot? That doesn’t make sense for most of us, since if you can do that you can probably do it for real money elsewhere. These are questions that might not get answered at all, though sometimes they are, leaving valuable information for others in the process.

That said, it strikes me that the Experts here could get a better deal. Why not set up a cooperative where they share the subscription fees? The problem is how to acquire the necessary momentum and build up a strong repository of solutions that show up in Google and bring users to the site.

As for developers, I’d prefer StackOverflow, which is unequivocally free; the organizers presumably get by on advertising income.

Technorati Tags:

Latest steps in the Google dance: brands, or not?

There’s a buzz in the SEO community about an update which the search company has made to its algorithms – though Google’s Matt Cutts calls it a change, if you can figure out the difference, albeit one important enough to have a name within the company – it’s “Vince’s change”, after the employee who contributed it.

According to SEO guru Aaron Wall  it is related to CEO Eric Schmidt’s comments last year that the Internet is a “cesspool” of false information. Big idea: promote trusted brands in the search results to ensure quality in the top hits.

As usual with Google, it’s hard to discern whether this is a big deal as Wall claims, or a minor evolution as Cutts presents it. Still, it is worth a few observations.

First, it seems obvious that Google’s original big idea, pagerank based on incoming links, is becoming less and less useful. It has been killed first by the SEO industry itself and its unceasing link farms and exchanges, and second by Google’s promotion of the “nofollow” attribute, which ironically means that many of the best incoming links are now supposedly ignored, while the SEO folk ensure that low-quality links which are not tagged nofollow abound.

That being the case, Google has to look for other ways to rank sites. According to Cutts, there are three things (in addition to pagerank) that it tries to identify: trust, authority, and reputation.

The brands idea is an easy solution. Prefer the well-known names; that way you may not get the best content or the best price; but at least users generally won’t be scammed.

The potential consequences of this kind of thinking are far-reaching. It is undermining one of the Web’s key attractions, which is low barriers to entry. If SEO becomes a matter of building a big brand, it is no different than the old world of big-budget marketing campaigns (and perhaps that should not come as a surprise).

The other twist on this is that users searching don’t necessarily want the big brands. Rather, they want the best information. Further, if a user wants to find a big brand on the Web, it does not need Google to do so. If Google goes too far in promoting familiar names above the best content, it leaves an opportunity for other search engines.

I think Google is smarter than that. Nevertheless, the problem which Schmidt refers to is real, and I reckon that barriers to entry on the Internet are rising and will continue to do so.

The power Google exerts to make or break Internet enterprises and to influence the flow of information is downright spooky, mitigated by the fact that it does an excellent job as far as I can tell (and there lies the rub).

Finally, one tip for Google. Scrap nofollow. It was a bad idea, for reasons which only now are becoming obvious. If I were building a search engine today, I would take little or no account of it.

PS great comment from @monkchips on Twitter just as I posted this entry:

for my purposes google search has actually become less useful over time. Now its kind of like a mall of corporations

Google says top two results get most of the hits – but what about ads?

A post on the Official Google Blog says that the first two search results get most of the clicks:

This pattern suggests that the order in which Google returned the results was successful; most users found what they were looking for among the first two results and they never needed to go further down the page.

I knew you had to be on the first page – but the “top two” result is even harder to achieve.

It is significant though that Google’s post makes no mention of ads. I am quite sure that the study included research into their effectiveness. Google has chosen not to reveal this aspect of the research.

In particular, most Google search results do not look like the examples. Rather, they have ads at the top which look just like the other results, except with a different background colour and a faint “Sponsored Links” at the right:

My question: in a result list like this, which “top two” gets the eyeballs and the clicks? The search results? Or the paid links?

Technorati tags: , ,

Another useless Microsoft search experience

I do not make posts like this just to needle Microsoft. I’d like to see it compete better with Google, not because I have anything against Google, but because competition is good. So I’m explaining why I can’t use Microsoft’s search product as currently implemented, in the hope that it will help to improve it.

I am aware that the Windows 7 beta SDK is now available, and wanted to download it. I tried the search on Microsoft’s site:

Top hit is from November 2007 and is a forum discussion of dxtrans.h. Correct result nowhere to be seen.

So I tried Live Search:

I get ads for double-glazing and a top hit for the QuickTime 7.3 SDK from Apple. Correct result nowhere to be seen.

So I tried Google:

Top hit is relevant but wrong. Second hit is a blog post which has the URL I’m looking for, not bad. Fourth hit is the exact URL. By the way, the Windows 7 SDK beta is here.

It is no use Microsoft doing search bundling deals with Dell. If it cannot fix the product, users will run back to Google.

Incidentally, I believe US searchers get better results from Live Search than I do, because of faulty regionalisation. Live Search seems to have a heavier bias than Google towards what it thinks are (in my case) UK results. That doesn’t explain why it gives an Apple QuickTime site as its top hit for “Windows 7 SDK”.

SharePoint – the good, the bad and the ugly

I’ve been messing around with SharePoint. When it works, it is a beautiful product. It is a smart file system with versioning, check-in and check-out, point-and-click workflow (eg document approval), offline support via Outlook, direct open and save from Office 2007, and more. It is an instant intranet with blogs, wikis, discussion forums, surveys, presence information, easy page authoring, and more. It is an application platform with all the features of ASP.NET combined with those of SharePoint. It is a content management system capable of supporting a public web site as well as an intranet. It is a search server capable of crawling the network, with a good-looking and sophisticated web UI. And in the high-end Enterprise version you get a server-side Excel engine and all sorts of Business Intelligence features. Fantastic.

Even better, the base product – Windows SharePoint Services 3.0 – comes free with Windows server. Search Server Express is also free and delivers all the search capability a small organization is likely to need.

What’s wrong with this picture? Here’s a few things:

  • Gets very expensive once you move to MOSS (Microsoft Office SharePoint Server) rather than the free WSS.
  • Deeply confusing. Working out the difference between WSS and MOSS is just the start. If you want to deploy it, you had better learn about site collections, applications, operations, farm topologies, web parts, workspaces, and the rest.
  • Complex to deploy. Make sure you read Planning and Architecture for Office SharePoint 2007 Part 1 (616pp); the good news is that part 2 is only 52pp. SharePoint is all that is bad about Microsoft deployments: a massive product with many dependencies, including IIS, ASP.NET and the .NET Framework, SQL Server in particular configurations, and of course hooks with Office 2007, Exchange and Active Directory.
  • Generates horrible source code. Try opening a page in SharePoint designer and viewing the source. Ugh.
  • Challenging to back up and restore, thanks to being spread across IIS and SQL Server.

I am out of sorts with SharePoint right now, after a difficult time with Search Server Express (SSX). I have a working WSS 3.0 installation, and I tried to install SSX on the same server. My setup is just slightly unusual, since I have both SharePoint and a default web site on port 80, using the host headers feature in IIS to direct traffic. The SSX install seemed to proceed reasonably well, expect for two things.

First, I puzzled for some time over what account to use as the default account for services. Setup asks you to specify this; and the documentation is a classic case of unhelpful help:

In the Default Account For Services section, type the user name and password for the default services account.

In the Search Center Account section, type the user name and password for the account for the application pool identity of the default Search Center site

Well, thanks, but I could have figured out that I have to type a user name where it says “User name”. But I would like help on how to create or select a suitable account. What permissions does it need? What are the security implications? The temptation is to use an administrator account just because it will most likely work.

Then there was the problem of creating the search site application manually. I had a go at this, helped by these notes from Ian Morrish. I set up a crawl rule and successfully indexed some content. Then I made a search, to be greeted by this error:

Your license for Microsoft Search Server has expired.

Well hang on, this is Search Server Express and meant to be free! A quick Google turns up this depressing recommendation from Microsoft:

To solve your immediate problem, however, it is suggested you uninstall WSS, MSS Express, repave your machine with a clean OS, and reinstall only MSS Express (WSS is installed with it).

Thanks but no thanks. See this thread for a more informative analysis. The user yanniemx reckons, after 10 reinstalls, that he has worked it out:

I realized it was due to using the Express version of Search and then not using the SQL install that is included in the install.  From what I can tell if you use another SQL instance it thinks you are using multiple servers and that is not allowed for the Express version.

I think I’ll just uninstall. I did another install of the full MOSS on its own server, and that one works fine. Running on a virtual machine is another good idea.

I hate the way certain Microsoft server products like to be installed on their own dedicated server. That makes sense in an Enterprise, but what about small organizations? I don’t see any inherent reason why something like SSX shouldn’t install neatly and in a reasonably isolated manner alongside other products and web applications. Equally, I am sure it can be done, just as I used the host headers trick to get WSS installed alongside another web site on port 80; but working out how to do it can be a considerable effort.

Google search wiki: user reviews for web sites

Google has announced its search wiki.

Do I want to customize my search results? No; or at least, only by refining the search, not by forcing sites to the top or inserting my own urls.

Do I want to comment my search results, just for myself? No. I can’t see myself using this, particularly as I deliberately avoid being permanently logged into Google.

What about public comments and ratings? This is the big deal. I wonder how Google will handle this – will the comments apply to web sites? To web pages? Or only to web pages when shown as results for specific searches? In other words, if I get the same site showing up for a different search, will I see the same comments?

Think Amazon, and how the ratings and reviews influence buying decisions (they certainly influence mine). The impact if people see such feedback every time they search on Google could be remarkable. I would love to see the SEO (Search Engine Optimization) folk advising customers, “Look, you actually have to make your site worth visiting, in order to get good reviews on Google.” Though I guess some of them will just offer to write the reviews.

If this sticks, I will be interested to see how it will affect Google’s relationship with its advertisers. Let’s say you do a product search, and Google displays ads inviting you to buy the product, alongside reader comments saying it is garbage. This tension has always existed in independent press that carries advertisements, but it is new to search. On the other hand, as currently described the SearchWiki comments are not displayed by default, but only if you click a SearchWiki link.

Technorati tags: ,

Google Chrome usage one month on

Om Malik asks about Chrome usage, one month after its release.

On this site this month (only a few days in) Chrome has a 2.5% share, below Opera at 3.2%. Malik reports 5.59%; commenters to his post have figures as small as 0.36% up to something approaching Malik’s figure – his seems to be about the maximum.

Small, but even say 2.5% is not that bad for a new, beta web browser. I use it myself some of the time; I like the speed and clean UI.

That said, Chrome usage has declined, after the initial surge of people trying it out. The share now is more meaningful; it will be fascinating to watch its progress. The challenge for Google is to get a buzz going; surely a web browser is a perfect candidate for Web 2.0 marketing.

Technorati tags: , ,

Google’s shoddy EULA

I am sensitized to design issues right now so I’m calling out this shoddy piece of work by Google on new Toshiba laptops (and most likely some other new PCs, in the UK at least).

Yesterday I set up a new laptop for a friend – a scenario which does not seem to have occurred to the legal folk. It comes with the Google Desktop and Google Toolbar pre-installed. Someone has decided that the most important thing in the world is that you should therefore agree to the Google EULA, which almost fills the screen with an ugly dialog that nevertheless displays the actual text of the agreement in a relatively small scrolling box.

There are a few notable features:

1. The agreement comes up automatically on startup, until you accept or decline.

2. The window has no close, cancel or even minimize buttons. Just accept or decline.

3. The agreement has some advice for you:

It says that before getting  “bebound” you “should print and/or save a local copy”. I would like to know how the designers of this screen intend you to do so. Your printer, if you have one, is probably not set up yet. I guess you should copy the text into another application (that’s what I did), which is fine provided you know about Ctrl-C, but made awkward because the EULA window is set to be always on top. The first image above shows what happens when you run Word after the EULA appears.

4. Still, you can drag the EULA to the right, select the text, copy and paste into Word. If you do this, as I did, you will find even stranger terms below the fold. Like this one:

2.3 In addition to the standard information that your web browser will typically send to most web pages you the Google Toolbar will send to Google a computer visit, generated unique identifier that is stored in your computer’s registry upon install.

I think I get it. Google will record every page you visit. I call this obscure language though.

5. I am not a lawyer, but some stuff confuses me. Clause 3 is headed “Additional terms” and says that use of the Toolbar is also subject to Google’s general terms of service on the web. Clause 9.1 says that “The Terms and Conditions constitute the entire agreement between you and Google”. “Terms and Conditions” is specifically defined in clause 1.2 as the current document. So did you agree to what is on the web, or not?

I realise I am possibly the only user ever to read this agreement. I still think it is disappointing: the horrible UI, the broken English, the obscure terms. I did not click Accept; my friend can do so if he wants. Ctrl-Alt-Del; Task Manager; terminate the two processes beginning EULA.

Technorati tags: , , ,