Tim Anderson’s ITWriting

Tech writing blog

June 24th, 2009

Search for virus help highlights lack of authority in Google, Wikipedia

A contact suffered a trojan infection on his Windows XP machine the other day. He was alerted to the infection by Windows Defender, but the Remove or Quarantine actions offered by Defender did not work. If he removed the trojan, it reappeared on the next reboot. The installed AVG security suite sat there unconcerned.

I am not sure exactly what path he took, but he did some clicking of links and ended up at a site which offered software that promised to fix the issue. The software was called SpyHunter, from Enigma Software. He purchased and installed SpyHunter, which proved no more effective than Defender. At this point he asked me to look at his machine.

A person who has discovered a virus on their PC will be anxious about the attack and its unknown consequences, and will want to fix it urgently. That makes them vulnerable to ill-considered downloads and purchases; and searching the web for assistance with a virus can be like trying to cure alcoholism with drinking. That said, there is good advice to be had; but assessing the authority and reliability of the assistance offered is critical.

My advice in general is only to visit sites that you know to be trusted, such as official Microsoft support, major security software vendors, and only those community sites with which you are already familiar. It is difficult advice to follow though, particularly for non-technical users.

The best course of action after a confirmed infection is to flatten and rebuild the operating system. Larger organizations do this efficiently by restoring a pre-configured image to standardised hardware, but this too is difficult for individuals and SMEs who want to get on with their work.

I digress. My first question: was SpyHunter bona fide, or could it have made the problem worse? The only quick way to find out: back to the search engines, source of all good and all evil. The top entries for SpyHunter on both Google and Bing are the official company site and a Wikipedia entry. Bing has Wikipedia first, while Google puts the company site top.

Note the large role Google (or your favourite search engine) is playing here, both in leading users to possible solutions, and in assessing their value. Although the high placement of the company site is somewhat reassuring, in that Google would probably try not to give a high ranking to known malware, it would be a mistake to rely entirely on a detail like this. Google makes no guarantees concerning the content of the sites it indexes.

Naturally I was more interested in the Wikipedia entry. The entry is annotated with warnings that the article is near-orphaned (though the search engines find it readily enough) and that it reads like an advertisement. There is little detail and it is out-of-date. Further, the language seems strange:

In early 2004, SpyHunter was blamed for producing false positives and using aggressive advertising techniques. This resulted in a lot of bad SpyHunter reviews published. Some of them were harsh, but fair, while others were simply ridiculous. We confirm that SpyHunter was promoted aggressively by some affiliates, but all of them were eventually banned by program makers in late 2004. Early SpyHunter versions had some obvious drawbacks. The product’s version 2.0 resolved all these issues.

This is a quote from a supposedly independent review on a site called 2-software.com. I don’t like the site, which seems (as are so many) dominated by its affiliate links.

SpyHunter is probably harmless, though ineffective. I used the Sophos command-line tool to remove the trojan, and deleted some rogue registry entries; the machine seems OK now though that might just mean that the other trojans are doing a better job of hiding. I also removed SpyHunter of course.

The state of security on the Internet remains lamentable, and security software is a partial solution at best. What interests me here though is the combination of two things:

1. The inadequacy of Wikipedia as an authoritative source, particularly in its less trafficked topics.

2. The high ranking accorded to seemingly any Wikipedia article by the leading search engines.

It is a dangerous combination – not only for virus victims, but for kids doing homework, or anyone researching anything.

June 10th, 2009

Bing’s disappearing search share gain in the US

Web stats site StatCounter caused some excitement last week when it announced that Bing had overtaken Yahoo in search market share, as tracked by its site analysis tools.

I took a look at the figures today, and they make depressing reading for Microsoft:

I’ve annotated the image to show Live Search share on 29 May, compared to Bing share now. They are nearly the same; within the normal daily variation. Yahoo is actually slightly ahead of where it was. Note that all Live Search hits automatically became Bing hits on the day of transition (1st June). As for Google, it is back a little above where it was before.

One odd thing about the StatCounter figures is that at the beginning of this period there was around 5% share for “other”, which has now almost disappeared. Gone to Google? Who knows; and I don’t particularly trust these figures.

There are two organizations with more reliable numbers, one of them Google, because of the number of sites signed up for its Web Analytics, and the other Microsoft, which can count actual hits, but these numbers are not published.

Well, Ballmer said it was a long haul. I’m actually impressed with Bing; the results seem decent, there are some good UI features, and the re-branding is sensible. If StatCounter accurately reflects the market though, the immediate affect of the launch is vanishingly small.

Update: Things look a little better today – Bing is up to 8.52% (note that the figure changes dynamically during each day). A long haul; I’ll be tracking the figures with interest.

Technorati Tags: ,,,,
June 8th, 2009

Bing, Blind Search and electoral fraud

It’s election fever in the UK: in dramatic results, the incumbent party is being pummelled at the polls. So too for search engines? Microsoft employee Michael Kordahi set up a blind search test. Perform a search, select your favourite from three columns of results. It started well for Bing, but market leader Google soon asserted a lead:

Blind search engine test at http://blindsearch.fejus.com Right now: "Google: 45%, Bing: 33%, Yahoo: 21% | 8,518 votes"

said Mr Google Matt Cutts.

Still, that’s not bad for Bing, considering that its market share is tiny in comparison to Google. 5.5% vs 81.5% according to stats I dug up for this Register piece. The real loser is Yahoo, whose second place in search is now under threat from the Microsoft juggernaut.

But can you trust the results? At some point last night Yahoo started an unlikely surge:

Internet search blind test: Google: 34%, Bing: 26%, Yahoo: 40% Try it out! http://blindsearch.fejus.com/

tweeted Bill Hamilton a few hours after Cutts. Someone was gaming the system:

not surprisingly, #blindsearch has been compromised you can still play, but i’m not currently showing results

said Kordahi, as Yahoo hit 57%.

Will Kordahi be able to insulate his test from fraudsters? Who knows; but it is still an interesting experiment.

I tried the test and found the results generally close, with a small edge to Google in my searches. Still, it would be interesting to measure not only which results are best, but also the margin of difference. In the past I’ve found Live Search almost useless, so Bing has made a substantial improvement from my perspective. The UI changes are important too. I’m a minimalist at heart, which again favours Google, but I like some of Bing’s features, especially the site and video previews.

Google’s Wave is of course more interesting from a technical perspective; but it would be a mistake to downplay the business significance of Microsoft improving its search market share. Search drives advertising income.

It’s also worth noting that in search, quantity drives quality. Program Manager Nathan Buggia explained to me how Bing’s categorisation feature works:

For the categorised results those are driven more off the search behaviour we see on our web site, not actually the semantic information that we infer from their web site. What we’ve done is to take all the queries that come into live search and analysed them to see what user intent those queries have. We take a look at the other search terms that they use to figure out where they go, we aggregate that information and use that to define categories, and we are able to draw on that.

Currently Bing only displays category tabs for around the top 10-20% of searches. The reason it is limited to that, according to Buggia, is insufficient volume of data. Using the Xbox as an example, he told me:

If we have a high enough volume of XBox data and we’ve seen that there are a specific set of intents that people are looking for, then we feel confident enough to show the quick tabs.

In other words, Bing could improve its results simply by more people using it.

What happens next? The easy prediction is that Bing will make at least small gains in market share, and that Yahoo will likely decline, perhaps to third place. For Microsoft, that would be no small achievement, but would do little to dislodge the big G. Further, if it sees significant traffic moving to Bing, Google will be quick to counter it with its own improvements. Personally I would like to see more competition in search, which for many users forms a portal that controls which sites they see and which they do not see, but a good launch for Bing is not enough to effect real change.

It could be the beginning of a change though, and that possibility makes Bing worth watching.

Technorati Tags: ,,,
May 28th, 2009

A few good things about Bing – but where is the webmaster’s guide?

So Bing (Bing Is Not Google?) is Microsoft’s new search brand. A few good things about it:

1. Short memorable name, short memorable url

2. Judging by the official video at http://www.decisionengine.com/ Microsoft realises that it has to do something different than Google; doing the same thing almost as well or even just a little better is not enough.

3. Some of the ideas are interesting – morphing the results and the way they are displayed according to the type of search, for example. In the video we see a search for a digital camera that aggregates user reviews from all over the Internet (supposedly); whereas searching for a flight gets you a list of flight offers with fares highlighted.

This kind of thing should work well with microformats, about which Google and Yahoo have also been talking – see my recent post here. But does Bing use them? That’s unknown at the moment, because the Bing Reviewer’s Guide says little about how Bing derives its results. I don’t expect Microsoft to give away its commercial secrets,  but it does have a responsibility to explain how web authors can optimise their sites for Bing – presuming that it has sufficient success to be interesting. Where is the webmaster’s guide?

Some things are troubling. The Bing press material I’ve seen so far is relentlessly commercial, tending to treat users as fodder for ecommerce. While I am sure this is how many businesses perceive them – why else do you want users to come to your site? – it is not a user-centric view. Most searches do not result in a purchase.

There’s a snippet in the reviewer’s guide about why Bing will deliver trustworthy results for medical searches:

Bing Health provides you with access to medical information from nine trusted medical resources, including the Mayo Clinic, the American Cancer Society and MedlinePlus.

No doubt these are trusted names in the USA. Still, reliance on a few trusted brands, while it is good for safety in a sensitive area such as health, is also a route to a dull and sanitized version of the Internet. I am sure there are far more than nine reliable sources of medical information on the Web; and if Bing takes off those others will want to know why they have been excluded.

Back to the introduction in the Reviewer’s Guide:

In a world of excessive choice and too much information, it’s often difficult to make the right decision. What you need is more than just a search engine; you need a decision engine that provides useful tools to help you get what you want fast, rather than simply presenting a list of Web links. Bing is such a decision engine. It provides an easy way to make more informed choices. It organizes popular results by category to help you get the answers you’re looking for without having to guess at the right way to formulate your query. And built right into Bing is a set of intelligent tools to help you accomplish important tasks such as buying a product, planning a trip or finding a local business.

Like many of us, I’ve been searching the web since its earliest days. I found portals and indexes like early Yahoo and dmoz unhelpful: always out of date, never sufficiently thorough. I used DEC’s AltaVista instead, because it seemed to search everywhere. Google came along and did the same thing, but better. Too much categorization and supposed intelligence can work against you, if it hides the result that you really want to see.

Live Search, I’ve come to realise (or theorise), frequently delivers terrible results for me because of faulty localization. It detects that I am in the UK and prioritises what it things are UK results, even though for most of my searches I only care about the quality of the information, not where the web sites are located. It’s another example of the search engine trying to be smart, and ending up with worse results than if it had not bothered.

Still, I’ll undoubtedly try Bing extensively as soon as I can; I do like some of its ideas and will judge it with an open mind.

Technorati Tags: ,,,,,
May 14th, 2009

Yahoo’s mindshare problem

Last weekend I attended Yahoo’s Open Hack Day in London.

It was excellent. I wasn’t hacking myself; but enjoyed the tech talks. I also had an opportunity to interview execs including co-founder David Filo, Cody Simms who does Product Management for Yahoo Open Strategy, and Sophie Major the head of the International Developer Network.

Highlights for me were Rasmus Lerdorf talking about smart PHP tricks, and a session on the amazing Yahoo Query Language which really does make the Internet look like one giant database which you can query.

I wrote up some of my interview for The Register, concluding:

Open Hack Day certainly showcased some impressive technology. The question is whether Yahoo! still has the marketing muscle to reverse its declining influence and truly to unsettle the likes of Google and Facebook and disrupt the market.

Events this week proved this exact point. During Open Hack Day there were talks on Microformats, RDFa and Yahoo Search Monkey. Search Monkey reads data on your site that includes semantic mark-up in order to present more meaningful search results.

On Tuesday Google announced Rich Snippets:

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages.

So were the headlines “Google catches up with Yahoo”? Not at all; most of the world apparently thought Google had invented something new and amazing. Timothy O’Brien reported on it for O’Reilly and apparently was not aware of Yahoo’s earlier initiative. He added a postscript:

We’ve had some response about failing to mention Yahoo’s SearchMonkey which also supports RDFa and Microformats. Google is certainly not the first search engine to support RDFa and Microformats, but it certainly has the most influence on the search market. With 72% of the search market, Google has the influence to make people pay attention to RDFa and Microformats.

Correct; though I also suspect Yahoo could do a better job of marketing its technology. Talk of disrupting Google seems fanciful at this point. Having said that, Twitter is doing it just a little bit: somehow it is easier for a tiny organization with a bright idea than for a giant from the past.

In the meantime, take a look at YQL. It’s brilliant.

May 10th, 2009

Bytemark failure illustrates value of Twitter

This site is hosted at Bytemark, which has a good track record for performance and service. On Sunday afternoon Bytemark and all its virtual servers became inaccessible. Seeking reassurance that this was a temporary problem and being worked on, I tried to get more information. This is a relatively small ISP and there is no 24-hr telephone support; there is an urgent email support address, but since this would be sent via Bytemark’s servers, which were down, I knew there was no point in using it.

I turned to Twitter search, where I found others tweeting about the problem, including MD Matthew Bloch:

is busy working out WTF is wrong with Bytemark’s core network, update on the forum as soon as it’s accessible again

re: Bytemark, both our Manchester core routers seem down, engineer is 20 mins away from data centre to help us with diagnosis.

tracking down enormous source of traffic on Bytemark network

wondering why the network is back up – still poring over switch configurations but things looking a little more useful.

and so on, with the directors demonstrating a degree of personal involvement that larger ISPs rarely display. The outage is still annoying of course; but knowing that it is being worked on with urgency along with a bit of information about the nature of the issue makes a huge difference.

When you need to search for the latest information, Twitter works well because it is rigorously sorted by time and date, which Google never is.

Technorati Tags: ,,
March 27th, 2009

Experts Exchange: a great way to make money on the Web

For Experts Exchange that is, not for you. Experts Exchange is a question and answer site which most people who use Google have come across, because it often features high in the rankings when you search for troubleshooting information about some strange Windows error or the like. This can be frustrating, because the solutions are behind a paywall. The paywall is partial, since sometimes if you scroll down … and down … and down, you find the solution at the bottom of the page, after a ton of useless category listings. However, this isn’t always the case; either some solutions are protected, or the site detects frequent visits and turns off the solutions after a while. I think it is the latter. This can be frustrating, since there is good information in many of the solutions. You also have to pay if you want to ask questions beyond a very limited allowance each month.

The great thing from the point of view of the site owners is that they don’t pay a penny for the expertise they sell, other than for moderation and hosting. If you sign up as an Expert, you can post solutions, though you still can’t see all the other solutions until you acquire a certain number of points. Points are awarded for accepted solutions, and solutions are accepted if the questioner marks them so. If the questioner doesn’t bother (not uncommon) then eventually a moderator turns up and decides which answers merit points. If an expert gets lots of points, the reward is an Experts Exchange certification for the subject area in which the points were won.

I tried being an Expert recently and it is quite fun if you are interested in the kinds of technical problems people want to solve and/or get any satisfaction from helping them. It is also quite annoying. Questions vary from trivial to impossible; with the trivial ones, it is a race against time as numerous Experts try to post their solution first. Some are impossible because they are hopelessly vague (so common with support issues), have no clear answer – eg “should I pay for help with SEO” – or because what the questioner wants simply cannot be done.

It is also interesting to see what questions are being asked. There is a heavy bias towards Windows. I guess this is another reminder of Microsoft’s continuing dominance, though it also reflects the culture of the community that has formed around the site. Many of the programming questions seem to be from beginners, though often wrestling with real business applications, raising questions about the level of IT expertise out there.

It might be worth answering a few simple questions to get 10,000 points, or 3,000 per month thereafter, as this qualifies Experts to get free use of the site. What about spending hours trying to fix a tricky and intricate problem with Active Directory, without access to the system you are trying to troubleshoot? That doesn’t make sense for most of us, since if you can do that you can probably do it for real money elsewhere. These are questions that might not get answered at all, though sometimes they are, leaving valuable information for others in the process.

That said, it strikes me that the Experts here could get a better deal. Why not set up a cooperative where they share the subscription fees? The problem is how to acquire the necessary momentum and build up a strong repository of solutions that show up in Google and bring users to the site.

As for developers, I’d prefer StackOverflow, which is unequivocally free; the organizers presumably get by on advertising income.

Technorati Tags:
March 6th, 2009

Latest steps in the Google dance: brands, or not?

There’s a buzz in the SEO community about an update which the search company has made to its algorithms – though Google’s Matt Cutts calls it a change, if you can figure out the difference, albeit one important enough to have a name within the company – it’s “Vince’s change”, after the employee who contributed it.

According to SEO guru Aaron Wall  it is related to CEO Eric Schmidt’s comments last year that the Internet is a “cesspool” of false information. Big idea: promote trusted brands in the search results to ensure quality in the top hits.

As usual with Google, it’s hard to discern whether this is a big deal as Wall claims, or a minor evolution as Cutts presents it. Still, it is worth a few observations.

First, it seems obvious that Google’s original big idea, pagerank based on incoming links, is becoming less and less useful. It has been killed first by the SEO industry itself and its unceasing link farms and exchanges, and second by Google’s promotion of the “nofollow” attribute, which ironically means that many of the best incoming links are now supposedly ignored, while the SEO folk ensure that low-quality links which are not tagged nofollow abound.

That being the case, Google has to look for other ways to rank sites. According to Cutts, there are three things (in addition to pagerank) that it tries to identify: trust, authority, and reputation.

The brands idea is an easy solution. Prefer the well-known names; that way you may not get the best content or the best price; but at least users generally won’t be scammed.

The potential consequences of this kind of thinking are far-reaching. It is undermining one of the Web’s key attractions, which is low barriers to entry. If SEO becomes a matter of building a big brand, it is no different than the old world of big-budget marketing campaigns (and perhaps that should not come as a surprise).

The other twist on this is that users searching don’t necessarily want the big brands. Rather, they want the best information. Further, if a user wants to find a big brand on the Web, it does not need Google to do so. If Google goes too far in promoting familiar names above the best content, it leaves an opportunity for other search engines.

I think Google is smarter than that. Nevertheless, the problem which Schmidt refers to is real, and I reckon that barriers to entry on the Internet are rising and will continue to do so.

The power Google exerts to make or break Internet enterprises and to influence the flow of information is downright spooky, mitigated by the fact that it does an excellent job as far as I can tell (and there lies the rub).

Finally, one tip for Google. Scrap nofollow. It was a bad idea, for reasons which only now are becoming obvious. If I were building a search engine today, I would take little or no account of it.

PS great comment from @monkchips on Twitter just as I posted this entry:

for my purposes google search has actually become less useful over time. Now its kind of like a mall of corporations

February 9th, 2009

Google says top two results get most of the hits – but what about ads?

A post on the Official Google Blog says that the first two search results get most of the clicks:

This pattern suggests that the order in which Google returned the results was successful; most users found what they were looking for among the first two results and they never needed to go further down the page.

I knew you had to be on the first page – but the “top two” result is even harder to achieve.

It is significant though that Google’s post makes no mention of ads. I am quite sure that the study included research into their effectiveness. Google has chosen not to reveal this aspect of the research.

In particular, most Google search results do not look like the examples. Rather, they have ads at the top which look just like the other results, except with a different background colour and a faint “Sponsored Links” at the right:

My question: in a result list like this, which “top two” gets the eyeballs and the clicks? The search results? Or the paid links?

Technorati tags: , ,
January 11th, 2009

Another useless Microsoft search experience

I do not make posts like this just to needle Microsoft. I’d like to see it compete better with Google, not because I have anything against Google, but because competition is good. So I’m explaining why I can’t use Microsoft’s search product as currently implemented, in the hope that it will help to improve it.

I am aware that the Windows 7 beta SDK is now available, and wanted to download it. I tried the search on Microsoft’s site:

Top hit is from November 2007 and is a forum discussion of dxtrans.h. Correct result nowhere to be seen.

So I tried Live Search:

I get ads for double-glazing and a top hit for the QuickTime 7.3 SDK from Apple. Correct result nowhere to be seen.

So I tried Google:

Top hit is relevant but wrong. Second hit is a blog post which has the URL I’m looking for, not bad. Fourth hit is the exact URL. By the way, the Windows 7 SDK beta is here.

It is no use Microsoft doing search bundling deals with Dell. If it cannot fix the product, users will run back to Google.

Incidentally, I believe US searchers get better results from Live Search than I do, because of faulty regionalisation. Live Search seems to have a heavier bias than Google towards what it thinks are (in my case) UK results. That doesn’t explain why it gives an Apple QuickTime site as its top hit for “Windows 7 SDK”.