Google can’t count

CodeGear’s Anders Ohlsson is excited because Google shows over half a million hits for “Delphi for PHP”. Even with the quotes.

I get the same results. More, in fact. Google says 654,000 hits.

Now try reading them. I get to page 35, then the hits come to a halt. There are 10 hits per page so that makes, hmmm, 350 hits. A bit less exciting. Let’s be honest, a lot less exciting. The real figure is probably a little higher, but not by half a million.

I do get this line (we’ve all seen it before):

In order to show you the most relevant results, we have omitted some entries very similar to the 341 already displayed. If you like, you can repeat the search with the omitted results included.

Trying the “complete” search does get more results, but they are just as repetitive as Google warns. Google appears to limit results to 1000 hits, so there is no obvious way to find out where the other alleged 653,000 hits can be found.

Microsoft’s Live Search says 24,473 results, but the trail runs out on page 80. That’s 800. So Microsoft Live Search can’t count either.

Yahoo says 322,000, but like Google can only show 1000 of them. I remain sceptical about the missing 321,000.

I’ve noticed this before. Certain phrases trigger huge numbers of alleged hits, but they vanish if you try to view them. Others seem to work fine. Perhaps someone more knowledgeable about the inner workings of search engines can explain why. It appears to be an unreliable measure.


Technorati tags: , ,

6 thoughts on “Google can’t count”

  1. Google is a wonderful and interesting thing..the numbers are interesting on a relative basis regardless. We love the discussion happening on us bringing a VCL to PHP. It won’t change the world, but it is a good step we are proud of. It might just change your world if you love Delphi and work with PHP though.

  2. Thanks Ben, I’m looking forward to taking a closer look at D4PHP when it is available.


  3. It could simply be some kind of estimate, databases query optimizers are usually able to estimate how many rows should be fetched for a given query – but to know how many row will be feteched actually they need to be fetched.
    I guess Google does not count the real result, it would be too expensive.

  4. It could simply be some kind of estimate

    Agreed, though it does seem to get it wrong by an extraordinary margin for some searches.


  5. Because Goggle does a kind of full-text indexing, I guess it could store some statistics about how many hits a single word has. When combining two or more words, it has to estimate how large the resulting set could be. But two large sets may have a fairly small intersection. He could try to infer the resulting set size, but with words that are “almost unrelated” I guess it could give a really wrong result.

  6. It’s even more funny to search for some not soo common term. If I search for ‘expisoft’ (my small business) I get:
    Results 1 – 10 of about 5 for expisoft. (0.16 seconds)

Comments are closed.