Tim Anderson’s ITWriting

Tech writing blog

August 11th, 2008

Backup: a gap in Google’s online services

Slashdot has a discussion on Google and backup. The question:

I am doing almost all of my computing in the cloud. Google Reader, Calender, Email, Docs and Notes have become my tools of choice … is there a one-touch solution that will take all my data from the various online apps and archive it on my home server?

The answer appears to be “No” – that is, there are ways of doing this with scripts, offline email clients and so on, but there is no one-touch solution.

It strikes me as a valid concern. It is not just a matter of trusting Google not to zap your data accidentally – though Google accounts have been known to disappear. Another scenario is that someone guesses your password, or grabs it via keystroke capture, and deletes stuff on your behalf. It just isn’t sensible to have only one copy of data that matters to you.

Arguably this is one advantage of synchronization services like Live Mesh, but these have a flaw too. Synchronization will happily copy a corrupt document over all your good copies. That’s why I like version control systems – they keep a history.

Even in the admin features of the premier edition, aimed at businesses, I don’t see backup covered.

It can’t be that hard to fix this with something like, say, differential backups from Google to Amazon S3. It would be unlucky to fall out with both companies simultaneously.

Technorati tags: , , ,
August 7th, 2008

Will anyone untangle the nofollow mess?

I get frequent emails asking to place advertising on itwriting.com. My reply is that I’m happy to take appropriate ads, provided that they are clearly marked as such, non-intrusive, and that links are either inserted by script or tagged “nofollow” in accordance with Google’s guidelines. Usually I don’t hear back from the advertiser.

The problem is that these advertisers are not primarily interested in having readers of this site click their ad or see their brand. Their real concern is search engine optimization – improving the rank of a web site in Google searches by adding to the number of incoming links. Counting incoming links to determine PageRank was Google’s original secret sauce, by which it determines which pages are of the highest quality, though it is probably less important now precisely because the principle is so well known. Advertising like this is often called paid links.

Although it upsets Google, there is nothing illicit about accepting paid links. Advertising is the commercial engine that drives much of the web. It is risky for the host site, since Google might downgrade its ranking because of the paid links. In other words, the advertiser is proposing a trade – buying more PageRank for its own sites, at the expense of some of yours.

The other risk, which is the reality that Google’s algorithms are attempting to simulate, is that a site filled with spammy links is less attractive to readers and therefore likely to become less popular. After a while the advertising will dry up too.

Balancing the demands of advertisers and editorial is nothing new. My main complaint about these advertising offers is that they are not transparent about their intentions.

Even so, the whole SEO/Paid links/Google dance is a horrible mess. One of the problems is that the Web has de-facto authority. People search the web for information, and believe the top-ranking results as the answer to their question. Another real-world issue is that countless businesses now depend on Google for their success and survival. I often hear from organizations which are desperate to improve their search ranking and vulnerable to SEO consultants whose services are often not good value.

All this makes the nofollow idea good, in that it is tackling the right problem; but unfortunately it is not working. We are seeing the nofollow tag frequently applied to links that are “good”, and frequently not applied to links that are “bad”. For example, links in comments to a WordPress blog are automatically tagged nofollow; yet some of the best-informed advice on the web is found in comments to WordPress blogs. Links in Wikipedia are tagged nofollow; yet an external link that survives in an important Wikipedia article is a high recommendation of quality.

At the same time, both the spammers and the SEO folk are having few problems placing paid links sans nofollow all over the web.

It would not surprise me if the secret search ranking algorithms used by Google and others actually accord more significance to links tagged nofollow than is acknowledged. It would actually make sense for the search engines to try and distinguish between good and bad links regardless of nofollow. In other words, the nofollow tag has become almost pointless. That said, I have no idea whether, if Google declared the tag obsolete, the net spamminess of the web would improve or deteriorate.

I don’t know how to fix the problem; though I suspect that Google as the Web’s current gatekeeper is the entity with most power to improve it. Another thing that would help: more focus on content and less focus on SEO.

Technorati tags: , , , , ,
July 30th, 2008

The trouble with Knol

Is that that it’s going to be full of rubbish. Wikipedia, which arguably has less authority because contributions can be anonymous, will likely have more authority, since it is more-or-less restricted by its community to one entry per topic.

Another way of looking at this is that on Wikipedia, if you want to contribute to a topic that already has an article, you have no choice but to (try and) improve the existing one. On Google Knol, there’s every incentive to start a new one, never mind if it duplicates existing content, or is worse than an existing one.

Take Search Engine Optimization, for example. Wikipedia has an article that looks decent. Knol has thirty articles; or if you search for SEO, more like seventy. And that’s after just a few days.

Google is good at ranking, and users can rate pages, but Knol is still a mess. You can be sure that many articles will be written primarily to drive traffic to the author’s web site, or to attract Adsense clicks. Wikipedia is not immune to spam; but at least contributors can delete it. All you can do on Knol is to give a spammy article a low rating.

Thinking aloud, what might work is some kind of Slashdot-style filtering. For example, you could have it so that by default Knol searches only show articles which have:

  • More than 10 ratings
  • An average rating of at least 4
  • or are the only article on the subject

or some such; vary the constants according to taste.

Then again, you could have a team of editors (and become Britannica); or enforce one article per topic and become more like Wikipedia.

Technorati tags: , , , ,
July 24th, 2008

Open collaboration on Knol

I’ve opened my Visual Basic Knol to “open collaboration” – just like wikipedia. If it gets spammed to hell – I’ll close it. An experiment.

It’s now showing in the Knol index – great.

Technorati tags: , , ,
July 24th, 2008

My first Google Knol

On reading this morning that Google has opened Knol to everyone, I thought I should have a go. There seems to be a predominance of medical Knols right now, so my Knol redresses the balance by covering a programming topic. Here it is:

Visual Basic

I deliberately did not look at whatever Wikipedia already has on the subject; knowing how good Wikipedia is on technical topics I am sure it is much longer and better than mine, and probably less opinionated.

Now I’m going to sit back and let the world improve my Knol.

But will anyone find my Knol? Oddly, if I try a search for Visual Basic on the Knol home page, my article doesn’t come up, although I’ve published it:

Oh well, maybe it is still being indexed.

It seems to me that the rating system is key here, and one to which I gave too little attention last time I thought about Knol. The thing is, there’s nothing to stop someone else writing an article about VB, and if it gets rated higher (sniff), my contribution will be lost at the bottom of the Knol dustbin – because I suspect Google will use the ratings heavily when ranking Knols in searches.

Other points of interest: I started creating my Knol in IE7, but gave up because of script errors and continued in FireFox. Second, I tried to verify my identity by telephone, but this only works for USA telephone numbers. It’s a beta.

Update: Danny Sullivan has a good commentary. I agree with him about credit cards. I declined.

Technorati tags: ,
July 22nd, 2008

Web stats: do you believe Google, or the web site owner?

Escherman’s Andrew Smith, in technology PR, asks whose site traffic figures do you trust – Google’s (via Ad Planner), or the site owner?

I don’t have Ad Planner, but because I run AdSense I can see Google’s stats on Adsense views on this site. I also have web logs, analyzed via awstats.

I took a look at my own figures for June. My stats show about 6.5 times more page views than AdSense reports.

This isn’t hits vs pages, incidentally. “Hits” record every request, so a page with several images requires several hits. Hits is therefore always the biggest number, but pages is in theory more meaningful.

It is a huge discrepancy. What’s the reason? I can think of several:

  • Google only counts page views that run its AdSense script. Bots like web crawlers are not likely to run these.
  • Not all my pages have AdSense on them, though most do.
  • Every time a request is made for my RSS feed, awstats will count that as a page view, but Google (rightly) will not.
  • Google will try to eliminate the rubbish, like spam bots posting comments that end up in the Akismet junk box.

Still, 6.5 times is a huge difference, more than I would expect. The page view discrepancy on the site Smith chose to look at is a mere 4.2 times – though we don’t know how that particular web site calculates its figures.

I don’t have any firm conclusions, though my own figures suggest that any web site which simply quotes figures from its logs will come up with something much larger than Google’s filtered stats.

I’d have thought the answer for advertisers would be to use tracking images and the like in ads so they can get their own statistics.

Finally, this prompts another question. Just how much Web traffic is bot-driven? We know that somewhere between 65% up to, by some estimates, 90%+ of email is spam. Web crawlers and RSS feeds are not bad things, but they are not human visitors either. Add that to the spam bots, and what proportion does it form?

Technorati tags: , , ,
July 16th, 2008

Why I can’t use Microsoft Live Search for real work

I’ve complained before about the quality of Microsoft’s Live Search vs Google; but today’s example seemed too good an illustration not to mention.

I needed to update Windows XP to SP3. In particular, I wanted what Microsoft calls the “network” download; that is, the entire service pack, not the launcher app that initiates a piecemeal download tailored to the specific machine.

I pulled up Live Search and entered “windows xp sp3 download”.

To my surprise, Live Search offered me only third party download sites in its first page of results. Actually, that’s not strictly true. At number 8 is the download for “Windows XP SP3 RC2 Refresh” (obsolete); and at number 10 the general home page for XP downloads:

Find popular Windows XP downloads, including PowerToys, trial software, tools and utilities

I tried Google. Same search. The official network download is in first place. The official piecemeal download is second.

I know: you can argue that this is just an isolated search, and that some other search might show Google in an equally bad light. However, I find this constantly: Google gets me much better results. Further, this case is particularly telling, since a third-party download site is not what you want when patching Windows. Quite likely those other sites do point you to the correct official download eventually; but getting Microsoft downloads from Microsoft’s site is safer.

I am not surprised Microsoft has a tiny share of the search market; and I don’t believe this is simply because of Google’s clever marketing.

Update PS: The above screen grab still matches what I get today. However, users in different countries may get different results; from the comments below I suspect that US users get better results in this instance. Maybe Live Search is worse in the UK than in the US; I’d be interested to know.

Technorati tags: , ,
July 4th, 2008

Why you can’t trust a Google ad

An interesting facet of the recent problems with UK non-supplier Zavvi Direct is that all the purchasers I spoke to found the fake web site via a Google ad. Put another way, without the ease of advertising through Google and eBay, it is likely that far fewer people would have found the site and potentially lost their money.

That raises the question: does Google do anything to verify that its advertisers are genuine? Here’s the answer, from a Google spokesperson:

Google, along with other online and offline advertising platforms are not able to proactively check the legitimacy of each and every advertiser. Consumers should always check the validity of what is being sold to them and how they are asked to pay for items. If Google is alerted to a potential fraud then we will work with the relevant legal authorities to help them resolve such matters.

This was clarified to me as follows. Google will assume ads are OK unless it receives complaints. If it receives a few complaints it might pass them on to the merchant. If it receives numerous complaints it might warn the advertiser and eventually disable the account.

I guess it is unreasonable to expect Google to conduct checks on every advertiser. Still, there is a related point: does Google do enough to highlight the difference between advertisements, and links identified by its famous search ranking algorithms? Here is a snapshot of a search I just made:

I’ve sized the browser small to get everything in; there are more search results than I’ve shown. However, it shows three panels of results. The top left is tinted and marked in unobtrusive gray type “Sponsored links”. The top right is narrow, not tinted, and also marked in gray type “Sponsored links”. The bottom left is what most tech-savvy folk think of as the main results area.

Judging by my interviews, some people are not really aware of the distinction between a “sponsored link” and a search result. In some cases, the buyer could not tell me what kind of link they clicked. To them it was just “Google”.

It would be easy to make the ads more distinct. Google could use the plain English “Advertisements” rather than the “sponsored links” circumlocution. It could use something bolder than gray text to identify them. It could use a different font and colour for the links in the right-hand column. It is good that the top left links are in a tinted panel; yet some may perceive this simply as best-match links, rather than links in an entirely different category than those that follow.

Overall, it seems to me that Google deliberately makes its ads look the same as search results. Which is good for advertisers, but can be bad news for buyers.

Technorati tags: , ,
June 20th, 2008

What’s coming in Buzzword - and Live Writer as Word for the cloud

Interesting post from Lisa Underkoffler’s on what’s coming in Buzzword, Adobe’s internet word processor. She mentions named styles, which I would enjoy since I use these all the time in Word; though I was surprised that it is frequently requested; most people seem happy to apply specific formatting and don’t worry about the structure provided it looks right. Maybe this is Adobe’s strong presence in the print and publishing world showing through.

It prompted me to make a quick tour of the competition to see who already has named style support. Nothing I could see in Google Docs.

Zoho Writer doesn’t seem to have them either.* Zoho’s site also seems a bit temperamental this morning. The connection kept failing which meant a long wait while, perhaps, some AJAX operation was not completing. Zoho froze IE completely; I switched to FireFox but it remained slow. I wish the Zoho folk would stop adding features (even named styles) and focus on performance and reliability for a while; perhaps it is better in the USA.

ThinkFree has them, and they seemed to work (more or less) once I had downloaded its gargantuan Java applet. The company seems to be shifting the emphasis to a downloadable application with online storage, perhaps because the applet is too big for casual use on any old computer. I tried the downloaded application as well. Curiously, after I saved and re-opened the document, my named style disappeared from the list of styles. I think something is not quite right here; I also had a few performance issues.

If you are happy to run a desktop application, Word plus Live Mesh makes a decent and familiar alternative. Just save your document to the Mesh, and open it from anywhere. Main snags: no Mac or Linux support yet, no online editing.

I’ve actually fallen into the habit of using Live Writer plus WordPress as a kind of cloud word processor. Writer has a feature called Post Draft to Weblog. Your document is saved to your blog, but not actually published. Usually I do this for posts that will be published later; but sometimes I use it for notes that will never be published. I can open the draft later from another PC using Writer; or use the online editor in WordPress if Writer is not installed. Another option is to save the draft locally, handy if you are offline; Live Writer will synch it with the online version later. Not recommended for confidential documents, but for casual use it is a powerful combination.

No named styles though. Never mind.

*Update: See comment below: Zoho does support CSS. So if you have a CSS stylesheet set up, you could use these styles in your document. Good idea, though I’m not sure how you go about using this if you are not a skilled web developer.

June 13th, 2008

Now it’s YahGoog

Yahoo has signed up for Adsense:

By offering Google’s industry-leading technology to Yahoo!, the whole system becomes more efficient, and everyone benefits.

This is efficient in the same way that having everyone run Windows is efficient. Hmmm.

Google observes that the deal is non-exclusive; Yahoo can still sell its own ads, etc etc. I tend to agree with Om Malik, who says:

In my opinion, with this deal, Yahoo has publicly acknowledged that Google is superior to them when it comes to search & contextual advertising.

Yes. But how much does that matter? Outsourcing what you are less good at, in order to concentrate on core competencies, can be a smart business move.

The snag here: advertising is Yahoo’s primary business activity. Here are its revenue figures for the first quarter 2008:

  • Marketing services: $1,818 million
  • Fees: $245,milion

Outsourcing the core of your business is bad PR.

Technorati tags: , ,