Monday, March 15, 2004

Croeso: Google within companies? 

In Croeso: Google within companies? Andy Boyd comments on the fact that Google does not work particularly well for the intranet because most of the information is in databases mailinglists and ERP systems. I am a little surprised about the mailinglists (I guess there are just no external links to the posts in the archive and the mails do not link to each other well enough), but that the databases are overlooked is completely expected. The thing is, Google does not work very well for the web either because most of the information on the web is in databases too. You usually donot realise this because google does not find it for you, and on the web you know less well what ought to be found. This is called the deep web phenomenon. People estimate that most of the information on the web is actually hidden in the deep web. Some people say its 99 % of teh information but it all depends on how you measure things. The human genome accounts for lots of bytes on the web but I donot think every gene should count as a webpage.

But Google is smart and they want to go IPO right ?

The people at Google are very smart indeed and google does an amazing job at what it does, but it is not clairvoyant. Google even has no understanding of what it reads, it just does some statistics. Now suppose you are 13 3/4 years old and want to know the production of the norwegian leather industry. If you poke around with Google you eventually find this


It is a webpage, on the internet, it wants to be found. I could find it only by asking google to look for leather production and database. Google found a page containing leather and a link called database. The database has a pull down menu containg Norway. I.e. found the page above by outguessing the system and a little luck. But what should have happened is that

the database describes itself to the world and particularly to google as

1. it contains information which you can query on a geo:country in geo:Europe and optionally on leather:tannery and the following list of goods,
2. it returns the amount of leather:leather in unit unit:tonsPerYear
3. it has a well defined interface

Google should "know", having indexed the geo ontology, that Norway is a country in Europe.

Then google could query the database for you and you would get the answer. The possibility to self describe a information source (in terms more general then simply listing all its entries) is a key challenge for the semantic web. Apparently Amazon and Google do something like this already. With such a delegation model you have a much cleaner division of responsibilities.

Comments: Post a Comment

This page is powered by Blogger. Isn't yours?

© Copyright 2004-2006 Rogier Brussee.
These are my personal views and do not necessarily reflect those of my employer.