Archive for August, 2009|Monthly archive page

Wikipedia 3.0 (Three Years Later)

In Uncategorized on August 11, 2009 at 4:51 pm

Author: Marc Fawzi

Twitter: http://twitter.com/#!/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0

I’ve just received a couple of questions from a contributor to an IT publication who is writing about the state of the semantic web.

I’m taking the liberty of posting one of the questions I received along with my response.


[The Semantic Web has seen] several years of development. For instance, there’s been steps to create Syntax (e.g., OWL & OWL 2, etc.) and other standards. Various companies and organizations have cropped up to begin the epic work of getting ‘the semantic web’ underway.

How would you characterize where we are now with the web. Is Web 3.0 mostly hype?

RDF, which emerged out of the work being done on the semantic web, is being used now to structure data for better presentation in the browser. It’s being used by Google and Yahoo. So you can say that the semantic web is starting to bear some fruits. But unlike OWL-DL, RDF does not have the structure to implement a logic model for the given domain of knowledge, which is required by machines to reason about the information published under that domain. However, RDF and RDFa (and other variations) are perfect for structuring the information itself (as opposed to the logic model for the given domain of knowledge) so the next step will be to use RDF to structure information for machine processing, not just for browser presentation, and that would be combined with the use of domain-specific inference engines (which, in this case, would combine logic programs and  description logic for the various knowledge domains) to build a pan-domain basic-AI-enabled “answer machine,” which is fundamental to any attempt to making machines ‘comprehend’ the information on the Web, per the full blown semantic web vision.

The “hard” problem with the semantic web is not the natural language processing, since we don’t really need it right at the start: we can always structure the information in such a way that it can be processed by machines and then comprehended using the aforementioned pan-domain AI, or, in the case of search queries, we can come up with a query language with proper and consistent rules that is easy to use by the average educated person, such that the information/query is machine-process-able and may be comprehended using domain-specific AI.

The “hard” problem is how can all the random people putting out the information on the Web agree to the same ontology per domain and same information structuring format when they do not have the training or knowledge to even understand the ontology and the information structuring format?

So both ontology creation/selection and information structuring has to be automated to remove incompatibilities/variances and human errors. But that’s not an easy task as far as the computer science involved.

However, instead of hoping to turn the whole web into a massive pan-domain knowledgebase, which would require that we conquer the aforementioned automation problem, we can base our semantic web model on expert-constructed domain-specific knowledgebases, which by definition include domain specific AI, and which have been in existence for some time now, providing a lot of value in specific domains.

The suggestion I had put forward three years ago in Wikipedia 3.0 (which remains as the most widely ready article on the semantic web with over 250,000 hits) was to take Wikipedia and its set of the experts, who are estimated at 30,000 peers as of 2006, and get those 30,000 experts to help build the ontologies for all domains currently covered by Wikipedia as well as properly format the information that is already on Wikipedia so that a pan-domain knowledgebase can be built on top of Wikipedia, which would be able to reason about information in all domains of knowledge covered by Wikipedia, resulting in the ultimate answer machine.

The Wikipedia 3.0 article and some of links there describe how that can be done at a high level as well as some implementation ideas. There is nothing intractable, IMO, except for the leadership problem.

2010 Update:

It’s obviously possible to start small and augment the capabilities as we go, and maybe something like Twitter, where knowledge is shared in literally small packets would be a good way to go, i.e. making tweets machine comprehensible and letting some kind of intelligence emerge from that rather than building a pan-domain answer machine, which is a much bigger task, IMO.

But that all depends on when Twitter decides to support Annotations. I hear it’s coming soon, and I can’t wait to see what can happen when tweets become ‘contextualizable’ and intelligence can emerge in a truly evolutionary process, through trial and error, collaboration and competition and (my favorite) unexpected and novel consequences of random events that end up enriching the process.

Maybe that would turn Twitter into the next Google?

2011 Update:

There is news now that Wikipedia is pursuing the Wikipedia 3.0 vision I outlined in 2006. However, the corruption among the power-tripping, teenage-like administrators at Wikipedia (documented in Slashdot and in this article) has meant that the Wikipedia 3.0 article I had written in 2006 is not welcomed on Wikipedia itself (not even under Semantic Web, Web 3.0 or Wikipedia 3.0 — try adding it yourself and see!) even though it was the article that launched public interest in Wikipedia as a semantic knowledge base and a basis for Web 3.0.

Is that sad? It may be, but it only proves the need for a P2P version of Wikipedia!