Author: Marc Fawzi
This post discusses the significant drawbacks of current quasi-semantic search engines (e.g. hakia.com, ask.com et al) and examines the potential future intersection of Wikipedia, Wikia Search (the recently announced search-engine-in-development, by Wikipedia’s founder), future semantic version of Wikipedia (aka Wikipedia 3.0), and Google’s Pagerank algorithm to shed some light on how to design a better semantic search engine (aka Web 3.0 search engine)
Query Side Improvements
Semantic “understanding” of search queries (or questions) determines the quality of relevant search results (or answers.)
However, current quasi-semantic search engines like hakia and ask.com can barely understand the user’s queries and that is because they’ve chosen free-form natural language as the query format. Reasoning about natural language search queries can be accomplished by: a) Artificial General Intelligence or b) statistical semantic models (which introduce an amount of inaccuracy in constructing internal semantic queries). But a better approach at this early stage may be to guide the user through selecting a domain of knowledge and staying consistent within the semantics of that domain.
The proposed approach implies an interactive search process rather than a one-shot search query. Once the search engine confirms the user’s “search direction,” it can formulate an ontology (on the fly) that specifies a range of concepts that the user could supply in formulating the semantic search query. There would be a minimal amount of input needed to arrive at the desired result (or answer), determined by the user when they declare “I’ve found it!.”
Information Side Improvements
We are beginning to see search engines that claim they can semantic-ize arbitrary unstructured “Wild Wild Web” information. Wikipedia pages, constrained to the Wikipedia knowledge management format, may be easier to semantic-ize on the fly. However, at this early stage, a better approach may be to use human-directed crawling that associates the information sources with clearly defined domains/ontologies. An explicit publicized preference for those information sources (including a future semantic version of Wikipedia, a la Wikipedia 3.0) that have embedded semantic annotations (using, e.g., RDFa http://www.w3.org/TR/xhtml-rdfa-primer/ or microformats http://microformats.org) will lead to improved semantic search.
How can we adapt the currently successful Google PageRank algorithm (for ranking information sources) to semantic search?
One answer is that we would need to design a ‘ResourceRank’ algorithm (referring to RDF resources) to manage the semantic search engines’ “attention bandwidth.” Less radical, may be to design a ‘FragmentRank’ algorithm which would rank at the page-component level (ex: paragraph, image, wikipedia page section, etc).
- See relevant links under comments
web 3.0, web 3.0, web 3.0, semantic web, semantic web, ontology, reasoning, artificial intelligence, AI, hakia, ask.com, pagerank, google, semantic search, RDFa, ResourceRank, RDF, Semantic Mediawiki, Microformats