2.0

Archive for the ‘Uncategorized’ Category

A Better Way To Price iPhone Apps (and mp3s)

In Uncategorized on December 18, 2009 at 6:30 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

If the price of an app was demand-indexed, starting at some arbitrary price near $0 when the app is launched and then going up and down with demand, then that would have interesting consequences.

Obviously, how that is tuned varies from app to app. For example, for a given class of apps, the demand-indexed price may be e.g. $0.50 at 1,000 downloads a day and $0.99 at 10,000 downloads. Many factors would go into deciding that curve (1), but the point is that the price should change with the rate of demand just like the price of scarce goods. However, unlike scarce goods, where there is the concept of an optimal [market] price at which the most profit is generated, the demand-indexed price would be optimal within the entire range of ‘FREE to CHEAP.’

This way if a developer has a great app with a great potential they get the most adoption upfront, helped by the near-free price, and as the market for that app heats up they get to enjoy higher profits from a higher price.

I think Apple got the idea for $0.99 for music singles from the publishing business where the price of e.g. music CD or a book is fixed and does not go up and down with demand.

The assumption is that a book or music CD can be replicated infinitely at a fixed cost per unit, so why slow down sales with a higher price if the demand is shooting up? However, when we’re talking about an mp3 or an .app the cost of replication is so negligible that pricing an app or mp3 at $0.01 produces a profit (after initial sunk cost of development/creation and assuming no recurring costs like cloud usage fees or bandwidth exist other than those paid for by Apple and factored into their model) so increasing the price from $0.10 to $0.25 will NOT slow down sales with rising demand because people are willing to pay ANYTHING between FREE and CHEAP for something they think is good and the perception of how good the app is increases with demand for that app (as that leads to more chatter among connected consumers and more hype in the press) …. So all one has to do is to figure out what is “CHEAP” for the given class of app (via a user survey) and then introduce the app at e.g $0.01 and change the price daily (or even in real time) while keeping it less than or equal to CHEAP.

There are a couple of key considerations to take into account when attempting this model. It has to do with the nature of demand in the long tail market for content.

Wikipedia 3.0 (3.14 Years Later)

In Uncategorized on August 11, 2009 at 4:51 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

I’ve just received a couple of questions from a contributor to an IT publication who is writing about the state of the semantic web.

I’m taking the liberty of posting one of the questions I received along with my response.

<<

[The Semantic Web has seen] several years of development. For instance, there’s been steps to create Syntax (e.g., OWL & OWL 2, etc.) and other standards. Various companies and organizations have cropped up to begin the epic work of getting ‘the sematic web’ underway.

How would you characterize where we are now with the web. Is Web 3.0 mostly hype?
>>

RDF, which emerged out of the work being done on the semantic web, is being used now to structure data for better presentation in the browser. It’s being used by Google and Yahoo. So you can say that the semantic web is starting to bear some fruits. But unlike OWL-DL, RDF does not have the structure to implement a logic model for the given domain of knowledge, which is required by machines to reason about the information published under that domain. However, RDF and RDFa (and other variations) are perfect for structuring the information itself (as opposed to the logic model for the given domain of knowledge) so the next step will be to use RDF to structure information for machine processing, not just for browser presentation, and that would be combined with the use of domain-specific inference engines (which, in this case, would combine logic programs and  description logic for the various knowledge domains) to build a pan-domain basic-AI-enabled “answer machine,” which is fundamental to any attempt to making machines ‘comprehend’ the information on the Web, per the full blown semantic web vision.

The “hard” problem with the semantic web is not the natural language processing, since we don’t really need it right at the start: we can always structure the information in such a way that it can be processed by machines and then comprehended using the aforementioned pan-domain AI, or, in the case of search queries, we can come up with a query language with proper and consistent rules that is easy to use by the average educated person, such that the information/query is machine-process-able and may be comprehended using domain-specific AI.

The “hard” problem is how can all the random people putting out the information on the Web agree to the same ontology per domain and same information structuring format when they do not have the training or knowledge to even understand the ontology and the information structuring format?

So both ontology creation/selection and information structuring has to be automated to remove incompatibilities/variances and human errors. But that’s not an easy task as far as the computer science involved.

However, instead of hoping to turn the whole web into a massive pan-domain knowledgebase, which would require that we conquer the aforementioned automation problem, we can base our semantic web model on expert-constructed domain-specific knowledgebases, which by definition include domain specific AI, and which have been in existence for some time now, providing a lot of value in specific domains.

The suggestion I had put forward three years ago in Wikipedia 3.0 (which remains as the most widely ready article on the semantic web with over 250,000 hits) was to take Wikipedia and its set of the experts, who are estimated at 30,000 peers as of 2006, and get those 30,000 experts to help build the ontologies for all domains currently covered by Wikipedia as well as properly format the information that is already on Wikipedia so that a pan-domain knowledgebase can be built on top of Wikipedia, which would be able to reason about information in all domains of knowledge covered by Wikipedia, resulting in the ultimate answer machine.

The Wikipedia 3.0 article and some of links there describe how that can be done at a high level as well as some implementation ideas. There is nothing ”hard” there except the leadership problem.

It’s clear that the leadership problem has not been tackled, especially having seen how Jimmy Wales started and quit a VC funded search startup after I had penned the article. Taking VC money (and being part of that model) is not the right way to do it. The right way is to carry it out with community support as a non-profit venture, not supported by VC funds. That’s partly because any such startup would require the free labor of those 30,000 Wikipedia contributors in their roles as domain-specific experts. So it just wasn’t going to work out to expect those contributors to work for free so that Jimmy Wales can make a fortune.

P2P Energy Economy (R3.00.00)

In Uncategorized on October 21, 2008 at 7:18 pm

The P2P Energy Economy fuses the latest advances in SmartGrid technology, P2P trading and lending, and P2P energy production (from renewables) into an abundance-sustaining economy, including a new kind of currency designed to work with a small but growing category of goods and services that can be produced on abundant basis.

For full model: http://p2pfoundation.net/P2P_Energy_Economy

P2P Energy Production (Smart Grid) and P2P Web

In Uncategorized on September 9, 2008 at 9:49 pm

Author: Marc Fawzi
License: Attribution-NonCommercial-ShareAlike 3.0

~~

In the future, everyone will be an energy producer and consumer. Everyone will produce their own energy and either sell the surplus to others or buy extra wattage from others.

That’s part of the premise and promise of the “smart grid” aka “intelligent utility network” aka “Intergrid.”

See this: http://www.odemagazine.com/doc/56/talkin-bout-my-generation/2

So if everyone can be a producer and consumer of energy then everyone can also be a producer (not just a consumer) of Web infrastructure, starting with people owning Mesh/802.11s-enabled wireless routers and all the way to people owning and renting out P2P-enabled storage, processing power and connectivity.

Where do today’s dominant Web players fit in such a scenario (e.g. Google)?

Answer: nowhere, as far as I can see.

Google is the biggest private consumer of energy. They may also be the biggest producer of energy one day. But I’m betting that such a day won’t come; i.e., that we will move to a P2P (or edge-driven) consumer-producer model, or P2P Economy, and away from the network -or cloud- centric model.

Related

  1. Towards a World Wide Mesh (WWM)
  2. P2P Energy Economy

People-Hosted “P2P” Version of Wikipedia

In Uncategorized on July 23, 2008 at 1:29 pm

Author: Marc Fawzi
License: Attribution-NonCommercial-ShareAlike 3.0

Wikipedia and Web 3.0

Problem Statement:

The New York Times’ Web 3.0 article on Web 3.0 from last year, which is basically a re-wording of the popular Evolving Trends’ Web 3.0 article that came out five (5) months before it, was accepted it into the Wikipedia entry on Web 3.0 while a Wikipedia admin (or zealot) rejected the inclusion of the Evolving Trends article on the basis that it is a blog entry, i.e. insignificant, despite the fact that the Evolving Trends article was read by 211,000 people (to date) and quoted by hundreds of people, which probably makes it the most read blog article about the Semantic Web to date.

So it is disturbing that arbitrary rules, which are often applied arbitrarily, can be exploited to put the “privilege-to-dictate-what-qualifies-as-knowledge” ahead of the right of the public to complete, well-rounded and uncensored knowledge.

Why was a copy-cat article about Web 3.0 authored by the New York Times more significant than the original blog article, which preceded the New York Times article, and which was read and quoted by a very significant number of people…?

In other words, what makes the blog a lesser medium than a newspaper, especially one that has had several ethics breaches including plagiarism?

Let’s say that I had agreed to publish the Web 3.0 article in question in a well respected academic journal, which I had been contacted by (through a contributing editor at Stanford University), then would it have made the ideas any more legitimate?

That proof of quality comes from the relevance of the subject to the people. Today, two years after I blogged “Wikipedia 3.0: The End of Google?” (the first article to coin the term Web 3.0 in connection to the Semantic Web, AI Agents and Wikipedia ), we can find many Semantic Web startups today that are applying semantic technology to Wikipedia. Before the publication of the article, there were not one startup and no mention of Wikipedia in the context of the Semantic Web, although the Semantic Mediawiki guys were working in that direction already.

Is it by pure coincidence that after the huge popularity of the Evolving Trends Web 3.0 article we now have not one or two but several startups and groups working on applying semantic technology to Wikipedia? including PowerSet, a startup that was recently acquired by Microsoft, and, not surprisingly, Wikia, a commercial venture led by Jimmy Wales, Wikipedia’s founder. In addition, when the Evolving Trends Web 3.0 article was published it became the top blog post on the Google Finance front page (and stayed there for a few of months) when people searched for GOOG (Google’s stock symbol.) So I’m sure Google’s investors and management did notice it, so it’s not hard at all to think that it also had _some_ influence on Google’s decision to build a Wikipedia competitor, although there is no way to prove it did.

All of this leaves me wondering why the Evolving Trends Web 3.0 article was removed from the Web 3.0 entry in Wikipedia? After all, it was the first article to coin, in highly publicized manner, the term “Web 3.0″ in conjunction with the Semantic Web and Wikipedia.

When the rules are arbitrary, and when they are applied arbitrarily, it’s impossible to tell the reason or the motive behind the reason.

The whole affair is not a single isolated case. It has happened and is happening regularly to many well-known bloggers and authors, yet it has its unique circumstances and unique flavor (or personal experience) in each case.

Thus, I feel that based on my experience and the experiences of many others that there is a real flaw in the governance model of Wikipedia, which needs to be fixed, or else risk being exposed more broadly to the tyranny that comes with the arbitrary dictation of the truth and the rewriting of history in a way that fits the agenda of those with power and influence, who can rewrite history at will by dictating what gets written about any events, which in my particular case happens to be the highly publicized first coining of the term “Web 3.0″ in conjunction with the Semantic Web and Wikipedia itself, which is no where to be seen on Wikipedia!

The Solution is P2P:

The best fix, IMO, is to replace Wikipedia with a distributed “P2P”-hosted encyclopedia that allows multiple versions of any given topic, from different authors, which would be rated by the users.

Eventually, or as the second logical step, we would need to apply a democratic model rather than rely on the unwisdom of the crowds. In other words,  are the Wikipedia admins elected by the people? No, they’re not!  So what must be done in the proposed “people hosted” Wikipedia is to let the people (us, the users) elect representatives who would rate up or down the various versions of a given topic entry submitted by different authors (but who would not be able to delete or bury any of the versions.)

See the following for more on building a governance model:

  1. The Unwisdom of Crowds
  2. The Future of Governance

-*-

Wikia and Web 3.0

The hosting of the Semantic Mediawiki, i.e. the Web 3.0 version of of Wikipedia’s platform, has been taken over by Wikia, a commercial venture founded by Wikiepdia’s own founder Jimmy Wales. This opens up a huge conflict of interest, which is, namely, the fact that Wikipedia’s founder is running a commercial venture that takes creative improvements to Wikipedia’s platform, e.g. Semantic Mediawiki, and hosts (with potential to transfer) those improvements on Wikia, his own commercial for-profit venture. This shows poor judgment at best and an explicit conflict of interest at worst. And we’re talking about a key figure in Wikipedia’s governing body.

-*-

New York Times and Web 3.0

Here is the Evolving Trends article that was the first article to coin, in a very publicized manner, the term “Web 3.0″ in the context of the Semantic Web, Wikipedia and AI agents:

http://evolvingtrends.wordpress.com/2006/06/26/wikipedia-30-the-end-of-google/

And here is the Web 3.0 article by the New York Times that came five (5) months after the above-mentioned article:

http://www.nytimes.com/2006/11/12/business/12web.html

Related

  1. Wikipedia 3.0: The End of Google?
  2. The Unwisdom of Crowds
  3. The Future of Governance


P2P version of Twitter using Flash 10

In Uncategorized on May 22, 2008 at 3:54 am

A massively scalable, highly redundant version of Twitter can be built using the P2P feature of Flash 10.

For pennies, too.

The devil is always in the details but this is something that can be conquered now, thanks to Adobe.

Another way of saying it, building a massively scalable, highly redundant Twitter clone is not exactly trivial but it can NOW be done 800 times easier than having to write your own P2P layer not to mention having to ask people to download a PC client.

Adobe has just changed the game by making P2P technology accessible to all Flex/Flash developers.

Towards a World Wide Mesh (WWM)

In Uncategorized on March 8, 2008 at 6:34 am

Author: Marc Fawzi
License: Attribution-NonCommercial-ShareAlike 3.0

The One Laptop Per Child (OLPC) project has brought some interest to mesh networking.

In theory, the XO laptop has the ability to form a wireless mesh together with other XO laptops in its vicinity. Each laptop extends the mesh further, like a link in a long chain.

Such mesh technology, when supplemented by signal repeaters, can theoretically cover entire villages. Since villages, towns, cities are connected to each other via the Internet these meshes come together in one world-wide mesh.

The Web and the Internet as a result is constrained (by many economic, political and technical factors) to working within the client-server model. The inability to establish direct communication between applications on different PCs, without having to go through difficult -and sometimes- unreliable paths (think: UDP hole punching, NAT traversal, uPnP etc), combined with ISPs’ tendency to throttle and even block P2P traffic has resulted in an unhealthy environment for P2P applications.

The XO laptop maybe the first sign of a global shift from the client-server model of the Web to the peer-to-peer model of wireless mesh technology.

The current Web architecture is bound to evolve over the next few decades as the architecture of global communication moves from the network-centric model to the peer-to-peer model, enabled by wireless mesh technology.

Related

  1. World Wide Mesh

Google Warming Up to the Wikipedia 3.0 vision?

In Uncategorized on December 14, 2007 at 8:09 pm

Google’s “Knol” Reinvents Wikipedia

Posted by CmdrTaco on Friday December 14, @08:31AM
from the only-a-matter-of-time dept.

 

teslatug writes “Google appears to be reinventing Wikipedia with their new product that they call knol (not yet publicly available). In an attempt to gather human knowledge, Google will accept articles from users who will be credited with the article by name. If they want, they can allow ads to appear alongside the content and they will be getting a share of the profits if that’s the case. Other users will be allowed to rate, edit or comment on the articles. The content does not have to be exclusive to Google but no mention is made on any license for it. Is this a better model for free information gathering?”

This article Wikipedia 3.0: The End of Google?  which gives you an idea why Google would want its own Wikipedia was on the Google Finance page for at least 3 months when anyone looked up the Google stock symbol, so Google employees, investors and executive must have seen it. 

Is it a coincidence that Google is building its own Wikipedia now?

The only problem is a flaw in Google’s thinking. People who author those articles on Wikipedia actually have brains. People with brains tend to have principles. Getting paid pennies to build the Google empire is rarely one of those principles.

Related

The World Wide Mesh (WWM)

In Uncategorized on December 7, 2007 at 6:01 am

I’m not sure why Wifi hardware vendors don’t update their firmware so that each Wifi router/bridge sold can communicate with nearby ones.

Who needs Level 3 and Worldcom if we have the ability to connect to each other over the air!

Update

This article below talks about P2P set-top boxes, i.e. the wired version of the World Wide Mesh.

Using set top boxes and Peer-to-Peer technology.

Thought Seeders and Thought Leechers – Updated

In Uncategorized on April 7, 2007 at 2:52 pm

My intention with Evolving Trends was not to see the ideas that are articulated here get adopted by others with no contribution to the debate whatsoever.

But I was reminded today about the true purpose of this blog:

“Don’t worry about people stealing your ideas. If your ideas are any good, you’ll have to ram them down people’s throats.”

– Howard Aiken quoted by Ken Iverson quoted by Jim Horning, 1979

So far, it looks like we did our part by ramming it down Google’s throat: link (and now Google “Knol” )

And we believe that we got Wikipedia’s founder Jimmy Wales to jump on the crowd-enhanced semantic search bandwagon with Wikia, his VC-backed startup: link (also see the Update section of this post re: Jimmy Wales’ conflict of interest)

Here’s the popular Evolving Trends article, Wikipedia 3.0: The End of Google, that preceded both Google Knol and Wikia.

Now, it seems that Hakia may have joined the club, too.

Description of a new service that Hakia just put on their site http://labs.hakia.com/hakia-lab-dial.html

Dialogue Algorithm

The long-term objective of the Dialogue algorithm is to establish a human-like dialogue with the user. The vision is to convert the search engine’s role into a computerized assistant with advanced communication skills while utilizing the largest amount of information resources in the world.The challenge is to analyze search results ranked by the SemanticRank algorithm one step further to determine whether the information can be used to communicate with the user at an elevated level of confidence about its accuracy and credibility. hakia’s conversational system is currently under development. A simplified version of this utility is at the BETA site that allows hakia developers (and the users) to monitor the incremental improvements at every step of the way.An interactive test system is available by invitations which is further advanced than that on the BETA site. You can request access to test this system, or Login using your access code.

Compare this to the Evolving Trends article that preceded their description:

Designing a Better Web 3.0 Search Engine

It’s intriguing to see how ideas spread… but a society where most people are thought leechers (as opposed to thought seeders) is a society that is headed for failure.

It would be great if the broken patent system was to be replaced with a simple P2P co-creative system that promotes balanced ‘contribution ratios’ so everyone contributes to the debate, not simply copy and co-opt other people’s contributions…

Google vs Web 3.0

In Uncategorized on March 24, 2007 at 11:35 pm

In Web 3.0, he who owns the metadata owns the Web.

User Enhanced Search

With Googel Co-Op, Google tried to leverage user-supplied metadata to enhance the accuracy and relevance of Google searches.

Now they’re trying it again with Image Labeler.

But this time they want users to actually use it so they’re making it into a Squirrel Wheel kind of game where you get to play the squirrel.

From their description:

“You’ll be randomly paired with a partner who’s online and using the feature. Over a 90-second period, you and your partner will be shown the same set of images and asked to provide as many labels as possible to describe each image you see. When your label matches your partner’s label, you’ll earn some points and move on to the next image until time runs out. After time expires, you can explore the images you’ve seen and the websites where those images were found. And we’ll show you the points you’ve earned throughout the session.”

You’re better off annotating Wikipedia (using Semantic MediaWiki) and applying your knowledge of a given subject (or domain) to build intelligence into Wikipedia, which is owned by the people (as a non-profit, people funded, people powered encyclopedia.) Why be a squirrel in Google’s squirrel wheel only to have Google abuse your good will?

Update

Google won’t give up. They really do wanna be the Microsoft of the Web.

Here they are trying to copy/co-opt the “Wikiedpai 3.0″ vision by developing their own Wikipedia:

  1. http://evolvingtrends.wordpress.com/2007/12/14/google-tries-again-to-co-opt-the-wikipedia-30-vision/

Again, the only problem is a flaw in Google’s thinking. People who author those articles on Wikipedia actually have brains. People with brains tend to have principles. Getting paid pennies to build the Google empire is rarely one of those principles.

Related

  1. Wikipedia 3.0: The End of Google?
  2. Google Co-Op: The End of Wikipedia?
  3. Web 3.0 Update
  4. Is Google a monopoly?
  5. Designing a Better Semantic Search Engine*
  6. Web 3.0 (Definition)

The Missing Link

In Uncategorized on February 5, 2007 at 5:28 am

At the time the Wikipedia 3.0: The End of Google? article was written, I didn’t think it necessary to supply external references, since it was just another idea of mine (came out of the blue on evening) and I had not expected the massive interest it would generate. Lately, however, I’ve been looking at what others have done and I came across this old but relevant paper from 2003, which should provide a more detailed technical context to developers as far as the use of rule-based inference engines and ontologies in the context of Semantic Web + AI (or Web 3.0.)

  1. Description Logic Programs: Combining Logic Programs with Description Logic

“A key requirement for the Semantic Web’s architecture overall is to be able to layer rules on top of ontologies–in particular to create and reason with rule-bases that mention vocabulary specified by ontology-based knowledge bases–and to do so in a semantically coherent and powerful manner.”

Self-Aware Text

In Uncategorized on January 13, 2007 at 4:57 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

Enabling self-organizing text

Below is a summary of an interesting model that I believe can be used to realize self-organizing text (excerpted from this rather weird but technically sound source):

Spin glasses are materials with chaotically oriented atomic spins which can reach neither a ferromagnetic equilibrium (spins aligned) nor a paramagnetic one (spins canceling in pairs), because of long-range spin interactions between magnetic trace atoms (Fe) and the conduction electrons of the host material (Cu). Because these effects reverse repeatedly with distance, no simple state fully resolves the dynamics, and spin glasses thus adopt a large variety of [globally] disordered states [with short range order.] Modeling the transition to a spin glass [i.e. simulated annealing] has close parallels in neural nets, particularly the Hopfield nets consisting of symmetrically unstable circuits. Optimization of a task is then modeled in terms of constrained minimization of a potential energy function. However the problem of determining the global minimum among all the local minima in a system with a large number of degrees of freedom is intrinsically difficult. Spin glasses are also chaotic and display sensitive dependence. Similar dynamics occurs in simulations of continuous fields of neurons.

Annealing is a thermodynamic simulation of a spin glass in which the temperature of random fluctuations is slowly lowered, allowing individual dynamic trajectories to have a good probability of finding quasi-optimal states. Suppose we start out at an arbitrary initial state of a system and follow the topography into the nearest valley, reaching a local minimum. If a random fluctuation now provides sufficient energy to carry the state past an adjacent saddle, the trajectory can explore further potential minima. Modeling such processes requires the inclusion of a controlled level of randomness in local dynamical states, something which in classical computing would be regarded as a leaky, entropic process. The open environment is notorious as a source of such [controlled level of randomness], which may have encouraged the use of chaotic systems in the evolutionary development of the vertebrate brain.

—End of Excerpted Model Summary—

Imagine the interaction between random words in the English language having two properties: aligned and none-aligned. If you throw the whole set of words into a heated spin-glass alloy (e.g. Cu-Fe), where the words replace the atoms and where word-word interactions replace spin-spin interactions, and then let it cool slowly (i.e. anneal it) then the system (of word-word interactions) should theoretically self-organize into the lowest potential energy state it could find.

The spin glass model (from the above quoted summary) implements an optimization process that is also a self organizational process that finds the local energy minima associated with a quasi-optimal state for the system which in turn organizes the local interactions between atomic spins (or words) to minimize discordant interactions (or disorder) in the short range, thus (in the case of word-word interactions) generating text that goes from garbage in the long range (as a result of globally disordered interactions in the long range) to well-formed in the short range (as a result of mostly aligned/ordered interactions in the short range.)

This idea is pretty raw, incomplete, and may not be the most proper use (or abuse) of the spin glass model (see References.)

However, in line with evolution’s preference for such a model for the brain, I find it useful to inject a controlled level of noise (randomness) into the thinking process.

Well, after having some apple crumble, I realize now (randomness works) that the reason this model will work well is because it will generate many well-formed sentences in each region in the state space (see image below) so there is bound to be a percentage of sentences that will actually make sense!

Having said that, this interpretation of the [SK] spin-glass model is pretty rough and needs more thinking to nail down, but the basic idea is good!

From Self-Organizing to Self Aware

What if instead of simply setting the rules and letting order emerge out of chaos (at least in the short range), as implied above, what if each word was an intelligent entity? What if each word knew how to fit itself with other words and within a sentence such that the words work collaboratively and competitively with each other to generate well-formed sentences and even whole articles?

The words would have to learn to read. :)

[insert your Web X.0 fantasy]

Reference

  1. Spin Glass Theory and Beyond

Images

Short range ordered regions in 2D state space of a spin glass.

Tags:

web 3.0, web 3.0, web 3.0, semantic web, semantic web, artificial intelligence, AI, statistical mechanics, stochastic, optimization, simulated-annealing, self-organization, spin glass

Designing a better Web 3.0 search engine

In Uncategorized on January 7, 2007 at 7:09 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

This post discusses the significant drawbacks of current quasi-semantic search engines (e.g. hakia.com, ask.com et al) and examines the potential future intersection of Wikipedia, Wikia Search (the recently announced search-engine-in-development, by Wikipedia’s founder), future semantic version of Wikipedia (aka Wikipedia 3.0), and Google’s Pagerank algorithm to shed some light on how to design a better semantic search engine (aka Web 3.0 search engine)

Query Side Improvements

Semantic “understanding” of search queries (or questions) determines the quality of relevant search results (or answers.)

However, current quasi-semantic search engines like hakia and ask.com can barely understand the user’s queries and that is because they’ve chosen free-form natural language as the query format. Reasoning about natural language search queries can be accomplished by: a) Artificial General Intelligence or b) statistical semantic models (which introduce an amount of inaccuracy in constructing internal semantic queries). But a better approach at this early stage may be to guide the user through selecting a domain of knowledge and staying consistent within the semantics of that domain.

The proposed approach implies an interactive search process rather than a one-shot search query. Once the search engine confirms the user’s “search direction,” it can formulate an ontology (on the fly) that specifies a range of concepts that the user could supply in formulating the semantic search query. There would be a minimal amount of input needed to arrive at the desired result (or answer), determined by the user when they declare “I’ve found it!.”

Information Side Improvements

We are beginning to see search engines that claim they can semantic-ize arbitrary unstructured “Wild Wild Web” information. Wikipedia pages, constrained to the Wikipedia knowledge management format, may be easier to semantic-ize on the fly. However, at this early stage, a better approach may be to use human-directed crawling that associates the information sources with clearly defined domains/ontologies. An explicit publicized preference for those information sources (including a future semantic version of Wikipedia, a la Wikipedia 3.0) that have embedded semantic annotations (using, e.g., RDFa http://www.w3.org/TR/xhtml-rdfa-primer/ or microformats http://microformats.org) will lead to improved semantic search.

How can we adapt the currently successful Google PageRank algorithm (for ranking information sources) to semantic search?

One answer is that we would need to design a ‘ResourceRank’ algorithm (referring to RDF resources) to manage the semantic search engines’ “attention bandwidth.” Less radical, may be to design a ‘FragmentRank’ algorithm which would rank at the page-component level (ex: paragraph, image, wikipedia page section, etc).

Related

  1. Wikipedia 3.0: The End of Google?
  2. Search By meaning

Update

  1. See relevant links under comments

Tags:

web 3.0, web 3.0, web 3.0, semantic web, semantic web, ontology, reasoning, artificial intelligence, AI, hakia, ask.com, pagerank, google, semantic search, RDFa, ResourceRank, RDF, Semantic Mediawiki, Microformats

Who 2.0: Update

In Uncategorized on December 20, 2006 at 7:40 am

First see:

Then read this (coming to a cell phone/search engine near you):

P2P to Disrupt eBay, Google, Yahoo

In Uncategorized on December 14, 2006 at 1:35 am

Forget Web 3.0 and P2P 3.0 for a second…

Non-Semantic, P2P-enabled versions of Google, eBay, Yahoo et al. can shift power to the user, especially since P2P eliminates the need for massive corporations to spend massive amounts of money on server farms to process the users’ transactions. Each user provides a piece of the server farm. The Web’s reported 2 billion users mean that a P2P version of Google can run on a server farm of at least 200 million servers. That is significantly more servers than Google can ever have.

So you wanna take on Google, eBay, Yahoo et al…? Then build a P2P version of any of their core services.

Web 3.0: Update

In Uncategorized on November 19, 2006 at 2:21 pm

As a key step in enabling the Web 3.0 vision (which is not to be exclusively associated with Wikipedia), startups and researchers are developing tools and processes to let domain experts with no knowledge of ontology construction build formal ontologies in a manner that is transparent to them, i.e. without them realizing that they’re building one.Such tools and processes are emerging from research organizations and Web 3.0 ventures.

This cripples the argument that domain specific ontologies can only be created by Semantic Web experts. Expert knowledge of your particular domain (or profession) is the only thing you’ll need to be part of the revolution.

Related

  1. Wikipedia 3.0: The End of Google?

Tags:

Web 3.0, Web 3.0, Semantic Web, AI

P2P 3.0: Shaking the Web to its Roots

In Uncategorized on October 8, 2006 at 5:02 pm

The application of P2P Search (semantic and non-semantic) to the Web will involve transforming the Web from the existing browser-server model to a browser-as-both-client-and-server model. This is different than P2P file sharing in the sense that it is about producing and consuming information (or meaning – in case of the P2P semantic Web model) rather than sharing digital data. Thus, P2P Search and P2P Semantic Search in particular may be viewed as being part of the Web 3.0 set of technologies.

[The abstract above hints at a "decentralized knowledge base" version of the previous model implied in the People's Google post]

Related

  1. Get Your DBin

Tags:

p2p search, p2p Web, Web 3.0, Web 3.0

Google Co-Op: The End of Wikipedia?

In Uncategorized on September 24, 2006 at 11:11 am

Did Google get the idea of using [subject matter] experts to structure [and give meaning to] information for improved findability from the Wikipedia 3.0 article? or as a pure and natural evolution of their thinking? or both?

In case you’re taking it literally, both the question and the title of the post are purely rhetorical at this point, now that Google seems to be on its way to adopting (or co-opting) the Web 3.0 vision.

Here is an excerpt from the newly announced Google Co-Op experiment:

Google Co-op gives you a way to improve search in the topics you know best. If you’re a doctor, for instance, with specific expertise in a particular disease, you can contribute by using the labels in the health topic to annotate all the webpages that you know provide useful, reliable information about that disease. Your patients and other Google users could then subscribe to you and benefit from your expertise.

You can participate in a number of topics that are already being worked on, such as health, destination guides, autos, computer & video games, photo & video equipment, and stereo & home theater. Or, if you’re passionate about something entirely different, go ahead and start a topic of your own. In this guide, we will walk you through an example of how to label webpages for an existing topic as well as how to create a topic of your own.

Google Co-Op has the potential in the future to follow the vision articulated in the Wikipedia 3.0 article as Google adds Web 3.0 capabilities to its search engine.

Update

  1. Google Tries Again to Co-opt the Wikipedia 3.0 Vision

Related

  1. Wikipedia 3.0: The End of Google?

Sources

  1. Web 2.0 or Web 3.0?

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Evolution, Google, GData, inference, inference engine, AI, ontology, Semanticweb, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, collective consciousness, Ontoworld, Wikipedia AI, Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AI, P2P Semantic Web inference Engine, Global Brain, semantic blog, intelligent findability, Google Co-Op

Wisdom Addition Machine – Updated

In Uncategorized on September 3, 2006 at 11:18 am

Updated on November 12, 2008

This is a reblogging of my email response on September 3, 2006 to Michel Bauwens, founder of The Foundation for P2P Alternatives, and other participants in the dialog.

—————————————————————————————————-

The statement from Michel’s interview that captures the difference in our thinking is:


Michel stated:

“The difference today, of course, is that we are no longer waiting for great leaders, since it is the collective intelligence of humankind that needs to rise to the level of its global challenges.”

While I agree with the ideal, I don’t see how we can ignore the fact that the crowd will always be less intelligent and less capable than its most intelligent and capable member.

I don’t yet see the conceptual scheme that would allow the individual wisdom of individuals in a crowd to be added up or multiplied rather than replaced with averaged judgment (not wisdom) or the lowest common denominator opinion.

Unless we can come up with a wisdom addition machine we still need to rely on individuals whose intellect, wisdom, taste, and ability exceed that of the crowd (i.e. exceed the average.) to lead the crowd.

The question is can wisdom be calculated? My answer is no, and the same goes for beauty, trust, friendship, love, goodness and all such concepts.

Related

  1. The Unwisdom of Crowds

Tags:

Trends, wisdom of crowds, mass psychology, cult psychology, Web 2.0, Web 2.0, digg, censorship, democracy, P2P, P2P 2.0, social bookmarking

P2P DNS for Firefox

In Uncategorized on September 2, 2006 at 5:57 am

In general, you can take any database-centric system and produce its disruptive counter-part using P2P technology.

If browsers like Firefox were to implement a P2P DNS plugin then who needs Verisign (Or GoDaddy for that matter) !?

If someone was to write a Firefox extension that sets up a P2P DNS system wherein each browser would map the topologically close set of IPv4/IPv6 addressses to a set of domain names and where each Firefox DNS extension can query all other Firefox DNS extensions running in the network, with milisecond latency, then, in theory, we could have a P2P domain name system (P2P DNS) that removes our dependency on the government controlled 13-server central DNS tree.

Obviously, the trick here is in having a scalable P2P database technology where each node in the network can resolve a domain name to an IP address by virtually (not physically) querying all other nodes in the network (until the query resolves) within seconds on first attempt and miliseconds on subsequent attempts. I believe that this technology either already exists or that the latest, greatest innovations in P2P database technology may be used to implement it. If not, then it’s just a matter of time before such disruptive technology emerges.

Such P2P DNS would apply to email, too, as long as it’s done from the browser (or by having such P2P DNS service loaded by the OS.)

Tags:

DNS, P2P, Verisign, Firefox, Firefox Extension, P2P DNS

Catch up, Lead, Conquer

In Uncategorized on August 24, 2006 at 3:00 am

I’m noticing that most visionary folks don’t have an overall perspective of their mission.

Many try to lead before they catch up to the competition.

Many try to conquer before they display successful leadership.

It’s one thing to talk of your vision and another thing to conquer.

Catch up before you lead, and lead before you conquer.

One of the greatest points behind the idea of a free market is that it allows for this step-wise rise to the top.

Destroy to Rule vs Rule to Destroy

In Uncategorized on August 16, 2006 at 7:58 pm

Disruptors prefer to destroy a competitor’s model in order to rule.

Non-Disruptors prefer to support the status quo and the idea of incremental change and in doing so they actually destroy the future.

That’s why it’s necessary to be disruptive.

Disrupt the competition, hit a high note and get out while you’re at the top of your game.

Obviously, that requires good timing, great execution and the ability to play smart, not just hard.

Use the FOAF, Luke.

In Uncategorized on August 11, 2006 at 11:20 pm

You’re probably thinking what a thmart title!

For those of you who don’t know, FOAF stands for Friend of a Friend. It is a social network description format based on RDF.

If people where to have their FOAFs encoded on RFID tags (along with User ID) and placed inside their watch (think Casio G-Shock) then whenever FOAF-watch-wearing users come within the coverage area of an RFID scanner network (in the real world) the network would detect if they have 1st, 2nd or 3rd-degree friend in common and transmit a signal to their watches which would in turn beep and display the name of the party in common as well as the name of the second party and their location relative to a given node within the RFID scanner network.

This would be cool to have at geek/IT conferences.

I’m very busy nowadays but I promise to write every now and then.

Tags:

FOAF, RDF, social network, tagging

Search By Meaning

In Uncategorized on July 29, 2006 at 8:28 am

I’ve been working on a detailed technical scheme for a “search by meaning” search engine (as opposed to [dumb] Google-like search by keyword) and I have to say that in conquering the workability challenge in my limited scope I can see the huge problem facing Google and other Web search engines in transitioning to a “search by meaning” model.

Related

  1. Wikipedia 3.0: The End of Google?
  2. P2P 3.0: The People’s Google
  3. Intelligence (Not Content) is King in Web 3.0
  4. Web 3.0 Blog Application
  5. Towards Intelligent Findability
  6. All About Web 3.0

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Evolution, Google, inference engineWeb 2.0, Web 2.0Web 3.0, AI, Wikipedia, Wikipedia 3.0, Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AIP2P Semantic Web inference Engine, intelligent findability, search by meaning

The Geek VC Fund Project: 7/26 Update

In Uncategorized on July 25, 2006 at 10:55 pm

This post is an update to the original post about the Geek-Run, Geek-Funded Venture Capital Fund.

  1. The idea has evolved by leaps and bounds.
  2. First project to be funded within 4-5 months.
  3. The framework will be socialized with the public at large when we have the first fruits of our labor.

Tags:

Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, funding, private equity, geek, seed funding, early stage, Startup

WordPress Logo Competition

In Uncategorized on July 20, 2006 at 10:49 pm

I’d like to call your attention to the WordPress logo competition going on in the Ideas forums on WordPress.com

Here is my entry for all the w^P fans:

Stolen art. That’s for sure.

Beats

  1. Red Star Over Qubah

Google dont like Web 3.0 [sic]

In Uncategorized on July 20, 2006 at 11:40 am

Why am I not surprised?

Google exec challenges Berners-Lee

The idea is that the Semantic Web will allow people to run AI-enabled P2P Search Engines that will collectively be more powerful than Google can ever be, which will relegate Google to just another source of information, especially as Wikipedia [not Google] is positioned to lead the creation of domain-specific ontologies, which are the foundation for machine-reasoning [about information] in the Semantic Web.

Additionally, we could see content producers (including bloggers) creating informal ontologies on top of the information they produce using a standard language like RDF. This would have the same effect as far as P2P AI Search Engines and Google’s anticipated slide into the commodity layer (unless of course they develop something like GWorld)

In summary, any attempt to arrive at widely adopted Semantic Web standards would significantly lower the value of Google’s investment in the current non-semantic Web by commoditizing “findability” and allowing for intelligent info agents to be built that could collaborate with each other to find answers more effectively than the current version of Google, using “search by meaning” as opposed to “search by keyword”, as well as more cost-efficiently than any future AI-enabled version of Google, using disruptive P2P AI technology.

For more information, see the articles below.

Related

  1. Wikipedia 3.0: The End of Google?
  2. All About Web 3.0
  3. P2P 3.0: The People’s Google
  4. Intelligence (Not Content) is King in Web 3.0
  5. Web 3.0 Blog Application
  6. Towards Intelligent Findability
  7. Why Net Neutrality is Good for Web 3.0

Somewhat Related

  1. Is Google a Monopoly?

Tags:

Semantic Web, Web strandards, Trends, OWL, Googleinference engine, AIWeb 2.0, Web 3.0AI, Wikipedia, Wikipedia 3.0, , Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AI, P2P Semantic Web inference Engine, semantic blog, intelligent findability, RDF

Towards Intelligent Findability

In Uncategorized on July 19, 2006 at 9:09 am

A lot of buzz about Web 3.0 and Wikipedia 3.0 has been generated lately from this blog, so I’ve decided that for my guest post here I’d like to dive into this idea and take a look at how we’d build a Semantic Content Management System (CMS).

Objective

We want a CMS capable of building a knowledge base (that is a set of domain-specific ontologies) with formal deductive reasoning capabilities.

Requirements

  1. A semantic CMS framework.
  2. An ontology API.
  3. An inference engine.
  4. A framework for building info-agents.

HOW-TO

The general idea would be something like this:

  1. Users use a semantic CMS like Semantic MediaWiki to enter information as well as semantic annotations (to establish semantic links between concepts in the given domain on top of the content) This typically produces an informal ontology on top of the information, which, when combined with domain inference rules and the query structures (for the particular schema) that are implemented in an independent info agent or built into the CMS, would give us a Domain Knowledge Database. (Alternatively, we can have users enter information into a non-semantic CMS to create content based on a given doctype or content schema and then front-end it with an info agent that works with a formal ontology of the given domain, but we would then need to perform natural language processing, including using statistical semantic models, since we would lose the certainty that would normally be provided by the semantic annotations that, in a Semantic CMS, would break down the natural language in the information to a definite semantic structure.)
  2. Another set of info agents adds to our knowledge base inferencing-based querying services for information on the Web or other domain-specific databases. User entered information plus information obtained from the web makes up our Global Knowledge Database.
  3. We provide a Web-based interface for querying the inference engine.

Each doctype or schema (depending on the CMS of your choice) will have a more or less direct correspondence with our ontologies (i.e. one schema or doctype maps with one ontology). The sum of all the content of a particular schema makes up a knowledge-domain which when transformed into a semantic language like (RDF or more specifically OWL) and combined with the domain inference rules and the query structures (for the particular schema) constitute our knowledge database. The choice of CMS is not relevant as long as you can query its contents while being able to define schemas. What is important is the need for an API to access the ontology. Luckily projects like JENA fills this void perfectly providing both an RDF and an OWL API for Java.

In addition, we may want an agent to add or complete our knowledge base using available Web Services (WS). I’ll assume you’re familiarized with WS so I won’t go into details.

Now, the inference engine would seem like a very hard part. It is. But not for lack of existing technology: the W3C already have a recommendation language for querying RDF (viz. a semantic language) known as SPARQL (http://www.w3.org/TR/rdf-sparql-query/) and JENA already has a SPARQL query engine.

The difficulty lies in the construction of ontologies which would have to be formal (i.e. consistent, complete, and thoroughly studied by experts in each knowledge-domain) in order to obtain powerful deductive capabilities (i.e. reasoning).

Conclusion

We already have technology powerful enough to build projects such as this: solid CMS, standards such as RDF, OWL, and SPARQL as well as a stable framework for using them such as JENA. There are also many frameworks for building info-agents but you don’t necessarily need a specialized framework, a general software framework like J2EE is good enough for the tasks described in this post.

All we need to move forward with delivering on the Web 3.0 vision (see 1, 2, 3) is the will of the people and your imagination.

Addendum

In the diagram below, the domain-specific ontologies (OWL 1 … N) could be all built by Wikipedia (see Wikipedia 3.0) since they already have the largest online database of human knowledge and the domain experts among their volunteers to build the ontologies for each domain of human knowledge. One possible way is for Wikipedia will build informal ontologies using Semantic MediaWiki (as Ontoworld is doing for the Semantic Web domain of knowledge) but Wikipedia may wish to wait until they have the ability to build formal ontologies, which would enable more powerful machine-reasoning capabilities.

[Note: The ontologies simply allow machines to reason about information. They are not information but meta-information. They have to be formally consistent and complete for best results as far as machine-based reasoning is concerned.]

However, individuals, teams, organizations and corporations do not have to wait for Wikipedia to build the ontologies. They can start building their own domain-specific ontologies (for their own domains of knowledge) and use Google, Wikipedia, MySpace, etc as sources of information. But as stated in my latest edit to Eric’s post, we would have to use natural language processing in that case, including statistical semantic models, as the information won’t be pre-semanticized (or semantically annotated), which makes the task more dificult (for us and for the machine …)

What was envisioned in the Wikipedia 3.0: The End of Google? article was that since Wikipedia has the volunteer resources and the world’s largest database of human knowledge then it will be in the powerful position of being the developer and maintainer of the ontologies (including the semantic annotations/statements embedded in each page) which will become the foundation for intelligence (and “Intelligent Findability”) in Web 3.0.

This vision is also compatible with the vision for P2P AI (or P2P 3.0), where users run P2P inference engines on their PCs that communicate and collaborate with each other and that tap into information form Google, Wikipedia, etc, which will ultimately push Google and central search engines down to the commodity layer (eventually making them a utility business just like ISPs.)

Diagram

Related

  1. Wikipedia 3.0: The End of Google? June 26, 2006
  2. Wikipedia 3.0: El fin de Google (traducción) July 12, 2006
  3. Web 3.0: Basic Concepts June 30, 2006
  4. P2P 3.0: The People’s Google July 11, 2006
  5. Why Net Neutrality is Good for Web 3.0 July 15, 2006
  6. Intelligence (Not Content) is King in Web 3.0 July 17, 2006
  7. Web 3.0 Blog Application July 18, 2006
  8. Semantic MediaWiki July 12, 2006
  9. Get Your DBin July 12, 2006

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Google, GData, inference engine, AI, ontology, Semantic Web, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, Ontoworld, Wikipedia AI, Info Agent, Semantic MediaWiki, DBin, P2P 3.0, P2P AI, P2P Semantic Web inference Engine, semantic blog, intelligent findability, JENA, SPARQL, RDF, OWL

All About Web 3.0

In Uncategorized on July 18, 2006 at 3:29 pm

Semantic Blog

In Uncategorized on July 17, 2006 at 9:02 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

Background

As concluded in my previous post there’s an exponetial growth in the amount of user-generated content (videos, blogs, photos, P2P content, etc).

The enormous amount of free content available today is just too much for the current “dumb search” technology that is used to access it.

I believe that content is now a commodity and the next layer of value is all about “Intelligent Findability.”

Take my blog for example, it’s less than 60 days old, and I’ve never blogged before, but as of today it already has ~500 RSS daily subscribers (and growing), with a noticeable increase after the iPod post I made 3 days ago, 6,281 incoming links (according to MSN) and ~70,000 page views in total so far (mostly due to the Wikipedia 3.0 post, which according to Alexa.com reached an estimated ~2M people.) That demonstrates the potential of blogs to generate and spread lots of content.

So there is a lot of blog-generated content (if you consider how many bloggers are out there) and that doesn’t even include the hundreds of thousands (or millions?) of videos and photos uploaded daily to YouTube, Google Video, Flickr and all those other video and photo sharing sites. It also doesn’t include the 30% of total Internet bandwidth being sucked up by BitTorrent clients.

There’s just too much content and no seriously effective way to find what you need. Google is our only hope for now but Google is rudimentary compared to the vision of Semantic-Web Info Agents expressed in the Wikipedia 3.0 and Web 3.0 articles.

Idea

We’d like to embed “Intelligent Findability” into a blogging application so that others will be able to get the most of the information, ideas and analyses we generate.
If you do a search right now for “cool consumer idea” you will not get the iPod post. Instead you will get this post, but that is because I’m specifically making the association between “cool consumer idea” and “iPod” in this post.

Google tries to get around the debilitating limitation of keyword-based search engine technology in the same way by letting people associate phrases or words with a given link. If enough people linked to the iPod post and put the words “cool consumer idea” in the link then when searching Google for “cool consumer idea” you will see the iPod post. However, unless people band together and decide to call it a “cool consumer idea” it won’t show up in the search results. You would have to enter something like “portable music application” (which is actually one of the search results that showed up on my WordPress dashboard today.)

Using Semantic MediaWiki (which allows domain experts to embed semantic annotations into the information) I could insert semantic annotations to semantically link concepts in the information on this blog that would build an ontology that defines semantic relationships between terms in the information (i.e. meaning) where “iPod” would be semantically related to “product” which would be semantically related to “consumer electronics” and where the sentence Portable Music Studio would be semantically related (through use of annotations) to “vision”, “idea”, “concept”, “entertainment”, “music”, “consumer electronics”, “mp3 player” and so on, while the “iPod” would be also semantically related to “cool” (as in what is “cool”?) Thus, using rules of inference for my domain of knowledge I should able to deliver an intelligent search capability that deductively reasons the best match to a search query, based on matching the deduced meanings (represented as semantic graphs) from the user’s query and the information.

The quality of the deductive capability would depend on the consistency and completeness of the semantic annotations and the pan-domain or EvolvingTrends-domain ontology that I would build, among other factors. But generally speaking, since the ontology and the semantic annotations would be built by me if we think alike (or have a fairly similar semantic model of the world) then you will not only be able to read my blog but you will be able to read my mind. The idea is that, with my help in supplying the semantic annotations, such system will be able to deduce possible meaning (as a graph of semantic relationships) out of each sentence in the post and respond to search queries by reasoning about meaning rather than matching keywords.

This is possible with Semantic MediaWiki (which is under development) However, in this particular instance, I don’t want a Semantic Wiki. I want a Semantic Blog. But that should be just a simple step away.

Related

  1. Wikipedia 3.0: The End of Google?
  2. Towards Intelligent Findability
  3. Web 3.0: Basic Concepts
  4. Intelligence (Not Content) is King in Web 3.0
  5. Semantic MediaWiki

Tags:

semantic web, Web 3.0, Semantic MediaWiki, semantic web, semantic blog, intelligent findability, inference engine

Intelligence (Not Content) is King in Web 3.0

In Uncategorized on July 17, 2006 at 2:35 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

Observation

  1. There’s an enormous amount of free content on the Web.
  2. Pirates will aways find ways to share copyrighted content, i.e. get content for free.
  3. There’s an exponential growth in the amount of free, user-generated content.
  4. Net Neutrality (or the lack of a two-tier Internet) will only help ensure the continuance of this trend.
  5. Content is is becoming so commoditized that it only costs us the monthly ISP fee to access.

Conslusions (or Hypotheses)

The next value paradigm in the content business is going to be about embedding “intelligent findability” into the content layer, by using a semantic CMS (like Semantic MediaWiki, that enables domain experts to build informal ontologies [or semantic annotations] on top of the information) and by adding inferencing capabilities to existing search engines. I know this represents less than the full vision for Web 3.0 as I’ve outlined in the Wikipedia 3.0 and Web 3.0 articles but it’s a quantum leap above and beyond the level of intelligence that exists today within the content layer. Also, semantic CMS can be part of P2P Semantic Web Inference Engine applications that would push central search model’s like Google’s a step closer to being a “utility” like transport, unless Google builds their own AI, which would then have to compete with the people’s P2P version (see: P2P 3.0: The People’s Google and Get Your DBin.)

In other words, “intelligent findability” NOT content in itself will be King in Web 3.0.

Related

  1. Towards Intelligent Findability
  2. Wikipedia 3.0: The End of Google?
  3. Web 3.0: Basic Concepts
  4. P2P 3.0: The People’s Google
  5. Why Net Neutrality is Good for Web 3.0
  6. Semantic MediaWiki
  7. Get Your DBin

Tags:

net neutrality, two-tier internet, content, Web 3.0, inference engine, semantic-web, artificial intelligence, ai

Why Net Neutrality is Good for Web 3.0

In Uncategorized on July 15, 2006 at 1:30 pm

(this post was last updated at 10:00am EST, July 22, ‘06)

Facts

1. Telcos and Cable companies in the US are legally disallowed from blocking other carriers’ VoIP traffic. Last year, the FCC fined a North Carolina CLEC for doing that to Vonage.

2. Telcos and Cable companies have been in a turf war ever since cable companies started offering Internet access. This turf war escalated after cable companies started offering VoIP phone service, thus cutting deeply into the telcos’ main revenue stream.

3. The telcos’ response to the Cable companies’ entry into the phone market is to roll out their own TV services, based on IPTV (TV over IP), which are being rolled out at the speed of local and state government bureaucracies. IPTV would be carried on DSL lines, FTTC or FTTH.

4. The telcos’ response to Skype, Vonage, Yahoo IM (with VoIP) as well as their response to YouTube (and Google Video), who combinedly threaten the Telcos’ business model in the phone service and video delivery areas, was their push for a two-tiered internet, where the telcos, who happen to own the Internet backbones, would de-prioritize VoIP and video traffic from Skype, Vonage, YouTube, Google, Yahoo and others.

Net Neutrality

The telcos already charge the end user (in case they serve the end user directly) and the cable companies (for use of their backbone when traffic has to travel outside of the cable company’s own network.)

So I just don’t see why the telcos would have to charge the cable companies, Google, YouTube, Yahoo, Vonage, Skype, MSN, etc one more time.

The telcos’ backbones are not being used for free. They are either paid for by the telco’s users (if the telco is the ISP) or by the cable companies and CLECs using those backbones, who pass the cost to their users. So it’s us, the end users, who are paying for those backbones, not the telcos as the telcos make it sound like.

But it seems that the telcos are saying that they’re not charging enough for those backbones to ensure continued investment on their part in growing their backbone capacities and instead of increasing how much they charge for traffic, which would increase our monthly access fees, they’re suggesting to charge the heavy content providers (e.g. YouTube, Google, others) for high-priority traffic (e.g. VoIP, video streams) and do the same to the VoIP transport providers (e.g. Skype, Vonage, etc.)

Google, Skype, Yahoo, MSN and others, seeing how that would hurt their business interests and the interest of their users by forcing them to charge users for content and VoIP transport, have sponsored a Net Neutrality bill, which to the best of my knowledge has had a hard time going through Congress and the Senate.

Two Tier Internet

The telcos are struggling against the inevitable: that they will be a commodity industry like the railroad or trucking industries. The telcos, who understand all of the above, do not want to be confined to the transport of traffic because the transport business has become a commodity.

The same argument applies to VoIP transport providers. VoIP transport has become (or is becoming) a commodity business.

And if you ask me, “content” is also becoming a commodity business since the huge and ever-growing number of news, analysis and entertainment blogs, the millions of people who contribute their home videos, the pirates who can always figure out ways to share copyrighted content, and the tons of yet-to-be-explored opportunities for user-generated content all mean that content is now officially commoditized. In fact, content is so commoditized all it costs now is the small monthly fee users pay their ISP to access the net.

The Two-Tier Internet is an attempt by the telcos to attach artificially enhanced value to content once again by making content producers pay them for delivering their content without jitters and delays. It is also an attempt to attach artificially enhanced value to transport by forcing VoIP transport providers like Skype, Vonage, Yahoo etc to pay them to have their VoIP traffic transported without jitters and delays.

The Two-Tier Internet, aka the attempt by the telcos to attach artificially enhanced value to content and transport seems anti-progress and simply going nowhere.

However, the question is who will pay to invest in new backbone capacity? The answer (or part of the answer) is that content providers like Google are investing in building thier own networks (between their data centers) and such efforts can conceivably grow into new backbone investments, where Google, Yahoo, AOL et al would be investing in new network capacity growth.

If Content has Become a Commodity Then How Will Content and Transport Providers Deliver Genuine Enhanced Value?

The answer that I propose is by embedding intelligent findability (forget keyword –and tag– indexed information, think Web 3.0!) into their Ad-supported content layer.

So instead of “dumb search” (which gives us “dumb content”) we would embrace the Web 3.0 model of intelligent findability (i.e. allowing the machines to use information in an intelligent manner to find what we’re looking for.)

No wonder Tim Berners-Lee (the father of the Web and the originator of the “Semantic Web,” which I had popularized as Web 3.0 in the Wikipedia 3.0 article) has come out strongly in favor of net neutrality. Having said that, I’m not sure whether or not he would agree that the the natural commoditization of “dumb content,” which would be assured continuance under Net Neutrality, would help us get to the Web 3.0 model of intelligent findability sooner than if there was to be a two-tier Internet. The latter, in my opinion, would slow down the commoditization of ‘dumb content’, thus giving value-driven innovators less reason to explore the next layer of value in the content business, which I’m proposing is the Web 3.0 model of intelligent findability.

Related

  1. Towards Intelligent Findability
  2. Wikipedia 3.0: The End of Google?
  3. Intelligence (Not Content) is King in Web 3.0

Tags:

net neutrality, two-tier internet, content, Web 3.0, VoIP transport, VoIP, IPTV, Semantic Web

iPod As A Portable Music Studio

In Uncategorized on July 14, 2006 at 11:16 am

Idea

Build a smaller-sized version of Apple’s GarageBand music making software right into the iPod.

Why?

So we can make our own tunes man!

And remix “A Cappellas” with our own beats!

And sell our own productions on iTunes!

It’s all about user-generated content …

When will the iPod jump on the Web 2.0 bandwagon?

But Why?

Because that would totally rock!

Can it be Done?

In ‘02/’03 I invested in a project where we made a version of GarageBand for the mobile platforms. The prototype worked fine (with 8 tracks, real-time BPM matching and anti-clipping) but in 2003 the VCs had left town and no one was investing :D

The iPod (and specially the Video iPod) uses a much more powerful processor than the Gameboy Advance. So it should be able to go up to 16 tracks (or more) and have complex synthesizers, drum machines and sound effect generators (e.g. see FruityLoops) so users can make killer loops (aka “samples”)! Users would hunt for and gather samples (i.e. trade them on forums, blogs, etc) as well as open source their amateur productions.

Now that’s what I call impulsive consumption and production!

And You Don’t Have to Wait for Apple to Do it!

Check out Rockbox. They don’t have it yet but I don’t see why they couldn’t build it for the iPod.

P.S. This post was not written to generate traffic :P

Tags:

ipod, apple, music, itunes, mp3, mp3 player

H y p e r l o g i c

In Uncategorized on July 13, 2006 at 5:01 am

(this post was last updated at 7:10pm EST, July 13, ‘06)

I received the following analysis from Sam Rose (in response to a comment I made to him about why I thought “forcing a split in opinion is important”)

“In other words, if you grab their attention with [a] split [i.e. to make the crowd or individuals swing to either one of two 'distinct' positions rather than stay in a chaotic or undecidable middle], but then show them the full picture, then you are tuning your transmitter to their receiver, in effect.”

I realize from my discussions with Sam that this applies to how you may have to manipulate people’s psychology, in a corrective not manipulative way, in order for them to “get it.”

It’s what I refer to as “hyper logic”…

Related

  1. For Great Justice, Take Off Every Digg

Tags:

truth, belief, argument, consistency, psychology

Wikipedia 3.0: El fin de Google (traducción)

In Uncategorized on July 12, 2006 at 4:08 pm

Wikipedia 3.0: El fin de Google (traducción)

por Evolving Trends

Versión española (por Eric Rodriguez de Toxicafunk)

La Web Semántica (o Web 3.0) promete “organizar la información mundial” de una forma dramáticamente más lógica que lo que Google podría lograr con su diseño de motor actual. Esto es cierto desde el punto de vista de la comprensión por parte de las maquinas versus la humana. La Web Semántica requiere del uso de un lenguaje ontológico declarativo, como lo es OWL, para producir ontologías específicas de dominio que las máquinas pueden usar para razonar sobre la información y de esta forma alcanzar nuevas conclusiones, en lugar de simplemente buscar / encontrar palabras claves.

Sin embargo, la Web Semántica, que se encuentra todavía en una etapa de desarrollo en la que los investigadores intentan definir que modelo es el mejor y cual tiene mayor usabilidad, requeriría la participación de miles de expertos en distintos campos por un periodo indefinido de tiempo para poder producir las ontologías específicas de dominio necesarias para su funcionamiento.

Las maquinas (o más bien el razonamiento basado en maquinas, también conocido como Software IA o ‘agentes de información’) podrían entonces usar las laboriosas –mas no completamente manuales- ontologías elaboradas para construir una vista (o modelo formal) sobre como los términos individuales, en un determinado conjunto de información, se relacionan entre sí. Tales relaciones se pueden considerar como axiomas (premisas básicas), que junto con las reglas que gobiernan el proceso de inferencia permiten a la vez que limitan la interpretación (y el uso correctamente-formado) de dichos términos por parte de los agentes de información, para poder razonar nuevas conclusiones basándose en la información existente, es decir, pensar. En otras palabras, se podría usar software para generar teoremas (proposiciones formales demostrables basadas en axiomas y en las reglas de inferencia), permitiendo así el razonamiento deductivo formal a nivel de máquinas. Y dado que una ontología, tal como se describe aquí, se trata de un enunciado de Teoría Lógica, dos o más agentes de información procesando la misma ontología de un dominio específico serán capaces de colaborar y deducir la respuesta a una query (búsqueda o consulta a una base de datos), sin ser dirigidos por el mismo software.

De esta forma, y como se ha establecido, en la Web Semántica los agentes basados en maquina (o un grupo colaborador de agentes) serán capaces de entender y usar la información traduciendo conceptos y deduciendo nueva información en lugar de simplemente encontrar palabras clave.

Una vez que las máquinas puedan entender y usar la información, usando un lenguaje estándar de ontología, el mundo nuca volverá a ser el mismo. Será posible tener un agente de información (o varios) entre tu ‘fuerza laboral‘ virtual aumentada por IA, cada uno teniendo acceso a diferentes espacios de dominio especifico de comprensión y todos comunicándose entre si para formar una conciencia colectiva.

Podrás pedirle a tu agente o agentes de información que te encuentre el restaurante más cercano de cocina Italiana, aunque el restaurante más cercano a ti se promocione como un sitio para Pizza y no como un restaurante Italiano. Pero este es solo un ejemplo muy simple del razonamiento deductivo que las máquinas serán capaces de hacer a partir de la información existente.

Implicaciones mucho más sorprendentes se verán cuando se considere que cada área del conocimiento humano estará automáticamente al alcance del espacio de comprensión de tus agentes de información. Esto es debido a que cada agente se puede comunicar con otros agentes de información especializados en diferentes dominios de conocimiento para producir una conciencia colectiva (usando la metáfora Borg) que abarca todo el conocimiento humano. La “mente” colectiva de dichos agentes-como-el-Borg conformara la Maquina Definitiva de Respuestas, desplazando fácilmente a Google de esta posición, que no ocupa enteramente.

El problema con la Web Semántica, aparte de que los investigadores siguen debatiendo sobre que diseño e implementación de modelo de lenguaje de ontología (y tecnologías asociadas) es el mejor y el más usable, es que tomaría a miles o incluso miles de miles de personas con vastos conocimientos muchos años trasladar el conocimiento humano a ontologías especificas de dominio.

Sin embargo, si en algún punto tomáramos la comunidad Wikipedia y les facilitásemos las herramientas y los estándares adecuados con que trabajar (sean estos existentes o a desarrollar en el futuro), de forma que sea posible para individuos razonablemente capaces reducir el conocimiento humano en ontologías de dominios específicos, entonces el tiempo necesario para hacerlo se vería acortado a unos cuantos años o posiblemente dos

El surgimiento de una Wikipedia 3.0 (en referencia a Web 3.0, nombre dado a la Web Semántica) basada en el modelo de la Web Semántica anunciaría el fin de Google como la Maquina Definitiva de Respuestas. Este sería remplazado por “WikiMind” (WikiMente) que no sería un simple motor de búsqueda como Google sino un verdadero Cerebro Global: un poderoso motor de inferencia de dominios, con un vasto conjunto de ontologías (a la Wikipedia 3.0) cubriendo todos los dominios de conocimiento humano, capaz de razonar y deducir las respuestas en lugar de simplemente arrojar cruda información mediante el desfasado concepto de motor de búsqueda.

Notas
Tras escribir el post original descubrí que la aplicación Wikipedia, también conocida como MeadiaWiki que no ha de confundirse con Wikipedia.org, ya ha sido usado para implementar ontologías. El nombre que han seleccionado es Ontoworld. Me parece que WikiMind o WikiBorg hubiera sido un nombre más atractivo, pero Ontoworld también me gusta, algo así como “y entonces descendió al mundo,” (1) ya que se puede tomar como una referencia a la mente global que un Ontoworld capacitado con la Web Semántica daría a lugar.

En tan solo unos cuantos años la tecnología de motor e búsqueda que provee a Google casi todos sus ingresos/capital, seria obsoleta… A menos que tuvieran un contrato con Ontoworld que les permitiera conectarse a su base de datos de ontologías añadiendo así la capacidad de motor de inferencia a las búsquedas de Google.

Pero lo mismo es cierto para Ask,com y MSN y Yahoo.

A mi me encantaría ver más competencia en este campo, y no ver a Google o cualquier otra compañía establecerse como líder sobre los otros.

La pregunta, usando términos Churchilianos, es si la combinación de Wikipedia con la Web Semántica significa el principio del fin para Google o el fin del principio. Obviamente, con miles de billones de dólares con dinero de sus inversionistas en juego, yo opinaría que es lo último. Sin embargo, si me gustaría ver que alguien los superase (lo cual es posible en mi opinión).

(1) El autor hace referencia al juego de palabra que da el prefijo Onto de ontología que suena igual al adverbio unto en ingles. La frase original es “and it descended onto the world,”.

Aclaración
Favor observar que Ontoworld, que implementa actualmente las ontologías, se basa en la aplicación “Wikipedia” (también conocida como MediaWiki) que no es lo mismo que Wikipedia.org.

Así mismo, espero que Wikipedia.org utilice su fuerza de trabajo de voluntarios para reducir la suma de conocimiento humano que se ha introducido en su base de datos a ontologías de dominio específico para la Web Semántica (Web 3.0) y por lo tanto, “Wikipedia 3.0”.

Respuesta a Comentarios de los Lectores
Mi argumento es que Wikipedia actualmente ya cuenta con los recursos de voluntarios para producir las ontologías para cada uno de los dominios de conocimiento que actualmente cubre y que la Web Semántica tanto necesita, mientras que Google no cuenta con tales recursos, por lo que dependería de Wikipedia.

Las ontologías junto con toda la información de la Web, podrán ser accedidas por Google y los demás pero será Wikipedia quien quede a cargo de tales ontologías debido a que actualmente Wikipedia ya cubre una enorme cantidad de dominios de conocimiento y es ahí donde veo el cambio en el poder.

Ni Google ni las otras compañías posee el recurso humano (los miles de voluntarios con que cuenta Wikipedia) necesario para crear las ontologías para todos los dominios de conocimiento que Wikipedia ya cubre. Wikipedia si cuenta con tales recursos y además esta posicionada de forma tal que puede hacer trabajo mejor y más efectivo que cualquier otro. Es difícil concebir como Google lograría crear dichas ontologías (que crecen constantemente tanto en numero como en tamaño) dado la cantidad de trabajo que se requiere. Wikipedia, en cambio, puede avanzar de forma mucho más rápida gracias a su masiva y dedicada fuerza de voluntarios expertos.

Creo que la ventaja competitiva será para quien controle la creación de ontologías para el mayor numero de dominios de conocimiento (es decir, Wikipedia) y no para quien simplemente acceda a ellas (es decir, Google).

Existen muchos dominios de conocimiento que Wikipedia todavía no cubre. En esto Google tendría una oportunidad pero solamente si las personas y organizaciones que producen la información hicieran también sus propias ontologías, tal que Google pudiera acceder a ellas a través de su futuro motor de Web Semántica. Soy de la opinión que esto será así en el futuro pero que sucederá poco a poco y que Wikipedia puede tener listas las ontologías para todos los dominios de conocimiento con que ya cuenta mucho más rápido además de contar con la enorme ventaja de que ellos estarían a cargo de esas ontologías (la capa básica para permitir la IA).

Todavía no esta claro, por supuesto, si la combinación de Wikipedia con la Web Semántica anuncia el fin de Google o el fin del principio. Como ya mencioné en el artículo original. Me parece que es la última opción, y que la pregunta que titula de este post, bajo el presente contexto, es meramente retórica. Sin embargo, podría equivocarme en mi juicio y puede que Google de paso a Wikipedia como la maquina definitiva de respuestas mundial.

Después de todo, Wikipedia cuenta con “nosotros”. Google no. Wikipedia deriva su de poder de “nosotros”. Google deriva su poder de su tecnología y su inflado precio de mercado. ¿Con quien contarías para cambiar el mundo?

Respuesta a Preguntas Básicas por parte de los Lectores
El lector divotdave formulá unas cuantas preguntas que me parecen de naturaleza básica (es decir, importante). Creo que más personas se estarán preguntando las mismas cuestiones por lo que las incluyo con sus respectivas respuestas.

Pregunta:
¿Como distinguir entre buena y mala información? Como determinar que partes del conocimiento humano aceptar y que parte rechazar?

Respuesta:
No es necesario distinguir entre buena y mala información (que no ha de confundirse con bien-formada vs. mal-formada) si se utiliza una fuente de información confiable (con ontologías confiables asociadas). Es decir, si la información o conocimiento que se busca se puede derivar de Wikipedia 3.0, entonces se asume que la información es confiable.

Sin embargo, con respecto a como conectar los puntos al devolver información o deducir respuestas del inmenso mar de información que va más allá de Wikipedia, entonces la pregunta se vuelve muy relevante. Como se podría distinguir la buena información de la mala de forma que se pueda producir buen conocimiento (es decir, comprender información o nueva información producida a través del razonamiento deductivo basado en la información existente).

Pregunta:
Quien, o qué según sea el caso, determina que información es irrelevante para mí como usuario final?

Respuesta:
Esta es una buena pregunta que debe ser respondida por los investigadores que trabajan en los motores IA para la Web 3.0.

Será necesario hacer ciertas suposiciones sobre que es lo que se está preguntando. De la misma forma en que tuve que suponer ciertas cosas sobre lo que realmente me estabas preguntando al leer tu pregunta, también lo tendrán que hacer los motores IA, basados en un proceso cognitivo muy similar al nuestro, lo cual es tema para otro post, pero que ha sido estudiado por muchos investigadores IA.

Pregunta:
¿Significa esto en última instancia que emergerá un todopoderoso* estándar al cual toda la humanidad tendrá que adherirse (por falta de información alternativa)?

Respuesta:
No existe la necesidad de un estándar, excepto referente al lenguaje en el que se escribirán las ontologías (es decir, OWL, OWL-DL. OWL Full, etc.). Los investigadores de la Web Semántica intentan determinar la mejor opción, y la más usable, tomando en consideración el desempeño humano y de las máquinas al construir y –exclusivamente en el último caso- interpretar dichas ontologías.

Dos o más agentes de información que trabajen con la misma ontología especifica de dominio pero con diferente software (diferente motor IA) pueden colaborar entre ellos. El único estándar necesario es el lenguaje de la ontología y las herramientas asociadas de producción.

Anexo

Sobre IA y el Procesamiento del Lenguaje Natural

Me parece que la primera generación de IA que será usada por la Web 3.0 (conocido como Web Semántica) estará basada en motores de inferencia relativamente simples (empleando enfoques tanto algorítmicos como heurísticas) que no intentarán ningún tipo de procesamiento de lenguaje natural. Sin embargo, si mantendrán las capacidades de razonamiento deductivo formal descritas en este articulo.

Sobre el debate acerca de La Naturaleza y Definición de IA

La introducción de la IA en el ciber-espacio se hará en primer lugar con motores de inferencia (usando algoritmos y heurística) que colaboren de manera similar al P2P y que utilicen ontologías estándar. La interacción paralela entre cientos de millones de Agentes IA ejecutándose dentro de motores P2P de IA en las PCs de los usuarios dará cabida al complejo comportamiento del futuro cerebro global.

ViRAL Text

In Uncategorized on July 12, 2006 at 11:11 am

Get Your DBin

In Uncategorized on July 12, 2006 at 9:06 am

Upon very quick glance, DBin seems to be about people (or rather ‘domain experts’) building the semantic annotations (informal ontologies), inference rules and query structures. The last three pieces I thought would be specified by the inference engine vendors but I believe that DBin let’s any person who qualifies as a domain expert add value!

Related

  1. P2P 3.0: The People’s Google

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Google, ontology, Semanticweb, Web 3.0, Web 3.0, Wikipedia, Wikipedia 3.0, Ontoworld, OWL-DL, OWL, DBin, Semantic MediaWiki, P2P 3.0

Semantic MediaWiki

In Uncategorized on July 12, 2006 at 6:01 am

What is it?

Semantic MediaWiki is an ongoing open source project to develop a Semantic Wiki Engine.

In other words, it is one of the impportant early innovations leading up to the Wikipedia 3.0 (Web 3.0) vision.

  • The project and software is called "Semantic MediaWiki"
  • ontoworld.org is just one site using the technology
  • Wikipedia might become another site using the technology 

Update

The hosting of the Semantic Mediawiki, i.e. the Web 3.0 version of of Wikipedia’s platform, has been taken over by Wikia, a commercial venture founded by Wikiepdia’s own founder Jimmy Wales. This opens up a huge conflict of interest, which is, namely, the fact that Wikipedia’s founder is running a commercial venture that takes creative improvements to Wikipedia’s platform, e.g. Semantic Mediawiki, and transfer those improvements to Wikia, Jimmy Wales’ own commercial for-profit venture.

Related

  1. Wikipedia 3.0: The End of Google?
  2. Web 3.0: Basic Concepts
  3. P2P 3.0: The People’s Google
  4. Semantic MediaWiki project website (as noted in the Update, Semantic Media Wiki hosting has been taken over by Wikipedia’s founder Jimmy Wales’ commercial venture Wikia…)

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Evolution, Google, ontology, Semanticweb, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Wikipedia, Wikipedia 3.0, Ontoworld, OWL-DL, Semantic MediaWiki, P2P 3.0

The People’s Google

In Uncategorized on July 11, 2006 at 10:16 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

/*

This is a follow-up to the Wikipedia 3.0 article.

See this article for a more disruptive ‘decentralized kowledgebase’ version of the model discussed here.

Also see this non-Web3.0 version: P2P to Destroy Google, Yahoo, eBay et al

Web 3.0 Developers:

Feb 5, ‘07: The following reference should provide some context regarding the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0) but there are better, simpler ways of doing it.

  1. Description Logic Programs: Combining Logic Programs with Description Logic

*/

In Web 3.0 (aka Semantic Web), P2P Inference Engines running on millions of users’ PCs and working with standardized domain-specific ontologies (that may be created by entities like Wikipedia and other organizations) using Semantic Web tools will produce an information infrastructure far more powerful than the current infrastructure that Google uses (or any Web 1.0/2.0 search engine for that matter.)

Having the sandardized ontologies and the P2P Semantic Web Inference Engines that work with those ontologies will lead to a more intelligent, “Massively P2P” version of Google.

Therefore, the emergence in Web 3.0 of said P2P Inference Engines combined with standardized domain-specific ontologies will present a major threat to the central “search” engine model.

Basic Web 3.0 Concepts

Knowledge domains

A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.

Information vs Knowledge

To a machine, knowledge is comprehended information (aka new information that is produced via the application of deductive reasoning to exiting information). To a machine, information is only data, until it is reasoned about.

Ontologies

For each domain of human knowledge, an ontology must be constructed, partly by hand and partly with the aid of dialog-driven ontology construction tools.

Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of the Semantic Web, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

Inference Engines

In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies (created as formal or informal ontologies by, say, Wikipedia, as well as others), domain inference rules, and query structures to enable deductive reasoning on the machine level.

Info Agents

Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.

Proofs and Answers

The interesting thing about Info Agents that I did not clarify in the original post is that they will be capable of not only deducing answers from existing information (i.e. generating new information [and gaining knowledge in the process, for those agents with a learning function]) but they will also be able to formally test propositions (represented in some query logic) that are made directly -or implied- by the user.

P2P 3.0 vs Google

If you think of how many processes currently run on all the computers and devices connected to the Internet then that should give you an idea of how many Info Agents can be running at once (as of today), all reasoning collaboratively across the different domains of human knowledge, processing and reasoning about heaps of information, deducing answers and deciding truthfulness or falsehood of user-stated or system-generated propositions.

Web 3.0 will bring with it a shift from centralized search engines to P2P Semantic Web Inference Engines, which will collectively have vastly more deductive power, in both quality and quantity, than Google can ever have (included in this assumption is any future AI-enabled version of Google, as it will not be able to keep up with the power of P2P AI matrix that will be enabled by millions of users running free P2P Semantic Web Inference Engine software on their home PCs.)

Thus, P2P Semantic Web Inference Engines will pose a huge and escalating threat to Google and other search engines and will expectedly do to them what P2P file sharing and BitTorrent did to FTP (central-server file transfer) and centralized file hosting in general (e.g. Amazon’s S3 use of BitTorrent.)

In other words, the coming of P2P Semantic Web Inference Engines, as an integral part of the still-emerging Web 3.0, will threaten to wipe out Google and other existing search engines. It’s hard to imagine how any one company could compete with 2 billion Web users (and counting), all of whom are potential users of the disruptive P2P model described here.

The Future

Currently, Semantic Web (aka Web 3.0) researchers are working out the technology and human resource issues and folks like Tim Berners-Lee, the Noble prize recipient and father of the Web, are battling critics and enlightening minds about the coming semantic web revolution.

In fact, the Semantic Web (aka Web 3.0) has already arrived, and Inference Engines are working with prototypical ontologies, but this effort is a massive one, which is why I was suggesting that its most likely enabler will be a social, collaborative movement such as Wikipedia, which has the human resources (in the form of the thousands of knowledgeable volunteers) to help create the ontologies (most likely as informal ontologies based on semantic annotations) that, when combined with inference rules for each domain of knowledge and the query structures for the particular schema, enable deductive reasoning at the machine level.

Addendum

On AI and Natural Language Processing

I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.

Related

  1. Wikipedia 3.0: The End of Google?
  2. Intelligence (Not Content) is King in Web 3.0
  3. Get Your DBin
  4. All About Web 3.0

Tags:

Semantic Web, Web strandards, Trends, OWL, Googleinference engine, AI, ontologyWeb 2.0, Web 3.0AI, Wikipedia, Wikipedia 3.0, collective consciousness, Ontoworld, AI Engine, OWL-DL, Semantic MediaWiki, P2P 3.0

The Future of Governance

In Uncategorized on July 10, 2006 at 8:06 pm

Please see Unwisdom of Crowds for an intro into this piece.

Future of Governance

My basic assumption is that the process of governing human societies in cyberspace will ultimately go back to the classical model we have today in the Western world. It may take 10, 20 or 50 years of experimenting with but I believe we will come full circle to what we have today.

I believe that the core governance process that is our democratic process (which is in essence the same basic idea as that invented by the Greeks, with several important innovations built on top of it) is immune to innovation in the short range. This belief applies to our core governance process now and at any given time, i.e. it will always be immune to innovation in the short range. Change in any process that is fundamental to our existence tends to happen every so many thousand years. Our system today is not that different from the system the Greek invented thousands of years ago.

Many would disagree, but I personally don’t believe that we will be successful in changing the core process that is the current process we have today. If we did, we would have had a new governance system every 50 years.

I do believe that we can innovate on top of what we today and ultimately change the system.

Is it possible that we’re ready for a revolution?

Or should we try to evolve what we have?

It’s really hard to answer that.

  1. Unwisdom of Crowds
  2. Open Source Your Mind
  3. Self-Aware e-Society

Tags:

Trends, wisdom of crowds, tagging, Startup, mass psychology, governance, cult psychology, Web 2.0, Web 2.0, digg, censorship, democracy, P2P, P2P 2.0, social bookmarking, social networking, Web 2.5, hierarchy

Is Google a Monopoly? (Updated September 16, 2009)

In Uncategorized on July 10, 2006 at 6:13 am

Author: Marc Fawzi
License: Attribution-NonCommercial-ShareAlike 3.0

Article

(first published in July 2006 and updated in September 2009)

Given the growing feeling that Google holds too much power over the future of the Web, without any proof that they can use that power wisely, and with sufficient proof to the contrary1, it’s easy to see why some of us are growing increasingly worried about Google’s continued drive to embed itself in all aspects of our lives.

In the software industry, economies of scale do not derive as much from production capacity as from the size of the installed user base, and that’s because software is made of electrical pulses (or bits) that can be replicated and downloaded by the users, at a relatively very small cost to the producer. This means that the size of the installed user base replaces production capacity in classical economic terms.

So far Google has managed to build a dominant market share in search based mostly on the strength of its technology, not by leveraging an installed user base as Microsoft had done with desktop applications.

However, this is changing as Google extends its huge presence on the Web to the desktop (search: Google Chrome) and mobile phones (search: “Google Android”), a move that should allow it to dominate almost every application category on the Web, desktop and mobile phones.

While Google’s leveraging of its humongous user base on the Web to create this advantage is lawful, it is unfair, with the consequence being Google’s domination of the mobile phone, desktop and Web applications and search markets, which is sure to stifle innovation across the board and make it even harder for smaller companies to compete against Google.

Theoretically speaking, the patent system is designed to enable companies of all sizes to carve out new niches to themselves. However, obtaining patents can be a very costly and prolonged process and small companies often get their inventions copied and co-opted by bigger players like Google, Microsoft, etc. In fact, in the Microsoft dominated era, very few companies succeeded in suing them for patent infringement. I happen to know of one small software company and their CEO who succeeded in suing and then settling with Microsoft for millions. But that’s a rare exception to a common rule: the one with the deeper pockets always has the advantage in court (they can drag the lawsuit for years and make it too costly for others to sue them.)

So for  small companies competing against Google , it’s not any better or worse than it used to be under the Microsoft monopoly. But for us the people it’s much worse because what is at stake now is much bigger. It’s no longer about our PCs and LANs, it’s about our online economy.

Unchecked monopolies, even when “lawful,” create too much dependency on single sources, which reduces the number of choices we have and exposes our economy to the risk of failure in the long run. After all, strength and resiliency come from ‘inter-dependent peers’ (think: billions of us trading goods and services with each other without any middlemen) not from the the few giant corporations that hold power over billions of us and control our economy.

If the Internet proved anything, it is that we, the people, can have everything we need without the profit-driven –and often morally suspect– giant corporations.

Cash-strapped governments of the world should try and extract billions of dollars in anti-trust fines from these so-called “lawful” monopolies and then feed all those billions of dollars downstream in the form of better public services and zero-interest loans to entrepreneurs.

Otherwise, the governments are pretty much useless, as the giant corporations continue to grow in power and shape our world, for the worse, with their profit driven focus and lack of moral principles.1

It’s time to abandon the old thinking about capitalism as we have it today being great and take a fresh look at the flawed version of capitalism that we’ve created or, else, we’re bound to end up at the mercy of a few giant corporations that control all or most aspects of our lives, including our freedom (or whether we have it or not), which is the case when companies like Google enforce a policy on people that the people had not agreed to. An example is the policy of “site blocking” that Google is forcing on site owners, without site owners having agreed to it. In other words, Google is coming up with its own law (in the form of their policies) and law enforcement for the Web (in the form of enforcing those policies without agreement by the party the policy is forced upon. 1)

Time to wake up to the real game of monopoly.

1. What leaps to mind as far as Google’s lack of wisdom is their cooperation with the Chinese government in oppressing the already-oppressed (see: Google Chinese censorship.) More recently, Google’s shareholders, on advice from Google’s Board of Directors, have voted against two proposals that would have compelled Google to change its human rights policies (for the better.) Even more recently, Google (and Firefox, which is largely funded by Google), Apple, and others have implemented a feature in their respective browsers that detects and filters out malicious sites based on what Google crawlers detect and what is reported on StopBarWare.org. The first part of the problem is that in both cases, whether malicious code was detected by Google crawlers or reported by some 3rd party to StopBadWare.org, Google is the main authority in deciding which site is malicious, for all browsers from Google, Firefox and Apple (and possibly others.) This means that web site owners whose sites had been injected with malicious code by hackers are at the mercy of Google’s review process which may not resolve (with the removal of the site from the list of malicious sites) for many hours or even days after the site owner has removed the malicious code. This holds the site owners hostage to Google. The second part of the problem is that the site owners do not have a choice as far as what browser their users use, and, therefore, Google’s site blocking policy is being forced on them, without their agreement. The problem in its two parts is that Google is establishing the law and enforcing it.

Related

  1. Beyond Google: The P2P Economy
  2. Still No. 1 Blog for “Google Monopoly”
  3. Wikipedia 3.0: The End of Google?

Also Related

  1. Towards a World-Wide Mesh
  2. People-Hosted “P2P” version of Wikipedia
  3. The People’s Google

Open Source Your Mind

In Uncategorized on July 9, 2006 at 3:03 pm

Any idea that you come up with that can bring a lot of power to someone and is realistic enough to attempt will inevitably get built by someone.

It doesn’t matter that you thought of it first. So it’s better to put your ideas out there in the open, be them good ideas like Wikipedia 3.0, P2P 3.0 (The People’s Google) and Google GoodSense or “potentially” concern-causing ones like the Tagging People in the Real World and the e-Society ideas.

In today’s world, if anyone can think of a powerful idea that is realistic enough to attempt then chances are someone is already working on it or someone will be working on it within months.

Therefore, it is wise to get both good and potentially concern-causing ideas out there and let people be aware of them so that the good ones like the vision for Wikipedia 3.0 and the debate about the ‘Unwisdom of Crowds‘ can be of benefit to all and so that potentially concern-causing ones like the Tagging People in the Real World and the e-Society ideas can be debated in the open.

It is in a way similar to the one aspect of the patent system. If someone comes up with the cure to cancer or with an important new technology then we, as a society, would want them to describe how it’s made or how it works so we can be sure we have access to it. However, given the availability of blogs and the connectivity we have today, wise innovators, including those in the open source movement, are putting their deas out there in the open so that society as a whole may learn about them, debate them, and decide whether to embrace them, fight them or do something in between (moderate their effect.)

For some, it can be a lot of fun, especially the unpredictability element.

So open source your blue sky vision and let the world here about it.

And for the potentially concern-causing ideas, it’s better to bring them out in the open than to work on them (or risk others working on them) in the dark.

In other words, open source your mind.

Tags:
Trends, wisdom of crowds, tagging, Web 2.0, Web 2.0, digg, censorship, democracy, P2P, P2P 2.0, e-society, unwisdom of crowds, Web 3.0, Web 3.0, ai, P2P AI, Wikipedia 3.0, Wikipedia, Semantic Web, semantic web, world hunger, Google AdSense, Open Source, open source your mind

Self-Aware e-Society

In Uncategorized on July 9, 2006 at 9:20 am

(this post was refreshed on Jul 16, ‘08.)

A Self-Aware Society

In this post we discuss the idea of a pattern-recognizing neural network that sits on top of a P2P network and learns to recognize and predict communication, social, cultural, political and transactional patterns [generated by the users] across the system. This idea is to enable the detection of the emergence of negative patterns (such as speculative market bubbles or the emergence of cult-like behavior) and thus enable us to better manage society.

The idea is to use the P2P clients as a way to pull into the neural network the communication, social, cultural, game-playing and transactional behavioral data generated by (or from) the users across the said e-society (where the neural network itself would be separate from its P2P input layer.) The neural network would then be able to recognize patterns so that it can send alerts when they emerge again (or outside of simulation.)

Obviously, there are limits on the types of patterns (and trends) that can be learned as well as limits on the accuracy of pattern recognition and trend prediction.

However, the potential is immense.

Self-Aware e-Society vs Prediction Markets

Prediction markets are mostly based on wisdom of crowds. They are simulations in which people make individual judgments and their judgments are averaged to produce the prediction (or the crowd judgment). There are types of prediction markets where people buy and sell and the system makes the prediction (or crowd judgment based on the buy-sell decisions which represent judgments)

However, I am not aware of any prediction market that can recognize and predict emergence of patterns in people’s implied or explicit judgments as they relate to a given company stock, product, idea or person. These patterns are extracted from the users’ communication, social, cultural, game-playing and transactional data (including inferred data) which are captured from virtual stock markets, virtual auctions, chat rooms (where a hierarchy can exist: e.g. founder of room, operators, favored participants, participants, and unwanted participants), social applications and entertainment applications (including multi-player online games.)

By having people buy and sell stocks and products, vote about certain ideas or individuals, communicate in both flat and hierarchy-enabled chat rooms, and play multi-player games, we can extract the distribution curves of the statistical aggregates of complex individual judgments about a stock, product, idea or person and then transform those curves into a pattern that can be fed into the neural network and teach the network to recognize the pattern as well as associate it with a learned behavior (e.g. market bubble, hype, fame, cult behavior, racism, rebellion, etc.)

People supply complex individual implicit or explicit judgments in true-to-life simulations that generate patterns of judgments across society or across groups within the society which can then be taught and recognized (for those patterns that relate to a phenomenon like speculative market bubbles, emergence of cults, etc) by the neural network monitoring this live e-society.

Governments and politicians will be able to use such live (made of people), self-aware e-society to simulate the outcome of critical political decisions on society before they make those decisions in their own, real society.

This relates to governance in another way: the e-society by being aware of negative patterns emerging within it can flag and alert the leaders of the e-society so that they may try steer society away from trouble.

I believe it is the next level in prediction markets. The key difference with respect to prediction markets, is that a self-aware e-society will be able to capture, recognize and predict the emergence of behavioral patterns that happen within it as opposed to simply predicting single-valued outcomes and ranges (without the ability to recognize and predict the patterns that could lead to those outcomes.) In other words, a self-aware e-society can predict the outcome of prediction markets running within it before prediction markets can make that prediction. That means that (given the ability to pre-predict and thus potentially avoid bad outcomes) the prediction markets can be real markets and not just simulations. So it would seem that the e-society application described here could run on top of society itself (i.e. no need for simulation.)

In other words, a self-aware e-society would act as a predictive governance tool for society itself.

Conclusion

I realize that it does sound very futuristic, but the idea is ‘technically’ compatible with the democratic governance ideals I had proposed for Web 2.0. In other words, in the ideal usage scenario, it should not supplant them. It should help society by monitoring it for dangerous trends so that the problems that would normally happen could be diffused.

Think it’s sci-fi? It can be put together with existing technologies and expertise.

The implications of this idea extend to areas such as national security, economic security, cultural phenomenon, political science, mass psychology and sociology.

But is it good or bad?

Any idea that can deliver a lot of power to someone and is realistic enough to be attempted will inevitably be developed by someone somewhere. So it’s better to put these ideas (be them good like the Wikipedia 3.0/Web 3.0 idea or potentially concern-causing like the Tagging idea or this idea) out there in the open and let people be aware of them and debate them.

Response to Readers’ Comments

Question:
Ian Delaney wrote: I wonder if machines are up to the job of identifying negative cults. After all, human judges seem to make a lot of very bad mistakes.

Response:
The leaders of society will still be the ones who would make the judgment. The machine is a predictive tool to help society avoid the emergence of negative patterns (e.g. emergence of speculative market bubbles or emergence of cults, hype, etc) It is the people who make the judgments, through their democratically elected leaders. The machine provides a cognitive layer below that.

Related

  1. Open Source Your Mind
  2. Tagging People in the Real World

Beats

  1. Soulenoid (Scream at the right time)

Tags:

Trends, wisdom of crowds, Startup, mass psychology, cult psychology, Web 2.0, Web 2.0, democracy, P2P, P2P 2.0, social networking, Web 2.5, governance, Internet governance, pattern recognition, non-linear feedback loop, neural network, prediction markets, e-society, national security, economy, political science, cultural phenomenon, AOL, NSA, wiretapping, civil liberties

Unwisdom of Crowds

In Uncategorized on July 7, 2006 at 8:15 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

A Crowd Has No Wisdom

Before we make this argument, let’s define the types of crowds.

{The designations of ‘condensed’ and ‘dispersed’ given below for crowds are relative to the ability of the members of the crowd to communicate with each other and affect each other’s judgment.

The word “crowd” is used here to mean a large group of people, not 5 or 10 people but thousands or millions of people.}

A dispersed crowd (without a formal hierarchy) will produce averaged judgment. For example, asking each of 200 people (not at the same time or place) how many jelly beans are in a jar would result in an averaged judgment, which would eliminate values that are too high or two low, resulting in an estimate of the number of jelly beans in the jar (which is a measurable value) that is close to the actual value. In this case the crowd is nothing more than a decent statistical calculator. It has not exhibited any more wisdom than the tool it is being used as.

A condensed crowd (without a formal hierarchy) may produce averaged or lowest-common-denominator judgment, depending on whether or not its judgment is rationally or psychologically driven. In case the judgment is about a measurable value it would most likely be rationally driven, and, thus, be an averaged judgment. In case the judgment is about a quality it would most likely be psychologically driven, and thus, be a lowest-common-denominator judgment. In the rational case, the assumption is that, even though the crowd’s members can communicate with and affect each other’s judgment, if each member is rational enough and the judgment to be made concerns a measurable value then the crowd will likely produce an averaged judgment (i.e. the average of independent judgments.) If, however, the crowd members affect each other’s judgment (which would happen mostly in the case of judgments about quality rather than judgments about a measurable value, i.e. when reason is suspended and psychology takes over) then the crowd’s judgment will tend towards the lowest common denominator.

A typical crowd is a mix of both the dispersed and condensed crowds. Thus, its range of judgment with respect to both measurable value and quality include both averaged as well as lowest-common-denominator judgments.

The problem with averaged judgment when it’s applied to quality (rather than measurable value), which can happen in a typical crowd, is that you end up with a judgment of average quality, not the best judgment.

The problem with lowest-common-denominator judgment when it’s applied to quality is that it uses the primitive part of our psychology. In other words, expect exactly the opposite of wisdom.

So when it comes to quality, a typical crowd is going to be either a judge of average quality or an unwise judge. And nothing else.

Where does that leave the ‘Wisdom of Crowds’ movement? (in the garbage bin of history in my candid opinion.)

Toward a Democratic Society

A hierarchy that doesn’t listen to the crowd (or that forces and manipulates the crowd to listen to it) is a dictatorship (e.g. North Korea, Iran, the 3rd Reich, etc.)

However, mixed ‘hierarchical + crowd’ system, which ideally allows the crowd to adjusts the judgment (of the system), is a democracy.

Therefore, Web 2.0’s [un]wisdom-of-crowds model needs to be fixed by adding the concept of a non-arbirary hierarchy that is by the crowd (or people) and for the crowd (or people.)

Below is one example, using ‘digg’ as the Web 2.0 application, that shows a prototypical transformation from Web 2.0 to Web 2.5 (or from “hunter gatherer” to “democratic society.”)

Electing Leaders in a Democracy: Building the System

In an application like digg (or the “digg killer” to be exact) writers, content producers, social figures, business figures, and others, who are higher in the food chain than the consumer, and who are collectively referred to herein as ‘taste makers’, should be allowed to start their own channel (or page) where they list links they think are cool. If enough people ‘bookmark’ a given page then that means that the taste-maker in question is worthy of being positioned into the system’s hierarchy at a higher level than that of the consumer. The taste-makers can then rally their followers (those who use them as taste-makers) to digg the links the taste maker has chosen to put on his/her page.

This is similar to parliamentary democracy where members of the parliament have to get enough votes on a given issue from their district in order to pass it into law.

The key here is that the ‘trusted’ taste-makers get to decide which links to promote for votes from their followers.

At the same time, people in the crowd should be able to vote the taste-makers in or out of the system’s hierarchical structure by bookmarking or un-bookmarking their page.

Anyone who has followers can become a taste-maker, but they would have to replace an existing taste-maker as the system has a finite hierarchy with finite number of taste-maker positions (e.g. in the thousands.) And once someone is elected as a taste-maker they would stay in the role for a certain period before they can be voted in or out of the position by their followers (assuming another contender has nominated himself/herself for the position.)

This is a very simple ‘hierarchical + crowd’ system that implements a very simple form of leader-follower democratic process.

The perils of letting the crowd decide without giving them a democratic structure and process is to let lowest-common-denominator and averaged judgments become the norm.

Leaders and Crowds need to work together within a democratic structure and process to assure the best judgment possible.

BTW, this is not much different than the process whereby the crowd selects its taste-makers (e.g. Radio DJs, Wise men, etc.) except this provides a structure to formalize the process, which would be too costly and time-consuming in the real world. So may be this would also apply to how society elects its taste makers (outside of social bookmarking sites.)

The reason this system would kill digg is because it will have an aggregate quality of judgment so much better than digg.

Related

  1. Web 2.0: Back to The Hunter Gatherer Society
  2. The Future of Governance

Tags:

Trends, wisdom of crowds, tagging, Startup, mass psychology, Google, cult psychology, Web 2.0, Web 2.0, digg, censorship, democracy, P2P, P2P 2.0, social bookmarking, social networking, Web 2.5

P2P AI Engines To Challenge Google in Web 3.0

In Uncategorized on July 6, 2006 at 9:57 am

This is a note (in case you missed it) about how in Web 3.0 (aka Semantic Web) P2P AI Engines running on users’ machine and working with standardized domain-specific ontologies will challenge Google’s dominance.

P2P AI Engines will challenge Google and as well as any future AI-enabled version of Google.

Read more

Related

  1. The People’s Google
  2. Wikipedia 3.0: The End of Google?

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Startup, Evolution, Google,inference engine, AI, ontology, Semanticweb, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0AI Engine, Danny Hillis, William Gibson, Thinking Machines, cellular automata, OWL-DL, AI Engine, The Matrix, AI Matrix, Global Brain

Digg Killer

In Uncategorized on July 6, 2006 at 4:29 am

The Google Crowd (has hierarchy)

In Uncategorized on July 5, 2006 at 1:06 am

For the context of this article, please see:

Unwisdom of Crowds

Last updated: 12/07/2008

-

Please see Ian Delaney’s well-written set of counter arguments at TwoPointTouch and the discussion that emerged under his comments section.

My reply to Ian’s argument re: Google’s PageRank being an implementation of the ‘wisdom of crowds’ model is that Google does not let the crowd judge the worthiness of a given link. It let’s the writers, bloggers like Ian, myself, e-zines, news publishers, organizations, etc, i.e. the tastemakers in society (or the producers), who are linked to by many others, judge what is good and what is not. This is distinctly different from letting those who simply consume make the judgment. In the food chain, the producer or tastemaker comes before the consumer. That represents a non-arbitrary hierarchy on the level of the society that does not exist within a crowd. Thus, on the level of the society, the Google model does not rely on the wisdom of the ‘crowd’ but the wisdom of tastemakers and producers.

One important thing to note about the precdeding argument is that it’s not any arbitrary producers that make up the ‘tastemakers’ layer (or crowd) within the hierarchy of society. The producers whose links to sites representing a given field (e.g. arts, music, science, etc) get valued higher by Google are those producers who have many people linking to them (i.e. other producers), which, if you follow the chain of links, leads us eventually to the first producers that appeared on the Web to write about that field, who had the time and leverage to build credibility among other tastemakers. So it’s the early adopters (for each given field), who tend to be the real tastemakers and leaders, who are the highest value producers, that determine who the high-value producers are. Having said that, high-value producers could appear out of nowhere. Such newcomers would get recognized as being high-value producers by receiving many incoming links from their peers.

Obviously, Google’s algorithm is more complex and robust than described above, but the purpose here is to show how Google’s PageRank is based on the averaged or lower-common-denominator judgment of the tastemakers layer of society (which itself is a crowd) rather than the averaged or lower-common-denominator judgment of an arbitrary crowd.

The wisdom of a crowd (or lack thereof), in the case of the tastemakers layer of society, is going to result in lowest-common-denominator only if their indivdiual judgments are lumped together (as digg does with the judgment of its users.)

In a mixed ‘hierarchical + crowd’ system the individual judgments of the taste-makers can be seen by members of the crowd. The lumping together of individual judgments is what creates a crowd.

Thus, in a mixed ‘hierarchical + crowd’ system the taste makers are bound to exist as both unwise crowds as well as wise individuals.

A crowd can never be as wise as its wisest member or as foolish as its most foolish member.

Related

  1. Unwisdom of Crowds
  2. For Great Justice, Take Off Every Digg
  3. Digg This! 55,500 hits in ~4 Days
  4. Web 2.0: Back to the Hunter Gatherer Society

Tags:
Trends, wisdom of crowds, tagging, Startup, mass psychology, Google, cult psychology, Web 2.0, Web 2.0, digg, censorship

The Geek VC Fund Project: 7/02 Update

In Uncategorized on July 2, 2006 at 9:06 am

This post is an update to the original post about the Geek-Run, Geek-Funded Venture Capital Fund.

  1. The idea has gotten a fantastic reception.
  2. We’ve built a core team of experienced individuals that is working on the concept.
  3. We plan on gathering input from potential investors and entrepreneurs in the near future.
  4. We plan on announcing the location of our virtual collaboration space in the near future.
  5. If you’ve just joined us you may wish to add your feedback (see Comments)

More to come …

As always, feel free to contact me via email.

Tags:

Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, funding, private equity, geek, seed funding, early stage, Startup

Digg This! 55,500 hits in ~4 Days

In Uncategorized on July 2, 2006 at 5:22 am

/*

(this post was last updated at 10:30am EST, July 3, ‘06, GMT +5)

This post is a follow up to the previous post For Great Justice, Take Off Every Digg

According to Alexa.com, the total penetration of the Wikipedia 3.0 article was ~2 million readers (who must have read it on other websites that copied the article)

*/

EDIT: I looked at the graph and did the math again, and as far as I can tell it’s “55,500 in ~4 days” not “55,000 in 5 days.” So that’s 13,875 page views per each day.

Stats (approx.) for the “Wikipedia 3.0: The End of Google?” and “For Great Justice, Take Off Every Digg articles:

These are to the best of my memory from each of the first ~4 days as verified by the graph.

33,000 page views in day 1 (the first wave)

* day 1 is almost one and a half columns on the graph not one because I posted it at ~5:00am and the day (in WordPress time zone) ends at 8pm, so the first column is only ~ 15 hours.

9,500 page views in day 2

5,000 page views in day 3

8,000 page views in day 4 (the second wave)

Total: 55,500 in ~4 days which is 13,875 page views per day (not server hits) for ~4 days. Now on the 7th day the traffic is expected to be ~1000 page views, unless I get another small spike. That’s a pretty good double-dipping long tail. If you’ve done better with digg let me know how you did it! :)

Experiment

This post is a follow-up to my previous article on digg, where I explained how I had experimented and succeeded in generating 45,000 visits to an article I wrote in the first 3 days of its release (40,000 of which came directly from digg.)

I had posted an article on digg about a bold but well-thought out vision of the future, involving Google and Wikipedia, with the sensational title of “Wikipedia 3.0: The End of Google?” (which may turn out after all to be a realistic proposition.)

Since my previous article on digg I’ve found out that digg did not ban my IP address. They had deleted my account due to multiple submissions. So I was able to get back with a new user account and try another the experiment: I submitted “AI Matrix vs Google” and “Web 3.0 vs Google” as two separate links for one article (which has since been given the final title of “Web 3.0: Basic Concepts

Results

Neither ’sensational’ title worked.

Analysis

I tried to rationalize what happened …

I figured that the crowd wanted a showdown between two major cults (e.g the Google fans and the Wikipedia fans) and not between Google and some hypothetical entity (e.g. AI Matrix or Web 3.0).

But then I thought about how Valleywag was able to cleverly piggyback on my “Wikipedia 3.0: The End of Google?” article (which had generated all the hype) with an article having the dual title of “Five Reasons Google Will Invent Real AI” on digg and “Five Reasons No One Will Replace Google” on Valleywag.

They used AI in the title and I did the same in the new experiment, so we should both get lots of diggs. They got about 1300 diggs. I got about 3. Why didn’t it work in my case?

The answer is that the crowd is not a logical animal. It’s a psychological animal. It does not make mental connections as we do as individuals (because a crowd is a randomized population that is made up of different people at different times) so it can’t react logically.

Analyzing it from the psychological frame, I concluded that it must have been the Wikipedia fans who “dugg” my original article. The Google fans did “digg” it but not in the same large percentage as the Wikipedia fans.

Valleywag gave the Google fans the relief they needed after my article with its own article in defense of Google. However, when I went at it again with “Matrix AI vs Google” and “Web 3.0 vs Google” the error I made was in not knowing that the part of the crowd that “dugg” my original article were the Wikipedia fans not the Goolge haters. In fact, Google haters are not very well represented on digg. In other words, I found out that “XYZ vs Google” will not work on digg unless XYZ has a large base of fans on digg.

Escape Velocity

The critical threshold in the digg traffic generation process is to get enough diggs quickly enough, after submitting the post, to get the post on digg’s popular page. Once the post is on digg’s popular page both sides (those who like what your post is about and those who will hate you and want to kill you for writing it) will affected by the psychlogical manipulation you design (aka the ‘wave.’) However, the majority of those who will “digg” it will be from the group that likes it. A lesser number of people will “digg” it from the group that hates it.

Double Dipping

I did have a strong second wave when I went out and explained how ridiculous the whole digg process is.

This is how the second wave was created:

I got lots of “diggs” from Wikipedia fans and traffic from both Google and Wikipedia fans for the original article.

Then I wrote a follow up on why “digg sucks” but only got 100 “diggs” for it (because all the digg fans on digg kept ‘burying’ it!) so I did not get much traffic to it from digg fans or digg haters (not that many of the latter on digg.)

The biggest traffic to it came from the bloggers and others who came to see what the all fuss was about as far as the original article. I had linked to the follow up article (on why I thought digg sucked) from the original article (i.e. like chaining magnets) so when people came to see what the fuss was all about with respect to the original article they were also told to check out the “digg sucks” article for context.

That worked! The original and second waves, which both had a long tail (see below) generated a total of 55,500 hits in ~4 days. That’s 13,875 page views a day for the first ~4 days.

Long Tail vs Sting

I know that some very observant bloggers have said that digg can only produce a sharp, short lived pulse of traffic (or a sting), as opposed to a long tail or a double-dipping long tail, as in my case, but those observations are for posts that are not themselves memes. When you have a meme you get the long tail (or an exponential decay) and when you chain memes as I did (which I guess I could have done faster as the second wave would have been much bigger) then you get a double-dipping long tail as I’m having now.

Today (which is 7 days after the original experiment) the traffic is over 800 hits so far, still on the strength of the original wave and the second wave (note that the flat like I had before the spike represents levels of traffic between ~100 to ~800, so don’t be fooled by the flatness, it’s relative to the scale of the graph.)

In other words, traffic is still going strong from the strength of the long-tail waves generated from the original post and the follow up one.

double

Links

  1. Wikipedia 3.0: The End of Google?
  2. For Great Justice, Take Off Every Digg
  3. Unwisdom of Crowds
  4. Self-Aware e-Society

Tags:
Semantic Web, Web strandards, Trends, wisdom of crowds, tagging, Startup, mass psychology, Google, cult psychology, inference, inference engine, AI, ontology, Semanticweb, Web 2.0, Web 2.0, Web 3.0, Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, collective consciousness, digg, censorship

Web 3.0: Basic Concepts

In Uncategorized on June 30, 2006 at 7:53 am


Notes

You may also wish to see Wikipedia 3.0: The End of Google?, the original ‘Web 3.0/Semantic Web’ article, and P2P 3.0: The People’s Google, a more extensive version of this article that discusses the implication of P2P Semantic Web Engines to Google.

Semantic Web Developers:

Feb 5, ‘07: The following reference should provide some context regarding the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0) but there are better, simpler ways of doing it.

  1. Description Logic Programs: Combining Logic Programs with Description Logic


Article

Semantic Web (aka Web 3.0): Basic Concepts

Basic Web 3.0 Concepts

Knowledge domains

A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.

Information vs Knowledge

To a machine, knowledge is comprehended information (aka new information that is produced via the application of deductive reasoning to exiting information). To a machine, information is only data, until it is reasoned about.

Ontologies

For each domain of human knowledge, an ontology must be constructed, partly by hand and partly with the aid of dialog-driven ontology construction tools.

Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of the Semantic Web, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

Inference Engines

In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies (created as formal or informal ontologies by, say, Wikipedia, as well as others), domain inference rules, and query structures to enable deductive reasoning on the machine level.

Info Agents

Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.

Proofs and Answers

The interesting thing about Info Agents that I did not clarify in the original post is that they will be capable of not only deducing answers from existing information (i.e. generating new information [and gaining knowledge in the process, for those agents with a learning function]) but they will also be able to formally test propositions (represented in some query logic) that are made directly -or implied- by the user.

“The Future Has Arrived But It’s Not Evenly Distributed”

Currently, Semantic Web (aka Web 3.0) researchers are working out the technology and human resource issues and people like Tim Berners-Lee, the Noble prize recipient and father of the Web, are battling critics and enlightening minds about the coming human-machine revolution.

The Semantic Web (aka Web 3.0) has already arrived, and Inference Engines are working with prototypical ontologies, but this effort is a massive one, which is why I was suggesting that its most likely enabler will be a social, collaborative movement such as Wikipedia, which has the human resources (in the form of the thousands of knowledgeable volunteers) to help create the ontologies (most likely as informal ontologies based on semantic annotations) that, when combined with inference rules for each domain of knowledge and the query structures for the particular schema, enable deductive reasoning at the machine level.

Addendum

On AI and Natural Language Processing

I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.

Related

  1. Wikipedia 3.0: The End of Google?
  2. P2P 3.0: The People’s Google
  3. All About Web 3.0
  4. Semantic MediaWiki

Tags:

Semantic Web, Web strandards, Trends, OWL, innovation, Googleinference engine, AI, ontology, Web 2.0Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, collective consciousness, Ontoworld, AI Engine, OWL-DL, AI Engine, AI Matrix, Semantic MediaWiki, P2P

For Great Justice, Take Off Every Digg

In Uncategorized on June 28, 2006 at 8:30 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

/* This article presents the case against the ‘wisdom of crowds’ and explains the background for how the Wikipedia 3.0: The End of Google? article reached over 200,000 hits */

This article explains and demonstrates a conceptual flaw in digg’s service model that causes biased (or rigged) as well as lowest-common-denominator hype to be generated, causing a dumbing down of society (as the crowd). The experimental evidence and logic supplied here apply equally to other Web 2.0 social bookmarking services such as del.icio.us, and netscape beta.

Since digg is an open system where anyone can submit anything, user behavior has to be carefully monitored to make sure that people do not abuse the system. But given that the number of stories submitted each second is much larger than what Digg’s own staff can monitor, digg has given the power to the users to decide what is good content and what is bad (e.g. spam, miscategorized content, lame stuff, etc.)

This “wisdom of crowds” model, which forms the basis for digg, has a basic and major flaw at its foundation, not to mention at least one process and technology related issue in digg’s implementation of the model.

Let’s look at the simple process and technology issue first before we explore the much bigger problem at the heart of the “wisdom of crowds” model. If enough users report a post from a given site as spam then that site’s URL will be banned from digg, even if the site’s owner had no idea someone was submitting links from his site to digg. The fact is that digg cannot tell for sure whether the person submitting the post is the site’s owner or someone else, so their URL banning policy (or algorithm if it’s automated) must make the assumption that the site’s owner is the one submitting the post. But what if someone starts submitting posts from another person’s blog and placing them under the wrong digg categories just to get that person’s blog banned by digg?

This issue can be eliminated by improvements to the process and technology. [You may skipp the rest of this paragraph if you can take my word for it.] For example, instead of banning a given site’s URL right away upon receiving X number of spam reports for posts from that site, the digg admins would put the site’s URL under a temporary ban and attempt to contact the site’s owner and possibly have the site owner click on a link in an email they’d send him/her to capture his/her IP address and compare it to that used by the spammer. If the IP addresses don’t match then they would ban the IP address of the spam submitter, and not the site’s URL. This obviously assumes that digg is able to automatically ban all known public proxy addresses (including known Tor addresses etc) at any given time, to force the users to use their actual IP addresses.

The bigger problem, however, and what I believe to be the deadliest flaw in the digg model is the concept of the wisdom of crowds. Crowds are not wise. Crowds are great as part of a statistical process to determine the perceived numerical value of something that can be quantified. A crowd, in other words, is a decent calculator of subjective quantity, but still just a calculator. You can show a crowd of 200 people a jar filled with jelly beans and ask each how many jelly beans are in the jar. Then you can take the average and that would be the closest value to the actual number of jelly beans. However, if you were to ask a crowd of 200 million to evaluate taste or beauty or whatever subjective quality, e.g. coolness, the averaging process that helps in the case of counting jelly beans (where members of the crowd use reasoning and don’t let others affect their judgment) doesn’t happen in this scenario. What happens instead is that the crowd members (assuming they communicate with each other such that they would affect each others qualitative judgment, or assuming they already share something in common) would converge toward the lowest-common-denominator opinion. The logic for this is that reasoning is used in the case of estimating measurable values, while psychology is used in the case of judging quality. Thus, in the case of evaluating the subjective quality of a post submitted to digg, the crowd has no wisdom: it will always choose the lowest common denominator, whatever that happens to be.

To understand a crowd’s lack of rationality and wisdom, as a phenomenon, consider the following. I had written a post (see link at the end of this article) about the Semantic Web, domain specific knowledge ontologies and Google as seen from a Google-centric view. I went on about how Google, using Semantic Web and an AI-driven inference engine, would eventually develop into an omnipresent intelligence (a global mind) and how that would have far reaching implications etc. The post was titled “Reality as a Service (RaaS): The Case for GWorld.” I submitted it to digg and I believe I got a few diggs and one good comment on it. That’s all. I probably got 500 hits in total on that post, and mostly because I used the word “Gworld” in the title. More than a week after that, I took the same post, the same idea of combining the Semantic Web, domain-specific knowledge ontologies and an AI-driven inference engine but this time I pitted Wikipedia (as the most likely developer of knowledge ontologies) against Google, and posted it with the sensational but quite plausible title “Wikipedia 3.0: The End of Google.” The crowd went wild. I got over 33,000 hits in the first 24 hours. And as of the latest count about 1600 diggs. In fact, my blog on that day (yesterday) beat the #1 blog on WordPress, which is that of ex Microsoft guy Scobleizer. And now I have an idea of how many hits he gets a day! He gets more than 10,000 and less than 25,000. I know because the first 16 hours I was getting hit by massive traffic I managed to get ahead of him with a total of 25,000 hits, but in the last 8 hours of the first 24 hours cycle (for which I’m reporting the stats here) he beat me back to the #1 spot, as I only had 9,000 hits. I stayed at #2 though. Figure 1: June 25 Traffic, the first 16 hours of a 24 hour graph cycle. Traffic ~ 25,000 hits.

The first 16 hours. Traffic from digg = 25,000 hits Figure 2: June 26 Traffic, the last 8 hours of a 24 hour graph cycle. Traffic ~ 8,000 hits. The last 8 hours. Traffic from digg = 8,000 hits

A crowd, not to be confused with individuals (like myself, yourself), aside from being a decent calculator of subjective quantities (like counting jelly beans in a jar) is no smarter than a bull when it comes to judging the intellectual, artistic or philosophical appeal of something. Wave something red in front of it or make a lot of noise and it may notice you. Talk to it or make subtle gestures and you’ll fail to get its attention. Obviously you can have a tame bull or an angry one. An angry one is easier to upset. A crowd is no more than a decent calculator of subjective quantities. It is a tool in that sense and only in that sense. In the context of judging quality, like musical taste or coolness of something, a crowd is neither rational nor wise. It will only respond to the most basic and crude methods of attention grabbing. You can’t grab it’s attention with subtlety or rationality. You have to use psychology, like you would with a bull. As you can see from the graphs of my blog traffic, I’ve proved it. I didn’t just understand it. Social bookmarking systems, and tagging in general, amplifies the intensity of the crowd-as-a-bull behavior by attaching the highest numerical values to the most curde, most raw and the lowest common denominator.

Now all the sudden, when a post gets 100 digs it reaches escape velocity and goes into orbit. The numerical value attached to posts (or the  “diggs”) when it grows fast acts like a bait. People rush to see such posts just as they rushed in tens of thousands to see the “Wikipedia 3.0 vs Google” post. Yet it’s basically the same post as the one I did on GWorld over a week ago that only got a few diggs. There is no comparison between the wisdom and rationality of an individual and that of a crowd. The individual is infinitely wiser and more rational than the crowd.

So these social bookmarking systems need to be based on a more evolved model where individuals have as much say as the crowd. Remember that many failed social ideologies were based on the the idea of favoring the so-called “wisdom of crowds” over individualism. The reason they failed is because collectivist behavior is dumb behavior and individual judgment is the only way forward. We need more individuality in society not less.

Censored by digg

This post was censored by digg’s rating system. However, in a software-enabled rating system, such as digg, reddit, del.icio.us, netscape, etc, there is no way to guarantee that manipulation of the system by its owner does not happen. Please see the Update section below for the explanation and the evidence (in the form of a telling list of censored posts) behind why digg itself, and not just some of its fanatic users, may have been behind the censoring of this post.

Note: a fellow wordpress blogger published a post called Digg’s Ultimate Flow which links to this post. It has not been buried/censored yet (June 29, ‘06, 5:45pm EST). It’s not to be confused with this post. The reason it hasn’t been buried is because it presents no threat to digg. They can sense danger like an animal and I guess I’ve scared them enough to bury/censor my post. The other me-too post that I’ve just mentioned does not smell as scary. It’s really sad that digg and sites like it are feeding the crude animal-like, instinctive, zero-clarity behavior that is the ‘unwisdom’ of crowds.

The truth is that digg and other so-called “social” bookmarking sites do not give us power, they take it away from us. Always. Think. Innovate. Do not follow. But you may want to follow this link to share your view with other digg users for what it’s worth. Correction I’ve just noticed that this blog is ahead of Scobleizer again at #1. I’ve had 7,796 hits since 8:00pm EST, June 28, ‘06 (yesterday.) It’s 8:00pm EST now, on June 29, ‘06.

Related

  1. Wikipedia 3.0: The End of Google?
  2. Unwisdom of Crowds
  3. Reality as a Service (RaaS): The Case for GWorld
  4. Digg This! 55,500 Hits in ~4 Days

Update The following is a snapshot of digg’s BURIED/CENSORED post section as of 4:00am EST, June 29th, ‘06. This post was originally titled “Digg’s Biggest Flaw Discovered.” Note that anything that is perceived as anti-digg, be it a bug report or a serious analysis of digg’s weaknesses, is being censored. Digg’s Biggest Flaw Discovered buried story submitted by evolvingtrends 21 hours 35 minutes ago (via http://evolvingtrends.wordpres…) An actual proof of a major flaw at the foundation of digg’s quality-of-service model category: Programming

Now even CNET wants its stories endorsed by Digg community submitted by aj9702 1 day 17 hours ago (via http://news.com.com/Attack+cod…) Check it out.. CNET which is number 72 on Alexa rankings wants its stories endorsed by the Digg community. They have a digg this link now to their more popular stories. This story links to the news that exploit code is out there for the RRAS exploit announced earlier this month category: Tech Industry News

Dvorak: Understanding Digg and Its Utopian Idealism buried story submitted by kevinmtu 1 day 18 hours ago (via http://www.pcmag.com/article2/…) Dvorak’s PC magazine article on the new version of Digg and its flaws, posing many interesting points.For example, “What would happen to the Digg site if the Bush-supporting minions in the red states, flocked to Digg and actively promoted stories, slammed things they didn’t like, and in the process drove away the libertarian users?” category: Tech Industry News

Pros and Cons of Digg v3 submitted by jobobshishkabob 2 days ago (via http://thenerdnetworks.com/blo…) Well, Digg version 3 got released today. It is really nice and has many great features. But everything has its flaws…. heres a list of pros and cons of the new Digg.com category: Tech Industry News

Easy Digg comment moderation fraud buried story submitted by Pooley 2 days ago (via http://www.davidmcmanus.com/st…) I’ve found a bug in digg.com. A flaw in the way I ‘digg’ a comment, by clicking the thumbs up icon, allows me to mark up a comment multiple times. category: Tech Industry News

Wikipedia 3.0: The End of Google?

In Uncategorized on June 26, 2006 at 5:18 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

Announcements:

Semantic Web Developers:

Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

  1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

Click here for more info and a list of related articles…

Forward

Two years after I published this article it has received over 230,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.

Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.

In August 2009, a little over 3 years after the writing of this landmark article, I wrote an update in response to a query by a journalist, titled Wikipedia 3.0: Three Years Later.

Versión española

Article

(Article was last updated at 10:15am EST, July 3, 2006)

Wikipedia 3.0: The End of Google?

Read the rest of this entry »

From Mediocre to Visionary

In Uncategorized on June 24, 2006 at 4:02 am

This post started out as a conscious attempt to start formalizing and clarifying Superhype: the phenomenon were skillful-but-otherwise-perfectly-mediocre individuals and companies go up in fame and popularity with such tantalizing hype as to leave those who are both much more skillful as well as much less mediocre wondering how they did it.

From the time I was exposed to Superhype I have known that it involves subtlety and design that go beyond what those at the heart of the phenomenon can intellectualize while they’re experiencing it. I’m not talking about Bill Gates or Michael Dell or anyone who consciously, deliberately and thoughtfully plans and executes successful strategies. I’m talking about those who get carried on top of a massive wave of hype because they happen to be in the right place at the right time doing something that can only be described as perfectly mediocre.

You can always seek an area of the market that has awesome hype waves and be ready to surf the biggest hype wave that comes your way. It’s a statistical process, which is not the same as leaving it up to luck. You try to maximize the chances of a major wave coming your way by knowing the market and knowing where to stand at any given time for the biggest wave possible. Or you can be a statistical oddity and get “lucky” without understanding why and how all the sudden you seem to be riding the biggest wave in history, without even having the knowledge to surf it but getting carried on top of it anyway as if on a flying carpet. Speaking of riding major waves without knowing a thing about surfing, it actually happened to me once in Puerto Rico. I was trying to do what the cool kids where doing so like them I stood behind a giant rock waiting for the next wave to hit. Well, as the wave approached they all ran off and left me standing there. The wave carried me for at least 100ft on top of sharp-edged volcanic rock but, fortunately, on a bed of water. When I finally landed people rushed to the scene and started yelling “loco! loco!” … That’s how I learned my Spanish. That is a perfect example of how one could surf the really big waves by chance without having any formal knowledge of surfing. Just be at the right place at the right time and have the courage to explore. The rest is up to the odds.

There are many examples of “Web 2.0″ celebrities (both companies and individuals) who are currently surfing some big waves (pretty much on their behind as I did in Puerto Rico) without any insight on how to properly surf the hype wave they’re riding, yet they seem to be magically levitating above it on a carpet of thin air (again, like I did in Puerto Rico.) When they finally land, and land they will, we’ll all rush to the scene of their landing and yell “loco! loco!”

But for those of us who cannot delegate our success to a statistically odd event (as in being at the right place, the right time and being carried miraculously by a massive wave of hype simply due to curiosity and good luck) we must strive to understand how to find the big hype waves across time and hype space and how to properly surf them.

This is where the discussion changes from simple metaphors to a rigorous analysis of the temporal, social, psychological and power mechanics of hype.

And this is where I have to stop, as I’m in the middle of a learning process in each of those four frames of hype mechanics.

In a future post I’ll try to tackle the “power” frame in the context of the emerging New Web Order.

Tags:

Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, geek, early stage, Startup, hype, Puerto Rico, surfing, waves, market, luck

The “Geek VC Fund” Project: 6/23 Update

In Uncategorized on June 23, 2006 at 5:21 pm

This post is an update to the original post about the Geek-Run, Geek-Funded Venture Capital Fund.

  1. Upon gathering further input from early supporters of the idea the emergent judgment is to host the Fund’s collaboration space on an independent Wiki as the project concerns other audiences (e.g. VCs, lawyers with interest in the subject, partners at boutique banks, academics, and others) besides the grassroots startup audience that is the core constituency.
  2. I will be announcing the location of the collaboration space for this project once it’s ready for general use by the community.
  3. There have been some more comments under the original post. If you’ve just joined us you may want to add your input (see Comments)

More to come …

Tags:

Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, funding, private equity, geek, seed funding, early stage, Startup

The “Geek VC Fund” Project: 6/22 Update

In Uncategorized on June 22, 2006 at 8:33 am

This post is an update to the original post about the Geek-Run, Geek-Funded Venture Capital Fund.

  1. I’m discussing the creation of the fund community’s “idea development” space on top of an existing and growing online “startup” community that is attracting a good mix of business geeks (potential LPs, educators, business leaders) and tech geeks (potential entrepreneurs, developers, technology leaders.) I’ll keep you posted on this.
  2. I’m socializing the vision with several folks from the PE/VC business as well as a couple of folks from major universities (that have student run VC funds) so I may gather further input.
  3. I’ve further defined the vision for the fund based on readers’ early comments, some of which were very helpful. If you’ve just joined us you may want to add your input to the original post’s Comments.

Tags:

Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, funding, private equity, geek, seed funding, early stage, Startup

Who 2.0: Tagging People in the Real World

In Uncategorized on June 21, 2006 at 11:15 pm

/*

Related Topic: Self-Aware e-Society

This post has been updated to include the concept of Standardized Tags.

*/

The Idea

This post presents an ubiquitous, passive method for tagging people in the real world with attributes that describe them (e.g. good, fun, smart, interesting, psycho, unreliable, untrustworthy, etc.. you get the idea) such that people can see how someone they’ve just met has been defined (or categorized) by others.

People will be able to make better choices about whom to associate with and whom to trust, in a real world setting. It allows the Web 2.0 “social networking” paradigm, which is currently confined to the Web, to be experienced in the real world.

From Conception to Production

The “Who 2.0″ idea was suggested by a fellow WordPress blogger who goes by “farlane” in his comment to the “Hunter Gatherer” post, which he made about 3 hours ago [note that this post has been updated since.] The idea is also sort of related to what I had previously wrote in a post on “GWorld (beta)” regarding how objects (or people) within a virtual world may be tracked and identified with the virtual equivalent of the RFID tag. Links to those posts, farlane’s comment and farlane’s “about” page are provided at the end of this post.

The Design

It’s really very simple.

Today, we have camera phones that can take photos with 8 mega pixel resolution. Such camera phones can produce a pretty detailed image of people’s faces (8 mega pixels actually produce a large-sized image with more than enough details.) Simply add facial recognition software which exists today and you have your tagging mechanism. Take a shot of someone’s face, add their name (optional) and tag them with words that describe them. Click send and the image and tags will be sent to a central database. When you you meet someone and you want to find out how others think of them simply take a shot of their face and send that as a query to the central database. The answer you get back would show each word/tag that has been used to define that person and how many people used each given tag. For example, 400 people think I’m funny and fun to be around while 3 think I’m a mutant ninja turtle. Who do you trust? Obviously, you can safely conclude based on the statistics that I’m not a turtle. The tag statistics in this context will help you make a good bet about the character and personality of someone you’ve just met, but it’s you (not the system) that makes the bet.

For example, if you look up my name (or look up my face) with your phone and find out that 10 people thought I was so very boring then you probably wouldn’t want to hang out with me. However, if at the same time 1000 thought I was a fun lovin’ guy then you may want to take your chances and hang out with me. But what you can do to people people can do to you, so I can look up your face or name and hedge my bet based on the tags I get back and the associated tag statistics.

This reminds me of that famous quote by Abraham Lincoln: “You can fool all of the people some of the time, and you can even fool some of the people all the time, but you cannot fool all of the people all of the time.” So while I may not be fun all of the time (e.g. 1%) I’m still fun most of the time (or 99%), so then you can make a safe bet that I’m fun to hang out with. But that’s just a simple example.

It gets more complicated as people describe the 20 or more possible personality and character attributes, some using different words than others. However, it cannot be any worse than following a link on del.icio.us or digg that has been tagged as funny by 1000 other people.

Standardized Tags

Several readers complained that tags are relative, not absolute, so they should not be used to judge people in an absolute way. Well, I never said the system should be used as an absolute measure of people’s personality and character, but since it could be potentially misused in that way, I figured that the tags, which form the basis of the system, should reflect the difference between people’s judgment and what one could statistically define as the “standard” judgment.

Instead of using regular tags the system would allow users to use weighed tags. For example, what I think is funny may be boring to 70% of the people. So my “funny” tag should count less than another person’s “funny” tag if that person has a much more mainstream humor. To teach that to the system, users would go through an online feedback test where random scenes are shown and each user has to tell the system what he/she think of each scene by picking one of the available tags that describe the scene. Once that test has been done with a large enough population of users, those who come out in the middle of the curve for a given tag, e.g. funny, would have the most weight assigned to their use of that tag to describe others (i.e. 1.0) and those on either side of the curve (within the reference range) will have the least weight assigned to their use of that tag to describe others (i.e. 0.0), with each user in between (on both sides of the curve) having his/her use of that tag assigned a weight that is in between the maximum weight and the minimum weight depending on how far they are from the median. This way when 1000 people think that someone is funny the system will add the total weight of their “funny” tags so that if their sense of humor (plural) is tightly distributed around the median then the total score for their combined judgment will be closer to 1000 but if they mostly have odd humor the total score would be much less, e.g. 100. This way if you discover that someone you’ve just met has been described with a “funny” score of 1000 then you can think of it as if 1000 people with “standard” judgment thought that person was funny. In other words, the total score would be given in units of standard judgment.

The statistical technique described here is very simple and it can be more elaborate, e.g. having additional user-behavior-based weighing factors.

A system like the one suggested here will allow us to make good choices (about whom we pick as our friends or business partners) quickly and reliably.

This could lead to a safer, happier and more productive society (Well, at least in theory.)

Links

  1. About farlane (the reader who suggested the idea.)
  2. Farlane’s comment where this idea was suggested.
  3. The “Hunter Gatherer” post that lured farlane to this blog.
  4. The post where tagging of objects (and people) in the [virtual] world was mentioned.

Tags:

Web 2.0, Web 2.0, Where 2.0, Where 2.0, social networking, Trends, Who 2.0, facial recognition, tagging, Startup

Standardized Tagging: Defining “Cool” in Standard Units of Judgment

In Uncategorized on June 21, 2006 at 11:11 pm

Problem

Current tagging systems suffer from one critical drawback: the lack of a judgment standard. In other words, if a photo on flickr is tagged as “cool” by 1000 people then how do you know that you, personally, would find it to be cool? After all, as the saying goes, beauty is in the eye of the beholder.

Solution

One idea to address the intra –or inter– cultural variance in judgment among people is by establishing a statistical standard. In a photo sharing site dedicated to art/photography students that standard would be different than in a site dedicated to a random population of people. However, you would have a rough/vague view of what to expect in terms of a judgment standard based on the description of the site (or the description of its average user.) That assumes (and only “assumes”) that the majority of art/photography students share roughly the same view when it comes to art/photography.

But in a photo sharing site (or any other type of tagging application) that is intended for the general population but where the actual user population is not exactly the general population then you would have no idea of what to expect in terms of a judgment standard. Is it mainstream? Is it semi-mainstream? Is it quasi-mainstream? Is it a mix of highly varrying random views? You couldn’t really tell until you’ve done a statistical analysis of the users’ views.

To do that, a large random sample of users (of a given system) would go through an online feedback test where random scenes are shown and each user has to tell the system what he/she think of each scene by picking one of the available tags that describe the scene. Once that test has been done with a large enough population of users, those who come out in the middle of the curve for a given tag, e.g. “cool,” would have the most weight assigned to their use of that tag to describe others (i.e. 1.0) and those on either side of the curve (within the reference range) will have the least weight assigned to their use of that tag to describe others (i.e. 0.0), with each user in between (on both sides of the curve) having his/her use of that tag assigned a weight that is in between the maximum weight and the minimum weight depending on how far they are from the median. This way when 1000 people think that someone is funny the system will add the total weight of their “cool” tags so that if their sense of what is cool (plural) is tightly distributed around the median then the total score for their combined judgment will be closer to 1000 but if they mostly have odd humor the total score would be much less, e.g. 100. This way if you come across a photo that has been described with a “cool” score of 1000 then you can think of it as if 1000 people with “standard” judgment thought that photo as cool. In other words, the total score would be given in units of standard judgment.

I believe that applications that use tagging should put each user through a taste test so that the system may continuously adjust its “standard” judgment.

The statistical technique described here is very simple and it can be more elaborate, e.g. having additional user-behavior-based weighing factors.

Startup

It’s not only possible but it is essential that a new company emerges to provide a more reliable “tagging engine,” similar to how Google had emerged to provide a more reliable search engine.

The issue that needs to be processed is the interaction of the “tagging” concept (including this variation) with the coming Ontology-driven Semantic Web. May be the interaction would be in the form of an OWL inference engine that understands the current context and usage of tagging? I’m sure that more thoughts on this will emerge down the line.

Tags:

Web 2.0, Web 2.0, tags, tagging, flikr, photo sharing, startup, Tagging Engine

Geek-Run, Geek-Funded Venture Capital Fund

In Uncategorized on June 21, 2006 at 10:00 pm

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

(This post was last updated on June 22, ‘06, taking into consideration early input from the community, which you can find under Comments.)

For a while now, I’ve been toying with the idea of starting a cooperative venture capital fund where smart, sophisticated people (aka business geeks, e.g., ex Technical Directors, ex Chief Architects, ex CTOs, ex CEOs et al) come together to launch ideas into market.

For example, for Web 2.0 ideas, if we could have a large crowd of Web 2.0 business geeks then it would be quite possible to conduct private placements under an SEC securities law safe harbor for non-accredited –but sophisticated– investors (i.e. business geeks who could judge the risk associated with a given venture/idea that’s within their domain of expertise) such that we could seed the fund as a community. Those who participate would then collectively have as much power as any VC.

Obviously, this requires the involvement of a legal counsel who would structure such a cooperative venture capital fund, so that it would comply with securities law and state regulations. Luckily, I have access to lawyers who work for private equity (PE) investors, as well as enlightened, accredited investors who may see the value in supporting it. But this is not about starting yet another VC fund. This is about giving power to the entrepreneurs, just like how the Web has given power to the producer and caused the middleman to adapt and innovate.

The VC industry is another socio-economic structure that will have to undergo a radical rethinking in the years to come (as the newspaper and the music industry are doing now) or risk losing in the long run. In this context, the fund would be a grassroots remaking of the early-stage funding process. In certain cases, the fund would partner with traditional VCs during the later stages of the ideas it launches.

The driving motivation is that a truly cooperative fund could be launched, supported and managed by a community of business geeks and angel investors that is much more grassroots in its makeup, scale and orientation than the current attempts, but still led by folks with experience and track record. The thesis is that such fund could serve the needs of the community on much wider basis.

It’s definitely a good idea to look at what others, such as YCombinator, have done and improve on that, as I see many ways in which such funds can help a lot more entrepreneurs and launch more ideas while retaining quality of ideas and execution.

But we’ll get to work on that as soon as I setup a collaboration space for us to use in developing this “Geek VC Fund” idea.

In the meantime, please feel free to add your feedback under Comments.

Tags:

Web 2.0, Web 2.0venture capital, venture capital, VC, entrepreneur, funding, private equity, YCombinator, geek, seed funding, early stage, Startup

Web 2.0: Back to the “Hunter Gatherer” Society

In Uncategorized on June 20, 2006 at 3:31 am

Author: Marc Fawzi

License: Attribution-NonCommercial-ShareAlike 3.0

~~

Fact: trusted individuals are once again the source of news in a society (bloggers)

Fact: word of mouth is once again how news spreads (viral marketing)

Fact: people once again hunt and gather in a group (del.icio.us)

Fact: people once again group things using words like small, big, happy, sad, funny, food rather than detailed hierarchical structures (tags)

Fact: impulsive consumption (i.e. “hunt and eat” or “click and enjoy”) and impulsive production (i.e. “less initial planning”, e.g. Google’s “betas”) are back in style.

Fact: once again, sharing between people cannot be explained with the strict concept of economic reciprocity and is being explained by the egalitarian and optimistic notion that what is good for all is good for one (YouTube, del.icio.us, etc.)

These are all traits of a hunter-gatherer society, i.e. a pre-agricultural society.

Tens of thousands of years of behavioral evolution wiped out in just a few years.

Human behavior and society have evolved for a reason. It may be that the Internet is simply freeing the hunter gatherer inside us, but I wonder if bringing out an ancient ingrained behavior will upset the equilibrium that was achieved through tens of thousands of years of behavioral evolution. I realize that the last statement sounds like the plot for Jurassic Park (the “hunter gatherer” in us as the suddenly reborn dinosaur ready to wreck havoc on modern-day socio-economic structures), but it’s a plausible suggestion given that the Web has already had a great disruptive effect on some industries, e.g. newspapers and soon the media hierarchy at large. Speaking of the media hierarchy, a hunter gatherer society is by definition incapable of supporting the concept of a formal, non-arbitrary social, economic or political hierarchy.

Is this where we’re headed? Should we expect the Web 2.0 hunter-gatherer behaviors identified above to make their way into society at large? And what effect will that have on the stability of our socio-economic system?

More relevantly, for the Enterprise 2.0 crowd, should we bring the hunter-gatherer behavior to the highly evolved socio-economic structure of the enterprise? (I can’t believe I just said “highly evolved” and “enterprise” in one sentence, but I’m speaking in relative terms here) Wouldn’t that be like bringing matter and anti-matter together? Won’t the two annihilate each other? Shouldn’t we try to adopt only those parts of the Web 2.0 paradigm that are compatible with the structures of the enterprise? Or how much change would be considered good change? And does the “hunter gatherer” based Web 2.0 paradigm represent progress or regression compared to what exists today in the enterprise?

These are good questions to chew on.

Related

  1. The Unwisdom of Crowds
  2. Wikipedia 3.0: The End of Google?
  3. Self-Aware e-Society
  4. Open Source Your Mind

Tags:

Web 2.0, Web 2.0, Anthropology, Trends, cultural anthropology, sharing, hunter gatherer, evolution, del.icio.us, YouTube, society, Web Evolution, hunter gatherer society, AJAX, file sharing, video sharing, behavioral economics, Enterprise 2.0

GoodSense: End World Hunger and Increase Blog Revenue with Google AdSense.

In Uncategorized on June 19, 2006 at 5:37 am

AdSense is a service by Google that delivers the ads you see on blogs, forums, information websites and most regretably on spam sites. Tens or hundreds of millions of people view such ads each day. Some people click on the ads (or so I have been told) but most people, including myself, simply ignore them, and therein lies the opportunity!

Let’s say that Google sets up a system whereby AdSense users (i.e. the bloggers, forum aministrators, and, yes, even spammers) may choose to allow ads to be displayed that when clicked on would deduct 10% out of the payment due from Google to the AdSense user and send that amount to the advertiser’s favorite charity. The advertisers may choose to participate in this program and specify their favorite charity. The money comes out of the AdSense user’s payment but there will be a positive gain rather than a loss. That’s because more people would click on the ads if they knew that by doing so they would be contributing to a worthy cause. I would. Many people would. In fact, since it makes so much economic sense to the AdSense user you may find spammers (those who setup massively interlinked farms of pages and plaster Google AdSense ads all over them) opting to allow the 10% deduction, thus effectively doing good for a change instead of just evil. Advertisers who participate in such a program do so with the understanding that people clicking on their ads simply to help the advertiser’s chosen charity are potential consumers who may be interested in the advertiser’s products or services. People in general have a higher incentive to click on an ad for a product they may be interested in buying or finding out about (but not at that exact moment) if clicking on that ad would generate an immediate positive contribution to society (or the environment.)

So if the idea of helping end world hunger while increasing your blog’s revenue sounds good to you then feel free to bug Google about it …

Update:

There would have to be some kind of electronic seal or some other validation mechanism that tells users that a given AdSense link is participating in the charity program, so people who wouldn’t normally click on ads would click on those.

Tags:

Google, Google AdSense, Google AdWords, Trends, poverty, charity, world hunger, social innovation, hunger, Make Poverty History, society, philanphropy, spam, clickfraud

Reality as a Service (RaaS): The Case for GWorld

In Uncategorized on June 15, 2006 at 5:33 am


People keep asking what Web 3.0 is. I think maybe when you’ve got an overlay of scalable vector graphics – everything rippling and folding and looking misty – on Web 2.0 and access to a semantic Web integrated across a huge space of data, you’ll have access to an unbelievable data resource.


Tim Berners-Lee

Ready for GWorld?

Have you ever come up with a great domain name for a Web 2.0 application, personal blog, or online store only to find out that it and 2000 other variations (including dyslexic spellings) were already off the market?

Well, there is good news then! Virtual worlds, which include Gworld, a hypothetical future version of Google Earth where you can have an avatar and build stores, supermarkets or your own personal publishing house (the virtual world’s version of the humble blog) will not require you to register a domain. However, you will have to claim the land, or in the case of GWorld, pay Google a renewable license fee to the right to occupy the land for X number of years (a.k.a. a land lease.) You may also have to hire virtual world developers to build your house, hotel, store, etc for you (using Google Sketchup, which already lets you build houses and other structures and place them on Google Earth) and most likely have Google ads integrated into the walls as doorways into other stores, publishing houses or bordellos.

Some of the scenarios in Google’s hyopthetical future version of Google Earth, a.k.a. “GWorld (beta)”, may include:

  1. The ability to idetinfy and track the location of all objects in the virtual world (as if each had a virtual RFID tag.)
  2. The ability to barter with real and/or virtual objects (interchangeably.) You can buy a real t-shirt on GWorld with virtual stuff you made or had purchased off someone else (e.g. a nice roof for a house, a side wall, a portable mountain, etc)
  3. The ability to break the law and get away with it a la Grand Theft Auto except all radio stations in your stolen car will air Google sponsored commercials.
  4. The ability to create your own wicked (or more civilized) version of the world, i.e. the ability to create your own world, not just your own forum or popular blog but your own world with your own genuine looking castle! full with real people (incarnated as avatars) who become your loyal followers and click (or rather “knock”) very often on your Google ads (which you can already have in the 2D Web but I bet it will be more satisfying when you can make them do that on demand or risk being left without shelter.)

With such a world of possibilities who needs the 2D Web anymore?

The Case for GWorld

But to “organize the [virtual] world’s information” more intelligently than possible in the real world, we will first have to enter the Semantic Web phase. This is where all information on the Web would be put into standard format (a declarative ontological language like OWL) which machines can use to build a view (or formal ontological model) of how the individual terms in the information relate to each other, which can be thought of as axioms (basic assumptions), which together with the rules of inference constrain the interpretation and well-formed use of these terms. Based on that, formal deductive propositions that are provable based on the axioms and the rules of inference (i.e. theorems) may be generated by the software, thus allowing formal deductive reasoning at the machine level. So given that an ontology, as described here, is a statement of Logic Theory, two or more independent, machine-based information agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

In other words, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.

Once machines can understand and use information in a standard way, the virtual world will never be the same. It will be possible to have a Google Buddy or many Google Buddies among your virtual AI-enhanced workforce each having access to different domain specific comprehension space with all having access to the collective consciousness (read: Google as the virtual world’s omnipresent AI.) You’ll be able to ask your Google Buddy (or Buddies) to find you the nearest restaurant in your virtual neighborhood (which may be a loose replica of your real-world neighborhood) that serves Italian cuisine, even if the local restaurant nearest to you advertises themselves a Pizza joint as opposed to an Italian restaurant. But that is just a simple example of the deductive reasoning machines will be able to perform on information they have.

I believe that the advent (on the Web scale) of this already existing machine reasoning capability is going to make the case for doing business in the aforementioned “Google Earth + Sketchup + Semantic Web” enabled virtual world far more compelling than in the real world or the current non-semantic 2D Web, and that is because every object that exists in such virtual world will be automatically within the comprehension space of the machine-as-your-Google-Buddy! That sort of awesome power (i.e. the ability to access/query the collective consciousness of the universe at will and with precision) combined with user-generated “design it and they will come” 3D environments will make a powerful case for the move out of the current 2D Web and into the virtual world.

So if you thought Web 2.0 is exciting and Web 3.0 (Semantic Web) was going to be powerful then wait till you see what Web 4.0 has in store …

Tags:

Google Earth, Virtual World, Semantic Web, Web strandards, Trends, Sketchup, OWL, 3D Web, innovation, RFID, Startup, Evolution, Google, GData, inference, inference engine, AI, ontology, Game Design, Semanticweb, Web 2.0, Web 2.0, gworld

Comparison of Peer-Assisted Digital Content Distribution Solutions

In Uncategorized on May 21, 2006 at 9:21 am

Two products, namely BitTorrent and SwarmStream, and one research project, namely, Microsoft’s “Avalanche” project, have been included in this up-to-date analysis.

The following is a summary of my up-to-date conclusion regarding each of the above mentioned products, provided here in pros vs cons format, for brevity.

BitTorrent

Pros:

1. Open Protocol (can be replicated freely by commercial applications)

2. Allows users to download as well as publish

3. Uses true and tried algorithms; no experimental or easy-to-abuse features

4. Open source; independently maintainable & evolveable (GPL-like license)

Cons:

1. No streaming support

2. Allows users to publish content (may be undesirable for copyright owners)

3. Allows users to adjust bandwidth limits (slows down the network)

SwarmStream

Pros:

1. Supports streaming

2. Works with any browser over HTTP port; works with direct links, no .torrent files required

3. Does not allow the user to publish (may be desirable for copyright owners)

4. Does not allow users to adjust bandwidth usage (maintains maximum network speed)

5. Available as a Java pluggable protocol handler (known as SwarmStream Public Edition); offers functionality transparently to Java developers. No license fee for open source and commercial use. This is limited in some ways, especially scalability, compared to the full commercial version. Less appealing than using a BitTorrent Java library, but no such publicly available library exists yet.

Cons:

1. Proprietary protocol (always loses to open, commodity protocols in the long run)

2. Commercial version license ($25K)

3. Implements experimental features (potentially unsecure or unproven); tries to do too much (which leaves a lot of room for performance and security problems in unexpected network topologies and under unexpected network conditions)

Microsoft Avalanche (research project)

Pros:

As vaporware, it has no pros.

Cons:

Randomized Network Coding” is an idea that has popped up fairly recently within the Information Theory field. The purpose of it was to out-perform network routing in terms of how much of a network’s total potential bandwidth may be utilized. It attempted to do so by essentially transforming the problem from a routing problem to an encoding problem. As it turned out, the encoding required at each node would cost a great deal in CPU and memory usage when dealing with very large files (e.g. movie files) and/or large number of blocks (reference), which slows down the network dramatically.

Due to the real-life vs theoretical gap and the inherent complexity of peer-assisted content distribution, it is generally wise to rely on true and tried algorithms than the latest and greatest theoretical research and simulations.
—-

Concerns common issues:

- Complexity

- Security

- Manageability (decentralized network)

- Lack of locality (which frustrates caching schemes) increases transit costs for ISPs. This is being tackled by companies like Cachelogic which cache P2P content as it travels thru the network and localizes it. More on this can be found in this press release on BitTorrent.com.

- Asymmetric-traffic (traffic engineering)

- Variable bandwidth, peers come and go

- Need for more sophisticated peer-assisted content distribution algorithms

- Need for dynamic elimination of malicious peers from client’s peer list (easy to implement but so far missing)

Tags:

BitTorrent, SwarmStream, digital content distribution, Digital Content Download, P2P, Video On Demand, Trends, VoD, Internet Video On Demand, P2P, P2P research, P2P applications, P2P IPTV