Archive for June, 2006|Monthly archive page

Web 3.0: Basic Concepts

In Uncategorized on June 30, 2006 at 7:53 am


You may also wish to see Wikipedia 3.0: The End of Google?, the original ‘Web 3.0/Semantic Web’ article, and P2P 3.0: The People’s Google, a more extensive version of this article that discusses the implication of P2P Semantic Web Engines to Google.

Semantic Web Developers:

Feb 5, ‘07: The following reference should provide some context regarding the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0) but there are better, simpler ways of doing it.

  1. Description Logic Programs: Combining Logic Programs with Description Logic


Semantic Web (aka Web 3.0): Basic Concepts

Basic Web 3.0 Concepts

Knowledge domains

A knowledge domain is something like Physics, Chemistry, Biology, Politics, the Web, Sociology, Psychology, History, etc. There can be many sub-domains under each domain each having their own sub-domains and so on.

Information vs Knowledge

To a machine, knowledge is comprehended information (aka new information that is produced via the application of deductive reasoning to exiting information). To a machine, information is only data, until it is reasoned about.


For each domain of human knowledge, an ontology must be constructed, partly by hand and partly with the aid of dialog-driven ontology construction tools.

Ontologies are not knowledge nor are they information. They are meta-information. In other words, ontologies are information about information. In the context of the Semantic Web, they encode, using an ontology language, the relationships between the various terms within the information. Those relationships, which may be thought of as the axioms (basic assumptions), together with the rules governing the inference process, both enable as well as constrain the interpretation (and well-formed use) of those terms by the Info Agents to reason new conclusions based on existing information, i.e. to think. In other words, theorems (formal deductive propositions that are provable based on the axioms and the rules of inference) may be generated by the software, thus allowing formal deductive reasoning at the machine level. And given that an ontology, as described here, is a statement of Logic Theory, two or more independent Info Agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

Inference Engines

In the context of Web 3.0, Inference engines will be combining the latest innovations from the artificial intelligence (AI) field together with domain-specific ontologies (created as formal or informal ontologies by, say, Wikipedia, as well as others), domain inference rules, and query structures to enable deductive reasoning on the machine level.

Info Agents

Info Agents are instances of an Inference Engine, each working with a domain-specific ontology. Two or more agents working with a shared ontology may collaborate to deduce answers to questions. Such collaborating agents may be based on differently designed Inference Engines and they would still be able to collaborate.

Proofs and Answers

The interesting thing about Info Agents that I did not clarify in the original post is that they will be capable of not only deducing answers from existing information (i.e. generating new information [and gaining knowledge in the process, for those agents with a learning function]) but they will also be able to formally test propositions (represented in some query logic) that are made directly -or implied- by the user.

“The Future Has Arrived But It’s Not Evenly Distributed”

Currently, Semantic Web (aka Web 3.0) researchers are working out the technology and human resource issues and people like Tim Berners-Lee, the Noble prize recipient and father of the Web, are battling critics and enlightening minds about the coming human-machine revolution.

The Semantic Web (aka Web 3.0) has already arrived, and Inference Engines are working with prototypical ontologies, but this effort is a massive one, which is why I was suggesting that its most likely enabler will be a social, collaborative movement such as Wikipedia, which has the human resources (in the form of the thousands of knowledgeable volunteers) to help create the ontologies (most likely as informal ontologies based on semantic annotations) that, when combined with inference rules for each domain of knowledge and the query structures for the particular schema, enable deductive reasoning at the machine level.


On AI and Natural Language Processing

I believe that the first generation of AI that will be used by Web 3.0 (aka Semantic Web) will be based on relatively simple inference engines that will NOT attempt to perform natural language processing, where current approaches still face too many serious challenges. However, they will still have the formal deductive reasoning capabilities described earlier in this article, and users would interact with these systems through some query language.


  1. Wikipedia 3.0: The End of Google?
  2. P2P 3.0: The People’s Google
  3. All About Web 3.0
  4. Semantic MediaWiki


Semantic Web, Web strandards, Trends, OWL, innovation, Googleinference engine, AI, ontology, Web 2.0Web 3.0, Google Base, artificial intelligence, AI, Wikipedia, Wikipedia 3.0, collective consciousness, Ontoworld, AI Engine, OWL-DL, AI Engine, AI Matrix, Semantic MediaWiki, P2P


For Great Justice, Take Off Every Digg

In Uncategorized on June 28, 2006 at 8:30 pm

Author: Marc Fawzi

Twitter: http://twitter.com/#!/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0


/* This article presents the case against the ‘wisdom of crowds’ and explains the background for how the Wikipedia 3.0: The End of Google? article reached over 200,000 hits */

This article explains and demonstrates a conceptual flaw in digg’s service model that causes biased (or rigged) as well as lowest-common-denominator hype to be generated, causing a dumbing down of society (as the crowd). The experimental evidence and logic supplied here apply equally to other Web 2.0 social bookmarking services such as del.icio.us, and netscape beta.

Since digg is an open system where anyone can submit anything, user behavior has to be carefully monitored to make sure that people do not abuse the system. But given that the number of stories submitted each second is much larger than what Digg’s own staff can monitor, digg has given the power to the users to decide what is good content and what is bad (e.g. spam, miscategorized content, lame stuff, etc.)

This “wisdom of crowds” model, which forms the basis for digg, has a basic and major flaw at its foundation, not to mention at least one process and technology related issue in digg’s implementation of the model.

Let’s look at the simple process and technology issue first before we explore the much bigger problem at the heart of the “wisdom of crowds” model. If enough users report a post from a given site as spam then that site’s URL will be banned from digg, even if the site’s owner had no idea someone was submitting links from his site to digg. The fact is that digg cannot tell for sure whether the person submitting the post is the site’s owner or someone else, so their URL banning policy (or algorithm if it’s automated) must make the assumption that the site’s owner is the one submitting the post. But what if someone starts submitting posts from another person’s blog and placing them under the wrong digg categories just to get that person’s blog banned by digg?

This issue can be eliminated by improvements to the process and technology. [You may skipp the rest of this paragraph if you can take my word for it.] For example, instead of banning a given site’s URL right away upon receiving X number of spam reports for posts from that site, the digg admins would put the site’s URL under a temporary ban and attempt to contact the site’s owner and possibly have the site owner click on a link in an email they’d send him/her to capture his/her IP address and compare it to that used by the spammer. If the IP addresses don’t match then they would ban the IP address of the spam submitter, and not the site’s URL. This obviously assumes that digg is able to automatically ban all known public proxy addresses (including known Tor addresses etc) at any given time, to force the users to use their actual IP addresses.

The bigger problem, however, and what I believe to be the deadliest flaw in the digg model is the concept of the wisdom of crowds. Crowds are not wise. Crowds are great as part of a statistical process to determine the perceived numerical value of something that can be quantified. A crowd, in other words, is a decent calculator of subjective quantity, but still just a calculator. You can show a crowd of 200 people a jar filled with jelly beans and ask each how many jelly beans are in the jar. Then you can take the average and that would be the closest value to the actual number of jelly beans. However, if you were to ask a crowd of 200 million to evaluate taste or beauty or whatever subjective quality, e.g. coolness, the averaging process that helps in the case of counting jelly beans (where members of the crowd use reasoning and don’t let others affect their judgment) doesn’t happen in this scenario. What happens instead is that the crowd members (assuming they communicate with each other such that they would affect each others qualitative judgment, or assuming they already share something in common) would converge toward the lowest-common-denominator opinion. The logic for this is that reasoning is used in the case of estimating measurable values, while psychology is used in the case of judging quality. Thus, in the case of evaluating the subjective quality of a post submitted to digg, the crowd has no wisdom: it will always choose the lowest common denominator, whatever that happens to be.

To understand a crowd’s lack of rationality and wisdom, as a phenomenon, consider the following. I had written a post (see link at the end of this article) about the Semantic Web, domain specific knowledge ontologies and Google as seen from a Google-centric view. I went on about how Google, using Semantic Web and an AI-driven inference engine, would eventually develop into an omnipresent intelligence (a global mind) and how that would have far reaching implications etc. The post was titled “Reality as a Service (RaaS): The Case for GWorld.” I submitted it to digg and I believe I got a few diggs and one good comment on it. That’s all. I probably got 500 hits in total on that post, and mostly because I used the word “Gworld” in the title. More than a week after that, I took the same post, the same idea of combining the Semantic Web, domain-specific knowledge ontologies and an AI-driven inference engine but this time I pitted Wikipedia (as the most likely developer of knowledge ontologies) against Google, and posted it with the sensational but quite plausible title “Wikipedia 3.0: The End of Google.” The crowd went wild. I got over 33,000 hits in the first 24 hours. And as of the latest count about 1600 diggs. In fact, my blog on that day (yesterday) beat the #1 blog on WordPress, which is that of ex Microsoft guy Scobleizer. And now I have an idea of how many hits he gets a day! He gets more than 10,000 and less than 25,000. I know because the first 16 hours I was getting hit by massive traffic I managed to get ahead of him with a total of 25,000 hits, but in the last 8 hours of the first 24 hours cycle (for which I’m reporting the stats here) he beat me back to the #1 spot, as I only had 9,000 hits. I stayed at #2 though. Figure 1: June 25 Traffic, the first 16 hours of a 24 hour graph cycle. Traffic ~ 25,000 hits.

The first 16 hours. Traffic from digg = 25,000 hits Figure 2: June 26 Traffic, the last 8 hours of a 24 hour graph cycle. Traffic ~ 8,000 hits. The last 8 hours. Traffic from digg = 8,000 hits

A crowd, not to be confused with individuals (like myself, yourself), aside from being a decent calculator of subjective quantities (like counting jelly beans in a jar) is no smarter than a bull when it comes to judging the intellectual, artistic or philosophical appeal of something. Wave something red in front of it or make a lot of noise and it may notice you. Talk to it or make subtle gestures and you’ll fail to get its attention. Obviously you can have a tame bull or an angry one. An angry one is easier to upset. A crowd is no more than a decent calculator of subjective quantities. It is a tool in that sense and only in that sense. In the context of judging quality, like musical taste or coolness of something, a crowd is neither rational nor wise. It will only respond to the most basic and crude methods of attention grabbing. You can’t grab it’s attention with subtlety or rationality. You have to use psychology, like you would with a bull. As you can see from the graphs of my blog traffic, I’ve proved it. I didn’t just understand it. Social bookmarking systems, and tagging in general, amplifies the intensity of the crowd-as-a-bull behavior by attaching the highest numerical values to the most curde, most raw and the lowest common denominator.

Now all the sudden, when a post gets 100 digs it reaches escape velocity and goes into orbit. The numerical value attached to posts (or the  “diggs”) when it grows fast acts like a bait. People rush to see such posts just as they rushed in tens of thousands to see the “Wikipedia 3.0 vs Google” post. Yet it’s basically the same post as the one I did on GWorld over a week ago that only got a few diggs. There is no comparison between the wisdom and rationality of an individual and that of a crowd. The individual is infinitely wiser and more rational than the crowd.

So these social bookmarking systems need to be based on a more evolved model where individuals have as much say as the crowd. Remember that many failed social ideologies were based on the the idea of favoring the so-called “wisdom of crowds” over individualism. The reason they failed is because collectivist behavior is dumb behavior and individual judgment is the only way forward. We need more individuality in society not less.

Censored by digg

This post was censored by digg’s rating system. However, in a software-enabled rating system, such as digg, reddit, del.icio.us, netscape, etc, there is no way to guarantee that manipulation of the system by its owner does not happen. Please see the Update section below for the explanation and the evidence (in the form of a telling list of censored posts) behind why digg itself, and not just some of its fanatic users, may have been behind the censoring of this post.

Note: a fellow wordpress blogger published a post called Digg’s Ultimate Flow which links to this post. It has not been buried/censored yet (June 29, ’06, 5:45pm EST). It’s not to be confused with this post. The reason it hasn’t been buried is because it presents no threat to digg. They can sense danger like an animal and I guess I’ve scared them enough to bury/censor my post. The other me-too post that I’ve just mentioned does not smell as scary. It’s really sad that digg and sites like it are feeding the crude animal-like, instinctive, zero-clarity behavior that is the ‘unwisdom’ of crowds.

The truth is that digg and other so-called “social” bookmarking sites do not give us power, they take it away from us. Always. Think. Innovate. Do not follow. But you may want to follow this link to share your view with other digg users for what it’s worth. Correction I’ve just noticed that this blog is ahead of Scobleizer again at #1. I’ve had 7,796 hits since 8:00pm EST, June 28, ’06 (yesterday.) It’s 8:00pm EST now, on June 29, ’06.


  1. Wikipedia 3.0: The End of Google?
  2. Unwisdom of Crowds
  3. Reality as a Service (RaaS): The Case for GWorld
  4. Digg This! 55,500 Hits in ~4 Days

Update The following is a snapshot of digg’s BURIED/CENSORED post section as of 4:00am EST, June 29th, ’06. This post was originally titled “Digg’s Biggest Flaw Discovered.” Note that anything that is perceived as anti-digg, be it a bug report or a serious analysis of digg’s weaknesses, is being censored. Digg’s Biggest Flaw Discovered buried story submitted by evolvingtrends 21 hours 35 minutes ago (via http://evolvingtrends.wordpres…) An actual proof of a major flaw at the foundation of digg’s quality-of-service model category: Programming

Now even CNET wants its stories endorsed by Digg community submitted by aj9702 1 day 17 hours ago (via http://news.com.com/Attack+cod…) Check it out.. CNET which is number 72 on Alexa rankings wants its stories endorsed by the Digg community. They have a digg this link now to their more popular stories. This story links to the news that exploit code is out there for the RRAS exploit announced earlier this month category: Tech Industry News

Dvorak: Understanding Digg and Its Utopian Idealism buried story submitted by kevinmtu 1 day 18 hours ago (via http://www.pcmag.com/article2/…) Dvorak’s PC magazine article on the new version of Digg and its flaws, posing many interesting points.For example, “What would happen to the Digg site if the Bush-supporting minions in the red states, flocked to Digg and actively promoted stories, slammed things they didn’t like, and in the process drove away the libertarian users?” category: Tech Industry News

Pros and Cons of Digg v3 submitted by jobobshishkabob 2 days ago (via http://thenerdnetworks.com/blo…) Well, Digg version 3 got released today. It is really nice and has many great features. But everything has its flaws…. heres a list of pros and cons of the new Digg.com category: Tech Industry News

Easy Digg comment moderation fraud buried story submitted by Pooley 2 days ago (via http://www.davidmcmanus.com/st…) I’ve found a bug in digg.com. A flaw in the way I ‘digg’ a comment, by clicking the thumbs up icon, allows me to mark up a comment multiple times. category: Tech Industry News

Wikipedia 3.0: The End of Google?

In Artificial Intelligence, crowdsourcing, description logic, Inference Engine, Ontology, OWL, RDF, Search For Meaning, Semantic, Semantic Search, Semantic Web, Web 3.0, Wikipedia, Wikipedia 3.0 on June 26, 2006 at 5:18 am

Author: Marc Fawzi

Twitter: http://twitter.com/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0


Semantic Web Developers:

Feb 5, ‘07: The following external reference concerns the use of rule-based inference engines and ontologies in implementing the Semantic Web + AI vision (aka Web 3.0):

  1. Description Logic Programs: Combining Logic Programs with Description Logic (note: there are better, simpler ways of achieving the same purpose.)

Click here for more info and a list of related articles…

Forward (2008, 2009)

Two years after I published this article it has received over 230,000 hits and we now have several startups attempting to apply Semantic Web technology to Wikipedia and knowledge wikis in general, including Wikipedia founder’s own commercial startup as well as a startup that was recently purchased by Microsoft.

Recently, after seeing how Wikipedia’s governance is so flawed, I decided to write about a way to decentralize and democratize Wikipedia.

In August 2009, a little over 3 years after the writing of this unexpectedly wildly popular article, I wrote an update in response to a query by a journalist, titled Wikipedia 3.0: Three Years Later.

Versión española


(Article was last updated at 10:15am EST, July 3, 2006)

Wikipedia 3.0: The End of Google?

Read the rest of this entry »

From Mediocre to Visionary

In Uncategorized on June 24, 2006 at 4:02 am

This post started out as a conscious attempt to start formalizing and clarifying Superhype: the phenomenon were skillful-but-otherwise-perfectly-mediocre individuals and companies go up in fame and popularity with such tantalizing hype as to leave those who are both much more skillful as well as much less mediocre wondering how they did it.

From the time I was exposed to Superhype I have known that it involves subtlety and design that go beyond what those at the heart of the phenomenon can intellectualize while they’re experiencing it. I’m not talking about Bill Gates or Michael Dell or anyone who consciously, deliberately and thoughtfully plans and executes successful strategies. I’m talking about those who get carried on top of a massive wave of hype because they happen to be in the right place at the right time doing something that can only be described as perfectly mediocre.

You can always seek an area of the market that has awesome hype waves and be ready to surf the biggest hype wave that comes your way. It’s a statistical process, which is not the same as leaving it up to luck. You try to maximize the chances of a major wave coming your way by knowing the market and knowing where to stand at any given time for the biggest wave possible. Or you can be a statistical oddity and get “lucky” without understanding why and how all the sudden you seem to be riding the biggest wave in history, without even having the knowledge to surf it but getting carried on top of it anyway as if on a flying carpet. Speaking of riding major waves without knowing a thing about surfing, it actually happened to me once in Puerto Rico. I was trying to do what the cool kids where doing so like them I stood behind a giant rock waiting for the next wave to hit. Well, as the wave approached they all ran off and left me standing there. The wave carried me for at least 100ft on top of sharp-edged volcanic rock but, fortunately, on a bed of water. When I finally landed people rushed to the scene and started yelling “loco! loco!” … That’s how I learned my Spanish. That is a perfect example of how one could surf the really big waves by chance without having any formal knowledge of surfing. Just be at the right place at the right time and have the courage to explore. The rest is up to the odds.

There are many examples of “Web 2.0” celebrities (both companies and individuals) who are currently surfing some big waves (pretty much on their behind as I did in Puerto Rico) without any insight on how to properly surf the hype wave they’re riding, yet they seem to be magically levitating above it on a carpet of thin air (again, like I did in Puerto Rico.)

For those of us who cannot delegate our success to a statistically odd event (as in being at the right place, the right time and being carried miraculously by a massive wave of hype simply due to curiosity and good luck) we must strive to understand how to find the big hype waves across time and space and how to properly surf them.

This is where the discussion must move from simple metaphors to a rigorous analysis of the temporal, social, psychological and power dimensions of “hype.”

And this is where I have to stop, as I’m in the middle of this learning process.


Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, geek, early stage, Startup, hype, Puerto Rico, surfing, waves, market, luck

The “Geek VC Fund” Project: 6/23 Update

In Uncategorized on June 23, 2006 at 5:21 pm

This post is an update to the original post about the Geek-Run, Geek-Funded Venture Capital Fund.

  1. Upon gathering further input from early supporters of the idea the emergent judgment is to host the Fund’s collaboration space on an independent Wiki as the project concerns other audiences (e.g. VCs, lawyers with interest in the subject, partners at boutique banks, academics, and others) besides the grassroots startup audience that is the core constituency.
  2. I will be announcing the location of the collaboration space for this project once it’s ready for general use by the community.
  3. There have been some more comments under the original post. If you’ve just joined us you may want to add your input (see Comments)

More to come …


Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, funding, private equity, geek, seed funding, early stage, Startup

The “Geek VC Fund” Project: 6/22 Update

In Uncategorized on June 22, 2006 at 8:33 am

This post is an update to the original post about the Geek-Run, Geek-Funded Venture Capital Fund.

  1. I’m discussing the creation of the fund community’s “idea development” space on top of an existing and growing online “startup” community that is attracting a good mix of business geeks (potential LPs, educators, business leaders) and tech geeks (potential entrepreneurs, developers, technology leaders.) I’ll keep you posted on this.
  2. I’m socializing the vision with several folks from the PE/VC business as well as a couple of folks from major universities (that have student run VC funds) so I may gather further input.
  3. I’ve further defined the vision for the fund based on readers’ early comments, some of which were very helpful. If you’ve just joined us you may want to add your input to the original post’s Comments.


Web 2.0, Web 2.0, venture capital, venture capital, VC, entrepreneur, funding, private equity, geek, seed funding, early stage, Startup

Who 2.0: Tagging People in the Real World

In Uncategorized on June 21, 2006 at 11:15 pm


Related Topic: Self-Aware e-Society

This post has been updated to include the concept of Standardized Tags.


The Idea

This post presents an ubiquitous, passive method for tagging people in the real world with attributes that describe them (e.g. good, fun, smart, interesting, psycho, unreliable, untrustworthy, etc.. you get the idea) such that people can see how someone they’ve just met has been defined (or categorized) by others.

People will be able to make better choices about whom to associate with and whom to trust, in a real world setting. It allows the Web 2.0 “social networking” paradigm, which is currently confined to the Web, to be experienced in the real world.

From Conception to Production

The “Who 2.0” idea was suggested by a fellow WordPress blogger who goes by “farlane” in his comment to the “Hunter Gatherer” post, which he made about 3 hours ago [note that this post has been updated since.] The idea is also sort of related to what I had previously wrote in a post on “GWorld (beta)” regarding how objects (or people) within a virtual world may be tracked and identified with the virtual equivalent of the RFID tag. Links to those posts, farlane’s comment and farlane’s “about” page are provided at the end of this post.

The Design

It’s really very simple.

Today, we have camera phones that can take photos with 8 mega pixel resolution. Such camera phones can produce a pretty detailed image of people’s faces (8 mega pixels actually produce a large-sized image with more than enough details.) Simply add facial recognition software which exists today and you have your tagging mechanism. Take a shot of someone’s face, add their name (optional) and tag them with words that describe them. Click send and the image and tags will be sent to a central database. When you you meet someone and you want to find out how others think of them simply take a shot of their face and send that as a query to the central database. The answer you get back would show each word/tag that has been used to define that person and how many people used each given tag. For example, 400 people think I’m funny and fun to be around while 3 think I’m a mutant ninja turtle. Who do you trust? Obviously, you can safely conclude based on the statistics that I’m not a turtle. The tag statistics in this context will help you make a good bet about the character and personality of someone you’ve just met, but it’s you (not the system) that makes the bet.

For example, if you look up my name (or look up my face) with your phone and find out that 10 people thought I was so very boring then you probably wouldn’t want to hang out with me. However, if at the same time 1000 thought I was a fun lovin’ guy then you may want to take your chances and hang out with me. But what you can do to people people can do to you, so I can look up your face or name and hedge my bet based on the tags I get back and the associated tag statistics.

This reminds me of that famous quote by Abraham Lincoln: “You can fool all of the people some of the time, and you can even fool some of the people all the time, but you cannot fool all of the people all of the time.” So while I may not be fun all of the time (e.g. 1%) I’m still fun most of the time (or 99%), so then you can make a safe bet that I’m fun to hang out with. But that’s just a simple example.

It gets more complicated as people describe the 20 or more possible personality and character attributes, some using different words than others. However, it cannot be any worse than following a link on del.icio.us or digg that has been tagged as funny by 1000 other people.

Standardized Tags

Several readers complained that tags are relative, not absolute, so they should not be used to judge people in an absolute way. Well, I never said the system should be used as an absolute measure of people’s personality and character, but since it could be potentially misused in that way, I figured that the tags, which form the basis of the system, should reflect the difference between people’s judgment and what one could statistically define as the “standard” judgment.

Instead of using regular tags the system would allow users to use weighed tags. For example, what I think is funny may be boring to 70% of the people. So my “funny” tag should count less than another person’s “funny” tag if that person has a much more mainstream humor. To teach that to the system, users would go through an online feedback test where random scenes are shown and each user has to tell the system what he/she think of each scene by picking one of the available tags that describe the scene. Once that test has been done with a large enough population of users, those who come out in the middle of the curve for a given tag, e.g. funny, would have the most weight assigned to their use of that tag to describe others (i.e. 1.0) and those on either side of the curve (within the reference range) will have the least weight assigned to their use of that tag to describe others (i.e. 0.0), with each user in between (on both sides of the curve) having his/her use of that tag assigned a weight that is in between the maximum weight and the minimum weight depending on how far they are from the median. This way when 1000 people think that someone is funny the system will add the total weight of their “funny” tags so that if their sense of humor (plural) is tightly distributed around the median then the total score for their combined judgment will be closer to 1000 but if they mostly have odd humor the total score would be much less, e.g. 100. This way if you discover that someone you’ve just met has been described with a “funny” score of 1000 then you can think of it as if 1000 people with “standard” judgment thought that person was funny. In other words, the total score would be given in units of standard judgment.

The statistical technique described here is very simple and it can be more elaborate, e.g. having additional user-behavior-based weighing factors.

A system like the one suggested here will allow us to make good choices (about whom we pick as our friends or business partners) quickly and reliably.

This could lead to a safer, happier and more productive society (Well, at least in theory.)


  1. About farlane (the reader who suggested the idea.)
  2. Farlane’s comment where this idea was suggested.
  3. The “Hunter Gatherer” post that lured farlane to this blog.
  4. The post where tagging of objects (and people) in the [virtual] world was mentioned.


Web 2.0, Web 2.0, Where 2.0, Where 2.0, social networking, Trends, Who 2.0, facial recognition, tagging, Startup

Standardized Tagging: Defining “Cool” in Standard Units of Judgment

In Uncategorized on June 21, 2006 at 11:11 pm


Current tagging systems suffer from one critical drawback: the lack of a judgment standard. In other words, if a photo on flickr is tagged as “cool” by 1000 people then how do you know that you, personally, would find it to be cool? After all, as the saying goes, beauty is in the eye of the beholder.


One idea to address the intra –or inter– cultural variance in judgment among people is by establishing a statistical standard. In a photo sharing site dedicated to art/photography students that standard would be different than in a site dedicated to a random population of people. However, you would have a rough/vague view of what to expect in terms of a judgment standard based on the description of the site (or the description of its average user.) That assumes (and only “assumes”) that the majority of art/photography students share roughly the same view when it comes to art/photography.

But in a photo sharing site (or any other type of tagging application) that is intended for the general population but where the actual user population is not exactly the general population then you would have no idea of what to expect in terms of a judgment standard. Is it mainstream? Is it semi-mainstream? Is it quasi-mainstream? Is it a mix of highly varrying random views? You couldn’t really tell until you’ve done a statistical analysis of the users’ views.

To do that, a large random sample of users (of a given system) would go through an online feedback test where random scenes are shown and each user has to tell the system what he/she think of each scene by picking one of the available tags that describe the scene. Once that test has been done with a large enough population of users, those who come out in the middle of the curve for a given tag, e.g. “cool,” would have the most weight assigned to their use of that tag to describe others (i.e. 1.0) and those on either side of the curve (within the reference range) will have the least weight assigned to their use of that tag to describe others (i.e. 0.0), with each user in between (on both sides of the curve) having his/her use of that tag assigned a weight that is in between the maximum weight and the minimum weight depending on how far they are from the median. This way when 1000 people think that someone is funny the system will add the total weight of their “cool” tags so that if their sense of what is cool (plural) is tightly distributed around the median then the total score for their combined judgment will be closer to 1000 but if they mostly have odd humor the total score would be much less, e.g. 100. This way if you come across a photo that has been described with a “cool” score of 1000 then you can think of it as if 1000 people with “standard” judgment thought that photo as cool. In other words, the total score would be given in units of standard judgment.

I believe that applications that use tagging should put each user through a taste test so that the system may continuously adjust its “standard” judgment.

The statistical technique described here is very simple and it can be more elaborate, e.g. having additional user-behavior-based weighing factors.


It’s not only possible but it is essential that a new company emerges to provide a more reliable “tagging engine,” similar to how Google had emerged to provide a more reliable search engine.

The issue that needs to be processed is the interaction of the “tagging” concept (including this variation) with the coming Ontology-driven Semantic Web. May be the interaction would be in the form of an OWL inference engine that understands the current context and usage of tagging? I’m sure that more thoughts on this will emerge down the line.


Web 2.0, Web 2.0, tags, tagging, flikr, photo sharing, startup, Tagging Engine

Geek-Run, Geek-Funded Venture Capital Fund

In Uncategorized on June 21, 2006 at 10:00 pm

Author: Marc Fawzi

Twitter: http://twitter.com/#!/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0


(This post was last updated on June 22, ’06, taking into consideration early input from the community, which you can find under Comments.)

For a while now, I’ve been toying with the idea of starting a cooperative venture capital fund where smart, sophisticated people (aka business geeks, e.g., ex Technical Directors, ex Chief Architects, ex CTOs, ex CEOs et al) come together to launch ideas into market.

For example, for Web 2.0 ideas, if we could have a large crowd of Web 2.0 business geeks then it would be quite possible to conduct private placements under an SEC securities law safe harbor for non-accredited –but sophisticated– investors (i.e. business geeks who could judge the risk associated with a given venture/idea that’s within their domain of expertise) such that we could seed the fund as a community. Those who participate would then collectively have as much power as any VC.

Obviously, this requires the involvement of a legal counsel who would structure such a cooperative venture capital fund, so that it would comply with securities law and state regulations. Luckily, I have access to lawyers who work for private equity (PE) investors, as well as enlightened, accredited investors who may see the value in supporting it. But this is not about starting yet another VC fund. This is about giving power to the entrepreneurs, just like how the Web has given power to the producer and caused the middleman to adapt and innovate.

The VC industry is another socio-economic structure that will have to undergo a radical rethinking in the years to come (as the newspaper and the music industry are doing now) or risk losing in the long run. In this context, the fund would be a grassroots remaking of the early-stage funding process. In certain cases, the fund would partner with traditional VCs during the later stages of the ideas it launches.

The driving motivation is that a truly cooperative fund could be launched, supported and managed by a community of business geeks and angel investors that is much more grassroots in its makeup, scale and orientation than the current attempts, but still led by folks with experience and track record. The thesis is that such fund could serve the needs of the community on much wider basis.

It’s definitely a good idea to look at what others, such as YCombinator, have done and improve on that, as I see many ways in which such funds can help a lot more entrepreneurs and launch more ideas while retaining quality of ideas and execution.

But we’ll get to work on that as soon as I setup a collaboration space for us to use in developing this “Geek VC Fund” idea.

In the meantime, please feel free to add your feedback under Comments.


Web 2.0, Web 2.0venture capital, venture capital, VC, entrepreneur, funding, private equity, YCombinator, geek, seed funding, early stage, Startup

Web 2.0: Back to the “Hunter Gatherer” Society

In Uncategorized on June 20, 2006 at 3:31 am

Author: Marc Fawzi

Twitter: http://twitter.com/#!/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0


Fact: trusted individuals are once again the source of news in a society (bloggers)

Fact: word of mouth is once again how news spreads (viral marketing)

Fact: people once again hunt and gather in a group (del.icio.us)

Fact: people once again group things using words like small, big, happy, sad, funny, food rather than detailed hierarchical structures (tags)

Fact: impulsive consumption (i.e. “hunt and eat” or “click and enjoy”) and impulsive production (i.e. “less initial planning”, e.g. Google’s “betas”) are back in style.

Fact: once again, sharing between people cannot be explained with the strict concept of economic reciprocity and is being explained by the egalitarian and optimistic notion that what is good for all is good for one (YouTube, del.icio.us, etc.)

These are all traits of a hunter-gatherer society, i.e. a pre-agricultural society.

Tens of thousands of years of behavioral evolution wiped out in just a few years.

Human behavior and society have evolved for a reason. It may be that the Internet is simply freeing the hunter gatherer inside us, but I wonder if bringing out an ancient ingrained behavior will upset the equilibrium that was achieved through tens of thousands of years of behavioral evolution. I realize that the last statement sounds like the plot for Jurassic Park (the “hunter gatherer” in us as the suddenly reborn dinosaur ready to wreck havoc on modern-day socio-economic structures), but it’s a plausible suggestion given that the Web has already had a great disruptive effect on some industries, e.g. newspapers and soon the media hierarchy at large. Speaking of the media hierarchy, a hunter gatherer society is by definition incapable of supporting the concept of a formal, non-arbitrary social, economic or political hierarchy.

Is this where we’re headed? Should we expect the Web 2.0 hunter-gatherer behaviors identified above to make their way into society at large? And what effect will that have on the stability of our socio-economic system?

More relevantly, for the Enterprise 2.0 crowd, should we bring the hunter-gatherer behavior to the highly evolved socio-economic structure of the enterprise? (I can’t believe I just said “highly evolved” and “enterprise” in one sentence, but I’m speaking in relative terms here) Wouldn’t that be like bringing matter and anti-matter together? Won’t the two annihilate each other? Shouldn’t we try to adopt only those parts of the Web 2.0 paradigm that are compatible with the structures of the enterprise? Or how much change would be considered good change? And does the “hunter gatherer” based Web 2.0 paradigm represent progress or regression compared to what exists today in the enterprise?

These are good questions to chew on.


  1. The Unwisdom of Crowds
  2. Wikipedia 3.0: The End of Google?
  3. Self-Aware e-Society
  4. Open Source Your Mind


Web 2.0, Web 2.0, Anthropology, Trends, cultural anthropology, sharing, hunter gatherer, evolution, del.icio.us, YouTube, society, Web Evolution, hunter gatherer society, AJAX, file sharing, video sharing, behavioral economics, Enterprise 2.0

GoodSense: End World Hunger and Increase Blog Revenue with Google AdSense.

In Uncategorized on June 19, 2006 at 5:37 am

AdSense is a service by Google that delivers the ads you see on blogs, forums, information websites and most regretably on spam sites. Tens or hundreds of millions of people view such ads each day. Some people click on the ads (or so I have been told) but most people, including myself, simply ignore them, and therein lies the opportunity!

Let’s say that Google sets up a system whereby AdSense users (i.e. the bloggers, forum aministrators, and, yes, even spammers) may choose to allow ads to be displayed that when clicked on would deduct 10% out of the payment due from Google to the AdSense user and send that amount to the advertiser’s favorite charity. The advertisers may choose to participate in this program and specify their favorite charity. The money comes out of the AdSense user’s payment but there will be a positive gain rather than a loss. That’s because more people would click on the ads if they knew that by doing so they would be contributing to a worthy cause. I would. Many people would. In fact, since it makes so much economic sense to the AdSense user you may find spammers (those who setup massively interlinked farms of pages and plaster Google AdSense ads all over them) opting to allow the 10% deduction, thus effectively doing good for a change instead of just evil. Advertisers who participate in such a program do so with the understanding that people clicking on their ads simply to help the advertiser’s chosen charity are potential consumers who may be interested in the advertiser’s products or services. People in general have a higher incentive to click on an ad for a product they may be interested in buying or finding out about (but not at that exact moment) if clicking on that ad would generate an immediate positive contribution to society (or the environment.)

So if the idea of helping end world hunger while increasing your blog’s revenue sounds good to you then feel free to bug Google about it …


There would have to be some kind of electronic seal or some other validation mechanism that tells users that a given AdSense link is participating in the charity program, so people who wouldn’t normally click on ads would click on those.


Google, Google AdSense, Google AdWords, Trends, poverty, charity, world hunger, social innovation, hunger, Make Poverty History, society, philanphropy, spam, clickfraud

Reality as a Service (RaaS): The Case for GWorld

In Uncategorized on June 15, 2006 at 5:33 am

People keep asking what Web 3.0 is. I think maybe when you’ve got an overlay of scalable vector graphics – everything rippling and folding and looking misty – on Web 2.0 and access to a semantic Web integrated across a huge space of data, you’ll have access to an unbelievable data resource.

Tim Berners-Lee

Ready for GWorld?

Have you ever come up with a great domain name for a Web 2.0 application, personal blog, or online store only to find out that it and 2000 other variations (including dyslexic spellings) were already off the market?

Well, there is good news then! Virtual worlds, which include Gworld, a hypothetical future version of Google Earth where you can have an avatar and build stores, supermarkets or your own personal publishing house (the virtual world’s version of the humble blog) will not require you to register a domain. However, you will have to claim the land, or in the case of GWorld, pay Google a renewable license fee to the right to occupy the land for X number of years (a.k.a. a land lease.) You may also have to hire virtual world developers to build your house, hotel, store, etc for you (using Google Sketchup, which already lets you build houses and other structures and place them on Google Earth) and most likely have Google ads integrated into the walls as doorways into other stores, publishing houses or bordellos.

Some of the scenarios in Google’s hyopthetical future version of Google Earth, a.k.a. “GWorld (beta)”, may include:

  1. The ability to idetinfy and track the location of all objects in the virtual world (as if each had a virtual RFID tag.)
  2. The ability to barter with real and/or virtual objects (interchangeably.) You can buy a real t-shirt on GWorld with virtual stuff you made or had purchased off someone else (e.g. a nice roof for a house, a side wall, a portable mountain, etc)
  3. The ability to break the law and get away with it a la Grand Theft Auto except all radio stations in your stolen car will air Google sponsored commercials.
  4. The ability to create your own wicked (or more civilized) version of the world, i.e. the ability to create your own world, not just your own forum or popular blog but your own world with your own genuine looking castle! full with real people (incarnated as avatars) who become your loyal followers and click (or rather “knock”) very often on your Google ads (which you can already have in the 2D Web but I bet it will be more satisfying when you can make them do that on demand or risk being left without shelter.)

With such a world of possibilities who needs the 2D Web anymore?

The Case for GWorld

But to “organize the [virtual] world’s information” more intelligently than possible in the real world, we will first have to enter the Semantic Web phase. This is where all information on the Web would be put into standard format (a declarative ontological language like OWL) which machines can use to build a view (or formal ontological model) of how the individual terms in the information relate to each other, which can be thought of as axioms (basic assumptions), which together with the rules of inference constrain the interpretation and well-formed use of these terms. Based on that, formal deductive propositions that are provable based on the axioms and the rules of inference (i.e. theorems) may be generated by the software, thus allowing formal deductive reasoning at the machine level. So given that an ontology, as described here, is a statement of Logic Theory, two or more independent, machine-based information agents processing the same domain-specific ontology will be able to collaborate and deduce an answer to a query, without being driven by the same software.

In other words, in the Semantic Web individual machine-based agents (or a collaborating group of agents) will be able to understand and use information by translating concepts and deducing new information rather than just matching keywords.

Once machines can understand and use information in a standard way, the virtual world will never be the same. It will be possible to have a Google Buddy or many Google Buddies among your virtual AI-enhanced workforce each having access to different domain specific comprehension space with all having access to the collective consciousness (read: Google as the virtual world’s omnipresent AI.) You’ll be able to ask your Google Buddy (or Buddies) to find you the nearest restaurant in your virtual neighborhood (which may be a loose replica of your real-world neighborhood) that serves Italian cuisine, even if the local restaurant nearest to you advertises themselves a Pizza joint as opposed to an Italian restaurant. But that is just a simple example of the deductive reasoning machines will be able to perform on information they have.

I believe that the advent (on the Web scale) of this already existing machine reasoning capability is going to make the case for doing business in the aforementioned “Google Earth + Sketchup + Semantic Web” enabled virtual world far more compelling than in the real world or the current non-semantic 2D Web, and that is because every object that exists in such virtual world will be automatically within the comprehension space of the machine-as-your-Google-Buddy! That sort of awesome power (i.e. the ability to access/query the collective consciousness of the universe at will and with precision) combined with user-generated “design it and they will come” 3D environments will make a powerful case for the move out of the current 2D Web and into the virtual world.

So if you thought Web 2.0 is exciting and Web 3.0 (Semantic Web) was going to be powerful then wait till you see what Web 4.0 has in store …


Google Earth, Virtual World, Semantic Web, Web strandards, Trends, Sketchup, OWL, 3D Web, innovation, RFID, Startup, Evolution, Google, GData, inference, inference engine, AI, ontology, Game Design, Semanticweb, Web 2.0, Web 2.0, gworld