Standardized Tagging: Defining “Cool” in Standard Units of Judgment

In Uncategorized on June 21, 2006 at 11:11 pm


Current tagging systems suffer from one critical drawback: the lack of a judgment standard. In other words, if a photo on flickr is tagged as “cool” by 1000 people then how do you know that you, personally, would find it to be cool? After all, as the saying goes, beauty is in the eye of the beholder.


One idea to address the intra –or inter– cultural variance in judgment among people is by establishing a statistical standard. In a photo sharing site dedicated to art/photography students that standard would be different than in a site dedicated to a random population of people. However, you would have a rough/vague view of what to expect in terms of a judgment standard based on the description of the site (or the description of its average user.) That assumes (and only “assumes”) that the majority of art/photography students share roughly the same view when it comes to art/photography.

But in a photo sharing site (or any other type of tagging application) that is intended for the general population but where the actual user population is not exactly the general population then you would have no idea of what to expect in terms of a judgment standard. Is it mainstream? Is it semi-mainstream? Is it quasi-mainstream? Is it a mix of highly varrying random views? You couldn’t really tell until you’ve done a statistical analysis of the users’ views.

To do that, a large random sample of users (of a given system) would go through an online feedback test where random scenes are shown and each user has to tell the system what he/she think of each scene by picking one of the available tags that describe the scene. Once that test has been done with a large enough population of users, those who come out in the middle of the curve for a given tag, e.g. “cool,” would have the most weight assigned to their use of that tag to describe others (i.e. 1.0) and those on either side of the curve (within the reference range) will have the least weight assigned to their use of that tag to describe others (i.e. 0.0), with each user in between (on both sides of the curve) having his/her use of that tag assigned a weight that is in between the maximum weight and the minimum weight depending on how far they are from the median. This way when 1000 people think that someone is funny the system will add the total weight of their “cool” tags so that if their sense of what is cool (plural) is tightly distributed around the median then the total score for their combined judgment will be closer to 1000 but if they mostly have odd humor the total score would be much less, e.g. 100. This way if you come across a photo that has been described with a “cool” score of 1000 then you can think of it as if 1000 people with “standard” judgment thought that photo as cool. In other words, the total score would be given in units of standard judgment.

I believe that applications that use tagging should put each user through a taste test so that the system may continuously adjust its “standard” judgment.

The statistical technique described here is very simple and it can be more elaborate, e.g. having additional user-behavior-based weighing factors.


It’s not only possible but it is essential that a new company emerges to provide a more reliable “tagging engine,” similar to how Google had emerged to provide a more reliable search engine.

The issue that needs to be processed is the interaction of the “tagging” concept (including this variation) with the coming Ontology-driven Semantic Web. May be the interaction would be in the form of an OWL inference engine that understands the current context and usage of tagging? I’m sure that more thoughts on this will emerge down the line.


Web 2.0, Web 2.0, tags, tagging, flikr, photo sharing, startup, Tagging Engine

  1. Anything that requires me to take a test makes me not want to do it.

    What if you used some sort of tag clustering algorithm, where you looked to see:
    a. how many folks viewed item
    b. how many of those tagged it “x”

    A higher “b” score means a more reliable tag. I would love it if a high rating by someone who I highly regard would make the item “brighter”. It could also extend to those highly regarded by folks I esteem.

  2. In the original post on Tagging People in the Real World, which you're intimately familiar with, I mentioned that the relative score of each tag (which is what I believe you mean, e.g. how many people tagged this post as "informative" out of all the people who tagged the post, but it can also include the ratio with respect to those who viewed it but did not tag it) would be displayed numerically and I was imagining (but did not state) graphically.

    Then people looking at the bar chart could tell the relative vale of each tag score.

    However, I believe you need a mutli-scheme approach that includes the above (i.e. relative tag scheme you stated, preferrably graphically, leaving it up to the user to decide) combined with the weighing factors based on and initial test of judgment as well as based on user behavior parameters.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: