Current tagging systems suffer from one critical drawback: the lack of a judgment standard. In other words, if a photo on flickr is tagged as “cool” by 1000 people then how do you know that you, personally, would find it to be cool? After all, as the saying goes, beauty is in the eye of the beholder.
One idea to address the intra –or inter– cultural variance in judgment among people is by establishing a statistical standard. In a photo sharing site dedicated to art/photography students that standard would be different than in a site dedicated to a random population of people. However, you would have a rough/vague view of what to expect in terms of a judgment standard based on the description of the site (or the description of its average user.) That assumes (and only “assumes”) that the majority of art/photography students share roughly the same view when it comes to art/photography.
But in a photo sharing site (or any other type of tagging application) that is intended for the general population but where the actual user population is not exactly the general population then you would have no idea of what to expect in terms of a judgment standard. Is it mainstream? Is it semi-mainstream? Is it quasi-mainstream? Is it a mix of highly varrying random views? You couldn’t really tell until you’ve done a statistical analysis of the users’ views.
To do that, a large random sample of users (of a given system) would go through an online feedback test where random scenes are shown and each user has to tell the system what he/she think of each scene by picking one of the available tags that describe the scene. Once that test has been done with a large enough population of users, those who come out in the middle of the curve for a given tag, e.g. “cool,” would have the most weight assigned to their use of that tag to describe others (i.e. 1.0) and those on either side of the curve (within the reference range) will have the least weight assigned to their use of that tag to describe others (i.e. 0.0), with each user in between (on both sides of the curve) having his/her use of that tag assigned a weight that is in between the maximum weight and the minimum weight depending on how far they are from the median. This way when 1000 people think that someone is funny the system will add the total weight of their “cool” tags so that if their sense of what is cool (plural) is tightly distributed around the median then the total score for their combined judgment will be closer to 1000 but if they mostly have odd humor the total score would be much less, e.g. 100. This way if you come across a photo that has been described with a “cool” score of 1000 then you can think of it as if 1000 people with “standard” judgment thought that photo as cool. In other words, the total score would be given in units of standard judgment.
I believe that applications that use tagging should put each user through a taste test so that the system may continuously adjust its “standard” judgment.
The statistical technique described here is very simple and it can be more elaborate, e.g. having additional user-behavior-based weighing factors.
It’s not only possible but it is essential that a new company emerges to provide a more reliable “tagging engine,” similar to how Google had emerged to provide a more reliable search engine.
The issue that needs to be processed is the interaction of the “tagging” concept (including this variation) with the coming Ontology-driven Semantic Web. May be the interaction would be in the form of an OWL inference engine that understands the current context and usage of tagging? I’m sure that more thoughts on this will emerge down the line.