Archive for January, 2007|Monthly archive page

Self-Aware Text

In Uncategorized on January 13, 2007 at 4:57 am

Author: Marc Fawzi

Twitter: http://twitter.com/#!/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0

Enabling self-organizing text

Below is a summary of an interesting model that I believe can be used to realize self-organizing text (excerpted from this rather weird but technically sound source):

Spin glasses are materials with chaotically oriented atomic spins which can reach neither a ferromagnetic equilibrium (spins aligned) nor a paramagnetic one (spins canceling in pairs), because of long-range spin interactions between magnetic trace atoms (Fe) and the conduction electrons of the host material (Cu). Because these effects reverse repeatedly with distance, no simple state fully resolves the dynamics, and spin glasses thus adopt a large variety of [globally] disordered states [with short range order.] Modeling the transition to a spin glass [i.e. simulated annealing] has close parallels in neural nets, particularly the Hopfield nets consisting of symmetrically unstable circuits. Optimization of a task is then modeled in terms of constrained minimization of a potential energy function. However the problem of determining the global minimum among all the local minima in a system with a large number of degrees of freedom is intrinsically difficult. Spin glasses are also chaotic and display sensitive dependence. Similar dynamics occurs in simulations of continuous fields of neurons.

Annealing is a thermodynamic simulation of a spin glass in which the temperature of random fluctuations is slowly lowered, allowing individual dynamic trajectories to have a good probability of finding quasi-optimal states. Suppose we start out at an arbitrary initial state of a system and follow the topography into the nearest valley, reaching a local minimum. If a random fluctuation now provides sufficient energy to carry the state past an adjacent saddle, the trajectory can explore further potential minima. Modeling such processes requires the inclusion of a controlled level of randomness in local dynamical states, something which in classical computing would be regarded as a leaky, entropic process. The open environment is notorious as a source of such [controlled level of randomness], which may have encouraged the use of chaotic systems in the evolutionary development of the vertebrate brain.

—End of Excerpted Model Summary—

Imagine the interaction between random words in the English language having two properties: aligned and non-aligned. If you throw the whole set of words into a heated spin-glass alloy, where the words replace the atoms and where word-word interactions replace spin-spin interactions, and then let it cool slowly (i.e. anneal it) then the system (of word-word interactions) should theoretically self-organize into the lowest potential energy state it could find.

The spin glass model (from the above quoted summary) implements an optimization process that is also a self organizational process that finds the local energy minima associated with a quasi-optimal state for the system which in turn organizes the local interactions between atomic spins (or words) to minimize discordant interactions (or disorder) in the short range, thus (in the case of word-word interactions) generating text that goes from garbage in the long range (as a result of globally disordered interactions in the long range) to well-formed in the short range (as a result of mostly aligned/ordered interactions in the short range.)

This idea is pretty raw, incomplete, and may not be the most proper use (or abuse) of the spin glass model (see References.)

However, in line with evolution’s preference for such a model for the brain, I find it useful to inject a controlled level of noise (randomness) into the thinking process.

Well, after having some apple crumble, I realize now (randomness works) that the reason this model will work well is because it will generate many well-formed sentences in each region in the state space (see image below) so there is bound to be a percentage of sentences that will actually make sense!

Having said that, this interpretation of the [SK] spin-glass model is pretty rough and needs more thinking to nail down, but the basic idea is good!

From Self-Organizing to Self Aware

What if instead of simply setting the rules and letting order emerge out of chaos (at least in the short range), as implied above, what if each word was an intelligent entity? What if each word knew how to fit itself with other words and within a sentence such that the words work collaboratively and competitively with each other to generate well-formed sentences and even whole articles?

The words would have to learn to read. :)

[insert your Web X.0 fantasy]


  1. Spin Glass Theory and Beyond


Short range ordered regions in 2D state space of a spin glass.


web 3.0, web 3.0, web 3.0, semantic web, semantic web, artificial intelligence, AI, statistical mechanics, stochastic, optimization, simulated-annealing, self-organization, spin glass


Designing a better Web 3.0 search engine

In Uncategorized on January 7, 2007 at 7:09 pm

Author: Marc Fawzi

Twitter: http://twitter.com/#!/marcfawzi

License: Attribution-NonCommercial-ShareAlike 3.0

This post discusses the significant drawbacks of current quasi-semantic search engines (e.g. hakia.com, ask.com et al) and examines the potential future intersection of Wikipedia, Wikia Search (the recently announced search-engine-in-development, by Wikipedia’s founder), future semantic version of Wikipedia (aka Wikipedia 3.0), and Google’s Pagerank algorithm to shed some light on how to design a better semantic search engine (aka Web 3.0 search engine)

Query Side Improvements

Semantic “understanding” of search queries (or questions) determines the quality of relevant search results (or answers.)

However, current quasi-semantic search engines like hakia and ask.com can barely understand the user’s queries and that is because they’ve chosen free-form natural language as the query format. Reasoning about natural language search queries can be accomplished by: a) Artificial General Intelligence or b) statistical semantic models (which introduce an amount of inaccuracy in constructing internal semantic queries). But a better approach at this early stage may be to guide the user through selecting a domain of knowledge and staying consistent within the semantics of that domain.

The proposed approach implies an interactive search process rather than a one-shot search query. Once the search engine confirms the user’s “search direction,” it can formulate an ontology (on the fly) that specifies a range of concepts that the user could supply in formulating the semantic search query. There would be a minimal amount of input needed to arrive at the desired result (or answer), determined by the user when they declare “I’ve found it!.”

Information Side Improvements

We are beginning to see search engines that claim they can semantic-ize arbitrary unstructured “Wild Wild Web” information. Wikipedia pages, constrained to the Wikipedia knowledge management format, may be easier to semantic-ize on the fly. However, at this early stage, a better approach may be to use human-directed crawling that associates the information sources with clearly defined domains/ontologies. An explicit publicized preference for those information sources (including a future semantic version of Wikipedia, a la Wikipedia 3.0) that have embedded semantic annotations (using, e.g., RDFa http://www.w3.org/TR/xhtml-rdfa-primer/ or microformats http://microformats.org) will lead to improved semantic search.

How can we adapt the currently successful Google PageRank algorithm (for ranking information sources) to semantic search?

One answer is that we would need to design a ‘ResourceRank’ algorithm (referring to RDF resources) to manage the semantic search engines’ “attention bandwidth.” Less radical, may be to design a ‘FragmentRank’ algorithm which would rank at the page-component level (ex: paragraph, image, wikipedia page section, etc).


  1. Wikipedia 3.0: The End of Google?
  2. Search By meaning


  1. See relevant links under comments


web 3.0, web 3.0, web 3.0, semantic web, semantic web, ontology, reasoning, artificial intelligence, AI, hakia, ask.com, pagerank, google, semantic search, RDFa, ResourceRank, RDF, Semantic Mediawiki, Microformats