I’ve heard it mentioned several times recently that “Taxonomies are hot again,” which leads to a couple of questions, namely, What is causing this renewed popularity? and, Why did they ever stop being “hot?” 

In the early days of bibliographic databases, content could only be found and retrieved by understanding how it was tagged, by structuring queries using the terms of the taxonomy in use by the taggers. One needed the help of skilled information professionals, using systems like Dialog or Data Star, to execute even the most simple queries. However, with the advent of full-text news databases and services like Nexis which could retrieve documents based on matching virtually any word in the text, there was less of a need to tag content, so the use of taxonomies in searching was increasingly limited to bibliographic databases in science, technology, and medicine.

The rise of the Web and its powerful economic engine, online advertising, led to a new need for controlled vocabularies that enabled the monetization of content. Yahoo! and Google created new products that could capture the meaning and context of a user’s search or a page’s content, and publishers consequently began to structure their content in ways that it could be more easily found and more effectively monetized. So taxonomies, albeit often greatly simplified, were employed once again – this time to create a logical structure for users to navigate the website. This effort continues, but has become much less important with the dominance of Google. More than seventy percent of web visits now start at Google, and not at the home page of a website, so the structure of a site today is largely invisible to most users.

The rise of the “Semantic Web” is bringing on a new era of interest in taxonomies, thesauri, and ontologies. As good as Google has gotten at interpreting the most popular searches and anticipating what most people are looking for, more specialized publishers and searchers with highly specific queries are once again looking for better ways to connect with the content they need. At the recent STM Innovations Seminar in London, nearly all the presentations talked in some way about the enhancement of digital content through semantic technology. (See the links to conference presentations on this page.) A few highlights were Professor Philip Bourne of UC San Diego, who introduced the idea of “Nano integration” – where the article metadata includes links to underlying data supporting the ideas within an article. Another was a presentation of the Semantic Biochemical Journal, which has turned the traditional journal into an interactive platform for further inquiry. We also talked about applications like Mendeley, which leverage descriptive metadata provided by users themselves. In every case, these innovations enhance the experience of the user by imbuing the content with greater meaning and context, and in so doing create links to other logically related content and data. So taxonomies and conceptual structure are back again – this time powered by new innovations and information standards that are pushing far beyond where they have gone before.

Bert Carelli
Vice President, Business Development
Access Innovations/Data Harmony