February 14, 2011 – This question was recently asked and addressed in a community online forum that my colleagues and I participate in quite frequently. It occurred to me that though it seems like a simple question with an even simpler answer to those of us who live, breathe and eat this stuff every day, it certainly bears revisiting.

With content management system vendors providing taxonomy functionality, users, unfortunately, have become tempted to consolidate and eliminate a very worthy part of their document management process. There are many reasons why a coordinating taxonomy management tool is not only valuable, but necessary.

When building a taxonomy or ontology, you want that model to be available across the enterprise and not tied to one single program. Because of this, we have open APIs and web services calls so that it can be used with all the software options across an enterprise. All of our software is written in Java and uses native XML, TCP/IP and Unicode. This means it is platform independent and supports all languages.  The software connects to others using APIs, or web services as needed. All of this is done to make it a dynamic and comprehensive system with outstanding search results.

We provide Search Harmony to allow that same taxonomy/ontology to be used on the user search side to leverage the tagging of the documents and further enhance search results. A user can easily change the configuration of the ontology/thesaurus/taxonomy through our administrative module so the data model retains integrity by matching the guidelines of the standards while modifications are made to the user needs. This easily integrated feature is critical to quality, progressive search results.

Likewise, rules based systems are proven to be consistent and accurate, providing persistent clustering rather than unreliable results that are impossible to duplicate, like that from a statistical system. Presenting suggested valid indexing terms to the user at the time of document submission or automatically for uploads ensures that the documents are tagged and ready for retrieval as soon as they are added to the data corpus. We are cheerleaders for the rules based system and have been working with and improving the rules options for over 20 years.  

A good taxonomy, based on full term records, will easily work with all the flavors of taxonomy implementations embedded in the content, document, digital asset, database and other management systems. To manage the taxonomy within them is to severely limit the options, the applications and thus the usefulness of the taxonomy to the organization. 

When considering using a content management system’s taxonomy feature, you should ask critical questions, like “Does it just manage the taxonomy – or does it provide automated classification of any content, in any format and any language?” If tagging is left to the manual process, users will either not do it at all or at a minimum by tagging top level nodes only. This can only result in horrible search results. If the tool provides statistical classification, it becomes even more uncontrollable. Although they say they handle taxonomies, they only allow a single broader term in the hierarchy, replicating the library classification systems of old and often not having any cross-hierarchy associations (related term) options.  Finally, ask if they use synonyms and then be certain more than a single synonym is allowed.

Obviously, I will always argue for a system for managing taxonomies and business rules to interpret content and apply taxonomy terms that is easily integrated with other systems using APIs and web services calls; that is transparent and human-understandable; that is available to support users as they upload and tag their own content, and supports searchers by extending to them the full semantic richness of the taxonomy. The system should not be short-circuited by the limitations of a content management system that, while trying to serve an array of purposes, doesn’t quite match all the functions served by the original parts.

Margie Hlava
President, Access Innovations

Originally posted July 12, 2010.