I recently had the opportunity to see webinars featuring a couple of software systems for taxonomy construction/management and content categorization. The systems were both impressive and, if I didn’t have 20 years in the business, I would have been totally awed … and snowed. It’s easy to be overwhelmed by a slick appearance and professional presentation.

Early in my career, I was overwhelmed and confused by the terminology—its abundance, multiplicity, and ambiguity. Each software company used different words, all very catchy, developed by a creative marketing department. I didn’t get whether they were talking about different concepts or the same in different verbal wrappers. Cutting through the terminology to identify key software features and functions can be tough. Yet that’s just what must be done for an informed buying decision.

One of the buzzwords I came across in these recent webinars was “content-driven” (or “data-driven”) to describe a taxonomy. To my amazement, this was described as a “trend” in taxonomy construction by the presenter for the company “with over 15 years of experience.” Apparently it was intended as a strike at a “top-down” approach to pulling together terms for a taxonomy based on an abstract, authoritative view of a domain. The top-down approach was described as more complex than necessary and including nodes not reflecting your content.

However, the discussion ignored the equally familiar and long established counterpart to top-down. This is the “bottom-up” approach, drawing terms directly from the documents to be categorized, i.e., content-driven. Here’s a link to a brief description of the strategies written in 1996 by Jessica Milstead.

In most cases, building a taxonomy or thesaurus requires a hybrid approach, with the overall organization based on a top-down approach for navigation and the bulk of terms reflecting the preferred terms for concepts in the domain and drawn from the actual documents. The strategies are most often used in balance, with the taxonomist providing a logical “top” structure into which the content-linked terms can fit.

The software on display generated a list of candidate terms, offering words and phrases from the content as terms. But this was just a starting point in taxonomy construction. Time for the taxonomist to add the value of organization through hierarchical, associative, and equivalence relationships.

Ah, “relationships” takes me to semantics, another buzzword that sounds very impressive and truly represents the power of taxonomies. The key thing to remember is that semantics in a taxonomy starts with the hierarchical, associative, and equivalence relationships. (Actually, a taxonomy with all those features is more accurately called a thesaurus). Organizing terms in a hierarchy of broader and narrower concepts—from general to specific—and recognizing synonymous alternative expressions and internal conceptual links all add semantic richness to terms by providing context based on the meanings of words. These are features built into a well-developed taxonomy, providing pivot points from one term to another through logical semantic associations. Applied as metadata to content items, the taxonomy terms provide semantic enrichment.

Another slick webinar focused on semantic enrichment with an artfully designed but effective presentation. As jaded as I have become, I was duly impressed by the appealing motifs, the jazzy colors, the graphics in motion, and the requisite buzzwords in the opening. This is the part you show to the CIO, CTO, etc., the one with final budget authority. We are still talking about semantically enriching content with metadata from a domain-specific taxonomy. You say, “This is just what I need!”

Several modules were described. One extracts words and phrases as key topics for a taxonomy-ish product, called by a name not found in the ANSI/NISO Z39.19 standard for taxonomy construction. Another is for human taxonomy building from scratch, if the ready-built domain taxonomies are not a good fit. Others serve categorizing/indexing/tagging/annotating content (chose your favorite expression), also known as applying taxonomy terms as metadata or … semantically enriching the content.

I must admit I was impressed, but not snowed. I’m an editor, not in marketing or in an art department. I knew what this was all about because this is basic taxonomy and indexing work that I do daily, using software that delivers these functions. I know that slick is cool to look at, but it comes at a price.


“Gee, thanks for the spin in your Ferrari, but I was hoping for a Chevy pricetag and Honda/Volvo/Subaru reliability.”

I also know that essential functions are available in products much more accessible to organizations on a budget.

If you are interested in software for taxonomy creation, management, and application, don't get snowed by the buzzwords and bling. Know the basics of taxonomy construction and implementation, and use that knowledge as a starting point when comparing software. Know the functions you need to perform and avoid slick but unnecessary frills, as alluring as they may be. Know if the product will work with other systems and whether you'll need a high-priced mechanic or an editor to do the work. When you hear about trends, consider established history and experience.

Alice Redmond-Neal, Senior Taxonomist
Access Innovations