When an organization asks us to create a taxonomy or thesaurus, it’s usually because they want to be able to manage a collection.

Taxonomies and thesauri are both types of controlled vocabularies. In controlled vocabularies intended for indexing, each concept covered by the vocabulary is represented by one and only one term that is valid for indexing, with some exceptions such as multilingual thesauri. (True, a thesaurus contains synonyms, but in most computer-based thesauri, these take the form of “non-preferred” synonyms that point to the preferred terminology for the concept.) If the taxonomy or thesaurus is managed consistently, searchers will get the same answers/results every time. They don’t want to get a different answer every time they do a search.

All taxonomies, and most computer-based thesauri, are hierarchical in structure. This allows for consistency of depth for different subject areas, if the vocabulary is well constructed. We want to do it to roughly the same depth throughout the vocabulary. This depth can vary from vocabulary to vocabulary, as the thesauri that are available are many, many different sizes. Some of them go down only three levels, and some of them go down 20 levels or more. Consistency in-depth leads to balance in coverage, as well as ease and predictability in browsing and navigating the vocabulary.

We also use taxonomies and thesauri as a way to translate. As we all know, people often call the same thing by different names in different places in the United States and in different languages around the world. We want to get those names translated to a single way of stating each concept so that we can be consistent. So, to some extent, a thesaurus is a translation system.

We use a taxonomy or thesaurus to search. It is used for navigation and search. It is used a lot of different ways in search. Some people are getting increasingly clever with ways the way that they are using it in searching and they are also using it now for match-up – getting data from lots of different places and putting it together in different ways. You can put information on a map; you can track accidents across the country or across a district within a city – if you’ve coded them consistently. We want to continue to code content consistently so that people can search for it in a variety of new and innovative ways.

We use a taxonomy or thesaurus to index – to tag the documents, which is also known as keywording. It is important in our business to keep straight the word ‘index’, because when editors talk about indexing, they are talking about tagging or keywording or adding terms from a thesaurus to an individual record. When programmers talk about indexing, they really mean building an inverted file – taking data and building it into a big list that is quickly searchable by a computer. They are radically different things.  So, we need to be careful.  I’m trying to train myself to say tag or keyword a bit more, but it’s not easy, because I’ve spent about 40 years calling it indexing.  But, I’m learning.

We also use a taxonomy or thesaurus to browse, to drill down, to look at a hierarchical list and navigate down a tree to find information. There are lots of other ways it can help to add value to a collection.

By Marjorie Hlava

This posting is one of a series based on a workshop, “Thesaurus Creation and Management,” that Marjorie Hlava presented in December of 2012.