Choosing Terms for a Taxonomy or Other Controlled Vocabulary

https://commons.wikimedia.org/wiki/File:Murray_OED_vocabulary_types_diagram.svg. Original source: First Edition of the Oxford English Dictionary. Image redrawn by User:DavidPKendal after the diagram by James Murray, first editor of the OED.

Recently, we’ve looked at choosing controlled vocabulary terms. More specifically, we’ve considered related terms, choosing non-preferred terms, and choosing broader and narrower terms. In this final installment of our “choosing terms” series, let’s broaden our scope to the task of choosing terms for inclusion in a vocabulary. And once again, let’s consult our usual trusty guide. That would, of course, be the Z39.19 standard (ANSI/NISI Z39.19-2005 (R2010)), “Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies”.

Yes, there it is. Section 6.1 of Z39.19 covers “Choice of Terms”. But wait. It’s only two inches long. There must be more to choosing terms than that. In fact, the first paragraph states, “Many issues need to be considered in selecting terms for a controlled vocabulary.” While that’s absolutely true, could this be a cop-out by our trusty guide?

And then, there they are — the cross-references:

The information space or domain to which the vocabulary will be applied (section 11.1.1)

Literary, user, and organizational warrant (section 5.3.5)

Specificity or granularity of the terms (section 11.1.7)

Relationship with other, related controlled vocabularies (section 10.9)

Let’s identify and examine the relevant passages from each of these sections.

The first cross-reference, regarding “The information space or domain to which the vocabulary will be applied”, is to section 11.1.1. Strangely enough, that section is headed “Avoid Duplicating Existing Vocabularies”. There doesn’t seem to be anything in that section that’s directly relevant to our topic. Anyway, most of us know (or can guess) that a controlled vocabulary for a particular subject area domain should include the terminology of that domain, and perhaps some of the terminology of peripheral subject areas, and not go too far astray from the core subject areas.

Looking elsewhere, we find that section 6.6.1 (Usage) has some advice relevant to vocabulary domains vis-à-vis term selection: “Terms should reflect the usage of people familiar with the domain of the controlled vocabulary as reflected in literary, organizational, and user warrant (see section 5.3.5).

Coincidentally and conveniently, the next cross-reference listed above also tells us to see section 5.3.5 regarding “literary, user, and organizational warrant”. There, we find some important advice that’s directly relevant to our topic:

“The process of selecting terms for inclusion in controlled vocabularies involves consulting various sources of words and phrases as well as criteria based on:

the natural language used to describe content objects (literary warrant),

the natural language of users, (user warrant), and

the needs and priorities of the organization (organizational warrant).”

The subsequent subsections go into a bit more detail. Additionally, going back to section 6.6.1, we find that much of the discussion on usage has to do with literary, user, and organizational warrant.

The next cross-reference, to section 11.1.7 (Levels of Specificity), has to do with “specificity or granularity of the terms”. The main piece of advice there is as follows: “The addition of highly specific terms is usually restricted to the core area of the subject field covered by a controlled vocabulary because the proliferation of such terms in fringe areas is likely to lead to a controlled vocabulary that is difficult to manage.” There are other considerations that are not mentioned there, such as the degree of specificity needed to properly index and search for content that is associated with the vocabulary.

The final cross-reference, to section 10.9, is regarding “Relationship with other, related controlled vocabularies”. The title for section 10.9 is “Storage and Maintenance of Relationships among Terms in Multiple Vocabularies”. Much of this section has to do with mapping between vocabularies. “This option requires designating one controlled vocabulary as the master with others as subsidiaries. The goal is to map the terminologies of the various controlled vocabularies to be included against a common classification scheme.” I think that where the term choice element comes into play here is making sure that the “common classification scheme” is complete enough to encompass the concepts represented by all the terms in all the vocabularies.

Mapping does not necessarily involve a subsidiary vocabulary per se. It could involve selected portions of a vocabulary. And the completeness aspect could involve considerations for making linked data effective. Here’s a tiny case study illustrating the need for adding a term to accommodate both:

“An example of the editorial work needed to create truly linked data is the process of mapping the implied conceptual links to actual links. For instance, the nationality/culture controlled list within ULAN should now map to terms in the AAT. While much mapping could be done through algorithms, comparing the ULAN nationality term to AAT terms, it had to be vetted by the editorial staff. Where “East German” was a historical nationality in the ULAN list, it did not exist in the AAT; the term was added to the AAT so that the link could be made.” (Patricia Harpring, “Linked Open Data in the Cultural Heritage World: Issues for Information Creators and Users.” Post on the Council on Library and Information Resources website, March 20, 2014. Permalink: http://connect.clir.org/blogs/patricia-harpring/2014/03/20/linked-open-data-in-the-cultural-heritage-world-issues-for-information-creators-and-users.)

There are many other factors to be considered, such as clarity and lack of ambiguity. Those factors, plus the factors mentioned above, along with a goodly amount of common sense, should provide a good foundation for choosing vocabulary terms well.

Barbara Gilles, Communicator

Choosing Terms for a Taxonomy or Other Controlled Vocabulary