© Kellyplz | Dreamstime.comSquirrels Making Wishes. Photo

Synonyms and other non-preferred terms are largely what make a taxonomy (or other controlled vocabulary) a thesaurus. They can enrich a vocabulary in a variety of ways. Searches on a vocabulary can take advantage of non-preferred synonyms to direct the search from those words or phrases to the preferred term. More significantly, each non-preferred term can be used as a basis for one or more indexing rules for retrieving information from a database.

For the most part, non-preferred terms are conceptually equivalent (more or less) to their preferred term pairings. The difference, of course, is that the “preferred term” is the one that represents the concept in a thesaurus hierarchy and therefore in the accepted set of words and phrases for use in indexing using that particular thesaurus. (I must confess: I’ve always been a bit bothered by the widespread practice of referring to all the regular terms in a thesaurus as “preferred terms”, whether or not they all have non-preferred pairings of some sort. What exactly is it that the non-paired terms are preferred to? Ah, well.)

The relationship between a regular thesaurus term and any of its non-preferred pairings, or vice versa, is known as an equivalence relationship. The discussion of equivalence relationships in section 8.2 of ANSI/ NISO Z39.19-2005 (R2010) (“Guidelines for the Construction, Format, and Management of Controlled Vocabularies”) explains: “The relationship between preferred and non-preferred terms is an equivalence relationship in which each term is regarded as referring to the same concept. The preferred term in effect substitutes for other terms expressing equivalent or nearly equivalent concepts.”

So how do we choose those non-preferred terms? Let’s back up slightly. Z39.19’s section 8.2 starts out with this statement: “When the same concept can be expressed by two or more terms, one of these is selected as the preferred term.” Actually, that’s something of an oversimplification, although it’s a plausible scenario. It suggests that we’re starting out by choosing a preferred term from a little collection of candidates; the non-preferred terms must be the ones that are left over after you choose a winner. (This seems more like a matter of rejecting the less fortunate terms than choosing them, doesn’t it?) This could very well be the case if you’re using a bottom-up approach to thesaurus construction, starting with a large assortment of possible terms and then piecing them together into sub-hierarchies.

However, most thesauri are probably constructed with a combination of bottom-up and top-down approaches. On the top-down side of things, you might be crafting the hierarchical structure in a more conceptually oriented way, adding the terms that first come to mind or that are available from your collection of possible terms for the vocabulary’s overall subject area(s). Often, those terms are the ones that best represent the concept for the users of the thesaurus. In that case, choosing non-preferred terms becomes a matter of thinking of or discovering possible synonyms.

Those synonyms should only be ones that might be used in searching the thesaurus or any associated databases; otherwise, you’re cluttering the vocabulary with deadwood. On the other hand, you should cast a fairly wide net, and try to discover or think of as many ways of searching for the occurrence of a concept as practicable.

There is a danger of casting too wide a net. The main danger, perhaps, is that of choosing non-preferred terms that have a wider concept than the preferred term, or that go outside the boundaries of the concept in some other way. This might not be much of a problem in searches within the thesaurus. However, it can lead to inappropriate indexing. For instance, if your main term is Dogs, and you use Canines as one of its non-preferred terms, content that discusses wolves, foxes, jackals, or coyotes is likely to be indexed with the term Dogs, even when there isn’t a hint of a single dog hair in the entire article or whatever.

Choosing a non-preferred term with a narrower concept than the regular term is perfectly fine, though, as long as that non-preferred term doesn’t match better with a different regular term in the thesaurus. You might want to check the regular term’s narrower terms to see if there’s a better fit somewhere else. Another possibility is to add the narrower concept to the hierarchy as a regular term. One factor in deciding on adding the term is the degree of granularity that you want the thesaurus to have. How detailed should it be for the vocabulary’s eventual users? How many levels deep should it be?

It’s worth the thought and care that it takes to have well-chosen non-preferred terms in your thesaurus. These terms help to make the thesaurus a powerful tool in indexing, information retrieval, and knowledge domain representation.

Barbara Gilles, Communicator

Data Harmony is an award-winning semantic suite that leverages explainable AI.