In a controlled vocabulary, we strive for disambiguation, the restriction and clarification of meaning. We want to determine and clarify what exactly is meant by each term. Reading could mean a town in England or it could be a communication process. We might have the word ‘cells’, meaning biological microsystems or electrical equipment or prison housing or other things. You can have a terrorist cell. Cell is a broadly used term, and without some kind of a modifier around it, we can’t be sure what it is exactly.

In a multidisciplinary thesaurus, one very common cause of ambiguity is identical terminology in different domains. One example that taxonomists at Access Innovations have seen several times is “binary systems” in thesauri that cover astrophysics, chemistry, and computer engineering. In a very, very general sense, yes, it means the same thing in all three disciplines – a system with two things in it. But that certainly isn’t specific to any or all of those disciplines, nor is it a useful concept for indexing content or organizing knowledge.

I have seen and heard the arguments that the place of a term in a hierarchy is sufficient to clarify the meaning. If you are using a taxonomy or thesaurus only as a navigation device on a website, perhaps so. For indexing or keywording or search, though, the term needs to be able to stand by itself.

A simple way to perform disambiguation is to include parenthetical qualifiers:

Astronomy

. . Binary systems (astronomy)

Chemistry

. . Binary systems (chemistry)

Computer engineering

. . Binary systems (computer engineering)

However, this approach should be used only as a last resort. While acceptable, it is discouraged in the standards, and presents problems for search functions.

Frequently (but not always), an expansion of the term is possible for disambiguation:

Astronomy

. . Astronomical binary systems

. . Binary star systems

. . Binary planet systems [with non-preferred synonyms double planets and binary planets]

Chemistry

. . Binary chemical systems

Computer engineering

. . Binary numeral system

For the terms in the example above, if you are using an automated or computer-assisted indexing system, you will still want to include the ambiguous phrase “binary systems” as a target for the system to identify. This is because in the content you are indexing, that phrase may appear as is in the context of star systems, or planet pairs, or chemical mixtures, or computers or mathematics. In those contexts, the otherwise ambiguous phrase is clear. If your software allows you to edit the indexing rules, you can stipulate the context in which the phrase should appear.

Text to match: binary system*

IF (AROUND “binary star” OR AROUND “binary stars” OR WITH “star” OR WITH “stars”)

USE Binary star systems

ENDIF

IF (MENTIONS “azeotrop*” OR AROUND “mixture*” OR AROUND “alloy*” OR AROUND “suspension*” OR AROUND “colloid*” OR AROUND “heterogen*” OR AROUND “homogen*”)

USE Binary chemical systems

ENDIF

IF (AROUND “numer*” OR MENTIONS “base 2” OR MENTIONS “base-2” OR MENTIONS “binary num*”)

USE Binary numeral system

ENDIF

 

Marjorie M.K. Hlava President, Access Innovations

This posting is one of a series based on a workshop, “Thesaurus Creation and Management,” that Marjorie Hlava presented in December of 2012.