dolls

Photo by Fanghong, https://commons.wikimedia.org/wiki/File:Russian-Matroshka.jpg / CC BY-SA 3.0

As most readers of this blog know, taxonomies are controlled vocabularies in which the terms are arranged hierarchically. Terms representing the broadest concepts are at the top level, and terms representing more specific concepts within those concepts are placed at a deeper level. The result is a vocabulary structure containing increasingly narrower terms, with “narrower” referring to more specific concepts. And where you have narrower terms, you inevitably have broader terms. It’s partly a matter of perspective: Going deeper into a taxonomy, the terms get ever narrower. Turn around and go back up to towards the top, and they get ever broader.

These “hierarchical relationships” are discussed in section 8.3  ANSI/ NISO Z39.19-2005 (R2010) (“Guidelines for the Construction, Format, and Management of Controlled Vocabularies”), which starts with this statement: “The use of hierarchical relationships is the primary feature that distinguishes a taxonomy or thesaurus from other, simple forms of controlled vocabularies such as lists and synonym rings.” (Regarding the implication that all thesauri are hierarchical, remember that practically all modern thesauri are hierarchical. However, there are other things, such as equivalence relationships and associative relationships, that distinguish them from other taxonomies and from other controlled vocabularies in general.)

Logical hierarchical structure is essential to enable taxonomy users to browse and navigate the taxonomy. In addition to the eventual users of the “finished” taxonomy, these users include taxonomists creating, developing, and maintaining the taxonomy, as well as the indexers who hunt for and apply the most appropriate taxonomy terms for the content being indexed. There are also technical considerations, such as “rolling up” of the narrower terms to their respective broader terms for such purposes as customized RSS feeds. Creating a usable hierarchical structure is partly a matter of choosing good terms for the top levels, to serve (ironically) as a foundation for the structure. After that, it’s largely a matter of choosing good narrower terms for the broader terms, and of choosing good broader terms for the narrower terms.

ANSI/NISO Z39.19 discusses three types of hierarchical relationships: the generic relationship; the instance relationship; and the whole-part relationship. These are simply different aspects of the more general – more specific pairings that are the one overriding principle of hierarchical taxonomy structure.

The instance relationship is straightforward: the narrower term is a specific instance (“instance” being a one-of-a-kind kind of thing) of the broader term. Borrowing from the Z39.19 example:

Mountain regions

.. Alps

. . Himalayas

With whole-part relationship, the name is perhaps self-explanatory. If not, some examples adapted from Z39.19 should be.

Nervous system

. . Central nervous system

. . . . Brain

. . . . Spinal cord

Canada

. . Ontario

. . . . Ottawa

. . . . Toronto

In the generic relationship, the narrower term “is a type of” whatever the broader term represents. For instance, a parakeet is a type of bird. Using the customary visual principle (which, confusingly, not all taxonomies follow) of going from left to right as the concepts get more specific:

Birds

. . Parakeets

The classic test for the appropriateness of a generic relationship is the “all-and-some” test. In the example above, in going from broader to narrower, are some birds parakeets? The answer is yes. Now, going in the other direction, from narrower to broader, are all parakeets birds? Again, the answer is yes. Our example passes the all-and-some test.

Now, let’s mess things up (for illustrative purposes only, of course).

Pets

. . Parakeets

Again, going from broader to narrower, are some pets parakeets? The answer is yes. So far, so good. Heading back up, are all parakeets pets? Not yet. There are still flocks of parakeets out in the wild. (No, “a lot of them are pets” doesn’t count.) The example above fails the all-or-some test.

Most of the mistakes I’ve seen in taxonomies and thesauri have to do with hierarchical relationships that would fail the all-or-some test. It might seem like an academic thing, but what it comes down to is checking the logic and predictability of the hierarchical relationship pathways. A term in the wrong place is likely to be overlooked, and it may cause inappropriate or missed search suggestions, as well as problems with RSS feed content.

With many scholarly and scientific thesauri, the terms aren’t well described by the types mentioned above. With disciplines of study, it’s more a matter of nesting sub-disciplines within disciplines. The all-or-some test may still be useful, but you might need to mentally preface each term under consideration with “the discipline of” or “the study of”.

One more thing: Polyhierarchy is good. If you don’t take advantage of an opportunity to pair up a term with a logical term, only because it already has a broader term, you’re limiting the pathways by which the term can be discovered. And you’re limiting the subject area overview and insight that a more complete set of narrower terms would provide.

And now a quiz for you: What’s the opposite of logical hierarchy? The answer: Taxonomic anarchy!

Barbara Gilles, Communicator

Data Harmony is an award-winning semantic suite that leverages explainable AI.