Every once in a while, the issue of capitalization in taxonomies and thesauri pops up. Some of us in taxonomy land believe that it does make a difference what capitalization (versus lower case) style you use. We just don’t necessarily agree what that style should be.

The National Standards Organization Institute (NISO) standard for controlled vocabularies (ANSI/NISO Z39.19, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies), which was last revised in 2005 and reaffirmed in 2010, has this to say on the subject, on page 34:

It is recommended that predominantly lowercase characters be used for terms in controlled vocabularies… Capitals should be used only for the initial letter(s) of proper names, trade names, and for those components of taxonomic names, such as genus, which are conventionally capitalized. Capitals should be used for all the letters of initialisms or where featured in unusual positions in product or corporate names. Because lowercase letters can occur in unusual positions in proper names, using a combination of capitals and lowercase letters in controlled vocabularies indicates to the user the correct orthography of a term in natural language and serves to distinguish common nouns from similar proper names. 

Example 57: Capitalization of proper and trade names

dBASE IV

DNA

information systems

Information Systems Corp.

NewsBank

[A note about “should”: Per ANSI/NISO Z39.19, page 2, “The conventions used in this Standard to indicate the force of recommendations are: must (required for meeting the Standard), should (recommended), and may (optional). The Standard also uses the conventions must not (not allowed in order to be in compliance with the Standard) and should not (not recommended).” So the NISO standard recommends the practice above, but does not insist on it.]

[Another note: A reconsideration/revision of Z39.19 is due soon.]

Most of this makes sense to me. It’s certainly much better than the solid caps default of early machine-readable taxonomies and thesauri. From what I understand, they were completely capitalized because of technological limitations and space-saving considerations. Have you ever tried to browse the printed records of those vocabularies? They’re horribly difficult to read.

At the same time, the readability issue is what makes me object to Z39.19’s recommendation. Specifically, I have problems with the terms that begin with lowercase letters. They sort of merge with the line above, rather than clearly being separate terms. They don’t have the visual boundaries that capitalization can provide.

I do appreciate the rationale of Z39.19 that “using a combination of capitals and lowercase letters in controlled vocabularies indicates to the user the correct orthography of a term in natural language.” And I know fellow taxonomists who strongly agree with the lowercase-unless-it’s-a-proper-noun approach. For a general controlled vocabulary that serves as a reference for how terms appear in natural language, that dictionary-ish approach kind of makes sense.

My take on that, though, as far as taxonomies and hierarchical thesauri are concerned, is that taxonomies and their kin are more like outlines than like dictionaries, and outline items are capitalized for clarity, to indicate where new items start. Moreover, most traditional dictionaries have the visual benefit (for our purposes) of tiny text filling up the distance between terms, whereas in hierarchical taxonomy displays (which are generally the most useful views), the terms appear on consecutive lines. And indents can confuse things even more if terms are lowercase; the narrower terms look like runover lines.

Taxonomist Heather Hedden has written a blog post on the subject of capitalization in taxonomies. She views initial capitalization of terms as analogous with capitalization style in headings:

A “taxonomy” implies a hierarchical classification or categorization of concepts. When we think of categories we think of labels or headings with subcategories. Headings in general tend to have initial capitalization or title capitalization. Thus, if it’s a strictly hierarchical taxonomy, where all terms are interconnected into a single hierarchy or a limited number of hierarchies, then it will more likely have initial capitalization or title capitalization. Such capitalization is particularly common on the relatively smaller/less detailed taxonomies that are proliferating on websites, intranets, and content management systems. It fits in with the web design style of capitalization on headings and categories.

As Heather points out, initial capitalization is a fairly common practice, despite Z39.19. I think she’s referring mostly to initial (letter) capitalization of the first word; I haven’t seen that Title Style is common at all. (In fact, I don’t remember seeing it at all in a taxonomy.)

I have seen modern taxonomies and thesauri that have solid (every letter) capitalization on the just on the top level, to indicate major categories. These are usually just category designations, rather than indexing terms. Heather comments: “A good application of the mixed capitalization style is if the top level terms were not actually to be used in indexing/tagging but are really just categories/groupings of the actual index terms, which in-turn are arranged hierarchically underneath.”

Ultimately, it’s up to the taxonomy owners to determine what style to use. (And might I remind you, a reconsideration of Z39.19 is due soon.) The main factors to consider are readability, clarity, and usability.

By Barbara Gilles, Taxonomist
Access Innovations, Inc.