Once you have a tentative list of terms for your taxonomy-to-be, it can be a bit overwhelming, especially if you have thousands of terms. (In scientific and technical fields, a typical taxonomy might have 5,000-10,000 terms.) How to start dealing with them all? A good first step is to organize the information into main categories. Choose some logical main areas, and don’t be too concerned up front about whether or not the areas or the initial wordings are exactly what you’ll ultimately want them to be. You’re just roughing things out at this stage. Then use those main areas as buckets, and dump the more specific terms into those buckets.

In the process of doing this roughing out, you begin to notice the structural relationships between the terms. You notice other other possible groupings. That gives you a way to develop the structure and organize it into sortable categories.

Go ahead and bravely take advantage of those discoveries. Where something is a sub-set of another term, you can make it into a narrower term. Remember that you can always change things later. (Don’t make term relationships too complicated at the beginning, though, or you might find yourself having to do some major untangling later on.)

In looking at terms that fit in the same general area, you’ll see terms that are closely related conceptually, but not hierarchically. (More about that later.) You may want users to be aware of a particular term when they’re looking at a conceptually similar term that’s in a different area of the taxonomy (such as Hiccups under Medical conditions, and Hiccup cures under Treatments). If you’re creating a thesaurus, and so can indicate associative relationships, this is where to create them.

Synonyms and quasi-synonyms – things that are near-synonyms – not identical – you know it’s not identical but it’s through that general sphere and you don’t want to go quite to that depth, so you roll synonyms and quasi-synonyms or near synonyms together. Try to find a term that covers the general concept, and use the other terms as narrower terms (if appropriate) or as synonyms. If non-preferred synonyms are not exact synonyms, they should be more specific in meaning than the main term. Otherwise, your indexing will be inaccurate.

The standards say that it’s okay to use non-indexing terms for term grouping purposes, but there is nothing more frustrating to a searcher to see a term that is exactly what they want and learn that nothing is indexed with it in the taxonomy. So, I don’t recommend including non-indexing terms in your taxonomy, even though it is allowed in the standards.

Look for areas that are missing. Look for gaps, where you have big conceptual leaks in the taxonomy. That’s the kind of situation for which you may need to get up and walk around the block before you go back to the taxonomy and think about it again.

As you go, flesh out the records.  When you have a thought, jot it down or type it in, because going back to that thought later is really tough. You just forget. Your mind is into 16 different relationships and you have one more thought. Just put it in when you are thinking of it.  Don’t try to remember it. You’ll be richer for it.

You can do that fleshing out in the scope notes or other places, the equivalence relationships, synonyms, as you think of them. As for multiple broader terms, or polyhierarchical relationships, add those as you think of them. You might take them out later; that’s okay.  It’s easier to clean them up later than it is to think about them after you have been away from that part of the taxonomy for a while. The same thing with associative terms and those relationships – if you can add them, then do it.

Once you have done those things, you have at the very least an alphabetical list of terms.  You have term records that you can begin to play with, and then you can begin to analyze them.

There is one way we use to figure out whether something should be a narrower term or a related term. We call it the All and Some Test. In the illustration above, all squirrels are rodents; some rodents are squirrels. Squirrels fit within the Rodent category. Some squirrels are pests; not all squirrels are pests. So, that’s what’s commonly called a “related term”, with an associative relationship. If one of the terms does not 100% neatly fit underneath the other term, then it is an associative relationship.

At some point in the development of your taxonomy or thesaurus, you’ll want to gain some perspective on it. Next time, I’ll discuss producing and obtaining different views.

Marjorie M.K. Hlava
President, Access Innovations

Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.