There’s a lot you can say about Access Innovations, but if one thing’s for certain, it’s that we’ve built a lot of controlled vocabularies over the last forty years. We’ve seen even more, in all shapes and sizes, some good and some not so good. We’d like to think that every organization would be mindful of controlling their vocabulary and using standards to dictate how concepts should be organized, but we know better. We know that a large portion of organizations don’t think taxonomically and, instead, push their concepts together in a way that may suit them, but is difficult if not impossible for others to use. That’s specifically why Access Innovations is around; we can take care of this problem.

To this end, we’ve (not me) put together a fun thesaurus, the Horrible Thesaurus, to illustrate some of the many pitfalls that can befall an improperly controlled vocabulary.

Note: For entertainment and educational purposes only…not for use in production.

Starting at the very top level, we can already see problems:










You can probably see the initial problem begins at the very start (shown coincidentally in our own Data Harmony software). No question that the top level of your thesaurus will house your most general concepts, but “Other,” “Stuff,” and “Things” are so general that they don’t really count as concepts at all. They could literally be anything, plus stuff and things are basically synonyms and it doesn’t describe anything about your subject matter. All it tells me is that you don’t know how to build a thesaurus.













As we dig a little deeper, the problems are revealed to be even worse. Starting from the top (and I’m only hitting the highlights here), each concept under “Other” already exists under the other two branches. This begs the question of its existence, aside from the fact that they both have “other” in them, which isn’t ever a real reason to do something.

One other important thing to note in this branch: always keep term consistency as one of your first principles. At Access Innovations, we’re pretty picky about standards (ANSI/NISO Z39.19 for us), so things like the All Caps construction of “OTHER THINGS” and “Stuff (other)” are pretty gross to us and would rarely if ever form terms in either of these ways. And while we believe in the standards, consistency is the important matter. If your organization wants to form terms differently (read: wrong), that’s fine; just pick a construction and stick with it.













As we dig deeper, the issues intensify. The term formation issue persists, but more importantly is the inconsistently thought out concepts. Some of this is okay; I can easily see Shirts and Suits and Skirts as Narrower terms under Clothing. What I can’t imagine is a scenario in which someone searching for clothing would think about the item’s color before it’s type. So why Blue and Red clothing. Caps does feature both Blue and Red hats, so at least you have both options, but then you’d have to add all the colors to complete the thesaurus. Good luck.

Worse is the pictured examples, which definitely has its problems. Shoes could easily just be removed from the term, as Clogs and Sneakers are both clearly types of shoes, but the real issue is the highlighted term. Term formation aside, Uggs is absolutely a shoe brand, but it’s not exactly a “type” of shoe (it is “Stuff” though, I’ll give it that). On top of all that is the problem of listing brands under the thing the brand is. Not only is it not a type of shoe, and that you can also have a purse branded with Uggs, if you wanted to make a complete thesaurus at this point, you’d be forced to add all the brands of shoes out there, which would make this thesaurus unusable…to say the least.

A better method would be to make a relatively flat list of brands that you could map to the concept thesaurus…but that a more involved blog post for another time.

(Also, the copyright symbol in the term record is also terrible…just saying.)












The final important example is here in the Things branch. Sushi is a very good example of why subjectivity should be eliminated entirely from the thesaurus building process. Now, while the end result of the Big things branch is pretty hard to argue with, the process for getting there is as ridiculous as a thesaurus branch can get, not to mention that the same type of thing can be different sizes, so there are some issues with this branch

A more interesting example of this is the pictured bottom level term. Sushi definitely belong in a thesaurus, I have no argument there, but this is an interesting case of why subjectivity is a terrible idea. Notice in the term record that Sushi has three broader terms. Access Innovations fully supports polyhierarchy, no doubt about that, but only one of these broader terms is usable. Sushi is, in fact, an edible thing. Even though it’s a dumb term, I can’t argue with it specifically. The other two terms are opposites of the other, which is ridiculous, of course. I know people who hate sushi and think it’s disgusting and that’s fine with me. I happen to think that it’s fabulous. Every food could be both, so its uselessness should be apparent.


Stick with nouns. I’ve built adjective thesauri myself…I do not recommend it.

One actual final thing:












No matter what you think Things, Nerdy might mean, don’t ever put this into a thesaurus…Jack.

Daryl Loomis

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.