Once you have developed a taxonomy, of course you want to use it. And before using it, you want (or should want) to test it. And, of course, the use is two ways. You can apply it to the records, for indexing, and you can use it for search, too; both of those applications are use tests at first, and can serve as use tests again for maintenance of the taxonomy. Once you get into the maintenance phase, you have to be able to edit and otherwise change lots of parts of the terms. You might change the status of the term itself, what it is called; for example, you could change the primary term that you use and put some other word in as the non-preferred term. You might want to delete or add a relationship. Of course, you want to add new terms. You might want to move the branches around. All of these things are part of routine taxonomy maintenance.

There are several possible formats for output of the data. You might output it as print. Normally, if you are going to do a print, you do an alphabetic full-term record view and a hierarchical view. Other views are certainly there. If you look at printed thesauri, you’ll find them. Then, electronically, you need to be able to do both those views, and then perhaps create a flat file for implementation and maybe a structured database for application to the intended data.

You want to be able to check for reciprocal postings and make sure that if the broader term is stated, the narrower term relationship is also stated. If it is a related term, you want to be sure that both related terms refer to each other, and so on. All of those kinds of things need to be posted automatically.

You need to be able to output different views and also to up-post to multiple broader terms. There are a lot of content management system (CMS) software packages that do not allow things like related terms, or they allow only a single synonym, or they allow no multiple broader terms. Although content management systems are increasingly popular, if it is going to be implemented, you need to find out whether it allows those things or not. One very popular CMS allows only one synonym per term, and no multiple broader terms and no related terms. Well, after you have gone to all the work of building those relationships and then the CMS does not allow it in the web presence, that is very frustrating. On the other hand, if you go with WordPress, for example, there are several plug-ins that will allow those kinds of things for your taxonomy. Your choice of CMS depends on where you are going with it, but it helps to know what is possible and what is not.

You also want to validate the entries and make sure that they have all the necessary pieces of the term record.

Technologically, there are many places where you can apply this taxonomy. If you build it in a very consistent format and it contains at least the pieces outlined in the standards, then I think you’re in pretty good shape.

As we have moved to online systems, they allow a lot of different points of access, and they do not always deal with structured data (also known as field-formatted data). Most people are now accessing something through the Cloud or at least remotely. They are not just tied to mainframe computers. That means that you have a lot of different users who might want to have access to a single record at the same time. We cannot lock it out. People are always going to have different points of view when they look at these systems.

One of the things about multiple users to a single term: If you have multiple people working on the same thesaurus and they are making changes to the same term record, you might want to be able to lock out that record so that only one person will be making changes at a time. Otherwise, changes might cancel each other out and they go back to look at it and it isn’t fixed. You want to avoid that kind of thing.

Some search systems limit what you can do with your taxonomy. Most of the online systems, but not all search systems, allow multiple broader terms. Some of the search systems do not even use taxonomies at all. Their developers don’t believe in them; they don’t think that they are effective.

A few years ago, Access Innovations did a project for a very prominent publisher to build a taxonomy. They halted the project about half-way through and implemented a new digital management system. The system is really beautiful, but it is not designed to use a taxonomy … don’t need one, thanks; we can take care of things through Bayesian techniques and vector analysis, etc.  So, after two years, the publisher came back to me and said, “We found that search is a whole lot better on the half of our data that has a taxonomy than on the other half.” It is a really nice case study of what actually is a beautiful content management system. They did not think they needed a taxonomy at all, but they found that having vocabulary control really does give significantly better accuracy retrieval. Instead of 40% accuracy they get 85-90% accuracy retrieval. That is a really nice kind of statistic.

Next time, we’ll look at how taxonomies interact with databases and with the Internet.

Marjorie M.K. Hlava, President
Access Innovations

Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.