We index, or “tag”, using the taxonomy to improve precision. In part, we are doing that by defining the scope of the term that we use. So, we have a term that represents a concept but that term might be used for several concepts. We are defining in which way we are using the term ‘mercury’ within this corpus of information. We define the scope of the term, and that improves the precision.
We can also improve recall: getting everything in the entire corpus that has to do with that concept by allowing the use of several different terms with the same concept. This is where the synonyms in a thesaurus come into play. If we can apply all of the synonyms, we can capture the concept of whether it is a couch or a sofa, davenport, settee, or multiple seating unit or modular seating, or whatever they want to call it. We can gather all of the stuff that has to do with that concept, even though it has been called different things over time. That’s very important, particularly if you are looking at e-discovery and similar technologies, because we need to be able to gather together that information and assure the users that they are not missing anything. It does not matter so much in a regular search, but if you are searching for information for a dissertation, you sure don’t want to find out two years down the road that someone else has published on that topic and you did not find it in your search. If you are looking to do a patent, you don’t want to find out that the technology is already patented after you have gotten investment capital and built a whole firm. If you are in a litigation situation and you have done your e-discovery and then someone else finds the smoking gun that your system missed because it did not have enough synonyms to uncover that particular damning email, then your system has really failed you. So, having a lot of different terms for the same concept is extremely important. My rule of thumb is that you have at least 1.5 synonyms for every term in the thesaurus. So, if you have 500 terms, you need 750 synonyms. This is a rule of thumb for how rich you want that taxonomy to be.
A taxonomy with effective indexing terms is a guide to a field of expertise. If you do a lot of taxonomies, you find that you have a lot of knowledge about that field; you can sling the lingo very effectively across the entire field. I have frequently – as I bid a taxonomy – gone to someone’s annual meeting to present it. For instance, I went to the American Society for Radiologic Technologists – they are the people who do x-rays. If you have ever had an x-ray of an arm or any of those kinds of things, somebody who is a member of and certified by the ASRT probably ran the test. Anyway, I went to the ASRT meeting and I really had a great time. I could talk with all those guys and nod and understand all those terms. I did not really know the science behind it, but I certainly knew the terms and how they fit together. Building a taxonomy is a fun way to learn about a field and to be able to talk about it.
As taxonomists, what we bring as taxonomists as an area of expertise is a way of sorting their language into a structure. And, it does not matter that we are not the subject experts, because we are going to reach out to those people to help us with the taxonomy. But, we are going to be able to understand what they are talking about. That leads to richer expression within the database that is indexed with the taxonomy.
Marjorie M.K. Hlava, President Access Innovations
Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.