When will it be finished?

November 7, 2011  
Posted in Access Insights, Featured, Taxonomy

Is it final?  Finally!  I recall Rex Harrison playing Pope Julius II, shouting up to Charlton Heston as Michelangelo, something like, “When will it be finished?”  To which Michelangelo-Heston replied enigmatically, “When I’m done!”  Michelangelo spent most of five years (1477 to 1480) on his back painting the Sistine Chapel. The Pope’s impatience was not without justification as he was financing the project while trying to recapture former Papal territories filched by the Borgias.

The same question is often asked of taxonomy and thesaurus development, “When will it be done?” The answer is, “Never!” which is less enigmatic than Heston’s reply, but not very palatable to those Popes trying to pay the bills while the Borgias (read: Barbarians) are at the gates. How do you know when your taxonomy building efforts are done, is a legitimate question and concern. The tart reply “Never!” really means your taxonomy needs to be kept up to date. But you still need to know the criteria for releasing it for widespread use. At some point you will want to actually use it to classify and label content and to use it to guide navigation as is demonstrated at our MediaSleuth website.

Taxonomists would answer the same as Michelangelo that a thesaurus is done only when they say it is, but you can push them to provide a go-to-market date. Access Innovations, Inc., must do this all the time as the taxonomies and thesauri we build are done as a service for our customers. How do we determine if it is ready for prime time? To a large extent it is a chicken-and-egg scenario. The best way to know if a thesaurus is ready is to use it. We test them by indexing (labeling or classifying) sample content. The content objects used for testing should be taken from the corpus the taxonomy is designed to conceptualize. A taxonomy/thesaurus is an organization of the concepts of a subject area or domain. It is a
knowledge organization system (KOS), which can be used to represent a field. By using it to index content from its own domain, we can easily determine how representative it is.

This is an iterative process whereby we index content, review the indexing, and augment and adjust the thesaurus. The sample content needs to be representative of a wide spectrum of the domain being mapped by the thesaurus. The sample size needs to be large enough to truly test the breadth and depth of the thesaurus. Breadth and depth are important criteria. We look for content objects that did not receive any indexing. Was the item outside the scope of the domain or do we need to add terms to the taxonomy? Were the terms assigned too broad? Do we need to add more precise terminology? More precise terminology usually translates into narrower and narrower terms in a hierarchical structure. Have we captured enough synonyms that point to the preferred term? We also mine the sample content objects for additional terms for the thesaurus.

Are the thesaurus terms appropriate for the intended audience? Again, a fast and cost effective approach to this question is a comparison of the chosen taxonomy terms to the content. If time and money are available, surveying the intended audience is ideal, but this is costly. The language of the audience is usually found in the content they read and can therefore be used as a proxy for user surveys.

Exceptions must be taken into account. A good example is health information. Using Internet resources, lay-people are doing as much research as scientists. When they start their research scientific names are often foreign to the layperson. By the time their done, they know the terms well. To help them get started a good thesaurus will capture both scientific terms and more generic, common names (e.g. aspirin vs. acetylsalicylic acid). The choice of which will be preferred and which will be a synonym, or non-preferred, depends on the target audience. A website of health information for the lay-public would use the common name as the preferred term. A researcher’s portal would lead with the scientific term. Both should have synonyms, or non-preferred terms, that provide lead-ins to the content.

Subject matter experts (SME) can be used to evaluate the efficacy and completeness of your taxonomy. Our experience suggests SMEs be brought in when you think your done. They are great at pointing out that you’re not done! Bringing them in too early is a waste of their valuable time. SMEs are expensive so you need to use their time sparingly. Your time is expensive, too, so you don’t want them slowing you down. SMEs in these circumstances are best used in a reactive mode. They will spot missing concepts and help with overall organization. They are quick to suggest the newest terminology. Bring them in too early and they will debate terms and placements until it makes Michelangelo’s Sistine Chapel fresco seem swift.

Reviews by SME’s, target audience members, and sample indexing efforts all contribute to the go, no go decision. At some point, you must go live. The initial feedback will be heavy at first, but as adjustments are made, things smooth out and calm down. A good practice is to set up a feedback loop to make it easy for users to comment, complain, and compliment. Set up an email addressed to the taxonomy team rather than an individual member of the team. Place a reply, or comment, button at prominent places around the taxonomy to make it easy to send messages to the taxonomy team.

From hard earned experience over 33 years we have learned these lesson. It would not be possible to test a taxonomy by indexing sample content, if we did not have our Data Harmony® indexing solution to automate the process. This makes testing the thesaurus through actual use economically feasible. A thesaurus should be continually renewed and feedback is an important part of the process. Again, lessons learned, we have added to Data Harmony Thesaurus Master®, our tool for managing complex controlled vocabularies such as taxonomies and thesauri, the ability to suggest candidate terms and to comment at the term level. Through the use of various levels of permission, these collaboration features can be made available to interested users without compromising the integrity of the thesaurus. Members of the target audience can provide direct feedback without having the ability to actually make changes to the taxonomy.

While your thesaurus is never done – and should be never done – there comes a time when you can – when you must – start using it. Let your audience see it and then embrace the feedback. It will only get better. Michelangelo finally had the scaffolding removed and people marveled than and still do today. (A tourist tip:  When visiting the Sistine Chapel you can save a lot time by going in the backdoor!)

Jay Ven Eman, CEO
Access Innovations