The workings of a taxonomy or thesaurus in a database or website can seem mysterious. Let’s take a look behind the scenes.

First of all, we need the taxonomy or thesaurus in digital form, either as a separate file or as it exists in a specialized software application. The screenshot below is from the editorial user interface of a thesaurus software application.

The left panel shows a taxonomy view, the hierarchical view. The right panel shows a term record. We have the broader term, the narrower terms, status, related terms, and then other stuff – synonyms, history, scope notes, and so forth. A lot of information stored as an object. This is the object, being “Heating, cooling, and ventilation”. In this case the object is the term and everything focusing around it.

Now let’s look at how the thesaurus terms get connected with a website. (We’ve seen some of this before, in connection with the discussion of metadata.) Go to a site, choose view, select the source view, and you’ll see the view store. It will look something like this.

Here you can see the meta names field for the view. Not all sites have them, but many do.

If you were to do this in a relational database, you would put your taxonomy terms in a table somewhere. You would need to be sure that they are related to the primary key or main records, so that you have them linked to the records.

It doesn’t matter if you are in an object system or if you’re in a relational database management system (RBDMS). You want to have a place to put those terms. Be sure that the IT people give you a place to put the terms.

In object-oriented code, it would be a very similar kind of model. You want to be sure that the data transfers over.

You want to define the terms and their connections in the relational database. In the various relational database models, you have a lot of options as to how to do those things.

You might have an XML-based database system, in which case you can put in new text and have a way to suggest the terms automatically and add them to the system.

If you look at this site, you see the hierarchical list that comes from the hierarchical list of the taxonomy.

You might see that the narrower terms in the term record becomes the narrower terms in the search interface, and that the related terms from the term record will also be posted in the search interface. You can see that you can do a fairly direct connection of the two.

You want to integrate that taxonomy so that you can enhance the findability of the terms. You want to use them as labels in search and also use them in tagging the records behind the scene.

If you attach the taxonomy terms to the record, load them into the search system, and then use a variation of that same taxonomy on top of the search system, you are using the taxonomy to search and you are using the taxonomy to tag. Then when you do the search, you get better results.

It could be in a relational database management system. It could be that you use MySQL, for example, as your search software, or you could be using Lucene or Autonomy, or you could use Google, but you attach that taxonomy term to the term record. Then you put the taxonomy terms in the inverted file for search. Then when you choose a taxonomy term on the user interface, it goes to that inverted index and pulls back the appropriate records.

Here’s a workflow diagram that might help to clarify things.

You might have a lot of raw data that you put into a data repository. You’re adding the taxonomy terms to the records in that repository. Then that repository could be stored as an SQL file for e-commerce. It could be stored in a repository.  It could be stored in a search system, and you might or might not use a presentation layer to do that search. So, from the repository where you have added the terms to the records, you can spin it out to all of these different places to put the records. You don’t have to, but you could.

So, as you can see, you could use the same set of taxonomy terms lots of places in your website. There are lots of things you can do with taxonomies.

Marjorie M.K. Hlava, President Access Innovations


Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.