As we continue the series on search, we are close to wrapping up with a more in-depth look behind the scenes of database management systems.

Let’s take a quick look behind the scenes. We want to connect the database management system to the thesaurus tool so that we can validate the terms and make sure that they are in good shape and, as people are adding records to the database, if they have any suggestions or candidates, we want to lock those in as well.

The thesaurus tool will tell you which terms are actually correct, allow you to add, change, and delete, and otherwise manage the term base. Then the indexing is used to actually suggest indexing terms to records as they are loaded to the database management system. That system can be SharePoint, it could be a content management system, it could be a Documentum or a FileNet, or any other thing you want to use as a repository to manage your data. That is driven by the taxonomy.

Here’s a taxonomy view. We have the hierarchical view on one side and we have the individual term records over here. The reason to maintain them as term records is because that way we have a related term that is attached to this term itself, and narrower terms for that term, and broader terms for that term, so you get the record as an object that you can keep track of.

Where does this subject metadata, or the set of taxonomy terms, go? We are going to apply those to the content itself, and we can do that in the metaname field in the HTML header, or we can connect it to keywords in the SQL database or other database tables, if it is a relational database system.

If you go to View, Source on any webpage it will give you something that gives which DTD for HTML they are using. This is HTML 4 Frameset in English and if your browser doesn’t have it, here’s where it can go get it. What is important here, besides that this is the DTD we’re using, is the metaname keywords field. This is how metadata as a term became popular in our business. What you are hoping is that internally or externally, people are filling in the content in the metaname keyword field. Ideally, on your website these should come from your taxonomy so that they are, indeed, populated and tied to the taxonomy; this gives you much more precise search.

If you are working in a relational database management system, you have a lot of tables, a lot of different kinds of information. Here it is a health database. The taxonomy terms are linked to a couple of places and you can see all of the fields that they can feed to. It’s a primary key, as opposed to a secondary key, and it is called ‘Category’. Here we have Category ID. That is the taxonomy field in this particular database. That is where we are going to dump the taxonomy terms.  In this case, because it is an ID, I have to go get them. Here are all the fields of that taxonomy record.

Another way to do it is if we have some XML database. The record shown below is from the National Information Center for Educational Media. I might have an abstract and title; based on those, I am going to go get some suggested terms automatically. It is going to tell me how many times they are suggested and dump them into the record. Then I go back to my view of the taxonomy and here is the hierarchical view; here’s the narrower term view and here is the related term view for the record. That is how it came up. It came directly from the taxonomy itself.

If I am going to integrate this into enhanced findability, I am going to try to create some faceted navigation or browsable views. Maybe I’ll have smart search to search for term equivalents or synonyms. I might have taxonomy terms that are original or modified that I’ll use as labels in the search. And, I’ll have navigational aids for the user to help with those relationships. I might also want to be able to give searchers spelling alternatives and correct the spelling when they are wrong, with messages such as “Did you mean…” or “I’m searching for this instead …”. ,I might want to give some related concepts, or some statistical information about that metadata – how many records there are that are indexed with a particular term. I might provide some navigation or hierarchical trees and drill down. I might want to offer some recursive steps so that you can search within that search. I might want to give you some concept linking or even a dictionary look-up with a glossary of the terms, so that you can define what those terms mean online.

Next week we will  finish up this series by looking at taxonomies in SharePoint.

Marjorie M.K. Hlava
President, Access Innovations