Taxonomies, Databases, and the Internet

The Internet has allowed us a lot of website builders and a lot of contributors. Basically, they are all equal as far as opportunity to participate is concerned. Largely because of this openness to website builders and contributors, the Internet is growing rapidly. It has become too big for people to find what they want reliably, so it needs us. It needs knowledge management.

When you implement a taxonomy or database on a website, one key consideration is the design of the user interfaces. Interfaces can provide a view of the underlying structure, and they can change those views for different user groups. They are limited to the size of a CRT or LED or some type of screen. It needs to be pretty simple for large audiences. When you look at the user interfaces, unless they are tailored to a pretty unique group, they are generally watered down quite a lot. You want to provide as sophisticated a back end as you can, so that you can make the front end very simple.

There are many different kinds of databases, as shown above. They operate differently. Numeric files are generally not transactional databases and generally not something that we would apply a taxonomy to. On the other hand, textual databases – full text or bibliographic – are something we would apply a taxonomy to. They are field-formatted, or structured, databases, which is what enables us to apply taxonomies in those instances.

Object-oriented databases are perfect for taxonomies, because a taxonomy is really an object-oriented system itself. It is the object – the central object. The sort key in a taxonomy is the term, the primary term – the preferred term. So, object-oriented databases and taxonomies have a great deal in common.

Nowadays we have lots of multimedia, and people want to index all of their videos and images, and they don’t necessarily have any captions. That area is not something that is so easily done by a text-oriented taxonomist. It requires some additional activities. You can talk with multimedia database specialists about that.

There are millions of websites and a lot of rich content that covers all kinds of areas. We want to be able to control as much of that for our users as we can. The way that we do that is through vocabulary management, along with indexing individual documents. (We also use other methods that are appropriate to individual situations.)

The purpose of a taxonomy is really to translate natural language terms into a common set of terms that can be used across an entire knowledge domain. We really want it to be done consistently, time after time, so that the same indexing terms are applied under the same conditions every time. If we can eliminate what we call editorial drift (the natural variation on how indexers are going to apply terms), then we will get much better information retrieval for people. Part of that is that if we can indicate the relationships between information, including semantic relationships or word relationships, we can infer conceptual similarities and help people get to where they want to go. The centerpiece of this process is retrieval of the information using the taxonomy as a search aid.

Next time, we’ll consider several answers to the question, “Why do we index?”

Marjorie M.K. Hlava, President Access Innovations

Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.

Taxonomies, Databases, and the Internet