A taxonomy provides a way to describe content. It is a source of conceptual descriptions for filling the subject metadata field in database records. When you look at a metadata schema, regardless of who built it, there is some field or element specified for the description of the concept represented in the item. It might be called the subject, descriptor, keyword, or something else. Dublin Core is one of the well-known metadata schemas. (To my mind it is not fully a standard; even though you will find my name on the standard for Z39.84, I still think it is just a guideline.)

Metadata is data about data. When you apply the taxonomy, you are using metadata to describe an object, usually content, but it could be any object, digital or otherwise. It is not the information itself; it is only a description.  Metadata could be, for example, all those bibliographic citation pieces – the pieces that we find when we are making a catalog card or when we are making a bibliography or making a reference list. It is the information about something. It might not be just about an article. It might be about a memo or an email, or it may tie into that content retention schedule for records management. It might be describing something in a museum collection or something on the shelf in inventory at a lumberyard. You might be in the grocery store and look at those metadata signs on the aisles that say ‘Dairy’.  That’s metadata for the grocery store and it helps you locate what you need.

If you are going to build a taxonomy, you need to define the subject of the field or discipline or domain. At the beginning, what you might find is that you get rather general concepts as subject terms or topics, some of which may be peripheral to database or whatever. You want to decide on the core. What is the knowledge domain that you want to describe within this particular taxonomy? You need to get what is core and what is marginal. So, you define the core and see how much of those peripheral fields you need to get in it. Sometimes it is pretty difficult to make sure that you keep to your core.  You need to keep asking, “What is it exactly that I need to cover?” You are only going into those peripheral subjects as deeply as you need to do to support the core.

If you are talking about fire, you might talk about the techniques of firefighting, but you won’t need to talk about all the techniques of fighting all of crime. You only need to talk about the techniques that are involved in firefighting.  You talk about the hoses for carrying water. You don’t need to talk about all of the substances that have ever been used to make a channel for the movement of water; you only need to talk about moving water for this specific purpose. You may need to bring your taxonomy reviewers and subject matter experts back in to focusing on the core, especially when they are experts in one particular area.

Along with focus on the core, you need to consider the level of specificity needed. You might have geography as an area that affects your field. However, it doesn’t mean that you need to go down to every street in every municipality in the world to cover geography. Maybe you just need to get to the continent level or the country level; maybe you only need the countries in a particular continent.

Similarly, for document types, you may need only to say that an item is a memo, or an email, or a white paper, or a journal article. Or, you might need to go to the level that retention management people do, to identify who wrote it, what the subject is, what other subject it impacts, and how long it is going to be kept.  It’s a matter of detail; how much detail do you really need or want?  If you are building a journal article database, then you probably don’t need document type metadata.

Next time, we’ll go into more detail on how to create a taxonomy.

Marjorie M.K. Hlava
President, Access Innovations


Note: The above posting is one of a series based on a presentation, The Theory of Knowledge, given at the Data Harmony Users Group meeting in February of 2011. The presentation covered the theory of knowledge as it relates to search and taxonomies.