Using a ‘Collabulary’ to Create a Taxonomy

September 13, 2010 – Libraries and librarians have been the gatekeepers to knowledge stores for more than 200 years. As their collections grew, they invented ways to easily find the information and knowledge they stored by creating classification systems and then subject headings to identify the concepts or topics represented in the items being stored. Every major language now has at least one classification system, and most countries have created and adopted classification and subject access systems, such as the Universal Decimal System (UDC) or Lenin’s outline of knowledge for Russia. In the United States, the use of the Dewey Decimal Classification system, Sears Subject Headings, and the Library of Congress Classification system is widespread.

With the ubiquity of computers and easy access to information afforded by the Internet, gate keeping focusing on a single item has diminished, but the need to find information that suits a person’s search query has become ever more important. The ocean of information is deep and rich, but very difficult to navigate. Google has created a false sense of security in the typical Internet user’s mind. “I can just Google it,” people think, and they do and then they get millions of possible results ranked according to topic relevance or the level of confidence that any given result will satisfy their request.

More and more staff time is being spent this way. On average, office workers now spend 35 percent of their time — nearly three hours in an eight-hour work day — using their computers to look for information. This is incredibly expensive.

To compound the information management problem, many organizations continually lose bits and pieces of their knowledge base because workers take valuable information with them (in their heads) when they retire or leave to take another job. Baby boomers worldwide are beginning to retire in droves and organizations of all types are downsizing in response to economic pressures, and the combination of these factors is leading to a near crisis in information availability and access. Knowledge management — capturing information before workers leave and documenting what was learned about a process or project before starting the next one — is a partial response to this problem. But these things are very hard to do, not to mention boring. When you complete a project, you want to move on to something new, not document what you’ve already done.

Serving as a Backbone

What if people could document what they learn as they go? What if the information and perhaps even the knowledge they are acquiring could be captured as it is being learned and applied? What if a group could use a social networking application to share and capture information and knowledge created throughout the life of a project?

I believe that a well-formed taxonomy, serving as the backbone of, say, a knowledge management or information management or SharePoint repository, can enable this capture of knowledge. If such a taxonomy were implemented — and that is a big if — it could help the remaining staff members in downsized organizations make the most of their working hours by being more productive.

A taxonomy is an outline of knowledge. A taxonomy can form a site map for a Web site or a view of the contents of a body of work. It is a set of nested terms, with parent-child or broader-narrower relationships, and is the hierarchical or broader-narrower view of a thesaurus.

Until recently, taxonomies have not been used much in relation to “tagging” social media, but this is changing rapidly in response to evolutions in metadata and searching. Historically, taxonomies have been used to tag articles and other textual content as part of a controlled vocabulary. Taxonomies indicate the concepts in a document and, as part of a thesaurus or controlled vocabulary, form the basis for database keywords and controlled descriptors.

Taxonomies can be created by groups or by harvesting information nuggets as they are contributed. A few organizations are already leading the way in these types of implementations. They use taxonomies in publications, SharePoint repositories, author submission systems, Web sites, search queries, records, inputs to the search system, browsable search trees, displays for browsing, and other processes and applications.

Taxonomies can be used to discover trends by analyzing the terms in an information corpus over time. Instead of using (often irrelevant) co-occurrences in huge batches of text, organizations can use a taxonomy to hone in on the data and the concepts they represent. This provides a much deeper understanding of a data set or a streaming feed of data. A large set of records can be automatically indexed using various taxonomies and then compared for specific knowledge domains. The data can be graphically displayed in many ways to show the concentration of information and how it moves or converges over time.

“Okay,” you say, “I’m convinced I should have a taxonomy, but what kind of work is involved in this task? It seems like a major mountain must be climbed to create and implement a taxonomy!”

There are two ways to get started. One way is to hire a firm to do the work. For some organizations, this approach works well — it gets you off to a fast start and provides you with a guide to the implementations and hooks. The second way is to do it yourself. In either case, you’ll eventually have to take over the work for your own organization. Since some organizations cannot hire outsiders, let’s concentrate on some ways you can easily begin creating and maintaining a taxonomy in your organization.

First, you’ll need to find at least one wordsmith (and maybe several). Some people love words — their meanings and the variations in the ways they are used. For others, words are simply a communication tool. To build a taxonomy, you must capture the kinds of things people do and talk about in your organization — the concepts that describe their work. The wordsmiths can work with the concepts and can perceive and create the relationships needed for a successful taxonomy. Others can add valuable contributions around the concepts.

Enticing Participation

People are increasingly using Web tools such as wikis or purpose-built software to build a collection of terms to use in tagging. In these efforts, the wordsmiths take the lead; others join in when they see the worth of the tagged items or have items to contribute. The method I’ve seen most often for contributing to vocabularies is a blog or wiki monitored by an editorial team that evaluates each item to ensure strict control.

I’m going to use an increasingly common term, “collabulary,” to describe this process and envision it as something much more loose and fun. The lead editor (the taxonomist) visits the wiki or blog occasionally to make sure it’s not getting too much out of hand. The development process is something like that of Wikipedia articles, where the participants (if there are enough of them) tend to keep things in line by correcting and improving on each other.

To entice participation, the main topic could be somewhat whimsical or interesting — for example, people could contribute vocabulary items pertaining to their home towns. This would tie in with user contributions on other topics to form a community-developed thesaurus and provide various possibilities for associating (or not) with the registry.

The purpose of the taxonomy and the end product will help determine the technology that is used to develop it. Suggestions without a related discussion (not recommended) would require nothing more than an input form that produces a table of information that could be displayed as a hierarchy (automatically or as the source of an import). However, if we want to encourage a discussion, then a discussion forum that organizes discussions in threads (perhaps one per term) would work. Entry of vocabulary items into the thesaurus would probably be performed manually; the hierarchical display of the product from the resulting thesaurus could be dynamic. See the forum for suggestions to the ASIS&T thesaurus, which has a little bit of discussion in the “Terms in the ASIS&T thesaurus” section.

A wiki could serve the same purpose, particularly if each page is dedicated to a term. Links to broader, narrower, and related terms might be easier to display. Users will need to know a little about formatting, but wikis may be common enough to negate that problem (or a moderator could clean up pages submitted by novices). Page templates could include sections for scope notes as well as general discussions and term relationships. A form could be created that would produce an initial page as well as structured term data for import into the hierarchy. Discussion items could be added, manually reviewed, and acted upon by the taxonomist in charge of the collabulary project.

For either a forum or wiki, the hierarchical product could link to the thread or page on which the discussion about the term appears. The initial discussion would probably just take place among your own team.

You should consider starting with an existing thesaurus, even if it’s just a skeleton of what you expect the end product to look like. Many of them are publicly available. The U.S. government has produced some high-quality thesauri that can be adopted in toto or piecemeal, depending on the subject area in which you’re interested.

Start small, organize the information in one area, and build out from there. Librarians are often wordsmiths and are well suited to this kind of work. Many resources, including standards, books, Web sites and wikis, are available to provide guidance. The new Taxonomy Division of SLA has information and members available to discuss the project.

Margie Hlava
President, Access Innovations

Previously published in Info Outlook, July-Aug 2010.

Using a ‘Collabulary’ to Create a Taxonomy