October 11, 2010 – You’ve just read the title and already there is trouble. “Classified” has multiple uses. What is meant here? When discussing homeland security, one might conclude that the security level of documents is under discussion. Assigning security levels to documents provides security classifications. Organizing a plant or an animal into a unique class gets it classified.
Being referred to here is the process of inserting order into content chaos by organizing documents into meaningful, useful arrangements – classifying content. To be more precise – classification places one object in one class. This works when organizing plants and animals. The resulting effort is a taxonomy.
Most articles and reports cover many subjects and are best categorized. Categorization places a document into one or more categories. When the categories are organized, related, and structured with your content logically linked to the categories, you have a taxonomy. Call it profiling, a process that is still “PC” for documents.
Velocity, Volume, and Variety
Why plan and deploy a taxonomic strategy? Well, aside from the e-Government Act, Section 207(f) mandating agency taxonomies, you can’t cope with your content without it. Increasing information volume is a much discussed crisis. Less obvious, but more important is velocity – the speed at which information hits your desk or screen. Absolute volume would be a less serious problem, if it wasn’t coming at you at the speed of light.
The accelerating variety of content bewilders – subject matter, scope and depth, document types, sources, formats, quality, and more. Subject matter is confounded by language ambiguity. A bomb in the entertainment business is not a bomb in the terrorist arena. Both have to be defused, but by very different means. Security in your legal department refers to a financial instrument.
You must also gain control over a variety of content types. Most obvious, in the war on terrorism, is content for analysts to study. Categorizing documents for analysis improves efficiency and allows for better document distribution among specialists. But these same analysts spend a great deal of time on non-critical content such as routine office correspondence, studies, and reports that are a part of everyone’s routine. All content must be a part of the content management solution.
Taxonomies corral content variety – document types, formats, sources, subject matter, and context parameters. Taxonomies can span departments and organizations, allowing for variations in language meaning and usage. Well designed, deployed, and integrated into an enterprise content management system (CMS), the twin enemies, volume and velocity, vanish. Well, not completely, but it is tantamount to reducing volume and slowing velocity.
A taxonomic strategy encompasses not only the development of an organizational taxonomy, but also includes plans for deploying the taxonomy, for indexing content, for its maintenance, and plans for direct support of search.
BUYIN – GET IT
To work, a taxonomy needs to span the entire organization from the legal department, to personnel, to finance, to analysts, etc. You need commitment from the top to get cooperation organization-wide. You need a budget, too.
Taxonomy – Build It
Building a taxonomy requires lexical expertise plus an understanding of your overall organization – its parts, its mission, its field, or industry. Look within your library or information center for taxonomy expertise. Outside, look to the American Society of Information Science and Technology (ASIS&T), the Special Libraries Association (SLA), and similar groups. Look to my organization, Access Innovations, Inc. Establish a taxonomy team of lexicographers, subject experts, and department representatives.
Consider acquiring existing thesauri and taxonomies for a fast start. Then build out to reflect your unique requirements. Look across the entire organization – gather words, gather sample documents. Get ideas from colleagues. Sort your sample documents against your fledgling taxonomy. Add concepts. Move things around. Group concepts. Look for hierarchical relationships – “asymmetric warfare” is a narrower concept to “warfare.” Look for related terms – “counterterrorism” is related to “countermeasures” and “deterrence.”
Taxonomies can be changed and should be. Don’t be afraid to move an entire hierarchical branch just to see if it works better for you. To do this, get yourself a tool like our Data Harmony Thesaurus Master™ (www.dataharmony.com) or another brand. Just be sure the tool of choice can be integrated with existing applications and is compliant with the ANSI/NISO Thesaurus Standard Z39.19 (www.niso.org) and ISO 2788 and ISO 5964 (www.iso.org). A good taxonomy tool will handle all of your term relationships, hierarchical issues, reciprocal posting, and a great deal more functionality. And make use of the standards when designing and building your taxonomy.
Keep testing against your sampling of documents. Periodically change the sample. Run a draft taxonomy by people from each department. Get their feedback. Incorporate it, but only if it enhances clarity and reduces ambiguity. Avoid slang and trendy words. Use words with “legs,” words that will be around for awhile.
Map slang and trendy words to preferred terms. Minimize the use of acronyms. From among synonyms, select the preferred term, the one that, for your users, best represents a given concept. For example, “bones” is slang for a medical doctor. Physician is an alternative term for medical doctor. Choose one. If you select physician, then map medical doctor and doctor and bones to physician. Within a tool like Thesaurus Master™, this functionality is built-in. For your preferred term, “Physician,” you can add as many non-preferred terms – synonyms – as you need.
Taxonomy – Deploy It
Put the taxonomy to work indexing, categorizing, and filtering your information. Automating indexing is essential as manual is too slow and costly. Automation packages fall into two broad categories: black box and transparent (white) box.
Black box solutions imbed proprietary approaches for categorization using linguistic, lexical, AI, neural-nets, and statistical processing. Training sets of sample documents must be collected and carefully screened to teach the system about your domain. Training sets easily skew the results. If the system performs poorly, then devising different rationales for building new training sets becomes near mystical. Expensive and very difficult to use, black boxes rarely produce accurate, relevant results. They don’t scale for expanding knowledge domains, though they do scale for volume. Best uses are for rudimentary clustering, categorization, and filtering of high volume information flows.
Most white boxes use human readable rules bases. Initial rules, generated automatically, grow ever more sophisticated – iteratively – by using highly refined statistical feedback within GUI editorial interfaces. Quick to deploy and inexpensive to maintain, white-box machine automated indexing produces the most accurate, relevant results and are best used for high level conceptual indexing, categorization, and filtering. In response to industry and government demands for more open, comprehensible, and easy to integrate taxonomic solutions, Access Innovation developed a unique, white-box solution, Data Harmony’s MAIstro™, which integrates taxonomy maintenance and automated indexing into one seamless whole. Developed in Java with built-in XML functionality, MAIstro is an example of the keystone technology needed for a successful taxonomic strategy.
Maintenance and Training – Keep It Up
Taxonomies require continual review, adding new terms and retiring outdated ones. This need not be onerous, if done on a regular basis by one or two taxonomists. Your taxonomy strategy will succeed when all constituents are trained and motivated. Demonstrating the huge advantage it gives to their daily hunt for information is easy and will ensure use of the taxonomy.
Search and Find
Search is nice, but find is better and in the end, getting the information you need, efficiently and reliably, is the goal of a taxonomic strategy. As the hearings on 9/11 proceed, the need for an effective taxonomic strategy becomes obvious and imperative. Your taxonomy provides the controlling principle for organizing vast quantities of raw data. Organizing data using a taxonomy transforms data into information. By using taxonomies for information display, faceted navigation, and search query formulation, your analysts quickly turn information liabilities into intelligence assets.
by Jay Ven Eman, Ph.D. CEO, Access Innovations, Inc.
Originally published in: Intelligence & Warning America