Bridging the Great Indexing Divide

December 8, 2010  
Posted in Access Insights, indexing, Taxonomy

by Alice Redmond-Neal

Reprinted from Key Words, the bulletin of the American Society for Indexing

I’ve been indexing for over a decade, but when I attended the ASI Annual Meeting 2008 in Denver, I was a stranger in a strange land. Being surrounded by indexers was intriguing yet confounding. You see, I come from the indexing otherworld, electronic territory where indexing is coupled with a taxonomy providing terms as category labels to identify and retrieve content on a given topic.

In my world, taxonomy is fundamental, the basis for the process of indexing. The taxonomy or thesaurus comes first. Both are forms of a controlled vocabulary, established to describe a collection rather than a single content item. Indexing follows, sorting content by the vocabulary terms. A taxonomy is the starting point for database indexing and its primary reason for being.

OK, I knew about you guys. Some years ago, I attended a workshop by Do Mi Stauber, sponsored by the New Mexico A to Zia Chapter, and was introduced to the “other” indexing, the back-of-the-book (BOTB) kind. I had a brief introduction to popular software, how it supports the work, and the profession.

My visit to ASI’s annual meeting was prompted by conversations with Heather Hedden leading up to her Taxonomy and Thesaurus Creation workshop. We discussed ASI’s expansion into taxonomy territory through the new SIG and we explored Data Harmony as an example of taxonomy construction and subject indexing software. In support of ASI’s broadening focus, Data Harmony signed on as a vendor, and I had the privilege of representing the company while wandering into the mostly unfamiliar territory of BOTB indexing.

I attended all the sessions I could, with special interest on cross-over topics, e.g. controlled vocabularies and taxonomies, web and database indexing. I started with Heather’s workshop. Having attended and presented many workshops on taxonomy construction, I found this an excellent introduction which effectively pointed out commonalities with BOTB work. Other sessions I especially enjoyed included BNA’s Chuck Knapp giving research evidence of the value of categorical indexing for information retrieval and Julie McClung on herding cats, i.e. indexing the record of the B.C. legislature.

Between the meeting sessions, I grabbed every old issue of Key Words available. How would they describe my daily work? My interest heightened with references to database and web indexing, controlled vocabularies, taxonomies, information technology, informatics–this is my indexing world, though work on journals, databases, and websites is fuzzy shared territory. Between the meeting sessions and past bulletins, I observed and learned a lot about how your indexing and my indexing come together.

Our similarities

I learned how similar our work is. We both do work that is not noticed when done well—that’s expected—but, oh, so obvious when done poorly. We use much the same cognitive processes to “capture what’s important.” I learned that the skills of “The Good Indexer,” identified by Henry Benjamin Wheatley early last century, substantially overlap with the other kind of indexer’s required skills[1]. Wheatley bemoaned “undifferentiated page references,” a situation I see in taxonomy terms for concepts that are indistinct and call for disambiguation. His and your cross-references are my related terms, and double posting translates to polyhierarchies or multiple broader terms. Many of the basic mental skills and attitudes Wheatley identified are restated in the draft document “Indexing Body of Knowledge”[2], and later Seth Earley would observe transferable skills for developing an enterprise taxonomy[3].

In “Lesson in Language,” William P. Meyers wrote “A good index is a structure optimized to help two human minds meet.”[4] Substitute the word “taxonomy” for “index” and the statement is also true. Meyers stressed the need to understand an author’s intent, the user’s needs, and the subject matter, all critical for a good index, as Wheatley noted, and also for a good taxonomy. In BOTB indexing, taxonomy construction and indexing, book and database indexers and taxonomists all pick up a lot of domain knowledge through their work, Meyers observed.

The parallels between BOTB indexing and database/web indexing with taxonomies continue. We recognize that a concept—not a word—is the key to capturing what’s important, and concepts are expressed with infinite variety. We spot alternate ways of expressing a concept and capture them as access points or synonyms (nonpreferred terms or Use/Used for pairs). A taxonomy redirects the user to the correct term, as a BOTB index does, and may provide various displays or views for a taxonomist or database indexer to simplify access from these alternative expressions to the authorized terms. For the end user, a good search engine completes that translation and redirection to a desired document or information.

Meyers decried computer-generated indexes just as taxonomists do computer-generated taxonomies. He observed that computers can’t sort out multiple entries at the same level within page ranges or differentiate subentries. A taxonomist faces this challenge in determining when one concept is distinct from another or is a subtype, part, or instance of another, i.e. a narrower term to a more general concept in a taxonomy. In my indexing role, I might turn to scope notes from the taxonomist explaining how the term should be used for indexing. Identifying the boundary of a concept, when to make a subentry or a narrower term, can be difficult for humans and is practically impossible for computer systems. We all worry over the use of “and”[5], which can blur those boundaries and lead to unwelcome mixing of concepts.

We recognize the importance of human judgment. As Deborah Patton observed in a 2006 speech, “…indexing is an art and not a science.”[6] We know Google is no substitute for a good index or retrieval device, and we share support for good metadata[7]. No simplistic approach has succeeded; a computer can’t match the product of a human indexer, taxonomist, or … um, indexer of the other sort.

I was interested to see a review of Heting Chu’s book, “Information Representation and Retrieval in the Digital Age”[8]. I find it an excellent introduction to approaches to information retrieval online. To my surprise, the review focused little on the book’s content but mostly critiqued its index, a fine example of blindly touching different parts of an elephant. Now more aware of BOTB indexes, though, I was recently struck by a bad index of a book on taxonomies. The reference to “mullets” made me wonder: is this a new concept in taxonomy construction? No, just an embarrassingly pointless reference…computer-generated, no doubt.

I identified with Julia Marshall’s “part two” article on controlled vocabularies[9], but wished I had the first part. I cheered for Peggy Ruppel’s “Rules of Database Indexing”[10], concurring with every point. This will be framed and hung on my wall!

Our differences

Despite the many similarities, there are undeniable differences. As a database indexer and taxonomist, my medium is electronic for content that is ever expanding. Your index, in most cases, is a nearly flat list on paper for a static product. You organize references alphabetically, a common default organizational strategy. I aim to organize terms in a logical conceptual hierarchy, an outline format designed to help users find their way to a concept without having to know the word. The hierarchy also helps users learn accepted terminology, understand how the subject area is organized, and discover interesting points along the way. The terms are applied to documents as descriptive subject metadata to support retrieval by a search engine. Your index is complete with your last entry, while a taxonomy for indexing is a dynamic creature growing to reflect evolving concepts and terminology.

You capture an individual author’s word choices and may use more common expressions as See references. I take the opposite approach. I respect the author’s keywords but aim to standardize them, translating to established taxonomy terms for indexing. Unique expressions can be found by free text search but don’t help searchers find all the material on the topic in a database. If there isn’t an equivalent term for a keyword, I index with a more general term. In my taxonomist hat, I research the concept to determine how others express it and choose a term that is widely recognized, capturing the keyword as a synonym. In some cases, I construct a phrase to represent a concept that multiple authors express in various ways across the expanding collection of documents.

Most ASI indexers deal with a book and typically aim for three references per page. I deal with an electronic record representing a research article, conference paper, book chapter, paragraph, or abstract, and tag it with maybe two up to ten taxonomy terms that describe its “aboutness”, Productivity time for us all depends on length and complexity. I index about three to 12 records an hour.

In some circumstances, we have very different perspectives on facilitating retrieval. I was intrigued by the case study of an A-Z site index created to speed access to accurate information on a call center intranet, to supplement the Google-style free search. One operator/user stated, “As long as I know what I am looking for, the A-Z index does just fine,” and another said, “Sometimes the A-Z index has items phrased differently…I look under the wrong letter.” Offering alternate routes to finding information is ideal, but I would suggest a third option. Best practices in information architecture advise providing multiple search options to suit different users’ knowledge, strategies, and preferences. A third alternative would help here. A visible and logically organized taxonomy of topic categories, rich in synonyms, with content indexed and linked to those topics would significantly improve search.

Bridging the Divide

In discussing indexing, we are using the same word for different meanings, violating Ruppel’s Rule #11 and ANSI/NISO Z39.19, the taxonomy standard. We are talking about concepts that, if not distinct, are at least overlapping. To complicate things further, our friends in computer science mean yet another thing by “indexing,” but we won’t go there. The key difference separating us is construction and use of a controlled vocabulary; the rest is just details. As ASI expands into taxonomy territory, we should be clear how we use that core word “indexing.” But overall, we have many key similarities and not so many differences.

A Key Words report showed indexers at the conference of the Association of Southern African Indexers and Bibliographers, and presumably worldwide, pondering the future of the industry and the impact of technology[11]. It is a concern for taxonomists and database and web indexers, as well. Proponents offer systems that, they claim, automatically sort masses of content into concept classes, with the click of a seemingly magic button. IT people assert quick and efficient results without human interference and complication. But watch out for the noise, the false hits—the “mullets”—and the missed content. We know it takes a human to provide precise and consistent results that discerning searchers demand.

Seth Maislin looked to a future when “indexing is not indexing”, when traditional indexing fades, giving way in an online environment[12]. Your 2004 survey indicated only 9% of respondents did either online document or database indexing[13]; I expect the number will rise with more indexers comfortable with both types of indexing work. Asserting the lasting value of the work, Maislin suggested individuals adapt by renaming the profession. Whatever we call it, information work in its many forms, performed at the highest quality level, will not go away.

ASI is to be applauded for its leadership in establishing a SIG dedicated to taxonomy work, addressing a glaring gap in support for this special interest by professional societies. We’ve been largely unaware of each other, neighbors separated by a common word, “indexing”. As we each learn more about similarities and differences in our perspectives, we can bridge the great indexing divide, melding our skills and adopting effective strategies. Ultimately this will provide better service for our customers and end users searching for information.

References

  1. Linda K. Fetters, Review of Wheatley, Henry Benjamin, How to make an index. First published 1902, London: Elliot Stock. 236 pp. Key Words, April-June 2004: 63-65.
  2. “Indexing Body of Knowledge”, Key Words, July-September 2006: 84-85.
  3. “Triennial International Indexing Meeting and Conference, American Society of Indexers (ASI), Indexing and Abstracting Society of Canada, Joint Conference, 15 to 17 June 2006, Toronto, Ontario”, Key Words, July-September 2006: 91-99.
  4. William P. Meyers, “Indexing Books: Lessons in Language Computations, Part I”, Key Words, April-June 2005: 47-50.
  5. Enid Zafran, “And What About ‘and’?”, Key Words, April-June 2005: 48-49.
  6. Deborah Patton, “2006 American Society of Indexers H.W. Wilson Aware for Excellence in Indexing, Presentation Speech, Jun 16, 2006. Toronto, Ontario, Canada”, Key Words, July-September 2006: 86-87.
  7. Pilar Wyman, “Metadata Is Our Friend”, Key Words, April-June 2005: 40.
  8. Pilar Wyman, Review of Chu, Heting Information Representation and Retrieval in the Digital Age. Information Today Inc. (ITI); Medford, NJ USA. September 2003, 250 pp. Key Words, April-June 2005: 63-64
  9. Julia Marshall, “Controlled Vocabularies: Implementation and Evaluation,” Key Words, April-June 2005: 53-59.
  10. Peggy Ruppel, “Ruppel’s Rules of Database Indexing”, Key Words, April-June 2005: 57.
  11. Elna Schoeman, “Association of Southern African Indexers and Bibliographers (ASAIB)     Conference Report,” Key Words, April-June 2005: 58-60.
  12. Marlene London, “Business, Ethics & Technology”, Key Words, April-June 2004: 61, 69.
  13. “American Society of Indexers 2004 Professional Activities and Salary Survey,” Key Words, January-March 2005: 18-21.