The Strategic Business Value of a Well-Constructed Taxonomy

December 8, 2010  
Posted in Access Insights, Business strategy, semantic, Taxonomy

By Bert Carelli

An organization’s website is today the primary strategic tool not only for delivering information and promoting its scholarly mission, but also for developing and maintaining relationships with and among its members and the larger scholarly community. A unified semantic strategy is the best way of bringing together all elements of that mission and leveraging all of the organization’s intellectual assets in support of these goals. These assets fall within three broad groups – Content, People, and Programs:

  1. Content – journal articles, conference proceedings, website information about the society, multimedia materials, newsletters, etc.;
  2. People – society members, authors, editors, peer reviewers, reporters, bloggers, event chairs, etc.; and
  3. Programs – conferences, workshops, special interest groups, and awards.

Each of these assets of the organization generates revenue and helps to support the scholarly mission. Traditionally, each of these has been treated separately, through wholly separate platforms and support functions. However, in our current Web 2.0 environment, this separation is no longer valid nor economically supportable. Like many of the former news and media giants, who discovered too late that the growth of their online popularity came at the expense of their established print and broadcast businesses, scholarly organizations are learning that in order to thrive in the new paradigm, it is necessary to find ways to break down these silos of information-rich assets and organize them in a way that supports the whole organization.

A well-constructed and systematically applied taxonomy can unify all of these various assets, and in the process can itself become a valuable repository of institutional knowledge and a source of competitive advantage for the organization. It can be the key to illuminating the connections between the content, people, and programs of the organization, enabling the whole to become greater than the sum of its parts.

A taxonomy which includes all of the major terms and concepts of the disciplines addressed by the organization, one that is robust enough to include rules for disambiguating similar or closely related terms, with a systematic mechanism for acquiring, evaluating, and adding new terms, offers a number of potential applications, for example:

  • Improved search engine optimization (SEO). Thanks to the use of machine-aided indexing, tagging every document with the most meaningful terms is now cost efficient
  • Improved website navigation – addressing needs of different kinds of visitors (member and non-member), with suggested terms based on popularity or user profile
  • Better Search – auto-completion of search terms, including recognizing synonyms and suggesting controlled terms (i.e. “Did you mean___?”); suggesting related, broader, and narrower terms; and presenting results in different views.
  • Adding value to legacy archives – increasing the findability and discovery of “long-tail” articles and facilitating the cost-effective reuse and repurposing of materials
  • Organizing unstructured content, which often requires uncovering relationships between materials originating from different media
  • Providing links to larger collections. For example, providing users with equivalent or related terms from the MeSH taxonomy, which enables links between related articles from different societies
  • Creating links to buy products or enroll in high-value programs and other content offered by the organization
  • Providing a structure for discussion forums by suggesting controlled terms for user-contributed content and integrating user-generated tags within the larger taxonomy
  • Enabling visualization and interactive functionality, including enriched “tag clouds” – a map of pre-defined concepts extracted from a defined set of documents, linked via machine aided indexing to the structured taxonomy – a gateway to further discovery
  • Providing users with easy-to-use current awareness tools that alert them to new articles in their specific areas of interest. Such tools, when supported by a rule-based taxonomy, deliver dramatically more precise results than keyword or subject-heading alerts.

Until recently, having a taxonomy was considered by scholarly organizations as a “nice to have” option. However, this is no longer the case. A proprietary taxonomy has now become essential to a scholarly publisher’s strategy for several reasons:

  • A taxonomy strengthens the publisher’s own brand. Content has become increasingly “unbundled” from its traditional packaging, the journal subscription. This trend is unlikely to change. However, a taxonomy that is proprietary to the publisher and reflects the standards of the discipline reinforces the importance of the organization and provides competitive advantage.
  • A taxonomy, well integrated with search, promotes the publisher’s relationship with the end user. With the rise of full-text crawling by Google, the economy of the Web is increasingly based around the article, rather than the journal. End users go directly to articles, based on the matching of keywords and the page rank of Google results. The great majority of valuable works thus become, effectively, more “needles in a haystack,” often only randomly discovered, due to the vagaries of the Google algorithm.
  • Searching scholarly content without a taxonomy to help guide the search and filter the results does not work well. Searches – whether through Google or on the site itself – often fail to find the most relevant content, due to ambiguous terms and the limitations of search technology. When users don’t find what they want, they often simply turn to Google. This results in degradation in the way a scholarly society’s corpus of work is encountered by the public and lowers subscribers’ perception of value.

The benefits a taxonomy provides in increased precision and recall are well documented and recognized by most people familiar with scholarly research. However, some publishers have felt that a taxonomy might be too expensive and resource-intensive to develop, difficult to maintain, or a burden for editors to apply terms to each published article. For this reason, until recently, taxonomies remained largely within the domain of abstracting and indexing databases. In fact, Access Innovations built many of the taxonomies that are still in use by the leading aggregators and A&I databases. The Data Harmony suite of thesaurus management and content management tools is the result of 32 years of experience continually working to develop ways to make the various parts of that process more efficient and more accurate. A major breakthrough has been the development of M.A.I.™  (the Data Harmony Machine Aided Indexer), which makes it possible for human categorizers (“indexers”) to increase their indexing efficiency and consistency while adding superior descriptive data. Customers have experienced up to a seven-fold increase in productivity using M.A.I. while measurably improving consistency and coverage of individual records. M.A.I. improves consistency by providing the same term in the same conditions every time, preventing editorial drift.

As with any strategic investment, the best way to evaluate the cost of building a taxonomy is to look at the potential return on investment. This should be evaluated in terms of value added to the aforementioned three organizational assets – content, people, and programs. For many organizations, thanks to Google, PubMed, and other third party referring sites, the number of visitors to the website who are not recognized as subscribers is much greater – often by one or more degrees of magnitude – than the total number of its subscribers, members, and conference attendees. Unfortunately for most publishers, with the exception of a small percentage who buy individual articles, this additional traffic is largely unmonetizable. A taxonomy, coupled with improved search and discovery, addresses this underserved market directly, by providing a new level of engagement with these visitors – helping them discover articles that match their interests and informing them of the full scope of benefits and activities that come with greater involvement in the organization.

For any scholarly organization, there is no better investment than one that simultaneously increases the value of the content, people, and programs that its scholarly mission comprises. A taxonomy serving as the basis for rule-based tagging, enabled and supported by Access Innovations using its Data Harmony suite of software tools, could be the foundation of such an investment, turning the website into a core strategic asset that will yield significant benefits for many years to come.

A Scholarly Web 2.0 Scenario:

Thanks to search engine optimized tagging, a scholar from an adjacent field discovers via Google an article on a publisher’s website.

Viewing the abstract on the website, the scholar is also presented with closely related articles and information drawn from the publisher’s own journals, conference proceedings, discussion groups, and event calendar, along with links to authors and commentators with similar interests.

The scholar is now engaged at a deep level with the society and its programs, and is not only a potential candidate for buying an article, but also potentially becoming a member, attending a conference, and subscribing to additional content.

All of this is made possible by the robust, multi-faceted taxonomy that has been developed and used to index all the above materials. The website is thus transformed from being simply a vehicle for delivering content to the society’s core audience, into a strategic program for promoting the programs and overall mission of the organization.