The Thesaurus as a Domain Model in the Modern World

July 20, 2015  
Posted in Access Insights, Featured, Taxonomy

colortreeSource: Dreamstime

Nowadays, taxonomies and thesauri are used largely for web navigation and for information search and retrieval. It wasn’t always that way. In fact, it’s largely the digital information revolution that has made their use for information search and retrieval a vital necessity in research, business, and numerous other types of activities.

Taxonomies and thesauri are sometimes referred to as domain models. However, the term is often limited to graphic constructs specifically designed as visual tools for problem solving. As explained in the relevant Wikipedia article:

A domain model in problem solving and software engineering is a conceptual model of all the topics related to a specific problem. It describes the various entities, their attributes, roles, and relationships, plus the constraints that govern the problem domain. … The domain model is created in order to represent the vocabulary and key concepts of the problem domain. The domain model also identifies the relationships among all the entities within the scope of the problem domain, and commonly identifies their attributes.

Before the proliferation of thesauri and taxonomies for search, most taxonomies served, in effect, as domain models. As such, they reflected and furthered our understanding of the relationships of things and creatures and areas of knowledge. Even today, by their nature, most or perhaps all taxonomies and hierarchical thesauri are domain models. We just don’t use them that way, for the most part. The main exception is in the world of classification of biological organisms.

The history of taxonomies is replete with the names of naturalists and other scientists who strove to categorize the natural world through the use of hierarchical schemes. Even today, mention “taxonomy” to a biologist, and he or she is likely to think of one or more taxonomies that serve to categorize the members of some family or genus (or whatever) of plants and/or animals and/or other types of organisms. In such taxonomies, the focus of the categorization isn’t on reports or articles or books or videos about the organisms (although those taxonomies could certainly be used for that kind of categorization, and often are). Rather, the focus is on how the organisms themselves, as represented by one form or another of their names, are categorized within the taxonomy.

Much of this work was done in the 1700s and subsequent centuries. The earliest of these taxonomies were based on the work of the philosophers of ancient Greece and Rome. The aim of all these naturalists and philosophers was to better understand the world around them by modeling the domains of nature, using semantic methods of representing the individual concepts. We still use this semantic approach! It has stood the test of time, as has the hierarchical design.

These taxonomies helped people to understand their world. Might we not also use taxonomies, or better yet, their more complex version, hierarchical thesauri, as graphical tools to understand our world and perhaps to gain insight into and solve our problems?

Barbara Gilles, Taxonomist
Access Innovations, Inc.

IntegraCoder® Version 2.0, Powered by Data Harmony®, Now Available Through Major EMR companies

July 13, 2015  
Posted in Access Insights, Featured, indexing

IntegraCoder Version 2.0, the medical coding application developed by Access Integrity, a division of Access Innovations, Inc., and coding experts Find-A-Code, is now available as a component in the offerings of some of the nation’s largest electronic medical record (EMR) companies. Medical providers will now have access to the latest version of this application based on Data Harmony, Access Innovations’ patented, award-winning software.

IntegraCoder is a web-based solution combining the technologies of Access Integrity and Find-A-Code. Access Integrity’s engine, which is powered by Data Harmony’s semantic analysis technology, analyzes content in EMRs and provides highly relevant diagnosis and procedure suggestions, while Find-A-Code provides exhaustive coding and revenue cycle/denial management resources. IntegraCoder’s indexing system recognizes key concepts within an EMR and delivers suggested codes for users to select from, which increases accuracy in clinical documentation.

Version 2.0 presents far-reaching improvements to the application:

  • An integrated edit capability, allowing providers to get more precise recommendations.
  • An ICD-10 readiness tool encompassing documentation and coding nuance.
  • Pre-claim code scrub for medical necessity.
  • Charge functionality seamlessly generates a claim submission with IntegraCoder.

“It’s exciting to have IntegraCoder version 2.0 in the marketplace,” says John Kuranz, CEO of Access Integrity. “The hard work that has been put in by all members of the team really shows in the product and how smoothly it integrates into the EMR system. IntegraCoder’s encounter note editing and research capabilities and ICD-10 tool set will greatly enhance the end-to-end patient engagement process.”

“The new version of IntegraCoder will give coders and billers the speed, efficiency, and instant access to critical coding data needed to keep up with the changes in medical coding,” explains Find-A-Code CEO LaMont Leavitt, “and adding the capability to seamlessly generate a claim from IntegraCoder’s results is a game changer, especially with the transition from the ICD-9 to ICD-10 code sets later this year.” Leavitt is confident that the integration of the two technologies will fulfill a long-time need in the practice management sector.

Jay Ven Eman, CEO of Access Innovations, the parent company of Access Integrity, remarks, “The latest version of IntegraCoder presents a great opportunity for disruption in the medical practice management sphere. We are very excited to be in this marketplace and expect great things to come from IntegraCoder.”


About Access Innovations, Inc. –,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

About Access Integrity –,,
Access Integrity provides a patented technology for complete and compliant EMR analysis. Access Integrity plays an important role in medical transaction processing by extracting rule based relevant data (concept extractor) from medical records, increasing coding accuracy, clinical decision support, and overall understanding of a patient encounter. Access Integrity is the first company to employ Data Harmony’s semantic enrichment and rule-based concept extraction technology in the healthcare industry. The award-winning and world-renowned Data Harmony software suite has been used in the content management and information technology industries for more than 15 years.

About Find-A-Code, LLC

Find-A-Code, LLC is dedicated to providing the most complete medical coding and billing resource library available anywhere. Find-A-Code’s online libraries include extensive information for all major code sets (ICD-9, CPT®, HCPCS, DRG, APC, NDC, ICD-10 and more) along with a wealth of supplemental information such as newsletters and manuals (AHA Coding Clinic®, AMA CPT Assistant, DH Newsletters, Medicare Manuals). All code information and newsletter databases are indexed, searchable and organized for quick access and extensively cross-referenced. Find-A-Code also provides tools for code set translation (such as ICD-9 to ICD-10), code validation (edits) and claim scrubbing.

Smart Thesauri: Using Taxonomies with Linked Data

July 6, 2015  
Posted in Access Insights, Featured, Taxonomy

Note: A full version of this article is scheduled to be published in an upcoming issue of the IKO eNewsletter.

As interest in Linked Data (LD) continues to grow, many organizations—publishers, corporations, universities, libraries—are increasingly interested in strategies to jump-start LD initiatives. Any organization that has an existing taxonomy (or other controlled vocabulary) can expedite the move to LD by leveraging its existing semantic structures as a bridge to an advanced LD-based semantic strategy.

Why Linked Data?

There are three primary reasons motivating organizations to move towards LD:

  1. To use resources on the Web to enhance internal knowledge environments.

Once a link to an external data source (e.g., DBpedia or Wikidata) is established, references to other content—Wikipedia articles, definitions, images, social media and news feeds, and other information—can be queried off and added to internal resources to enrich content or Web portals.

  1. To add backlinks to internal resources and content to LD portals, thereby pointing to an organization as an authority on a topic or topics.

Adding reciprocal links from LD sources to content (or publically available web resources) enables other LD users to find and reference an organization’s expertise on a given subject.

  1. To add an organization’s expertise to the growing Semantic Web of knowledge and information—i.e., to contribute to the sum of available human knowledge.

More idealistically, many LD enthusiasts are motivated by the goal of adding their curated and well-formed specialized knowledge to the expanding network of LD sources—in other words, attaching their datasets and knowledge organization systems (KOS) to the LD community.

ldgraphFigure 1: The Ubiquitous LD Graph Diagram. Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. Reproduced here under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported license (CC BY-SA 3.0).

Note that in this diagram (as in most such graphics) DBpedia still occupies the central position, as it is the most robust and oft-linked universal (i.e., not subject-specific) LD source. Although it has its shortcomings, DBpedia is still the best choice for stable LD URIs, and this is not likely to change anytime soon; accordingly, we will use DBpedia as the sample LD source in this article.

From Taxonomy to Linked Data

Although DBpedia Spotlight is good at recognizing existing LD concepts in a block of text, in most cases it doesn’t have the robust synonymy and/or rule-based concept extraction used in most semantic platforms. Additionally, an organization with a well-formed taxonomy and rich semantic strategy will already have indexed content—possibly a very large volume of content—so “re-indexing” a large legacy dataset using DBpedia Spotlight is an unwieldy proposition.

Instead, by asserting a link between terms in an existing taxonomy and the corresponding concept in an LD source, it’s not necessary to run legacy content through Spotlight (or a similar LD-matching service) since the link is asserted at the thesaurus level instead of the document level. This is both much more efficient and provides a simpler mechanism to curate and establish LD links in the future.

Sample Process

Imagine that you have a large set of content about physics—you could be a publisher, laboratory, or research organization—and that you have an existing, well-formed taxonomy of physics concepts and wish to pursue LD. Basically, you want to assert that each term (more on this later) in your taxonomy corresponds to a DBpedia URI on the same topic.

For example, you might have a term (or branch) in your vocabulary called “Optics”. The first step is to add a field to the term record in your taxonomy management system (most commercial taxonomy applications have a mechanism for this) to hold a URI; this could be a dedicated ’live’ URI field or just a simple text field. Next, in this field for your term “Optics”, you add the corresponding DBpedia URI:

…and you’re done.[1]

You have now achieved something like this:


Figure 2: Connecting Resources on the Same Topic Using LD

Using a SPARQL endpoint, you can now query information from the DBpedia page or use it to reference or link to other pages from DBpedia. For example, if you have a topical web portal on Optics, you could automatically add definitions, images, news, social media feeds, or links to other publishers with Optics content.

Automation, Problems, and Quality Control

Can this process—matching terms from your taxonomy to DBpedia—be automated? Yes, but it requires careful quality control. We have found the most success using Spotlight as a starting point and validating the results by hand; this is faster than matching each term manually, and at the same time ensures accuracy.

As mentioned earlier, Spotlight is better at matching blocks of text (leveraging semantic proximity) than single concepts from a taxonomy, but it’s accurate enough to decrease some of the effort of manual matching.

The primary sticking point, however, is that any specialized thesaurus will be more granular than DBpedia is. Your taxonomy on Physics, in the thought experiment above, is going to have far more specific terms than DBpedia in many, many places. The top two or three levels of your taxonomy will probably have corresponding LD pages, but the more specific topics will not.

For example, there’s a robust DBpedia page on Optics, as well as one on Nonlinear optics; more specific topics within Nonlinear optics, however, are far less likely to have a corresponding page (e.g., “Photonic metamaterials” has no corresponding LD page we’ve found so far—and many scientific and technical vocabularies get even more granular than this).

Possible solutions to this problem are as follows:

  • Forget it for the time being and check back later to see whether a corresponding page emerges.

This is the easiest option, but does not accomplish much.

  • “Roll up” more granular topics to the next-nearest Broader Term in the taxonomy.

This procedure at least provides LD pages for every topic in the taxonomy, but leaves much to be desired; every term in a large branch might point to the same LD page, which is not particularly useful.

  • Proactively add new DBpedia pages on not-yet-existing topics, and add the backlink to your content/vocabulary as the first link; the Web should come and fill in the blanks eventually.

Attractive, altruistic, and useful, though far more time consuming, this option is ideal from an information science perspective but may not be practical in the scope of your LD initiative.

Margie Hlava, President and Bob Kasenchak, Product Development
Access Innovations



For more information, we recommend David Wood, Marsha Zaidman, Luke Ruth, and Michael Hausenblas Linked Data: Structured data on the Web (Shelter Island, NY: Manning Publications 2013)

[1] A subsequent step might involve adding a backlink to your topical/library page on Optics to the DBpedia dbpedia-owl:wikiPageExternalLink field, if you want your content to be publicly available.

ICD-10 and Taxonomies

June 29, 2015  
Posted in Access Insights, Featured, Taxonomy

In Canada, the implementation of ICD-10 caused an approximate 50% drop in productivity in rates of coding in the health profession. Reports show that a year after implementation, productivity only returned to 80% of the original ICD-9 baseline. Now, more than a decade since Canada adopted the new coding system, they still have yet to reach original productivity levels.

This reduction in productivity is understandable. ICD-10 is vastly more complex than ICD-9, and people sometimes have difficulty adapting to change. It’s naturally going to cause fear in the industry. The 3-page bill proposed to the US Congress by Rep. Ted Poe (R-TX), HR 2126, would allay those fears for a while by delaying U.S. implementation of ICD-10, which has already occurred several times. As noted in the referenced article, however, neither committee has decided to hear the bill, so yes, it seems highly likely that ICD-10 implementation will occur in the United States on October 1, as (currently) scheduled.

But tell me: do we really need to know that someone was “bitten by a turtle” (W59.21XA) rather than “struck by a turtle” (W59.22XA)? And if the person was bitten on the toe by that turtle, do we really need to know whether it was the right big toe (S90.471A) or the left big toe (S90.472A)? What if the incident happened, for some reason, while the person was waterskiing (Y93.17)? Do we need to know that? It is definitely helpful to know something about laterality, which side was the injury was on, and about the person’s activity when the injury occurred. It is also nice to know how many people a turtle bit this year, as well as how many waterskiing accidents.

In another (less funny but more likely) example, someone driving a pickup strikes a lamppost while texting. The person’s head strikes the dashboard, causing a contusion (S00.83XA), while the truck’s airbag expands and strikes them on the right side of the chest, causing a contusion there (S20.211A). That it happened in a pickup matters (V57.5XXA), as does the fact that they were texting at the time (Y93.C2). This is all highly useful information, but does it have to be done as a pre coordinate highly complex classification system? The format of the ICD-10 codes, and for that matter the ICD-9 codes, is quite old fashioned in a post-coordinate world.

A taxonomy — and it would be a large one — would better provide the desired data and in a much more flexible form than the ICD-10. Just a single laterality rule for left and right would remove a goodly portion of the listed codes.  If it is the interest of the Centers for Medicare and Medicaid Services (CMS) to gather more data about health and health care in the USA, then why not apply some proven, inexpensive, easy-to-implement algorithms rather than cause a Y2K-style panic throughout the entire healthcare industry?

In example one above, we could code for “turtle,” “toe,” “bite,” and even water or waterskiing. If any of those items was missing in the source data, i.e., the electronic medical record (EMR), we could still do a good job of determining the cause of the bite and collating the data for later reporting and retrieval. It would be possible to mine the data more effectively than using the classification system, since the data would be disaggregated and available. For example, all kinds of water-related accidents could be retrieved, as could all turtle-related injuries, bite-related injuries, etc.

Going one step further, one could even link the data as RDF triples for a full semantic enrichment and ontology approach, which would surface all kinds of fascinating relationships. One could then visualize the data in various presentations for quick understanding of how much danger there is the general populace regarding turtles bites and waterskiing.

Access Integrity has built a system to review EMRs and instantly provide a suggested list of ICD-10 codes, just like they have done for the ICD-9. For this, ICD-10 is certainly a good move forward. We still maintain that a human should review and make the final selection of the codes submitted for billing, due to the complexity of the classification system, the ambiguities in the EMR as written by the healthcare provider, and of course the liabilities in a litigious world. For those of us in the information business, more data is always better than less, so even if ICD-10 is imperfect, and it is, there is no doubt that it’s a step in the right direction.

Marjorie M.K. Hlava
President, Access Innovations

TaxoDiary Blog Achieves Milestones in Sharing Information

June 22, 2015  
Posted in Access Insights, Featured, Taxonomy

The TaxoDiary blog, which can be read at and adds new posts on a daily basis every Monday through Friday, recently published its 3,000th blog post.

TaxoDiary was started in August 2010 and is sponsored and maintained by Access Innovations, Inc., under the leadership of Access Innovations President Marjorie Hlava and Chief Executive Officer Jay Ven Eman.

TaxoDiary covers all types of knowledge organization systems (KOS), and related subjects with daily posts sharing news and opinions. “Monday features” dig a little deeper into taxonomies, semantic technology, and other key areas of interest. There are over 200 of these feature articles that have been researched, written, and shared by the team at Access Innovations.

“The blog is designed as an information vehicle that highlights news of interest and topics that content professionals face daily,” Dr. Ven Eman commented. “Networking and providing access to methods of making content findable are helpful to us all.”

TaxoDiary is designed to provide taxonomists, indexers, and content professionals with news and opinions about categorization, and the application of KOS to increase findability of information objects within or across large collections of information, structured in databases, or unstructured in content repositories using controlled vocabularies.

Subscribing to TaxoDiary will deliver the posts directly to email, either as they are posted or as a daily summary.


About Access Innovations, Inc. –,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Father of Library Science

June 15, 2015  
Posted in Access Insights, Featured, Taxonomy

Father’s Day is coming up soon, so we thought we’d pay homage to Shiyali Ramamrita (S. R.) Ranganathan. As described in the Wikipedia article about him, Ranganathan “is considered to be the father of library science, documentation, and information science in India and is widely known throughout the rest of the world for his fundamental thinking in the field.” He is also regarded by many information science professionals throughout the world as the father of library science. That’s a lot of fathering.

Ranganathan is perhaps best known for devising a system of faceted classification that had enormous influence on classification and indexing science. We’ve already observed (in “Ranganathan, Classification, and British Toys”) about how his career path, along with a peek into the window of a toy store, provided the background and inspiration for his Colon Classification system.

In 1931, Ranganathan’s book The Five Laws of Library Science was published. While some of the more specific recommendations in the text have been rendered obsolete by technological advances, there are many passages that are still relevant. Let’s have a sampling.

From pages 1, 6-7:

The first law of Library Science is: BOOKS ARE FOR USE. No one will question the correctness of this law. But, in actual practice, the story is different. The law is seldom borne in mind by library authorities. We may examine the history of any aspect of library practice and we shall find ample evidence of a deplorable neglect of this law.

[This is followed by several illustrative anecdotes focusing on library authorities who fortunately have remained nameless.]

On the other hand a modern librarian, who has faith in the law that ‘BOOKS ARE FOR USE,’ is happy only when his readers make his shelves constantly empty. It is not the books that go out that worry him. It is the stay-at-home volumes that perplex and distress him. He too will constantly cross the yard to meet his Agassizes. But he will go to them, not to snatch away the books they are using, but to distribute the new arrivals that need to be introduced to them as rapidly as possible.

From page 49 (where it’s evident that “indexing” wasn’t an entirely accepted word for the activity quite yet, at least not in a library context):

Not infrequently one comes across a bumptious upstart, who has the cheek to say, “What is there in indexing?” meaning by ‘indexing’, Cataloguing. One only wishes that he was allowed to try his hand at ‘indexing’ for a couple of months to discover for himself what a mess he is capable of making.

From page 50, which brings to mind the love-hate relationship between taxonomists and subject matter experts:

Another, a specialist quite jealous of the rights of his line of experts, may make a flippant remark, “That is not the way to classify. This is the way to catalogue. Reference-work is not in your province. It is the preserve of the Professors” and so on. One has to tell him “Mr. Specialist, I am a specialist in my line as much as you are, Sir, in yours. If your field is clouded in mystery and needs prolonged formal initiation, so is mine. Remember what you will think of any uninitiated Tom, Dick or Harry who attempts to poke his nose into your sphere.”

From pages 293-294, on the Second Law, Every reader his/her book:

It is a peculiar sort of knowledge that is needed to find for EVERY PERSON HIS BOOK. People at all levels will seek the help of the Library Staff to find their books. It may be a freshman that wants help to prepare for the scholarship examination; it may be a senior student who wants to lead a debate on feminism; it may be a professor who wants to settle a point in the phonology of the Dravidian vowel system; it may be a physicist who wants the book that will give him just enough and no more of Matrices to understand Heisenberg’s treatment of Wave Mechanics. …

No person can depend on his memory to say what his library resources are on such a bewildering range of subjects. The Library Staff have necessity to depend on certain recognised mechanical aids, to discharge their obligations in helping EVERY PERSON TO FIND HIS BOOK.

From pages 382-383, and largely true of taxonomies and research databases, as well as physical libraries:

The Fifth Law is: A LIBRARY IS A GROWING ORGANISM. It is an accepted biological fact that a growing organism alone will survive. An organism which ceases to grow will petrify and perish. The Fifth Law invites our attention to the fact that the library, as an institution, has all the attributes of a growing organism. A growing organism takes in new matter, casts off old matter, changes in size and takes new shapes and forms. Apart from sudden and apparently discontinuous changes involved in metamorphosis, it is also subject to a slow continuous change which leads to what is known as ‘variation’, in biological parlance, and to the evolution of new forms. … The one thing that has been persisting through all those changes of form has been the vital principle of life. So it is with the library.

From pages 397-398, where the Fifth Law leads us to a discussion of classification approaches:

Another important matter that needs to be examined in the light of the Fifth Law is the classification of books. In the first place, as A LIBRARY IS A GROWING ORGANISM and as knowledge itself is growing, it is necessary that the “classification must be comprehensive, embracing all past and present knowledge and allowing places for any possible additions to knowledge”. Indeed this has been set down by Mr. Sayers [William Charles (W.C.) Berwick Sayers, Ranganathan’s mentor in library science at the University of London] as the first canon of classification. To quote Sayers again, “A classification must be elastic, expansible, and hospitable in the highest degree. That is to say, it must be so constructed that any new subject may be inserted into it without dislocating its sequence”. Cases like that of Wave Mechanics, Matrices, Raman Effect, Internal Combustion Engine, Radium, Behaviourism, Dalton Plan and the entire subject of Sociology have had to be accommodated within living memory. It can not be said that all the printed schemes in force have come quite unscathed out of this trial.

And we’ll conclude with an excerpt from page 414, where Ranganathan looks towards the future (as do all good fathers):

What further stages of evolution are in store for this GROWING ORGANISM — the library — we can only wait and see. Who knows that a day may not come — at least [Orson] Wells has pictured a world in which dissemination of knowledge will be effected by direct thought transfer, in the Dakshinamurti fashion, without the invocation of the spoken or the printed word — that a day may not come when the dissemination of knowledge, which is the vital function of libraries, will be realised by libraries even by means other than those of the printed book?

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Photo, S. R. Ranganathan’s photo at City Central Library, Hyderabad, India. Photo by Krzna,, CC BY-SA 3.0.

Down the Rabbit Hole

June 8, 2015  
Posted in Access Insights, Featured, Taxonomy

As many readers know, the main U.S. standard for taxonomies, thesauri, and other controlled vocabularies is ANSI/NISO Z39.19, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. The material on vocabulary displays covers types of vocabulary displays that might be unfamiliar. I’m guessing that some of these displays emerged as the result of technological limitations, and have persisted simply because of inertia. This seems to be the case with the flat format display, about which I’ve already written (see “Blind Alleys, Dead Ends, and Mazes”).

This also might be the case with the display practice described in Z39.19’s Section 8.4.3, Node Labels for Related Terms: “In order to bring closely related concepts together in the alphabetical array under a given term, related terms may be divided [grouped?] into categories that do not form part of a logical hierarchy. These related terms should then be identified by a node label.” What do we get here? A shallow, illogical hierarchy? Related terms without a useful hierarchy?

And then there’s Section, Faceted Display: “Some controlled vocabularies provide a display of the terms organized according to the broad categories or facets to which the term belongs. Facets may have a hierarchical arrangement as well so that narrower facets are arranged within broader categories.” And then again, there might not be a hierarchical arrangement. What then?

At best, we get a relatively flat list with very general groupings, as with the related terms display. At worst, perhaps, the faceted display might be associated with a thesaurus hierarchy. We get a display of categories that look like the top terms in a thesaurus hierarchy, but that actually obscure the hierarchy. Let’s dive into such a thesaurus, shall we?

Again, I’m going to pick on the U.S. Department of Education’s ERIC Thesaurus, which I’ve picked on before for the flat format displays used for its individual terms. To its credit, it dates back to 1964, but it’s showing its age. The main point of entry for browsing is the webpage that lists the main “categories”. It would be easy for you to assume that those categories are the top terms in the hierarchical structure of the thesaurus. (After all, it is a “thesaurus”, right?) But no. This vocabulary is something of a hybrid creature. As mentioned in Z39.19, Appendix C (at C.13, Faceted Displays), a faceted display “Provides a view of the vocabulary that is complementary to any strict hierarchical arrangement”. The categories look like they’re there to simplify navigation, but the hierarchy is nowhere in sight.

Picking at random – If you click on the category “Arts”, you’ll see a flat list of several terms. One of them is “Art History”. Click on “Art History”, and you’ll see that doesn’t have a narrower term, but it does have a broader term, “Intellectual History”. Okay, let’s travel back up to the top (not that we’ve traveled very far down). If you click on “Intellectual History”, you’ll discover that it’s in the category of “Humanities”, with the category “Arts” nowhere to be seen. “Intellectual History” also has a broader term, “History”, which in turn has two broader terms, neither of which is taking us back up to where we jumped in.

Try jumping in somewhere else. You’re likely to find the same kind of inconsistency on your way back up.

In Appendix C (at C.13, Faceted Displays), the Z39.19 standard warns that with a faceted display, it “Can be difficult for users to locate a specific term”. Yes, indeed!

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Illustration by Jessie Willcox Smith

Marjorie M.K. Hlava to Receive the John Cotton Dana Award from the Special Libraries Association

June 1, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc. is pleased to announce that its president and founder, Marjorie M.K. Hlava, has been named the recipient of the highest honor given by the Special Libraries Association (SLA), the John Cotton Dana Award.


Named for the founder and original president of SLA, this prestigious honor recognizes a lifetime of achievement in the field of library science and exceptional service to the association and to the field at large. Hlava, a Fellow of the association, served on the SLA Board of Directors from 1990 through 1992, received the President’s Award for her outstanding service to the organization in 2000, and has received multiple honors over the years for her work in the industry.

“Marjorie Hlava personifies the spirit of innovation and the commitment to professionalism that were hallmarks of John Cotton Dana’s career, and it is fitting that she should receive SLA’s highest honor,” said SLA 2015 President Jill Strand. “She has been a mentor, a leader, and a friend to many information professionals over the course of her career, and I look forward to seeing her receive the Dana Award in Boston.”

A member of SLA since 1976, she has served on many committees on the board, chapter, and division levels. She was co-creator and served as Chair of the Taxonomy Division and chaired the Nominations and Professional Development Committees therein. Additionally, she served as Chair of the Information Technology Division and, within that, served on the Executive, Nominations, and Networking Committees, as well as serving as the Division Archivist.

On the Association Level, Hlava served as Director at Large of the Board of Directors, chaired the DACOLT, Long Range Plan, Non-Serial Publication Review, Directory, Technical Standards, and Bylaws Committees, as well as serving on the Bylaws and Finance Committees. Additionally, she served as the SLA voting representative to the National Information Standards Organization (NISO), created and implemented the SLA thesaurus and taxonomy, and consulted on the redesign of the SLA website navigation system.

Her Chapter activities have included serving as Rio Grande Chapter President and Vice President and chaired the Special Projects, Employment, Membership, Career Counselor, and Nominations Committees, in addition to numerous other roles at the Chapter Level.

“I am surprised, delighted, and humbled by this honor,” commented Ms. Hlava. “I have always enjoyed my years of service to SLA and found the meetings and presentations a springboard for new ideas. The insights gained from networking with other members have fueled my desire to undertake new (and sometimes daring!) developments with my company’s software and services. These conversations have often helped find creative ways to address the applications of information science and its challenges. I look forward to many more years of continued involvement in SLA.”

For over 40 years, Hlava has been a thought leader in library and information science. She is well known internationally for her work in the implementation of information science principles and the technology and standards that support them. She has served as the president of a number of industry associations, including the National Federation of Advanced Information Services (NFAIS) and the Association for Information Science and Technology (ASIS&T). Hlava is the author of The Taxobook (Morgan & Claypool, 2014-2015), a three-volume book series on taxonomies and thesauri. She is now developing ontological structures to serve linked data, which she feels is the future of semantics and information science.

The presentation of the John Cotton Dana Award will take place at the SLA 205 Annual Conference in Boston, Massachusetts on June 14, 2015. In addition to receiving the award, Hlava will conduct a workshop on advanced taxonomy concepts at the conference.


About Access Innovations, Inc. –,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.


About SLA –

The Special Libraries Association is the global organization for innovative information professionals and their strategic partners that promotes and strengthens its members through learning, advocacy, and networking initiatives. SLA is a nonprofit corporation incorporated in the State of New York and organized for the purpose of “provid[ing] an association of individuals and organizations having a professional interest in the strategic use of information.” SLA is organized into 56 regional chapters and 26 divisions representing subject interests, fields, or types of information-handling techniques.

Cloudy with a chance of crab cakes

May 25, 2015  
Posted in Access Insights, Featured, Taxonomy

(Note: Back in November of 2013, TaxoDiary published the post “A Cloud Drifting Toward a Classification“, about the cloud formation tentatively labeled as Asperatus undulatus and the quest to achieve an official classification for it. The article below, which first appeared in the Apalachicola Times on April 8, 2015, focuses on matters of terminology concerning the same cloud, from the viewpoint of a scholar of ancient Latin language and literature. Republished by kind permission of the Apalachicola Times, which holds all rights to the article.)


The cloud’s name would be undulatus asperatus, Latin for “wave-like (and) rough.”– ALLEN GATHMAN

As locals and visitors alike will attest, all the Apalach/St. George Island folk are strong and good-looking, the children are above average, and there’s rarely a cloud in the sky.

Yet just when it seemed Ecclesiastes was right to observe there’s “nothing new under the sun” (nihil sub sole novum, as in “nihilist,” “solar,” and “novelty”), cloud buffs and cooperating scientists believe they’ve identified a previously unclassified cloud type – the first since 1951 – a variant of the undulatus class they are proposing to call undulatus asperatus, Latin for “wave-like (and) rough.” There’re lots of photos at the Cloud Appreciation Society (CAS) website at (search on asperatus then undulatus) and a compilation of videos on YouTube (search “undulatus asperatus compilation”).

Gavin Pretor-Pinney, founder of the CAS, resourcefully tapped into the mother tongue for the name of the cloud formation, which he and Graeme Anderson, a meteorologist assisting his research, hope to have officially recognized by the United Nations World Meteorological Organization (WMO) this year or next. Undulatus, like English “UNDulate,” means to “move like ocean waves,” from the noun unda/“wave.” That same root word gives us “redUNDant,” a synonym of “repetitive,” which, once you know this bit of Latin, evokes an image of waves breaking again and again (re-), REpeatedly, against the shore and sending the sand crabs skittering. The root of asperatus appears in “exASPERate,” literally “to roughen” but commonly (as when your spouse describes your mannerisms!) meaning “to irritate, annoy, anger.” You might also know the noun “ASPERity,” meaning roughness – of touch, of climate, or of a person’s behavior. Vergil in his first century B.C. epic poem the Aeneid used the phrase asperat undas of a wintry storm at sea that “makes the waves rough.”

The Latin word for clouds in general is nubes, and the Romans called the centaurs nubigenae, “cloud-born” (from gen-, “to beget,” as in “GENerate” and “proGENitor”), since the murderous Ixion had seduced a cloud-image of the goddess Hera, enGENdering a deformed son Centauros, who later mated with a group of mares and sired those mythic man-horse critters! A rain-cloud specifically is a nimbus, which gives us “cumuloNIMBUS” for those menacing, heaped up, towering clouds that can signal a coming rainstorm or hail: Latincumulus is a “heap” and to “acCUMULate” is to pile things up–like the little mounds of sand those tiny sand crabs often kick up round their holes along the St. George shoreline. “Cumulonimbus” was originally called “Cloud Nine” in the inaugural edition of the WMO’s International Cloud Atlas, published in 1896; that work’s title likely influenced the name of a series of piano compositions by Yoko Ono’s first husband, Toshi Ichiyanagi, whose music in turn inspired the title of David Mitchell’s sci-fi novel “Cloud Atlas” and its 2012 film adaptation.

Here’s wishing you a euphoric, not rainy, Cloud Nine day, and may all yournubes be lined with silver (argentum, chemical symbol AG). As for me, I’m heading down to Water Street in quest of crab cakes, and as I slather them with remoulade I’ll try not to think of their crustacean brethren frolicking on the cloudless beaches of St. George Island.

By Rick LaFleur

Rick LaFleur is retired from 40 years of teaching Latin language and literature at the University of Georgia, which during his tenure there came to have the largest Latin enrollment of all of the nation’s colleges and universities; he is not quite sure whether he loves Latin or Apalachicola more.

To Cap or Not to Cap?

Every once in a while, the issue of capitalization in taxonomies and thesauri pops up. Some of us in taxonomy land believe that it does make a difference what capitalization (versus lower case) style you use. We just don’t necessarily agree what that style should be.

The National Standards Organization Institute (NISO) standard for controlled vocabularies (ANSI/NISO Z39.19, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies), which was last revised in 2005 and reaffirmed in 2010, has this to say on the subject, on page 34:

It is recommended that predominantly lowercase characters be used for terms in controlled vocabularies… Capitals should be used only for the initial letter(s) of proper names, trade names, and for those components of taxonomic names, such as genus, which are conventionally capitalized. Capitals should be used for all the letters of initialisms or where featured in unusual positions in product or corporate names. Because lowercase letters can occur in unusual positions in proper names, using a combination of capitals and lowercase letters in controlled vocabularies indicates to the user the correct orthography of a term in natural language and serves to distinguish common nouns from similar proper names. 

Example 57: Capitalization of proper and trade names



information systems

Information Systems Corp.


[A note about “should”: Per ANSI/NISO Z39.19, page 2, “The conventions used in this Standard to indicate the force of recommendations are: must (required for meeting the Standard), should (recommended), and may (optional). The Standard also uses the conventions must not (not allowed in order to be in compliance with the Standard) and should not (not recommended).” So the NISO standard recommends the practice above, but does not insist on it.]

[Another note: A reconsideration/revision of Z39.19 is due soon.]

Most of this makes sense to me. It’s certainly much better than the solid caps default of early machine-readable taxonomies and thesauri. From what I understand, they were completely capitalized because of technological limitations and space-saving considerations. Have you ever tried to browse the printed records of those vocabularies? They’re horribly difficult to read.

At the same time, the readability issue is what makes me object to Z39.19’s recommendation. Specifically, I have problems with the terms that begin with lowercase letters. They sort of merge with the line above, rather than clearly being separate terms. They don’t have the visual boundaries that capitalization can provide.

I do appreciate the rationale of Z39.19 that “using a combination of capitals and lowercase letters in controlled vocabularies indicates to the user the correct orthography of a term in natural language.” And I know fellow taxonomists who strongly agree with the lowercase-unless-it’s-a-proper-noun approach. For a general controlled vocabulary that serves as a reference for how terms appear in natural language, that dictionary-ish approach kind of makes sense.

My take on that, though, as far as taxonomies and hierarchical thesauri are concerned, is that taxonomies and their kin are more like outlines than like dictionaries, and outline items are capitalized for clarity, to indicate where new items start. Moreover, most traditional dictionaries have the visual benefit (for our purposes) of tiny text filling up the distance between terms, whereas in hierarchical taxonomy displays (which are generally the most useful views), the terms appear on consecutive lines. And indents can confuse things even more if terms are lowercase; the narrower terms look like runover lines.

Taxonomist Heather Hedden has written a blog post on the subject of capitalization in taxonomies. She views initial capitalization of terms as analogous with capitalization style in headings:

A “taxonomy” implies a hierarchical classification or categorization of concepts. When we think of categories we think of labels or headings with subcategories. Headings in general tend to have initial capitalization or title capitalization. Thus, if it’s a strictly hierarchical taxonomy, where all terms are interconnected into a single hierarchy or a limited number of hierarchies, then it will more likely have initial capitalization or title capitalization. Such capitalization is particularly common on the relatively smaller/less detailed taxonomies that are proliferating on websites, intranets, and content management systems. It fits in with the web design style of capitalization on headings and categories.

As Heather points out, initial capitalization is a fairly common practice, despite Z39.19. I think she’s referring mostly to initial (letter) capitalization of the first word; I haven’t seen that Title Style is common at all. (In fact, I don’t remember seeing it at all in a taxonomy.)

I have seen modern taxonomies and thesauri that have solid (every letter) capitalization on the just on the top level, to indicate major categories. These are usually just category designations, rather than indexing terms. Heather comments: “A good application of the mixed capitalization style is if the top level terms were not actually to be used in indexing/tagging but are really just categories/groupings of the actual index terms, which in-turn are arranged hierarchically underneath.”

Ultimately, it’s up to the taxonomy owners to determine what style to use. (And might I remind you, a reconsideration of Z39.19 is due soon.) The main factors to consider are readability, clarity, and usability.

By Barbara Gilles, Taxonomist
Access Innovations, Inc.

« Previous PageNext Page »