ICD-10 and Taxonomies

June 29, 2015  
Posted in Access Insights, Featured, Taxonomy

In Canada, the implementation of ICD-10 caused an approximate 50% drop in productivity in rates of coding in the health profession. Reports show that a year after implementation, productivity only returned to 80% of the original ICD-9 baseline. Now, more than a decade since Canada adopted the new coding system, they still have yet to reach original productivity levels.

This reduction in productivity is understandable. ICD-10 is vastly more complex than ICD-9, and people sometimes have difficulty adapting to change. It’s naturally going to cause fear in the industry. The 3-page bill proposed to the US Congress by Rep. Ted Poe (R-TX), HR 2126, would allay those fears for a while by delaying U.S. implementation of ICD-10, which has already occurred several times. As noted in the referenced article, however, neither committee has decided to hear the bill, so yes, it seems highly likely that ICD-10 implementation will occur in the United States on October 1, as (currently) scheduled.

But tell me: do we really need to know that someone was “bitten by a turtle” (W59.21XA) rather than “struck by a turtle” (W59.22XA)? And if the person was bitten on the toe by that turtle, do we really need to know whether it was the right big toe (S90.471A) or the left big toe (S90.472A)? What if the incident happened, for some reason, while the person was waterskiing (Y93.17)? Do we need to know that? It is definitely helpful to know something about laterality, which side was the injury was on, and about the person’s activity when the injury occurred. It is also nice to know how many people a turtle bit this year, as well as how many waterskiing accidents.

In another (less funny but more likely) example, someone driving a pickup strikes a lamppost while texting. The person’s head strikes the dashboard, causing a contusion (S00.83XA), while the truck’s airbag expands and strikes them on the right side of the chest, causing a contusion there (S20.211A). That it happened in a pickup matters (V57.5XXA), as does the fact that they were texting at the time (Y93.C2). This is all highly useful information, but does it have to be done as a pre coordinate highly complex classification system? The format of the ICD-10 codes, and for that matter the ICD-9 codes, is quite old fashioned in a post-coordinate world.

A taxonomy — and it would be a large one — would better provide the desired data and in a much more flexible form than the ICD-10. Just a single laterality rule for left and right would remove a goodly portion of the listed codes.  If it is the interest of the Centers for Medicare and Medicaid Services (CMS) to gather more data about health and health care in the USA, then why not apply some proven, inexpensive, easy-to-implement algorithms rather than cause a Y2K-style panic throughout the entire healthcare industry?

In example one above, we could code for “turtle,” “toe,” “bite,” and even water or waterskiing. If any of those items was missing in the source data, i.e., the electronic medical record (EMR), we could still do a good job of determining the cause of the bite and collating the data for later reporting and retrieval. It would be possible to mine the data more effectively than using the classification system, since the data would be disaggregated and available. For example, all kinds of water-related accidents could be retrieved, as could all turtle-related injuries, bite-related injuries, etc.

Going one step further, one could even link the data as RDF triples for a full semantic enrichment and ontology approach, which would surface all kinds of fascinating relationships. One could then visualize the data in various presentations for quick understanding of how much danger there is the general populace regarding turtles bites and waterskiing.

Access Integrity has built a system to review EMRs and instantly provide a suggested list of ICD-10 codes, just like they have done for the ICD-9. For this, ICD-10 is certainly a good move forward. We still maintain that a human should review and make the final selection of the codes submitted for billing, due to the complexity of the classification system, the ambiguities in the EMR as written by the healthcare provider, and of course the liabilities in a litigious world. For those of us in the information business, more data is always better than less, so even if ICD-10 is imperfect, and it is, there is no doubt that it’s a step in the right direction.

Marjorie M.K. Hlava
President, Access Innovations

TaxoDiary Blog Achieves Milestones in Sharing Information

June 22, 2015  
Posted in Access Insights, Featured, Taxonomy

The TaxoDiary blog, which can be read at www.taxodiary.com and adds new posts on a daily basis every Monday through Friday, recently published its 3,000th blog post.

TaxoDiary was started in August 2010 and is sponsored and maintained by Access Innovations, Inc., under the leadership of Access Innovations President Marjorie Hlava and Chief Executive Officer Jay Ven Eman.

TaxoDiary covers all types of knowledge organization systems (KOS), and related subjects with daily posts sharing news and opinions. “Monday features” dig a little deeper into taxonomies, semantic technology, and other key areas of interest. There are over 200 of these feature articles that have been researched, written, and shared by the team at Access Innovations.

“The blog is designed as an information vehicle that highlights news of interest and topics that content professionals face daily,” Dr. Ven Eman commented. “Networking and providing access to methods of making content findable are helpful to us all.”

TaxoDiary is designed to provide taxonomists, indexers, and content professionals with news and opinions about categorization, and the application of KOS to increase findability of information objects within or across large collections of information, structured in databases, or unstructured in content repositories using controlled vocabularies.

Subscribing to TaxoDiary will deliver the posts directly to email, either as they are posted or as a daily summary.


About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com,www.taxodiary.com

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Father of Library Science

June 15, 2015  
Posted in Access Insights, Featured, Taxonomy

Father’s Day is coming up soon, so we thought we’d pay homage to Shiyali Ramamrita (S. R.) Ranganathan. As described in the Wikipedia article about him, Ranganathan “is considered to be the father of library science, documentation, and information science in India and is widely known throughout the rest of the world for his fundamental thinking in the field.” He is also regarded by many information science professionals throughout the world as the father of library science. That’s a lot of fathering.

Ranganathan is perhaps best known for devising a system of faceted classification that had enormous influence on classification and indexing science. We’ve already observed (in “Ranganathan, Classification, and British Toys”) about how his career path, along with a peek into the window of a toy store, provided the background and inspiration for his Colon Classification system.

In 1931, Ranganathan’s book The Five Laws of Library Science was published. While some of the more specific recommendations in the text have been rendered obsolete by technological advances, there are many passages that are still relevant. Let’s have a sampling.

From pages 1, 6-7:

The first law of Library Science is: BOOKS ARE FOR USE. No one will question the correctness of this law. But, in actual practice, the story is different. The law is seldom borne in mind by library authorities. We may examine the history of any aspect of library practice and we shall find ample evidence of a deplorable neglect of this law.

[This is followed by several illustrative anecdotes focusing on library authorities who fortunately have remained nameless.]

On the other hand a modern librarian, who has faith in the law that ‘BOOKS ARE FOR USE,’ is happy only when his readers make his shelves constantly empty. It is not the books that go out that worry him. It is the stay-at-home volumes that perplex and distress him. He too will constantly cross the yard to meet his Agassizes. But he will go to them, not to snatch away the books they are using, but to distribute the new arrivals that need to be introduced to them as rapidly as possible.

From page 49 (where it’s evident that “indexing” wasn’t an entirely accepted word for the activity quite yet, at least not in a library context):

Not infrequently one comes across a bumptious upstart, who has the cheek to say, “What is there in indexing?” meaning by ‘indexing’, Cataloguing. One only wishes that he was allowed to try his hand at ‘indexing’ for a couple of months to discover for himself what a mess he is capable of making.

From page 50, which brings to mind the love-hate relationship between taxonomists and subject matter experts:

Another, a specialist quite jealous of the rights of his line of experts, may make a flippant remark, “That is not the way to classify. This is the way to catalogue. Reference-work is not in your province. It is the preserve of the Professors” and so on. One has to tell him “Mr. Specialist, I am a specialist in my line as much as you are, Sir, in yours. If your field is clouded in mystery and needs prolonged formal initiation, so is mine. Remember what you will think of any uninitiated Tom, Dick or Harry who attempts to poke his nose into your sphere.”

From pages 293-294, on the Second Law, Every reader his/her book:

It is a peculiar sort of knowledge that is needed to find for EVERY PERSON HIS BOOK. People at all levels will seek the help of the Library Staff to find their books. It may be a freshman that wants help to prepare for the scholarship examination; it may be a senior student who wants to lead a debate on feminism; it may be a professor who wants to settle a point in the phonology of the Dravidian vowel system; it may be a physicist who wants the book that will give him just enough and no more of Matrices to understand Heisenberg’s treatment of Wave Mechanics. …

No person can depend on his memory to say what his library resources are on such a bewildering range of subjects. The Library Staff have necessity to depend on certain recognised mechanical aids, to discharge their obligations in helping EVERY PERSON TO FIND HIS BOOK.

From pages 382-383, and largely true of taxonomies and research databases, as well as physical libraries:

The Fifth Law is: A LIBRARY IS A GROWING ORGANISM. It is an accepted biological fact that a growing organism alone will survive. An organism which ceases to grow will petrify and perish. The Fifth Law invites our attention to the fact that the library, as an institution, has all the attributes of a growing organism. A growing organism takes in new matter, casts off old matter, changes in size and takes new shapes and forms. Apart from sudden and apparently discontinuous changes involved in metamorphosis, it is also subject to a slow continuous change which leads to what is known as ‘variation’, in biological parlance, and to the evolution of new forms. … The one thing that has been persisting through all those changes of form has been the vital principle of life. So it is with the library.

From pages 397-398, where the Fifth Law leads us to a discussion of classification approaches:

Another important matter that needs to be examined in the light of the Fifth Law is the classification of books. In the first place, as A LIBRARY IS A GROWING ORGANISM and as knowledge itself is growing, it is necessary that the “classification must be comprehensive, embracing all past and present knowledge and allowing places for any possible additions to knowledge”. Indeed this has been set down by Mr. Sayers [William Charles (W.C.) Berwick Sayers, Ranganathan’s mentor in library science at the University of London] as the first canon of classification. To quote Sayers again, “A classification must be elastic, expansible, and hospitable in the highest degree. That is to say, it must be so constructed that any new subject may be inserted into it without dislocating its sequence”. Cases like that of Wave Mechanics, Matrices, Raman Effect, Internal Combustion Engine, Radium, Behaviourism, Dalton Plan and the entire subject of Sociology have had to be accommodated within living memory. It can not be said that all the printed schemes in force have come quite unscathed out of this trial.

And we’ll conclude with an excerpt from page 414, where Ranganathan looks towards the future (as do all good fathers):

What further stages of evolution are in store for this GROWING ORGANISM — the library — we can only wait and see. Who knows that a day may not come — at least [Orson] Wells has pictured a world in which dissemination of knowledge will be effected by direct thought transfer, in the Dakshinamurti fashion, without the invocation of the spoken or the printed word — that a day may not come when the dissemination of knowledge, which is the vital function of libraries, will be realised by libraries even by means other than those of the printed book?

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Photo, S. R. Ranganathan’s photo at City Central Library, Hyderabad, India. Photo by Krzna, http://commons.wikimedia.org/wiki/File:S._R._Ranganathan.jpg, CC BY-SA 3.0.

Next stop – Boston!

June 9, 2015  
Posted in Access Insights, News, Taxonomy

Are you headed to the Special Libraries Association (SLA) Conference in Boston this week? We will see you there. Margie Hlava, President of Access Innovations, Inc. and Bob Kasenchak, Head of Product Development, will be leading a workshop titled, “So You Have a Taxonomy…Now What? Advanced Taxonomy Topics.”

The workshop addresses the next steps beyond just having a taxonomy. It will cover topics involving maintaining, implementing, and leveraging your taxonomy. Topics will include vocabulary revision and upkeep, content review, semantic enrichment, places to use a taxonomy in your workflow, taxonomy-driven analytics, data visualization based on taxonomy terms, and other uses (and abuses!) of taxonomies, thesauri, and authority files. They will also cover some common pitfalls to avoid.

The SLA 2015 Annual Conference is loaded with many networking opportunities. We hope to see you there.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.

Down the Rabbit Hole

June 8, 2015  
Posted in Access Insights, Featured, Taxonomy

As many readers know, the main U.S. standard for taxonomies, thesauri, and other controlled vocabularies is ANSI/NISO Z39.19, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. The material on vocabulary displays covers types of vocabulary displays that might be unfamiliar. I’m guessing that some of these displays emerged as the result of technological limitations, and have persisted simply because of inertia. This seems to be the case with the flat format display, about which I’ve already written (see “Blind Alleys, Dead Ends, and Mazes”).

This also might be the case with the display practice described in Z39.19’s Section 8.4.3, Node Labels for Related Terms: “In order to bring closely related concepts together in the alphabetical array under a given term, related terms may be divided [grouped?] into categories that do not form part of a logical hierarchy. These related terms should then be identified by a node label.” What do we get here? A shallow, illogical hierarchy? Related terms without a useful hierarchy?

And then there’s Section, Faceted Display: “Some controlled vocabularies provide a display of the terms organized according to the broad categories or facets to which the term belongs. Facets may have a hierarchical arrangement as well so that narrower facets are arranged within broader categories.” And then again, there might not be a hierarchical arrangement. What then?

At best, we get a relatively flat list with very general groupings, as with the related terms display. At worst, perhaps, the faceted display might be associated with a thesaurus hierarchy. We get a display of categories that look like the top terms in a thesaurus hierarchy, but that actually obscure the hierarchy. Let’s dive into such a thesaurus, shall we?

Again, I’m going to pick on the U.S. Department of Education’s ERIC Thesaurus, which I’ve picked on before for the flat format displays used for its individual terms. To its credit, it dates back to 1964, but it’s showing its age. The main point of entry for browsing is the webpage that lists the main “categories”. It would be easy for you to assume that those categories are the top terms in the hierarchical structure of the thesaurus. (After all, it is a “thesaurus”, right?) But no. This vocabulary is something of a hybrid creature. As mentioned in Z39.19, Appendix C (at C.13, Faceted Displays), a faceted display “Provides a view of the vocabulary that is complementary to any strict hierarchical arrangement”. The categories look like they’re there to simplify navigation, but the hierarchy is nowhere in sight.

Picking at random – If you click on the category “Arts”, you’ll see a flat list of several terms. One of them is “Art History”. Click on “Art History”, and you’ll see that doesn’t have a narrower term, but it does have a broader term, “Intellectual History”. Okay, let’s travel back up to the top (not that we’ve traveled very far down). If you click on “Intellectual History”, you’ll discover that it’s in the category of “Humanities”, with the category “Arts” nowhere to be seen. “Intellectual History” also has a broader term, “History”, which in turn has two broader terms, neither of which is taking us back up to where we jumped in.

Try jumping in somewhere else. You’re likely to find the same kind of inconsistency on your way back up.

In Appendix C (at C.13, Faceted Displays), the Z39.19 standard warns that with a faceted display, it “Can be difficult for users to locate a specific term”. Yes, indeed!

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Illustration by Jessie Willcox Smith

Marjorie M.K. Hlava to Receive the John Cotton Dana Award from the Special Libraries Association

June 1, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc. is pleased to announce that its president and founder, Marjorie M.K. Hlava, has been named the recipient of the highest honor given by the Special Libraries Association (SLA), the John Cotton Dana Award.


Named for the founder and original president of SLA, this prestigious honor recognizes a lifetime of achievement in the field of library science and exceptional service to the association and to the field at large. Hlava, a Fellow of the association, served on the SLA Board of Directors from 1990 through 1992, received the President’s Award for her outstanding service to the organization in 2000, and has received multiple honors over the years for her work in the industry.

“Marjorie Hlava personifies the spirit of innovation and the commitment to professionalism that were hallmarks of John Cotton Dana’s career, and it is fitting that she should receive SLA’s highest honor,” said SLA 2015 President Jill Strand. “She has been a mentor, a leader, and a friend to many information professionals over the course of her career, and I look forward to seeing her receive the Dana Award in Boston.”

A member of SLA since 1976, she has served on many committees on the board, chapter, and division levels. She was co-creator and served as Chair of the Taxonomy Division and chaired the Nominations and Professional Development Committees therein. Additionally, she served as Chair of the Information Technology Division and, within that, served on the Executive, Nominations, and Networking Committees, as well as serving as the Division Archivist.

On the Association Level, Hlava served as Director at Large of the Board of Directors, chaired the DACOLT, Long Range Plan, Non-Serial Publication Review, Directory, Technical Standards, and Bylaws Committees, as well as serving on the Bylaws and Finance Committees. Additionally, she served as the SLA voting representative to the National Information Standards Organization (NISO), created and implemented the SLA thesaurus and taxonomy, and consulted on the redesign of the SLA website navigation system.

Her Chapter activities have included serving as Rio Grande Chapter President and Vice President and chaired the Special Projects, Employment, Membership, Career Counselor, and Nominations Committees, in addition to numerous other roles at the Chapter Level.

“I am surprised, delighted, and humbled by this honor,” commented Ms. Hlava. “I have always enjoyed my years of service to SLA and found the meetings and presentations a springboard for new ideas. The insights gained from networking with other members have fueled my desire to undertake new (and sometimes daring!) developments with my company’s software and services. These conversations have often helped find creative ways to address the applications of information science and its challenges. I look forward to many more years of continued involvement in SLA.”

For over 40 years, Hlava has been a thought leader in library and information science. She is well known internationally for her work in the implementation of information science principles and the technology and standards that support them. She has served as the president of a number of industry associations, including the National Federation of Advanced Information Services (NFAIS) and the Association for Information Science and Technology (ASIS&T). Hlava is the author of The Taxobook (Morgan & Claypool, 2014-2015), a three-volume book series on taxonomies and thesauri. She is now developing ontological structures to serve linked data, which she feels is the future of semantics and information science.

The presentation of the John Cotton Dana Award will take place at the SLA 205 Annual Conference in Boston, Massachusetts on June 14, 2015. In addition to receiving the award, Hlava will conduct a workshop on advanced taxonomy concepts at the conference.


About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.


About SLA – www.sla.org

The Special Libraries Association is the global organization for innovative information professionals and their strategic partners that promotes and strengthens its members through learning, advocacy, and networking initiatives. SLA is a nonprofit corporation incorporated in the State of New York and organized for the purpose of “provid[ing] an association of individuals and organizations having a professional interest in the strategic use of information.” SLA is organized into 56 regional chapters and 26 divisions representing subject interests, fields, or types of information-handling techniques.

Cloudy with a chance of crab cakes

May 25, 2015  
Posted in Access Insights, Featured, Taxonomy

(Note: Back in November of 2013, TaxoDiary published the post “A Cloud Drifting Toward a Classification“, about the cloud formation tentatively labeled as Asperatus undulatus and the quest to achieve an official classification for it. The article below, which first appeared in the Apalachicola Times on April 8, 2015, focuses on matters of terminology concerning the same cloud, from the viewpoint of a scholar of ancient Latin language and literature. Republished by kind permission of the Apalachicola Times, which holds all rights to the article.)


The cloud’s name would be undulatus asperatus, Latin for “wave-like (and) rough.”– ALLEN GATHMAN

As locals and visitors alike will attest, all the Apalach/St. George Island folk are strong and good-looking, the children are above average, and there’s rarely a cloud in the sky.

Yet just when it seemed Ecclesiastes was right to observe there’s “nothing new under the sun” (nihil sub sole novum, as in “nihilist,” “solar,” and “novelty”), cloud buffs and cooperating scientists believe they’ve identified a previously unclassified cloud type – the first since 1951 – a variant of the undulatus class they are proposing to call undulatus asperatus, Latin for “wave-like (and) rough.” There’re lots of photos at the Cloud Appreciation Society (CAS) website at cloudappreciationsociety.org (search on asperatus then undulatus) and a compilation of videos on YouTube (search “undulatus asperatus compilation”).

Gavin Pretor-Pinney, founder of the CAS, resourcefully tapped into the mother tongue for the name of the cloud formation, which he and Graeme Anderson, a meteorologist assisting his research, hope to have officially recognized by the United Nations World Meteorological Organization (WMO) this year or next. Undulatus, like English “UNDulate,” means to “move like ocean waves,” from the noun unda/“wave.” That same root word gives us “redUNDant,” a synonym of “repetitive,” which, once you know this bit of Latin, evokes an image of waves breaking again and again (re-), REpeatedly, against the shore and sending the sand crabs skittering. The root of asperatus appears in “exASPERate,” literally “to roughen” but commonly (as when your spouse describes your mannerisms!) meaning “to irritate, annoy, anger.” You might also know the noun “ASPERity,” meaning roughness – of touch, of climate, or of a person’s behavior. Vergil in his first century B.C. epic poem the Aeneid used the phrase asperat undas of a wintry storm at sea that “makes the waves rough.”

The Latin word for clouds in general is nubes, and the Romans called the centaurs nubigenae, “cloud-born” (from gen-, “to beget,” as in “GENerate” and “proGENitor”), since the murderous Ixion had seduced a cloud-image of the goddess Hera, enGENdering a deformed son Centauros, who later mated with a group of mares and sired those mythic man-horse critters! A rain-cloud specifically is a nimbus, which gives us “cumuloNIMBUS” for those menacing, heaped up, towering clouds that can signal a coming rainstorm or hail: Latincumulus is a “heap” and to “acCUMULate” is to pile things up–like the little mounds of sand those tiny sand crabs often kick up round their holes along the St. George shoreline. “Cumulonimbus” was originally called “Cloud Nine” in the inaugural edition of the WMO’s International Cloud Atlas, published in 1896; that work’s title likely influenced the name of a series of piano compositions by Yoko Ono’s first husband, Toshi Ichiyanagi, whose music in turn inspired the title of David Mitchell’s sci-fi novel “Cloud Atlas” and its 2012 film adaptation.

Here’s wishing you a euphoric, not rainy, Cloud Nine day, and may all yournubes be lined with silver (argentum, chemical symbol AG). As for me, I’m heading down to Water Street in quest of crab cakes, and as I slather them with remoulade I’ll try not to think of their crustacean brethren frolicking on the cloudless beaches of St. George Island.

By Rick LaFleur

Rick LaFleur is retired from 40 years of teaching Latin language and literature at the University of Georgia, which during his tenure there came to have the largest Latin enrollment of all of the nation’s colleges and universities; he is not quite sure whether he loves Latin or Apalachicola more.

To Cap or Not to Cap?

Every once in a while, the issue of capitalization in taxonomies and thesauri pops up. Some of us in taxonomy land believe that it does make a difference what capitalization (versus lower case) style you use. We just don’t necessarily agree what that style should be.

The National Standards Organization Institute (NISO) standard for controlled vocabularies (ANSI/NISO Z39.19, Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies), which was last revised in 2005 and reaffirmed in 2010, has this to say on the subject, on page 34:

It is recommended that predominantly lowercase characters be used for terms in controlled vocabularies… Capitals should be used only for the initial letter(s) of proper names, trade names, and for those components of taxonomic names, such as genus, which are conventionally capitalized. Capitals should be used for all the letters of initialisms or where featured in unusual positions in product or corporate names. Because lowercase letters can occur in unusual positions in proper names, using a combination of capitals and lowercase letters in controlled vocabularies indicates to the user the correct orthography of a term in natural language and serves to distinguish common nouns from similar proper names. 

Example 57: Capitalization of proper and trade names



information systems

Information Systems Corp.


[A note about “should”: Per ANSI/NISO Z39.19, page 2, “The conventions used in this Standard to indicate the force of recommendations are: must (required for meeting the Standard), should (recommended), and may (optional). The Standard also uses the conventions must not (not allowed in order to be in compliance with the Standard) and should not (not recommended).” So the NISO standard recommends the practice above, but does not insist on it.]

[Another note: A reconsideration/revision of Z39.19 is due soon.]

Most of this makes sense to me. It’s certainly much better than the solid caps default of early machine-readable taxonomies and thesauri. From what I understand, they were completely capitalized because of technological limitations and space-saving considerations. Have you ever tried to browse the printed records of those vocabularies? They’re horribly difficult to read.

At the same time, the readability issue is what makes me object to Z39.19’s recommendation. Specifically, I have problems with the terms that begin with lowercase letters. They sort of merge with the line above, rather than clearly being separate terms. They don’t have the visual boundaries that capitalization can provide.

I do appreciate the rationale of Z39.19 that “using a combination of capitals and lowercase letters in controlled vocabularies indicates to the user the correct orthography of a term in natural language.” And I know fellow taxonomists who strongly agree with the lowercase-unless-it’s-a-proper-noun approach. For a general controlled vocabulary that serves as a reference for how terms appear in natural language, that dictionary-ish approach kind of makes sense.

My take on that, though, as far as taxonomies and hierarchical thesauri are concerned, is that taxonomies and their kin are more like outlines than like dictionaries, and outline items are capitalized for clarity, to indicate where new items start. Moreover, most traditional dictionaries have the visual benefit (for our purposes) of tiny text filling up the distance between terms, whereas in hierarchical taxonomy displays (which are generally the most useful views), the terms appear on consecutive lines. And indents can confuse things even more if terms are lowercase; the narrower terms look like runover lines.

Taxonomist Heather Hedden has written a blog post on the subject of capitalization in taxonomies. She views initial capitalization of terms as analogous with capitalization style in headings:

A “taxonomy” implies a hierarchical classification or categorization of concepts. When we think of categories we think of labels or headings with subcategories. Headings in general tend to have initial capitalization or title capitalization. Thus, if it’s a strictly hierarchical taxonomy, where all terms are interconnected into a single hierarchy or a limited number of hierarchies, then it will more likely have initial capitalization or title capitalization. Such capitalization is particularly common on the relatively smaller/less detailed taxonomies that are proliferating on websites, intranets, and content management systems. It fits in with the web design style of capitalization on headings and categories.

As Heather points out, initial capitalization is a fairly common practice, despite Z39.19. I think she’s referring mostly to initial (letter) capitalization of the first word; I haven’t seen that Title Style is common at all. (In fact, I don’t remember seeing it at all in a taxonomy.)

I have seen modern taxonomies and thesauri that have solid (every letter) capitalization on the just on the top level, to indicate major categories. These are usually just category designations, rather than indexing terms. Heather comments: “A good application of the mixed capitalization style is if the top level terms were not actually to be used in indexing/tagging but are really just categories/groupings of the actual index terms, which in-turn are arranged hierarchically underneath.”

Ultimately, it’s up to the taxonomy owners to determine what style to use. (And might I remind you, a reconsideration of Z39.19 is due soon.) The main factors to consider are readability, clarity, and usability.

By Barbara Gilles, Taxonomist
Access Innovations, Inc.

Access Innovations, Inc. Releases Data Harmony® Version 3.10

May 11, 2015  
Posted in Access Insights, Featured, Taxonomy

Access Innovations, Inc. is pleased to announce the release of Version 3.10 of its Data Harmony Suite of software tools.

The Data Harmony Suite provides content management solutions to improve information organization by systematically applying a taxonomy or thesaurus in total integration, with patented content extraction methods. MAIstro, the award-winning flagship software module of the Data Harmony product line, combines Thesaurus Master® (for taxonomy creation and maintenance) with M.A.I. (Machine Aided Indexer) for interactive text analysis and better subject tagging. XIS® (XML Intranet System) offers powerful content management and metadata creation tools and completes the Data Harmony Suite.

Data Harmony Version 3.10 features significant enhancements and new features throughout the software suite. These features greatly improve editorial efficiency through changes to functionality and overall clarity.

Most significantly, Data Harmony Version 3.10 now allows much easier use of the software from non-Java based platforms, such as PHP, .NET, etc. It has always been possible to connect to the software using a non-Java platform, but this allows a broader base of users easy access to the Data Harmony software.

“It is exciting for me to see these changes put into action,” said Lamine Idjeraoui, the lead software engineer of the Access Innovations programming team. “Data Harmony 3.10 has so much that will make our clients’ work easier and more efficient.”

System-wide improvements include:

  • expanded API options, including JSON as a format for the getSuggestedTerms API
  • an improved, modernized look and feel to the GUI, including the ability to change font style and size for improved customizability
  • updated notifications

Improvements to M.A.I. include:

  • color-coding and line numbering in the Rule Building screen, greatly increasing manageability of long rules
  • a new dropdown suggested syntax code menu, making rule building easier and more viewable
  • an expanded maximum rule length, facilitating even the most complex concepts

Thesaurus Master includes the following improvements:

  • assignment of multiple facet notations to a single thesaurus term
  • drag-and-drop functionality in the thesaurus view
  • ability to double-click a term in the thesaurus view to change the term

XIS improvements include the following:

  • unlimited search parameter fields for unlimited granularity
  • multiple ways to sort search results for improved customization
  • the ability to change field colors of the GUI, allowing users to highlight specific fields

“These changes to Data Harmony are outstanding. I look forward to our current and future users seeing all the improvements we have made,” states Bob Kasenchak, head of product development. “This is the best release to date. We continue to embrace OWL, SKOS, RDF, and other formats even as the platform broadens its base.”

For further information, visit www.accessinn.com.

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Taxonomy Blogs – The Big Picture

May 4, 2015  
Posted in Access Insights, Featured, Taxonomy

dreamstime_m_20414095© Icefields | Dreamstime.comBlog Icons Photo

We who blog on TaxoDiary know that it’s not the only blog that has to do with taxonomies and such. There are a few others out in cyberspace, and each has its own character. Let’s take a look at some of them, starting with what we know best.


You already must know something about TaxoDiary; you’re reading it right now. Maybe you haven’t read the description, though, which might provide some new insight:

TaxoDiary covers all types of knowledge organization systems (KOS), and related subjects. It is designed to provide the taxonomist, indexing and content professional with news and opinions about metatagging, and the application of KOS to increase findability of information objects within or across large collections of information, structured in databases, or unstructured in content repositories using controlled vocabularies. These activities are not unique to a single country or language, but rather shared and active globally.  It is part of our effort to keep abreast with the constantly changing field and to provide us information for research and used in consideration for the creation of new products and services. We will provide a regular stream of information about these topics and hope to share with you an informative and lively forum for discussion worldwide.

TaxoDiary is somewhat unique in that new posts appear each weekday.

The Accidental Taxonomist

The Accidental Taxonomist blog is written by Heather Hedden, perhaps best known for her book The Accidental Taxonomist. In the first post, in 2011, Heather gave a preview that turned out to hold true for the subsequent years:

Where will my new blog post ideas come from?

As a consultant, I am constantly engaging in new taxonomy projects with new experiences, new lessons to be learned, and new insights into the field. My client names should be kept confidential, so writing complete case studies may not be feasible, but the short informal nature of a blog post is quite appropriate to share some thoughts.

I also attend a number of conferences during the course of a year, and there are always new ideas coming out of these events. Some of my blog posts will be based on my own presentation topics, but not a repeat of the slide bullets, though. Instead I will provide some commentary about the presentation topic, such as why it is significant, timely, of interest, or what my concerns are. Other posts will be my observations an ideas gleaned form what others presented.

I may decide to revisit a topic in my book for a blog post. But I could also explore some new direction of topics related to taxonomies, such as content management, information architecture, search, or digital asset management.

The Accidental Taxonomist blog averages about one very substantial post per month.

The Taxonomy Blog

From the name, it’s fairly clear what this blog is about. While no longer active, it is still online (at least for the time being) and has some very useful posts on taxonomy philosophy and methodology. It was formerly maintained by taxonomist Marlene Rockmore, with help from Heather Hedden. To give you an idea of the approach, here’s how Marlene describes herself:

I call myself the “Classy Taxonomist.” I help organizing concepts which leads to clear thinking, better analysis, and results. I use taxonomies to help you figure out how to communicate by sorting meaning into buckets. Once you have buckets, you can then build interfaces and processes more efficiently. Big Bird once said “One of these things is not like the other.” I’ve been doing this since 1986. Clients include Harvard Business School, Digital, 6.2Million Tax Override, Conoco Philips, Boston College, O’Reilly, and Google.

Earley & Associates blog

Earley & Associates is a consulting organization headed by prominent knowledge management expert Seth Earley. Their blog covers a wide range of information management topics, but the posts indexed with “Taxonomy” far outnumber the posts indexed with other topics. The website has a page of research suggestions, one of which is to “check out our blog” if “you want to get a pulse on what is new and hot.”

Taxonomy Watch

This blog had the tagline “A weblog about taxonomies and their application in organizing digital content. Also includes related topics such as controlled vocabulary, thesauri, topic maps, ontologies and semantic technologies.” It was maintained by Gwen Harris, and discontinued in 2012. Gwen wrote in 2012 that the blog would be taken down soon, but as I write this, it’s still there. There’s a lot of good stuff there; take a look.

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Next Page »