Data Harmony® v.3.10 Named 2015 Trend Setting Product by KMWorld

Access Innovations, Inc., the industry leader in data organization and innovators of the Data Harmony® software suite, is pleased to announce that KMWorld has named Data Harmony Version 3.10 on their list of Trend Setting Products for 2015.

“It is vital to stay at the forefront of knowledge management and, with Data Harmony v.3.10, we have delivered the most integrated, flexible, productive, streamlined, and user-friendly semantic enrichment software on the market,” notes Marjorie Hlava, president of Access Innovations, Inc. “We will continue developing new and innovative ways to analyze, enhance, and access data to increase findability and distribution options for our customers.”

The proven, patented Data Harmony software is the knowledge management solution to index information resources and, with Version 3.10, has pushed the envelope further, with a more modern graphical look, increased search functionality, auto-complete, color-coding for easier readability, and much more. With these improvements in place, Data Harmony offers a richer, more advanced, and friendlier customer experience.

The Trend Setting Product awards from KMWorld began in 2003. Speaking on behalf of the judging panel, KMWorld Editor-in-Chief Hugh McKellar says, “In each and every case, the thoughtfulness and elegance of the software certainly warrants deep examination. Depending on customer needs, the products on the list can dramatically boost organizational performance.

McKellar adds, “The panel, which consists of editorial colleagues, market and technology analysts, KM theoreticians, practitioners, customers and a select few savvy users (in a variety of disciplines), reviewed more than 200 vendors, whose combined product lineups include more than 1,000 separate offerings. The products identified fulfill the ultimate goal of knowledge management—delivering the right information to the right people at the right time.”

Data Harmony v.3.10 is available through the cloud, a hosted SaaS version, or an enterprise version hosted on a client’s server. More information about Data Harmony and its 14 software modules is available at


About Access Innovations, Inc.,,
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.


About KMWorld
KMWorld ( is the leading information provider serving the Knowledge Management systems market and covers the latest in Content, Document and Knowledge Management, informing more than 40,000 subscribers about the components and processes – and subsequent success stories – that together offer solutions for improving business performance.

KMWorld is a publishing unit of Information Today, Inc. (

Choosing Related Terms

August 17, 2015  
Posted in Access Insights, Featured, Term lists

Source: Dreamstime

Which quiche? You’ll have to read to the end to find out.

As many of our readers are aware, hierarchical thesauri are distinguished from other taxonomies by (among other things) the inclusion of non-hierarchical relationships among terms. One main type of non-hierarchical relationship is the equivalence relationship, which is usually expressed as a preferred term – non-preferred “term” (generally a synonym or quasi-synonym) pairing. The other main type of non-hierarchical relationship is the associative relationship, in which regular thesaurus terms are paired. (Note: I’m using “regular terms” here to refer to what are commonly called “preferred terms”, meaning terms that aren’t non-preferred terms.)

The paired terms are called “related terms”; while terms in a thesaurus can be “related” in various ways, “related terms” are those that carry a reciprocal associative relationship. If Term A has Term B as a related term, then Term B will likewise have Term A as a related term. Good taxonomy management software will be responsive to a taxonomist’s addition of a related term to a term record, and will automatically add the reciprocal relationship to the other term’s term record.

The Z39.19 standard (ANSI/NISI Z39.19-2005 (R2010), “Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies”) offers a somewhat vague definition of the associative relationship: “A relationship between or among terms in a controlled vocabulary that leads from one term to other terms that are related to or associated with it.” (Subsection 4.1) As the standard comments later (8.4), “the associative relationship is the most difficult one to define.”

Subsection 8.4 does capture the essence of the associative relationship: “This relationship covers associations between terms that are neither equivalent nor hierarchical, yet the terms are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary, on the grounds that it may suggest additional terms for use in indexing or retrieval.”

Z39.19 discusses associative relationships between sibling terms, as well as the more common (and perhaps more valuable) associative relationships between terms in different hierarchies within a thesaurus. With the sibling relationships, the conceptual relationship should be stronger than simply being part of the same broader concept; otherwise, there is no point in adding the associative relationship. (I should point out here that there are those who advocate for always adding associative relationships among sibling terms.) Even so, when one browses or navigates a thesaurus, siblings are readily visible. The value of associated siblings is mainly in search.

Associative relationships across hierarchies are another matter. They call attention to terms (and content indexed with those terms) that the thesaurus user or searcher should perhaps be aware of, and otherwise might miss. Some of the example pairings in Z39.19 are weaving and cloth; pathogens and infections; surface tension and liquids; and ducks and rubber ducks. These examples are from Z39.19’s Subsection 8.4.2, which illustrates the various types of associative relationships (such as Process / Agent) listed in a table in section 8.1. While it’s instructive to be acquainted with these types, there’s no need to memorize them or refer to them, unless one is looking for inspiration for adding more “related terms” to a term.

Here are some of the problems I’ve seen with related terms in thesauri:

Few or no related terms; not taking advantage of the ability to add related terms, and missing out on the benefits that they can provide.

Too many related terms, with the ones that could be valuable getting lost in the mix, and the network of relationships becoming cumbersome.

Very vague related terms, with no real value. This often goes hand in hand with too many related terms.

Related terms that might better serve as broader or narrower terms, in a hierarchical relationship.

And last but not least, terms that would be great as related terms, but that are in an inappropriate hierarchical relationship.

One simple example of the last problem has to do with quiche. As mentioned elsewhere in the TaxoDiary blog: “In a food thesaurus, “Quiche” or “Quiches” would not be appropriate as a narrower term under “Vegetarian foods,” because some quiches contain bacon or ham.”

However, Quiche would make a fine related term for Vegetarian foods, because many quiches are suitable for vegetarian consumption.

So choose wisely, and Bon appetit!

Barbara Gilles, Communicator
Access Innovations, Inc.

Classifying Exoplanets: Where does Earth 2.0 fall?

August 10, 2015  
Posted in Access Insights, Featured, Taxonomy

On the 9th of January, 1992, astronomers around the world rejoiced. For the first time ever, they had definitive proof of a planet orbiting another star. These early observations were extremely limited, mainly focused on noticing the wobble of the parent star, but they opened the proverbial floodgate for exoplanet discovery. A little over two decades later, the list of known exoplanets continues to grow every day, with the number of verified planets well above a thousand, and the number of candidates exponentially more than that. All of these new planets have highlighted the rather humbling fact that, before now, we really knew nothing about the sheer number and diversity of exoplanets within the galaxy.

For a hundred years, it was assumed that other planets would roughly reflect what we see in our own solar system: A number of small rocky worlds close to the star, with a number of large gas giants circling farther out beyond the Goldilocks Zone (where liquid water can exist). The reality is quite different.

We now know, for instance, that planetary migration is common. Large gas planets, which form far away from their parent star (a requirement to keep their gas from becoming heated and stripped away during the planetary formation process), often tend to fall closer to their star over time. These “Hot Jupiters,” as they are called, often orbit extremely close to their star. This tight orbit opens up a world of geologic possibilities for the planet (and its moons). Imagine a planet like Jupiter, which has vast amounts of frozen water. As the planet drifts closer to the sun, the water ice melts and these planets develop oceans deeper than the diameter of earth. Water worlds like this have often been referenced in science fiction, but now they have become known as scientific fact.

How do we go about classifying these exoplanets when each one illustrates how little we actually know? Pluto was only recently kicked out of the planetary club, with its eviction predicated on our defining some of the most basic aspects of a “planet” (size, gravitational impact, etc.) How do we go about sub-classifying the many exoplanets we’re now beginning to find when we can barely agree on what constitutes a planet in the first place?

Should planets be classified on material composition in accordance with historical precedent (rocky worlds vs. gas giants)? This seemed to be effective at classifying the planets in our solar system, but when we know that gas giants can migrate to extremely close orbits, and their frozen gas compositions can change drastically, does this standard still hold up? The reverse of this is also true as we find an increasingly large number of “Super Earths”, planets that are rocky like earth and our inner solar system neighbors, but closer in size to the gas giants.

What about the type of star that a planet orbits? One would think that perhaps that could be the common denominator. But now we know that some planets orbit large hot stars, others orbit old cool stars, and some orbit two stars, while some have been flung out of their orbits altogether to float lonely out in space forever. Large planets evicted from their solar systems in this way aren’t even considered planets but are instead considered “sub-brown dwarfs”, somewhere in the gray zone between planets and stars.

As we have learned about exoplanets, we quickly realized that we knew practically nothing about them. What we now know has shattered our old classification system. Whatever eventually replaces it will need to be far more sophisticated and take into account the vast diversity we now know exists. The Star Trek dream of discovering Class M planets simultaneously seems further away and closer than ever before, and I for one am eager to see what happens.

Win Hansen, Production Manager
Access Innovations, Inc.

Registration Now Open for the 12th Annual Data Harmony Users Group Meeting

August 3, 2015  
Posted in Access Insights, Featured

Registration is now open for the twelfth annual Data Harmony Users Group (DHUG) meeting, scheduled for February 9-10, 2016 at the Access Innovations, Inc. offices located at 4725 Indian School Road NE, in Albuquerque, New Mexico.

Access Innovations, the company behind the Data Harmony product line, hosts the annual meeting. Case studies presented by Data Harmony software users make up most of the program, and most attendees indicate that these are the primary reason they attend DHUG.

“Based on feedback from the 2015 meeting, we have altered the schedule for DHUG 2016,” said Heather Kotula, Director of Communications at Access Innovations. “We are a client-driven organization and our users’ needs reflect the new arrangements, the same as updates and new features in the Data Harmony software.”

Monday, February 8, attendees will meet at the Access Innovations office, where introductory training topics will be covered. “Having the training at our office makes it possible for everyone attending to have a hands-on experience with the software,” remarked Win Hansen, Director of Production Services. “Whether they are experienced users, new users, or still considering Data Harmony software for their organization, this approach is the most beneficial.”

Tuesday and Wednesday will feature a combination of case studies presented by users and presentations by Access Innovations and Data Harmony staff.  The Welcome and Features Update by Margie Hlava, president of Access Innovations, normally a three-hour presentation on the second day of the meeting, will now be split between the two days. The Welcome portion will be presented in the morning on Tuesday. The Features Update will be combined with the wrap-up on Wednesday afternoon. Sessions on these two days will held in the Mega Classroom at NMSU, adjacent to the Access Innovations office.

Thursday, February 11, is a full day of in-depth training, also held in the Access Innovations office. “The segments on this day are geared toward users who will work with the admin module, importing and exporting data, and performing batch processes,” explained Jack Bruce, Senior Taxonomist. “We will also cover information about XIS, our XML database system.”

Meeting registration includes a networking reception at the nearby Hampton Inn on Monday evening and dinner on Tuesday evening. The Tuesday evening dinner will be held at the Unser Racing Museum, which uses modern technologies to educate and immerse visitors in the exciting world of racing.

“By hosting the meeting at the Access Innovations home office, we are able to make the entire staff available to discuss technical and tactical issues,” remarked Bob Kasenchak, Director of Product Development. “It also fosters better communication among the various parties, and we are able to identify and solve issues much faster.”

To register for the meeting please visit the DHUG registration page.

Data Harmony software users are encouraged to submit case study abstracts to present at the meeting. The submission form uses the Data Harmony Smart Submit software module.

For information about planning a trip to Albuquerque for the meeting, go to


About Access Innovations, Inc. –,,

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Everything Goes Every Place It Fits

July 27, 2015  
Posted in Access Insights, Featured, Taxonomy

The traditional taxonomy is monohierarchical – there is one and only one place for every term. This is only sensible for many purposes; no one wants to look up a book in a library catalog only to find out that it is either in Section A or Section F, and no biologist would ever identify a new species as being both a fungus and an animal. “A place for everything and everything in its place” is the guiding principle of the monohierarchical taxonomy.

But there are times when that guiding principle just isn’t appropriate.

Take synthetic biology, for instance. To some biologists, DNA is just a biopolymer – a biologically created compound made up of distinct smaller compounds. A synthetic biologist wouldn’t be likely to disagree with that characterization; DNA is a biopolymer, so the traditional biological taxonomy still stands. It is also a type of information storage. The traditional biologist wouldn’t disagree with that characterization either, but might question its relevance in building a biology taxonomy. When the synthetic biologist is designing devices that modify DNA strands to store data and other devices to read that data, though, it becomes relevant. DNA is a data medium as well as a biopolymer. It belongs with magnetic tape and optical discs as much as it does with cellulose and starches. In the same way, genetically engineered spider silk is both an animal product and an artificial fiber in equal measure – pigeonholing it into a specific location in the taxonomy keeps it out of an equally good location.

What about household electronics? By 2008, Sony’s Blu-Ray discs had overtaken Toshiba’s HD-DVDs and were clearly going to be the next generation video format of choice. But Blu-Ray players were still relatively rare and expensive. If you knew where to look, or even to look at all, one of the most economical options for a new Blu-Ray player was the Playstation 3. Unfortunately, even online retailers weren’t marketing the game console as a Blu-Ray player. A monohierarchical taxonomy at the retailer would obviously classify the systems as game consoles – that’s exactly what they were. But they were also functional and affordable Blu-Ray players. There’s no way of knowing for certain, of course, but it is entirely possible that online retailers like could have sold even more Playstation 3 consoles if their customers had seen the consoles as an option when searching for Blu-Ray players.

These aren’t isolated situations. Synthetic biology isn’t the only multidisciplinary field. Modern science includes chemical physics, astrophysics, neuroeconomics, and a whole host of other fields that draw from two or more distinct disciplines. The whole point of these multidisciplinary sciences is to study the places where the parent disciplines converge. Technological convergence is a real and growing trend. Microwave televisions may not be the wave of the future, but a smart oven that can bake a pie based on a recipe it downloaded off the internet and then call you on your cell phone when it’s done may be just around the corner; if your next oven comes from a computer manufacturer, where will it be listed in the online catalog?

Luckily, modern taxonomy has a tool to deal with that problem: polyhierarchy. In a polyhierarchical thesaurus, you might still find DNA in the traditional biological place, as a child term of biopolymers, but you might also find it as a child term of storage media. Likewise, you could find that Playstation 3 either by browsing the list of game consoles or by browsing the list of Blu-Ray players. It isn’t suitable for every situation, but it makes for a more flexible thesaurus that provides added value in many circumstances. An article on spider silk is tagged in a way that lets both the researcher interested in animal products and the one interested in artificial fibers know that it may be relevant. An online store search returns the microwave television in a search for either microwaves or televisions. The polyhierarchical thesaurus replaces “A place for everything and everything in its place” with “Everything goes every place it fits.”

Tim Soholt, Webmaster
Access Innovations, Inc.

The Thesaurus as a Domain Model in the Modern World

July 20, 2015  
Posted in Access Insights, Featured, Taxonomy

colortreeSource: Dreamstime

Nowadays, taxonomies and thesauri are used largely for web navigation and for information search and retrieval. It wasn’t always that way. In fact, it’s largely the digital information revolution that has made their use for information search and retrieval a vital necessity in research, business, and numerous other types of activities.

Taxonomies and thesauri are sometimes referred to as domain models. However, the term is often limited to graphic constructs specifically designed as visual tools for problem solving. As explained in the relevant Wikipedia article:

A domain model in problem solving and software engineering is a conceptual model of all the topics related to a specific problem. It describes the various entities, their attributes, roles, and relationships, plus the constraints that govern the problem domain. … The domain model is created in order to represent the vocabulary and key concepts of the problem domain. The domain model also identifies the relationships among all the entities within the scope of the problem domain, and commonly identifies their attributes.

Before the proliferation of thesauri and taxonomies for search, most taxonomies served, in effect, as domain models. As such, they reflected and furthered our understanding of the relationships of things and creatures and areas of knowledge. Even today, by their nature, most or perhaps all taxonomies and hierarchical thesauri are domain models. We just don’t use them that way, for the most part. The main exception is in the world of classification of biological organisms.

The history of taxonomies is replete with the names of naturalists and other scientists who strove to categorize the natural world through the use of hierarchical schemes. Even today, mention “taxonomy” to a biologist, and he or she is likely to think of one or more taxonomies that serve to categorize the members of some family or genus (or whatever) of plants and/or animals and/or other types of organisms. In such taxonomies, the focus of the categorization isn’t on reports or articles or books or videos about the organisms (although those taxonomies could certainly be used for that kind of categorization, and often are). Rather, the focus is on how the organisms themselves, as represented by one form or another of their names, are categorized within the taxonomy.

Much of this work was done in the 1700s and subsequent centuries. The earliest of these taxonomies were based on the work of the philosophers of ancient Greece and Rome. The aim of all these naturalists and philosophers was to better understand the world around them by modeling the domains of nature, using semantic methods of representing the individual concepts. We still use this semantic approach! It has stood the test of time, as has the hierarchical design.

These taxonomies helped people to understand their world. Might we not also use taxonomies, or better yet, their more complex version, hierarchical thesauri, as graphical tools to understand our world and perhaps to gain insight into and solve our problems?

Barbara Gilles, Taxonomist
Access Innovations, Inc.

IntegraCoder® Version 2.0, Powered by Data Harmony®, Now Available Through Major EMR companies

July 13, 2015  
Posted in Access Insights, Featured, indexing

IntegraCoder Version 2.0, the medical coding application developed by Access Integrity, a division of Access Innovations, Inc., and coding experts Find-A-Code, is now available as a component in the offerings of some of the nation’s largest electronic medical record (EMR) companies. Medical providers will now have access to the latest version of this application based on Data Harmony, Access Innovations’ patented, award-winning software.

IntegraCoder is a web-based solution combining the technologies of Access Integrity and Find-A-Code. Access Integrity’s engine, which is powered by Data Harmony’s semantic analysis technology, analyzes content in EMRs and provides highly relevant diagnosis and procedure suggestions, while Find-A-Code provides exhaustive coding and revenue cycle/denial management resources. IntegraCoder’s indexing system recognizes key concepts within an EMR and delivers suggested codes for users to select from, which increases accuracy in clinical documentation.

Version 2.0 presents far-reaching improvements to the application:

  • An integrated edit capability, allowing providers to get more precise recommendations.
  • An ICD-10 readiness tool encompassing documentation and coding nuance.
  • Pre-claim code scrub for medical necessity.
  • Charge functionality seamlessly generates a claim submission with IntegraCoder.

“It’s exciting to have IntegraCoder version 2.0 in the marketplace,” says John Kuranz, CEO of Access Integrity. “The hard work that has been put in by all members of the team really shows in the product and how smoothly it integrates into the EMR system. IntegraCoder’s encounter note editing and research capabilities and ICD-10 tool set will greatly enhance the end-to-end patient engagement process.”

“The new version of IntegraCoder will give coders and billers the speed, efficiency, and instant access to critical coding data needed to keep up with the changes in medical coding,” explains Find-A-Code CEO LaMont Leavitt, “and adding the capability to seamlessly generate a claim from IntegraCoder’s results is a game changer, especially with the transition from the ICD-9 to ICD-10 code sets later this year.” Leavitt is confident that the integration of the two technologies will fulfill a long-time need in the practice management sector.

Jay Ven Eman, CEO of Access Innovations, the parent company of Access Integrity, remarks, “The latest version of IntegraCoder presents a great opportunity for disruption in the medical practice management sphere. We are very excited to be in this marketplace and expect great things to come from IntegraCoder.”


About Access Innovations, Inc. –,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

About Access Integrity –,,
Access Integrity provides a patented technology for complete and compliant EMR analysis. Access Integrity plays an important role in medical transaction processing by extracting rule based relevant data (concept extractor) from medical records, increasing coding accuracy, clinical decision support, and overall understanding of a patient encounter. Access Integrity is the first company to employ Data Harmony’s semantic enrichment and rule-based concept extraction technology in the healthcare industry. The award-winning and world-renowned Data Harmony software suite has been used in the content management and information technology industries for more than 15 years.

About Find-A-Code, LLC

Find-A-Code, LLC is dedicated to providing the most complete medical coding and billing resource library available anywhere. Find-A-Code’s online libraries include extensive information for all major code sets (ICD-9, CPT®, HCPCS, DRG, APC, NDC, ICD-10 and more) along with a wealth of supplemental information such as newsletters and manuals (AHA Coding Clinic®, AMA CPT Assistant, DH Newsletters, Medicare Manuals). All code information and newsletter databases are indexed, searchable and organized for quick access and extensively cross-referenced. Find-A-Code also provides tools for code set translation (such as ICD-9 to ICD-10), code validation (edits) and claim scrubbing.

Smart Thesauri: Using Taxonomies with Linked Data

July 6, 2015  
Posted in Access Insights, Featured, Taxonomy

Note: A full version of this article is scheduled to be published in an upcoming issue of the IKO eNewsletter.

As interest in Linked Data (LD) continues to grow, many organizations—publishers, corporations, universities, libraries—are increasingly interested in strategies to jump-start LD initiatives. Any organization that has an existing taxonomy (or other controlled vocabulary) can expedite the move to LD by leveraging its existing semantic structures as a bridge to an advanced LD-based semantic strategy.

Why Linked Data?

There are three primary reasons motivating organizations to move towards LD:

  1. To use resources on the Web to enhance internal knowledge environments.

Once a link to an external data source (e.g., DBpedia or Wikidata) is established, references to other content—Wikipedia articles, definitions, images, social media and news feeds, and other information—can be queried off and added to internal resources to enrich content or Web portals.

  1. To add backlinks to internal resources and content to LD portals, thereby pointing to an organization as an authority on a topic or topics.

Adding reciprocal links from LD sources to content (or publically available web resources) enables other LD users to find and reference an organization’s expertise on a given subject.

  1. To add an organization’s expertise to the growing Semantic Web of knowledge and information—i.e., to contribute to the sum of available human knowledge.

More idealistically, many LD enthusiasts are motivated by the goal of adding their curated and well-formed specialized knowledge to the expanding network of LD sources—in other words, attaching their datasets and knowledge organization systems (KOS) to the LD community.

ldgraphFigure 1: The Ubiquitous LD Graph Diagram. Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. Reproduced here under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported license (CC BY-SA 3.0).

Note that in this diagram (as in most such graphics) DBpedia still occupies the central position, as it is the most robust and oft-linked universal (i.e., not subject-specific) LD source. Although it has its shortcomings, DBpedia is still the best choice for stable LD URIs, and this is not likely to change anytime soon; accordingly, we will use DBpedia as the sample LD source in this article.

From Taxonomy to Linked Data

Although DBpedia Spotlight is good at recognizing existing LD concepts in a block of text, in most cases it doesn’t have the robust synonymy and/or rule-based concept extraction used in most semantic platforms. Additionally, an organization with a well-formed taxonomy and rich semantic strategy will already have indexed content—possibly a very large volume of content—so “re-indexing” a large legacy dataset using DBpedia Spotlight is an unwieldy proposition.

Instead, by asserting a link between terms in an existing taxonomy and the corresponding concept in an LD source, it’s not necessary to run legacy content through Spotlight (or a similar LD-matching service) since the link is asserted at the thesaurus level instead of the document level. This is both much more efficient and provides a simpler mechanism to curate and establish LD links in the future.

Sample Process

Imagine that you have a large set of content about physics—you could be a publisher, laboratory, or research organization—and that you have an existing, well-formed taxonomy of physics concepts and wish to pursue LD. Basically, you want to assert that each term (more on this later) in your taxonomy corresponds to a DBpedia URI on the same topic.

For example, you might have a term (or branch) in your vocabulary called “Optics”. The first step is to add a field to the term record in your taxonomy management system (most commercial taxonomy applications have a mechanism for this) to hold a URI; this could be a dedicated ’live’ URI field or just a simple text field. Next, in this field for your term “Optics”, you add the corresponding DBpedia URI:

…and you’re done.[1]

You have now achieved something like this:


Figure 2: Connecting Resources on the Same Topic Using LD

Using a SPARQL endpoint, you can now query information from the DBpedia page or use it to reference or link to other pages from DBpedia. For example, if you have a topical web portal on Optics, you could automatically add definitions, images, news, social media feeds, or links to other publishers with Optics content.

Automation, Problems, and Quality Control

Can this process—matching terms from your taxonomy to DBpedia—be automated? Yes, but it requires careful quality control. We have found the most success using Spotlight as a starting point and validating the results by hand; this is faster than matching each term manually, and at the same time ensures accuracy.

As mentioned earlier, Spotlight is better at matching blocks of text (leveraging semantic proximity) than single concepts from a taxonomy, but it’s accurate enough to decrease some of the effort of manual matching.

The primary sticking point, however, is that any specialized thesaurus will be more granular than DBpedia is. Your taxonomy on Physics, in the thought experiment above, is going to have far more specific terms than DBpedia in many, many places. The top two or three levels of your taxonomy will probably have corresponding LD pages, but the more specific topics will not.

For example, there’s a robust DBpedia page on Optics, as well as one on Nonlinear optics; more specific topics within Nonlinear optics, however, are far less likely to have a corresponding page (e.g., “Photonic metamaterials” has no corresponding LD page we’ve found so far—and many scientific and technical vocabularies get even more granular than this).

Possible solutions to this problem are as follows:

  • Forget it for the time being and check back later to see whether a corresponding page emerges.

This is the easiest option, but does not accomplish much.

  • “Roll up” more granular topics to the next-nearest Broader Term in the taxonomy.

This procedure at least provides LD pages for every topic in the taxonomy, but leaves much to be desired; every term in a large branch might point to the same LD page, which is not particularly useful.

  • Proactively add new DBpedia pages on not-yet-existing topics, and add the backlink to your content/vocabulary as the first link; the Web should come and fill in the blanks eventually.

Attractive, altruistic, and useful, though far more time consuming, this option is ideal from an information science perspective but may not be practical in the scope of your LD initiative.

Margie Hlava, President and Bob Kasenchak, Product Development
Access Innovations



For more information, we recommend David Wood, Marsha Zaidman, Luke Ruth, and Michael Hausenblas Linked Data: Structured data on the Web (Shelter Island, NY: Manning Publications 2013)

[1] A subsequent step might involve adding a backlink to your topical/library page on Optics to the DBpedia dbpedia-owl:wikiPageExternalLink field, if you want your content to be publicly available.

ICD-10 and Taxonomies

June 29, 2015  
Posted in Access Insights, Featured, Taxonomy

In Canada, the implementation of ICD-10 caused an approximate 50% drop in productivity in rates of coding in the health profession. Reports show that a year after implementation, productivity only returned to 80% of the original ICD-9 baseline. Now, more than a decade since Canada adopted the new coding system, they still have yet to reach original productivity levels.

This reduction in productivity is understandable. ICD-10 is vastly more complex than ICD-9, and people sometimes have difficulty adapting to change. It’s naturally going to cause fear in the industry. The 3-page bill proposed to the US Congress by Rep. Ted Poe (R-TX), HR 2126, would allay those fears for a while by delaying U.S. implementation of ICD-10, which has already occurred several times. As noted in the referenced article, however, neither committee has decided to hear the bill, so yes, it seems highly likely that ICD-10 implementation will occur in the United States on October 1, as (currently) scheduled.

But tell me: do we really need to know that someone was “bitten by a turtle” (W59.21XA) rather than “struck by a turtle” (W59.22XA)? And if the person was bitten on the toe by that turtle, do we really need to know whether it was the right big toe (S90.471A) or the left big toe (S90.472A)? What if the incident happened, for some reason, while the person was waterskiing (Y93.17)? Do we need to know that? It is definitely helpful to know something about laterality, which side was the injury was on, and about the person’s activity when the injury occurred. It is also nice to know how many people a turtle bit this year, as well as how many waterskiing accidents.

In another (less funny but more likely) example, someone driving a pickup strikes a lamppost while texting. The person’s head strikes the dashboard, causing a contusion (S00.83XA), while the truck’s airbag expands and strikes them on the right side of the chest, causing a contusion there (S20.211A). That it happened in a pickup matters (V57.5XXA), as does the fact that they were texting at the time (Y93.C2). This is all highly useful information, but does it have to be done as a pre coordinate highly complex classification system? The format of the ICD-10 codes, and for that matter the ICD-9 codes, is quite old fashioned in a post-coordinate world.

A taxonomy — and it would be a large one — would better provide the desired data and in a much more flexible form than the ICD-10. Just a single laterality rule for left and right would remove a goodly portion of the listed codes.  If it is the interest of the Centers for Medicare and Medicaid Services (CMS) to gather more data about health and health care in the USA, then why not apply some proven, inexpensive, easy-to-implement algorithms rather than cause a Y2K-style panic throughout the entire healthcare industry?

In example one above, we could code for “turtle,” “toe,” “bite,” and even water or waterskiing. If any of those items was missing in the source data, i.e., the electronic medical record (EMR), we could still do a good job of determining the cause of the bite and collating the data for later reporting and retrieval. It would be possible to mine the data more effectively than using the classification system, since the data would be disaggregated and available. For example, all kinds of water-related accidents could be retrieved, as could all turtle-related injuries, bite-related injuries, etc.

Going one step further, one could even link the data as RDF triples for a full semantic enrichment and ontology approach, which would surface all kinds of fascinating relationships. One could then visualize the data in various presentations for quick understanding of how much danger there is the general populace regarding turtles bites and waterskiing.

Access Integrity has built a system to review EMRs and instantly provide a suggested list of ICD-10 codes, just like they have done for the ICD-9. For this, ICD-10 is certainly a good move forward. We still maintain that a human should review and make the final selection of the codes submitted for billing, due to the complexity of the classification system, the ambiguities in the EMR as written by the healthcare provider, and of course the liabilities in a litigious world. For those of us in the information business, more data is always better than less, so even if ICD-10 is imperfect, and it is, there is no doubt that it’s a step in the right direction.

Marjorie M.K. Hlava
President, Access Innovations

TaxoDiary Blog Achieves Milestones in Sharing Information

June 22, 2015  
Posted in Access Insights, Featured, Taxonomy

The TaxoDiary blog, which can be read at and adds new posts on a daily basis every Monday through Friday, recently published its 3,000th blog post.

TaxoDiary was started in August 2010 and is sponsored and maintained by Access Innovations, Inc., under the leadership of Access Innovations President Marjorie Hlava and Chief Executive Officer Jay Ven Eman.

TaxoDiary covers all types of knowledge organization systems (KOS), and related subjects with daily posts sharing news and opinions. “Monday features” dig a little deeper into taxonomies, semantic technology, and other key areas of interest. There are over 200 of these feature articles that have been researched, written, and shared by the team at Access Innovations.

“The blog is designed as an information vehicle that highlights news of interest and topics that content professionals face daily,” Dr. Ven Eman commented. “Networking and providing access to methods of making content findable are helpful to us all.”

TaxoDiary is designed to provide taxonomists, indexers, and content professionals with news and opinions about categorization, and the application of KOS to increase findability of information objects within or across large collections of information, structured in databases, or unstructured in content repositories using controlled vocabularies.

Subscribing to TaxoDiary will deliver the posts directly to email, either as they are posted or as a daily summary.


About Access Innovations, Inc. –,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Next Page »