A Celebration of Roget’s Taxonomy

January 12, 2015  
Posted in Access Insights, Featured

Later this week is January 18th, which for taxonomists is notable for two things: 1) it’s Thesaurus Day; and 2) it’s the birthday of Peter Mark Roget. This double occurrence is no coincidence. We may consider Doctor Roget to be the inventor of the thesaurus (or at least one of its pioneers), and a person whose birthday is cause for taxonomists’ celebration.


Yes, this is the man who compiled the first “Thesaurus of English Words and Phrases.” He started writing it in 1805 but didn’t have it published until much later, in 1852. The full title of the first edition was Thesaurus of English Words and Phrases, Classified and Arranged so as to Facilitate the Expression of Ideas.


Photo, http://www.pbagalleries.com/view-auctions/catalog/id/339/lot/104070/?url=%2Fview-auctions%2Fcatalog%2Fid%2F339%3Fcat%3D9

Did you catch the “Classified” part of the title? And the “Arranged”?

Most people think of Roget’s thesaurus as a simple list of words and their synonyms. This is understandable, as some of the more recent synonymies that include “thesaurus” in their titles really are just strictly alphabetical lists of words, annotated with some synonyms. Taxonomists sometimes consider Roget’s synonym resource to be much different than modern taxonomic thesauri. After all, hasn’t it always lacked any sort of classification scheme?

No, no, no.

As much of a habitual list maker as Roget was (since he was eight years old, in fact), he recognized that the full potential of a lengthy vocabulary could not be achieved unless there was some sort of categorization or classification of the list entries. Classification was an intrinsic part of Roget’s compilation of synonyms throughout its long development.

As he explained in the preface to the first edition of the Thesaurus of English Words and Phrases: 

“It is now nearly fifty years since I first projected a system of verbal classification similar to that on which the present work is founded. Conceiving that such a compilation might help to supply my own deficiencies [as a writer], I had, in the year 1805, completed a classed catalogue of words on a small scale, but on the same principle, and nearly in the same form, as the Thesaurus now published. I had often during that long interval found this little collection, scanty and imperfect though it was, of much use to me in literary composition, and often contemplated its extension and improvement; but a sense of the magnitude of the task, amidst a multiple of other avocations, deterred me from the attempt. Since my retirement from the duties of Secretary to the Royal Society, however, finding myself possessed of more leisure, and believing that a repertory of which I had myself experienced the advantage might, when amplified, prove useful to others, I resolved to embark in an undertaking which, for the last three or four years, has given me incessant occupation .” (“Roget’s Thesaurus: The Original Manuscript”)

Part of Roget’s classification efforts involved choosing a single term to represent each concept, rather than repeating each synonym in some other part of the list. This is akin to modern taxonomic thesauri, in which each concept is represented by only one term, and alternative ways of expressing that concept are indicated in the term record as non-preferred terms. Roget’s approach was oriented toward findability of a concept through the choice of words that users were most likely to associate with particular concepts.

Beyond that, though, the overall structure of the thesaurus was hierarchical. The table of contents of Project Gutenberg’s presentation of Roget’s thesaurus shows the organization of the book into six main classes, with numerous subdivisions. Wikipedia provides an “Outline of Roget’s Thesaurus” that shows the hierarchical depth to seven levels; this resource also includes links from many of the categories to relevant Wikipedia articles, as does the related Wiktionary resource “Appendix: Roget’s thesaurus classification”.

Roget crafted the thesaurus categories and subdivisions according to principles set out by some eminent philosophers, as explained in the Wikipedia article on “Roget’s Thesaurus”:

“Each class is composed of multiple divisions and then sections. This may be conceptualized as a tree containing over a thousand branches for individual “meaning clusters” or semantically linked words. These words are not exactly synonyms, but can be viewed as colours or connotations of a meaning or as a spectrum of a concept. One of the most general words is chosen to typify the spectrum as its headword, which labels the whole group.

“Roget’s schema of classes and their subdivisions is based on the philosophical work of Leibniz (see Leibniz—Symbolic thought), itself following a long tradition of epistemological work starting with Aristotle. Some of Aristotle’s Categories are included in Roget’s first class “abstract relations”.”

So was Roget an inventor? An originator? A pioneer? Consider these eclectic accomplishments:

  • He invented the log-log slide rule, which greatly simplified the exponential and root calculations.
  • He designed a pocket chessboard and invented several chess problems.
  • He made insightful observations about the perception of motion, thus contributing to the development of mechanical animation devices and, more importantly, to the early development of cinema.
  • He helped found the wonderfully named Society for the Diffusion of Useful Knowledge.
  • He was a co-founder of the Medical and Chirurgical Society of London, the forerunner of the Royal Society of Medicine.
  • He was the first Fullerian Professor of Physiology at the Royal Institution.
  • He helped establish the University of London.
  • He compiled Roget’s Thesaurus, which writers still use to perfect their prose.
  • He developed a classification approach that set an example for modern taxonomists and thesaurians.

Yes, I think we can conclude that Peter Mark Roget was an inventor, an originator, and a pioneer. And a thesaurian, of course. And yes, a taxonomist.

All good reason to celebrate his birthday on Thesaurus Day!

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Marjorie M.K. Hlava’s Taxobook Published by Morgan Claypool

January 5, 2015  
Posted in Access Insights, Featured, Taxonomy

Access Innovations, Inc. is proud to announce the publication of The Taxobook, a three-volume series on taxonomies and thesauri, written by Marjorie M.K. Hlava, president of Access Innovations. The three volumes are part of a larger series, Synthesis Lectures on Information Concepts, Retrieval, and Services, edited by Gary Marchionini, Dean of the School of Information and Library Science, University of North Carolina at Chapel Hill, and published by Morgan & Claypool Publishers.

Volume 1, The Taxobook: History, Theories, and Concepts of Knowledge Organization, introduces the foundations of classification, covering theories from the ancient Greek philosophers to modern thinkers. This volume also includes a glossary that covers all three volumes.

Volume 2, The Taxobook: Principles and Practices of Taxonomy Construction, outlines the basic principles of creation and maintenance of taxonomies and thesauri. It also provides step-by-step instructions for building a taxonomy or thesaurus and discusses the various ways to get started on a taxonomy construction project.

Volume 3, The Taxobook: Applications, Implementation, and Integration in Search, covers putting taxonomies into use in as many ways as possible to maximize retrieval for users.

“This book has been a labor of love for me,” said Ms. Hlava. “I believe firmly in the value of taxonomies and their place within information systems, and I have wanted to share my thoughts with a larger audience for some time. I hope these books will contribute to a better understanding of the different ways taxonomies can be implemented and why information management professionals should embrace them.”

“Margie Hlava’s lectures on taxonomy pack a lifetime of experience creating vocabularies for corporations and organizations into narrative and case studies that will delight researchers and teachers and inspire students,” remarked Gary Marchionini. “Her love of language and organizational structure comes through in every chapter of the work.”

“It is our pleasure to have Margie Hlava as a Morgan & Claypool author!” commented Diane Cerra of Morgan & Claypool. “She and her Access Innovations team have made a much needed contribution to our publishing program, and to the community at large. These volumes will serve practitioners for many years to come. In addition, Margie and her group are a joy to work with: a personable, responsible, and responsive team that enabled us to quickly produce this collection.”

The books are available through Morgan & Claypool Publishers in either online or print format.


About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Raising a Glass to Rich and Complicated Words

December 29, 2014  
Posted in Access Insights, Featured, Taxonomy, Term lists

glassesPhoto, by Wikimedia user UFA66, of an artwork titled “Hip, Hip, Hurrah!” by Danish painter P.S. Krøyer, 1888, at the Gothenburg Museum of Art in Sweden. CC BY-SA 3.0. http://commons.wikimedia.org/wiki/File:Hipp_hipp_hurra!_Konstn%C3%A4rsfest_p%C3%A5_Skagen_-_Peder_Severin_Kr%C3%B8yer.jpg 

As we anticipate the approaching new Gregorian year, those of us who are taxonomists are looking forward with renewed anticipation to the taxonomic challenges that certain kinds of words bring. Take “glass”, for example.

“Glass” is one of those words that contain an abundance of possible meanings. Ironically, this poses the potential for ambiguity. What makes this particular situation even more ironic is that this ambiguity clouds the very clarity that the word often symbolizes. Ambiguous words are tricky to work with in constructing and developing taxonomies and thesauri. Moreover, they make the writing of effective indexing rules challenging. Taking care in the crafting of those rules becomes all the more important, because of the need for disambiguation.

Another basic challenge is posed by words representing concepts that fit (sometimes neatly, and sometimes, not so neatly) into a variety of categories, and that can be subdivided (again, sometimes neatly, and sometimes, not so neatly) into a variety of sub-categories or sub-concepts. Glass is one of those words, too.

When we raise a glass, or see one as half full or half empty (your choice), that glass is a drinking vessel. A looking glass is (or at least used to be) a mirror. When we venture into the plural form, “glasses” are often understood to be eyeglasses. And search engines have their own opinions, it seems; when I decided to google “glass”, the search engine presented a very different interpretation in the top hit, reflecting a certain bias by a well-known search engine. Glass also has its artistic side, especially with stained glass. Getting down to basics, and to what many of the various meanings have in common, glass is a kind of material.

A glass – a drinking vessel

halfglassPhoto by Derek Jensen, http://commons.wikimedia.org/wiki/File:Glass-of-water.jpg 

Looking glass – a reflection on ourselves

tennielIllustration by John Tenniel, for Lewis Carroll’s Through the Looking-Glass, http://commons.wikimedia.org/wiki/File:Aliceroom3.jpg

Magnifying glass – the bigger to see with

magglassPhoto, http://www.basilrathbone.net/films/#Sherlock

Glasses/eyeglasses/spectacles (Spectacles? There’s some big ambiguity there.)the better to see with

dudePhoto, http://www.therpf.com/f9/doctor-who-10th-doctors-glasses-have-they-been-ided-26658/

Google Glass – for when what you normally see (and hear) isn’t enough by itself

googleglassPhoto by Dan Leveille, http://commons.wikimedia.org/wiki/File:Google_Glass_photo.JPG, CC BY-SA 3.0

Stained glass – For when stained means pretty

stainedglassPhoto by Kate Jewell, of a stained glass window designed by Ronald Whiting and constructed by Chapel Studios of Kings Lynn, http://www.geograph.org.uk/photo/1461459, CC BY-SA 2.0

Glass – a kind of material (getting down to basics)


But what kind of material? Glass as a material presents several categorization challenges. For one thing, artists, art collectors, historians, engineers, and materials scientists deal with a large variety of glass materials. What most of us think of simply as glass, scientists classify as soda-lime glass. The Wikipedia category page on glass types lists forty-two separate articles, and it is by no means exhaustive.

And then there is the matter of state of matter. Is glass a liquid or a solid (presumably at room temperature)? The consensus has gone back and forth. For a long time (before the complications of solid-state physics), it was assumed to be a solid. Then some scientists speculated that it was a supercooled liquid. Evidence was supposedly provided by antique windowpanes, many of which were thicker at the bottom; this supposedly proved that the glass was flowing downward, ever so slowly. This persistent urban myth was solidly debunked only twenty-five years ago, in an article in the Journal of Chemical Education, although it still pops up in textbooks and science classes. Later, in 2000, the same journal published an article with the wonderful title “Glass Doesn’t Flow and Doesn’t Crystallize and It Isn’t a Liquid.”

Well, if glass doesn’t crystallize, it isn’t a solid, either, at least not in the traditional solid-state physics sense. So what is it?

Here’s one answer (of sorts) that I found on a University of California, Riverside webpage:

“There is no clear answer to the question “Is glass solid or liquid?” In terms of molecular dynamics and thermodynamics it is possible to justify various different views that it is a highly viscous liquid, an amorphous solid, or simply that glass is another state of matter that is neither liquid nor solid. The difference is semantic. In terms of its material properties we can do little better. There is no clear definition of the distinction between solids and highly viscous liquids. All such phases or states of matter are idealisations of real material properties. Nevertheless, from a more common sense point of view, glass should be considered a solid since it is rigid according to everyday experience.”

Well, that was kind of iffy. But then I found that a team of scientists recently tried to settle the question by performing a series of experiments and measurements on a chunk of amber (yes, amber turns out to be a kind of glass) that was twenty million years old.

amberPhoto of Dominican amber by Brocken Inaglory, http://commons.wikimedia.org/wiki/File:Ant_in_amber1.jpg, CC BY-SA 3.0

Their report, at http://www.nature.com/ncomms/journal/v4/n4/full/ncomms2809.html, has been summarized in an io9 blog post by George Dvorsky as follows:

“The team performed a series of calorimetric and stress relaxation experiments on the Dominican amber. They measured its relaxation times (intermolecular rearrangements) at various temperatures, including above its fictive temperature. The team observed that the amber relaxation times did not diverge — meaning that it couldn’t possibly be a kind of fluid.”

Okay, so it’s a solid, I thought. But then, I came across a fairly recent news article with this comment:

“The solids catalog used to be pretty straightforward. Solid stuff was either a crystal or a glass. Crystals fill up space with atoms or molecules in specific, fairly rigid patterns. The positions of the atoms are fixed such that if you take any section of pure crystal and slide it up, down, in, out or sideways a given distance, it will fit perfectly in the new position. That’s translational symmetry. You can also spin the crystal through certain angles and the atoms also will line up; that’s rotational symmetry.

“Glasses have neither symmetry. They’re just a random arrangement of their components, as if you’d taken a liquid and suddenly frozen everything in place without giving the atoms a chance to get in order. Which, in fact, is how metallic glasses are made.”

Okay, so glass isn’t a liquid, and it isn’t a crystalline solid. So what is it, really?

The modern scientific consensus seems to be that glass is a special kind of non-crystalline solid. As explained in Wikipedia’s “Amorphous solid” article:

“In condensed matter physics and materials science, an amorphous (from the Greek a, without, morphé, shape, form) or non-crystalline solid is a solid that lacks the long-range order characteristic of a crystal. In some older books, the term has been used synonymously with glass. Nowadays, “amorphous solid” is considered to be the overarching concept, and glass the more special case: A glass is an amorphous solid that exhibits a glass transition. Polymers are often amorphous. Other types of amorphous solids include gels, thin films, and nanostructured materials.”

So non-crystallinity is a defining characteristic of glass. Here we encounter another irony, and another potential source of ambiguity. Fine drinking glasses used to be made of “lead crystal”, but paradoxically, lead crystal is actually a kind of glass, and not one bit crystalline. Nor are its modern replacements that replace the lead oxide with a safer oxide, and that are known as crystal glass or lead-free crystal. So when you’re raising a fancy glass, be assured that it’s really a glass (unless it’s plastic or some such).

Let’s raise that glass to the wonderful richness and complexity of words! Cheers!

Barbara Gilles, Taxonomist
Access Innovations

The Ghosts of Concepts Future

The bell struck twelve.


The Phantom slowly, gravely, silently, approached. … It was shrouded in a deep black garment, which concealed its head, its face, its form, and left nothing of it visible save one outstretched hand. But for this it would have been difficult to detach its figure from the night, and separate it from the darkness by which it was surrounded.

(From A Christmas Carol in Prose; Being a Ghost Story of Christmasby Charles Dickens.)

And so it is with emerging concepts, those concepts whose forms we can but vaguely discern at the present point in time, whose true reality lurks in the future.

As taxonomists, we have a responsibility to discern those future concepts, although they may still be invisible to most. We can save the various expressions of those concepts in search logs from being rejected from consideration for a vocabulary simply on account of their as yet infrequent appearance.  In a taxonomy or thesaurus, we can provide labels that will consolidate the indexing for a concept for which researchers have not yet settled on a name. In some cases, especially with widely used vocabularies, we can perhaps determine the name by which a concept will be known on a standard basis.

This role in itself is one of the emerging responsibilities for taxonomists, thanks to the rapid advances in science and technology. In “What Next, Taxonomy?” (posted on The Taxonomy Blog on November 4, 2011), taxonomist Marlene Rockmore concludes that taxonomists need to deal with emerging technologies in a variety of ways, including collection of relevant content:

“So what next, taxonomy? What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics and challenges –  how to distribute and decentralize (localize)  while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data. Phew! That’s a challenge. Taxonomists have a chance to build relationships not only between terms, but with stakeholders on the way to a compelling, visualized, multidimensional content strategy. Good luck.”

This challenge has been growing in step with the rapid advances in science and technology. One example among the many advances in science is the ability of biologists to recognize new and emerging species, as well as life forms that have existed for a while but were formerly overlooked. The Live Science page Newfound Species observes:

“Science has identified some 2 million species of plants, animals and microbes on Earth, but scientists estimated there are millions more left to discover, and new species are constantly discovered and described. The most commonly discovered new species are typically insects, a type of animal with a high degree of biodiversity. Newly discovered mammal species are rare, but they do occur, typically in remote places that haven’t been well-studied previously. Some animals are found to be new species only when scientists peer at their genetic code, because they look outwardly similar to another species — these are called cryptic species. Some newfound species come from museum collections that haven’t been previously combed through and, of course, from fossils.”

Even the humble hosta has its own emergings, due in part to technological and social advances in communication.


A Rookie’s Guide to Hostas, Hostas, Hostas observes:

“In past centuries, we used to talk about people “discovering” new species of plants. What this usually meant was that European, English or American plant explorers traveled to remote parts of the world and found plants that were new to them. Now, of course, we know that local people in those other parts of the world were often quite familiar with these plants all along. Many of the so-called new plants, including hostas, have been found in local paintings and documents produced long before the Westerners started poking around. In more recent times, however, with better communications, we more universally share the knowledge of different horticultural communities.”

As far as actually emerging species are concerned, evolutionary biologist Rob DeSalle of the American Museum of Natural History has indicated the continuing nature of species emergence:

“Identifying a new species as it emerges is the holy grail of evolutionary biology. … Species must be emerging someplace on earth. The best places to look would be places with lots of species, like rain forests, and islands, because isolation opens new niches.”  (In “Q & A; Emerging Species” by C. Claiborne Ray, published June 17, 2003 in The New York Times)

The ScienceDaily website has a webpage dedicated to news about “new” species of plants and animals. While most of these will escape public awareness, Time Magazine has sifted through the barrage of information to identify the Top 10 New Species” of 2014. According to author Bryan Walsh, “The collection includes a dragon tree, a skeleton shrimp, a gecko and a microbe that likes to hang out in the clean rooms where spacecraft are assembled.”

Speaking of top things of 2014, and moving on to emerging technologies, the Massachusetts Institute of Technology’s online Technology Review has published a list of 10 Breakthrough Technologies 2014.” The list includes such things as brain mapping, genome editing, and agile robots.


The Wikipedia article “Emerging technologies” emphasizes the role of technology convergence in the emergence of new technologies. The article mentions an acronym of particular interest to those in the information technology world:

“NBIC, an acronym for Nanotechnology, Biotechnology, Information technology and Cognitive science, is currently the most popular term for emerging and converging technologies, and was introduced into public discourse through the publication of Converging Technologies for Improving Human Performance, a report sponsored in part by the U.S. National Science Foundation.”

Wikipedia also has a “List of emerging technologies” containing brief descriptions of “some of the most prominent ongoing developments, advances, and innovations in various fields of modern technology.” More than two hundred emerging technologies are listed.

There are and will continue to be many new and emerging concepts in science, technology, and other fields. Taxonomies can help define the terminology for those concepts. This is perhaps most readily evident for genus-species-subspecies-etc. names, whose designation is the territory of the biological taxonomist, or the biologist temporarily acting as taxonomist. Elsewhere, taxonomists can identify predominant labels and the occasionally used synonyms, and then use that information to add appropriate preferred terms and non-preferred synonyms to a vocabulary. They can also add definitions and scope notes. The skills of the taxonomist can bring clarity to formerly mysterious concepts and nomenclature. 

No fog, no mist; clear, bright, jovial, stirring, cold; cold, piping for the blood to dance to; Golden sunlight; Heavenly sky; sweet fresh air; merry bells. Oh, glorious! Glorious!


So don’t be scared of the ghosts of future concepts. Think of them as true spirits of the future, taking flight with the benefit of well-chosen terms and synonyms in a taxonomy or thesaurus.

Every time a new term rings true, an emerging concept gets its wings.



Barbara Gilles, Taxonomist
Access Innovations


Originally posted December 30, 2013. Updated for 2014.

The Gift of SMEs

December 15, 2014  
Posted in Access Insights, Featured, reference

A typical thesaurus construction project for a scholarly publisher, policy clearinghouse, medical institution, or any other client with a technical vocabulary involves the input of a number of stakeholders.

At a certain point—usually about three-quarters of the way through the construction of the taxonomy—it’s vital to get the input of subject matter experts (or: subject-matter experts, SMEs, or domain experts). These SMEs generally work for the client—often as technical editors, but just as frequently in other capacities—although we at Access Innovations are occasionally asked to provide them.

image1Figure 1: An SME in his native habitat

In either case, the SME’s job is to review the emergent thesaurus from the perspective of an expert in the field. Even the best taxonomists can garner a finite amount of knowledge through research, especially in complex, complicated, or highly sophisticated domains; without a PhD in particle physics, your branch on quantum field theory will only be so comprehensive.

The SME is, ideally, well acquainted with the state of research in the field, conversant with the current hot topics, steeped in the journal literature, and familiar with both the stable and fluctuating terminologies used by other practitioners in their discipline.

A vocabulary that’s missing important concepts or terms covering exploding areas of research will not present a good public-facing website for your client’s organization. Imagine browsing a computer science taxonomy that’s missing the term “Big Data” in 2014—would you assume the parent of that vocabulary is competent and up-to-date? Therefore, the SME is vitally important, as the end users who will be searching for content in, for example, a large repository of physics papers will very likely be similar to the SME engaged in the review.

Dealing with SMEs can be an excellent and productive experience; unsurprisingly, this process also involves a number of challenges.

(1) [Most] SMEs are not taxonomists: you have to explain it to them.

Your physicist, oncologist, or social scientist SME (unless they happen also to have attended library school) is probably not intimately familiar with tagging, information retrieval, or taxonomies—at least, not nearly to the extent that they understand their chosen area of study.

Don’t assume that whoever wrangled the SMEs for a call with you explained the project, what’s expected of them, or anything else. Be prepared to give a very short expository talk on why the taxonomy matters, what it will do, and why they’ve been asked to participate.

image2Figure 2: (Probably?) not a taxonomist

The bright side is that SMEs tend to be intelligent, so they’ll pick it up. Just don’t expect to be able to dive right into the hierarchy without explaining the background of the project, the purpose of their input, and what you expect them to do.

This last point is especially important. Explain clearly what you do and don’t expect the SMEs to provide feedback on. You definitely want:

(a)  Input on missing terms/concepts. What needs to be added?

(b)  Input on NPTs. What other names, especially acronyms, can supplement the existing terms? What is the fancy new name for a term that everyone’s using since Dr. Johnson wrote his famous 2009 paper?

(c)   Input on term placement and hierarchical organization.  Do the branches make sense? Are the top- and second-level terms a good outline of the field? (More on this below.)

(2) Some SMEs will be more engaged than others.

Almost predictably, some of the SMEs on a given project will take the minimum amount of time to provide feedback (there may even be some initial grumbling), and you’ll never hear from them again. That’s okay; accept the feedback (asking questions if necessary) and let them go back to their job. For many SMEs, their involvement is an extra assignment over and above their normal workload, so it’s understandable.

Invariably, though, you’ll get at least one SME who gets it—who is excited and engaged and thinks the thesaurus is cool and wants to help. This is exactly what you’re looking for, so make sure to match their level of engagement and enthusiasm. Once they get used to thinking about the taxonomy, they’ll be an invaluable resource.

(3) Disagreements between SMEs and taxonomists

See (1), above. SMEs are not used to thinking like taxonomists, so their ideas about term placement, term formation, and warrant are probably not influenced by things like the ISO and ANSI/NISO standards governing thesaurus construction. They will also not be very sensitive to ambiguous terms, and may be familiar only with the portion of relevant content covering their particular sub-area of expertise. You’ll want to watch out for a few specific issues:

(a)  Literary warrant. SMEs will want to add terms covering their entire field, not just the terms required to index the content in question. When considering terms suggested by SMEs, remember to check the content for warrant; reject any terms that don’t meet your criteria.

(b)  Term placement. SMEs will have ideas that make plenty of sense to them, but violate (for example) the all-some rule. Stand your ground here; no matter how you cut it, “dog food” is not a dog. Be ready to suggest using associative relationships (RTs), and explain why they’re helpful.

(c)   Top- and second-level terms structure. This requires a little more flexibility on the part of the taxonomist; while “Particle physics” is clearly a child of “Microphysics” (as it’s a sub-discipline of that field), if your physicist SME insists that it’s a major enough topic to be a Top Term, you should listen.

(4) Spec creep: how much time for SME review?

A taxonomy, as we know, is a living document that’s subject to constant revision and review, but at a certain point you have to call it complete and deliver the project. This is where your enthusiastic SME can cause problems; they will want to make tweak after tweak ad infinitum. Set a schedule for SME reviews, including a timetable for providing material, getting feedback, integrating that feedback, and returning the revised taxonomy. One more round of changes is acceptable, but if you allow for more, it’ll never end.

Try to allow for about eight hours to process the feedback from each SME. Each suggestion, addition, term move, and deletion needs to be considered carefully, so make sure to allow your taxonomists time to properly weigh SME input.

(5) How many SMEs do you need?

This really depends on the size—and, moreover, scope—of the vocabulary. A thesaurus covering All of Science will require more reviewers than one on Acoustics. If you have many SMEs, try to keep them from stepping on one another’s toes.

(6) Tips on presenting the taxonomy and soliciting feedback

Ideally, you can provide a hierarchical display (naturally, a read-only version) that the SMEs can access; this allows them to see the entire term record, including non-preferred terms (NPTs), related terms (RTs), and multiple broader terms (BTs).

In conjunction with a hierarchical view (if possible), the best mechanism for SME feedback is [still] a spreadsheet with a hierarchical display of terms. (If you can, provide each SME with just the branches of the hierarchy that they’re being asked to review.) A spreadsheet allows the SME to make comments, changes, suggestions, additions, and other input using colors, adding cells, or leaving remarks in adjacent fields. Make sure that the taxonomist can see the feedback at a glance, so they don’t have to spend time poring over the document looking for comments.

(7) You don’t have to integrate every single comment–you’re the filter.

On receiving SME feedback, the taxonomist’s job is not to make every change suggested by the SMEs; rather, the SME’s input is raw material for the taxonomist to consider using. In other words, the SME’s expert opinion has to be run through the taxonomist’s filter to accept, reject, or re-format for inclusion in the taxonomy.

image3Figure 3: A grain (or more) of salt

On the other hand, have respect for the SME’s expertise. Be flexible when you can, and try to accommodate the SME’s point of view wherever possible. Oftentimes the SME will, for example, make a suggestion to add a term that already exists phrased another way (a conceptual duplicate); this can trigger the addition of an NPT, changing the preferred version of the term, or some other action—one that was not intended, but nevertheless turned out to be useful.

image4Figure 3: Best. Present. Ever.

SMEs can be a great gift for any taxonomy project—if you have a strategy, provide a clear set of expectations, and maintain good communication throughout the process.

Bob Kasenchak, Project Coordinator
Access Innovations

Access Innovations Named to EContent’s Annual Top 100 Companies

December 8, 2014  
Posted in Access Insights, Featured, metadata

Access Innovations, Inc., a leader in digital data organization, is proud to announce its inclusion for the fourth time on EContent magazine’s annual list of the top 100 companies in the digital content industry in the category of SEO and Search Analytics.

“This year we had three new judges and lots of new companies to consider. We also included a new category: Big Data. These days, data is the driving force behind almost everything on the web. From the targeted ads you see while surfing your favorite sites, to the articles and videos that those sites serve up to you, data is behind it all,” remarked Theresa Cramer, editor of EContent Magazine. “Congratulations to all of the companies on this year’s EContent 100 List and kudos on all they contribute to the digital content industry.”

Jay Ven Eman, CEO of Access Innovations, said he is honored by the distinction. “We appreciate winning a spot on the EContent list,” commented Ven Eman. “Access Innovations is dedicated to continuously improving and expanding the software and services we offer. Recognition like this is quite rewarding.”

“We enjoy pushing the envelope in many areas of digital content,” added Marjorie M.K. Hlava, president of Access Innovations. “The emergence of Big Data has made vocabulary control increasingly vital for effective data mining, and we’re placing a special emphasis on disambiguation of sources, such as authors and affiliations, for distinctly improved search retrieval. Our company has always done new things, and we will continue to evolve, so we can meet the challenges that our clients face in our ever-evolving information universe.”

EContent magazine’s annual list of top 100 companies that matter most in the digital industry was compiled by an independent panel of industry insiders. Unlike many other trade lists, inclusion is not purchased and is at the sole discretion of the judging panel. For a full list of the top 100 companies that matter most in the digital content industry, click here.

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

About EContentwww.econtentmag.com

EContent is a leading authority on the businesses of digital publishing, media, and marketing—targeting executives and decision-makers in these fast-changing markets. By covering the latest tools, strategies, and thought leaders in the digital content ecosystem, EContentmagazine and Econtentmag.com keep professionals ahead of the curve in order to maximize their investment in digital content strategies while building sustainable, profitable business models.

Passive Searching for KM Taxonomy Resources

December 1, 2014  
Posted in Access Insights, Featured, search

“The bad news is time flies. The good news is you’re the pilot.” (Michael Althsuler)

In past blogs we’ve already looked at several ways to actively search the open web for taxonomy, thesaurus, and ontology resources. Today we’ll consider a few tools that enable passive – “set it and forget it” – searching. In these passive search methods, the searching doesn’t require the user’s continuous and direct attention. This search strategy allows you to turn your attention elsewhere while the tool trawls the web for content that is of interest to you. Let’s consider…

o   Alerts

o   RSS feeds

o   Customized dashboards

o   TOC services


You may set alerts in Google, for instance, that allow you to determine your own search criteria as if you were conducting an active search. You can take advantage of truncation, using wildcards, and Boolean operators, as well as other search features usually found under “advanced” search.



Noticeable in the screenshot above are several preset “suggested” categories. However, you’ll want to customize your alerts in order to fit your personal interests. To avoid drinking directly and continually from the information fire hose, you can enable your alerts to flow into your e-mail inbox at the following frequencies available through the drop down menu: “As-it-happens,” “At most once a day,”or  “At most once a week.”

The following settings menu appears once you’ve entered your search terms.


Below this menu you will also see a helpful “Alert preview” that shows you a sampling of search engine results. Before actually hitting the blue “CREATE ALERT” button, you can tweak your search string for better accuracy. Watch the sample search results change as you experiment with different search terms and search term word order.

Additionally, the searcher can limit or expand his search pool(s) to the following sources: Automatic, News, Blogs, Web, Video, Books, or Discussions. The user can also refine searches in order to monitor by language, or region, or even limit to “best results.” You can then have the search results sent to the e-mail address that you assign.


To avoid flooding your inbox, you might be interested in RSS (Rich Site Summary or Really Simple Syndication). Subscribing to a website RSS will remove your need to manually check the web site for new content. Through RSS, you’ll be automatically alerted to updates at the site.

When you have discovered sites and pages that interest you, check whether or not the site offers an RSS feed. Look for the following symbol or icon at the site.


Feed readers can be web-based, desktop-based, or mobile-device-based. You’ll need to find one that is compatible with your browser or your operating system (Windows, Mac, or Linux). Some browsers, such as the current versions of Firefox and Safari, have built-in RSS readers. If you’re using a browser that doesn’t currently support RSS, consider using one of the many RSS news readers available for download from the Internet. Lists exist here and here.

If you desire to manage all of your feeds from single place, you may wish to consider https://www.feedmyinbox.com/

Here is an example of an RSS subscribe invitation at TaxoDiary.


Although each RSS reader has its own way of adding a new feed, try clicking on the link or small XML button near the feed you want. You’ll see a page displaying XML code. For example, clicking on the RSS symbol in the example above results in the following display choices.


From your web browser’s address bar, copy the URL. Sometimes it is necessary to paste that URL into the “Add New Channel” section of the reader. The user would need to click the “View Feed XML” (the last tab on the bottom right of the screenshot above). After the feed XML source has been entered into your reader, the feed will then start to display content and regularly update the headlines for you.


An example of a passive searching technique that empowers personalization is www.netvibes.com


Once you create a free personal account, you are able to monitor web content by means of tailored dashboards. There is great flexibility in choosing various displays for your dashboards. Since the searcher decides when to log in and check the dashboards, such alerts are non-intrusive and under the complete control of the dashboard creator.

Still another way to avoid cluttering your e-mail inbox is the use of TOC services.


A common complaint among professionals is that while they have the desire to stay current in their field of work or study, various constraints constantly thwart that desire. Either they are incapable of purchasing all of the different peer-reviewed professional journals that apply to their field, or they cannot set aside the time to read them.

A table of contents (TOC) alerting service can bring to you the most recent articles and titles in the subject or topic of your choice. By regularly perusing TOCs and abstracts, the taxonomist or knowledge manager is better able to recognize emerging terms for concepts on the basis of literary or industry warrant. Scan up-to-date, freshly published scholarly resources by browsing, viewing, saving, and searching across thousands of journal tables of contents from hundreds of publishers. Cost-free registration allows you to create a customized list of your most important and favorite journals.

Give the JournalTOCs TOC alert service (http://www.journaltocs.ac.uk/) a try, and follow your journals by title, subject, or publisher.


Some publishers and libraries have their own TOC services to alert you to new publications and additions to their libraries. Springer Publishing offers another example of this passive search technique. Springer’s alphabetized list of 2,200 journals is found here.

Some web sources that you may wish to track do not offer convenient RSS feeds. However, the following work-arounds will help you stay current with changes at the site(s).





If it’s imperative that you keep up with important product updates, another alternative is http://www.copernic.com/en/products/tracker/. This Internet monitoring software is designed for everyone from home users to competitive intelligence researchers.


The implementation of these passive search approaches may also assist you in the discovery of the knowledge management taxonomy resources that you require.

Eric Ziecker, Information Consultant
Access Innovations

Additional Active Searching Techniques for Taxonomy Resources

November 24, 2014  
Posted in Access Insights, Featured, Taxonomy

To continue our theme from last time: “For every minute spent in organizing, an hour is earned” (Unknown).

This quotation helps us remember the importance of developing online search techniques that will help you “save” time (or, at least allocate your time wisely!). Previously, we looked at keyword searching on the open web and in full-text search environments. Today, let’s consider another active search technique. This one takes advantage of online website directories.

There are a few free, quality, online directories. “Quality” and “free” are not mutually exclusive terms. Current and curated directories are useful for bringing in specific and targeted user traffic. Unlike a keyword search, directories allow the searcher to browse general subjects until he or she is ready to drill down to a very specific classification, category, or topic. Finding a “good” directory is not entirely subjective. Objectively, “good” directories are governed by the goal to offer only trustworthy and timely listings. The listings in a quality directory have been pre-evaluated, or vetted, by a human curator or editor.

It is necessary to think categorically when looking for taxonomy, thesaurus, or ontology resources. When it comes to “taxonomy management” or “thesaurus management,” you will want to first look at the top level terms and categories of the directory. In searches for taxonomy resources within a directory, the following pattern is observable at both http://www.dmoz.org/ and www.bestoftheweb.com.


At both of these directory locations, drill down by clicking the “reference” path. Underneath “reference”, look for “knowledge management.” Since taxonomies assist in “knowledge discovery” (as in navigation), the searcher might discover something of use there. Otherwise, “knowledge representation” or “knowledge retrieval” may be the searcher’s next tab(s) of choice. “Classification” is sometimes a useful subcategory to explore, also. Although “classification” may refer to a physical location (as in book item or number), it sometimes overlaps with the ordering of terms that describe concepts of information resources. For example, as you scroll down the 29 entries at http://www.dmoz.org/Reference/Libraries/Library_and_Information_Science/Technical_Services/Cataloguing/Classification/ you’ll notice some taxonomy entries near the bottom of the alphabetized list (see the red underlining in the screenshot below).


Interestingly, each directory has its own categories and may take a variety of approaches and strategies to hit your target (if the directory even contains your target!).

Another directory you might try is http://dir.yahoo.com/ But notice the different sequence and thought process in order to find your resource(s). Here is the search string, or permutation, in order to arrive at “Knowledge Management.”

Directory > Business and Economy > Business to Business > Management > Knowledge Management

Earlier on in the string, the searcher could have also detoured after Directory > Business and Economy > Business to Business > Information > and also found pertinent resources for information or knowledge. So, in this case, your primary directory top term was not reference but business and economy.

You might also consider some of the following directory resources; http://www.refdesk.com/toc.html. As you scroll down the page, look for “Refdesk Subject Categories” located in the final portion of the middle column.


The Internet public library offers both a search window and a subject directory. See http://www.ipl.org/div/subject/ or http://www.ipl.org/div/about/sitemap.html 

The WWW Virtual Library can be found at http://vlib.org/. Click on Information and Libraries to yield the following categories of interest for knowledge and information management:

lts is http://www.e-journals.org/ A simple search of taxonomy management yielded 244 results. Try some variations like business taxonomy for additional results.



Another resource to try is http://www.exactseek.com/


The serendipity that results from browsing often yields better results than keyword searching. For example, notice the rich resources located at http://www.brint.com/km/


At the same site, you’ll find various portals down the rightmost column at http://www.brint.com/km/#definition



Toying slightly with different search term combinations will provide various results that are still within your desired search parameters. For example, try searching business taxonomy or project taxonomy or operative taxonomy. Other resources to explore include http://www.best-web-directories.com/free-directories.htm and http://www.controlledvocabulary.com/links.html.

[Although several paid directories exist, they are outside the scope of this blog.]

Next time we’ll consider some passive search strategies in order to find taxonomy, thesaurus, or ontology resources.

Eric Ziecker, Information Consultant
Access Innovations



Access Innovations Delivers Thesaurus for AAAS

November 17, 2014  
Posted in Access Insights, Featured

Access Innovations, Inc. is pleased to announce the delivery of an extensive thesaurus for the American Association for the Advancement of Science (AAAS).

Data Harmony With a Century of Content

               In the summer of 2013, AAAS, the publisher of Science Magazine and various specialty journals (such as Science Signaling and Science Translational Medicine), contacted Access Innovations, Inc. regarding a thesaurus project to index and tag their data using Access Innovations’ patented, award-winning Data Harmony software. Because the publications represent the entire spectrum of science, this presented a unique challenge to Access Innovations. “The taxonomy needed to address all areas with a great degree of specificity to improve discovery for AAAS users,” remarked Bob Kasenchak, Project Coordinator for Access Innovations.

Formed in 1848, AAAS has content dating back to the 1880s. Since then, scientific language and terminology have undergone significant changes. With a database of 250,000 articles spanning 120 years of science, it is crucial for AAAS to implement granular topical indexing and provide a mechanism for browsing the full corpus of electronic content. The implementation of Data Harmony will greatly improve the accuracy of the free text searches used on the AAAS website.

A Cooperative Effort

Access Innovations has now delivered the thesaurus and continues to streamline content for AAAS. Due to the extremely broad range of topics in their content, AAAS has access to an unusual number of in-house subject matter experts (SMEs). Access Innovations worked in close conjunction with the SMEs to review and refine the thesaurus. Will Schweitzer, Business Director of AAAS, notes, “We’re excited to use our new thesaurus throughout all our platforms and products. We’ll first use the thesaurus to make our peer-review processes more efficient and to improve our readers’ browsing and search experience.”

As AAAS begins leveraging the thesaurus for website navigation, Access Innovations continues to refine the index and manage the taxonomy.


About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Access Innovations has extensive experience with Internet technology applications, master data management, content-aware database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world. Access Innovations: changing search to found since 1978.

The Use of Data in Election Cycles

Election Day has once again come and gone. Incumbents are ousted, bills are passed, and the political ads have finally stopped, so now the fallout begins. But regardless of where you fall on the political spectrum, there’s one thing you can be sure of: lobbying groups and political campaigns have utilized Big Data to try to secure your vote and will place an increasing amount of importance on it. On a smaller level, data analysis has been happening in this realm for years, but really, it became huge during President Barack Obama’s 2012 reelection campaign.

During that campaign, Obama’s team assembled a huge staff of analysts to work with the terabytes of collected voter data. The results of their analytics included a number of strategies to target specific constituents and disseminate information and donation requests to them based on exactly the issues that matter most to them. For the voter, that not only reduces the noise in their emails, it personalizes the election and, as we’ve seen plenty of times through the years, people tend to vote for someone to whom they feel a personal connection.

People felt that connection to Obama in 2008 through his particular personality and brand of speaking; while none of that changed much over the following four years, 2012 saw him reach audiences through the use of data, as well.

As much as Big Data is being used by American politicians, they aren’t the leaders in it. In India’s elections earlier this year, the Bharatiya Janata Party (BJP) used their mountains of data to secure funds, advertise, and organize events directed toward, again, putting a personal face on the election. BJP leader Narendra Modi is now India’s prime minister and has reached out to the world using the same strategies. His six million Twitter followers would suggest that they work.

Then we have the BBC. We’ve already seen their innovative uses of Big Data and Linked Data for BBC Nature and the 2012 Olympics, but for the UK elections this past May, they devised a system by which they could aggregate their data and disseminate news and information in sophisticated, highly useful ways. In this case, it’s more about analysis to serve their own reporting, but the way richness with which they were able to deliver the news to their readers and viewers was nothing short of fantastic.

These are three strong examples of how Big Data, Linked Data, and semantic enrichment are changing the way election campaigns are conducted and covered for the better, but all three are top-down processes. By that, I mean that in all these cases, we are being directed to look at or think about particular subjects and issues. But in this increasingly interconnected online world, I’m not the only person who would rather direct myself, to tell myself what issues are important to me.

In their Olympics coverage, they proved beyond a shadow of a doubt that Linked Data can be effectively used to make knowledge easy to access for the individual. You might recognize all the members of the gold-medal winning women’s gymnastic team; we couldn’t hear enough about them during the event. What about the Russian team that won the silver though? We don’t hear much about them, but going to the BBC website would not only tell you their names, but where they’re from, what other events they’ve competed in, and much, much more. Elections are far more important than the Olympics, so why not do the exact same thing with candidates?

We know what the ballot will look like well in advance. All that would have to happen to get this process started is to lay that out on a publicly accessible website. Each candidate would have a link attached that would take the viewer to a page for the individual. From there, we could see the voting history of candidate X, other candidates who voted alongside her, links to campaign speeches, writings, and news reports on the person, as well, most likely, as many things that I can’t think up right now.

We could do the same thing with ballot measures with little additional trouble. For these, we could look at a particular measure’s history, others like it voted on previously or in other regions, reporting on it, and all sorts of statistics.

All of this is in the service of information and knowledge, which helps us as voters make more reasoned, coherent decisions. We are being monitored constantly in service of directed advertising, whether it’s in the political spectrum or elsewhere. On a personal level, I don’t really care about that, but there’s no good reason why we couldn’t have access to candidate data in an easily digestible form. The information is out there, but it’s a lot of labor for individuals to take on by ourselves. I hope the day soon comes when I’ll be able to go to a single place and learn what I need so that I make the best possible decisions while in the voting booth.

Obviously, I’m hugely impressed with how the BBC has embraced this new philosophy about they way they deliver their content. They’ve made it easy for individuals to collect knowledge in all kinds of realms. Now, they don’t have what I’m looking for, either, but between the Olympic Data Service and Vote 2014, it’s clear they have both the mindset and capability to make it happen. When will media outlets on this side of the pond follow suit? Quickly, I hope.

Daryl Loomis
Access Innovations

« Previous PageNext Page »