Are You a Traditionalist?

November 23, 2015  
Posted in Access Insights, Featured, Taxonomy

When I say Thanksgiving, do you immediately think of a basted, golden brown bird on a large platter adorned with oranges, cranberries and sage? Maybe you think of a juicy oven-roasted ham bearing the traditional clove and pineapple scored design baked into its caramelized goodness?

Traditional Thanksgiving Food –

Thanksgiving seems a holiday that’s as American as apple pie, or pumpkin pie for that matter. But actually, there are variants of this holiday all around the globe. Their meanings, dates and customs may vary, but they all revolve around the concept of gratitude and food, of course.

I think of these foods as traditional to the North American Thanksgiving holiday since I am North American. Other countries view traditional holiday foods through their own cultural lens.

For instance, while North Americans and Canadians both celebrate Thanksgiving Day, there are several differences between the traditions, practices and foods in the two neighboring countries. While the basic Thanksgiving foods are similar in name, in practice they are quite different.

For instance, Canadian pumpkin pie is spicy, with ginger, nutmeg, cloves and cinnamon, while North American pumpkin pie is typically sweet and has custard in it.

The North American Thanksgiving holiday has been celebrated as a federal holiday every year since 1789, after a proclamation by George Washington. The event that Americans commonly call the “First Thanksgiving” was celebrated by the pilgrims after their first harvest in the New World in 1621. This feast lasted three days, and it was attended by 90 Native Americans and 53 pilgrims to offer thanks for their blessings.

The First Thanksgiving 1621, oil on canvas by Jean Leon Gerome Ferris (1899). The painting shows common misconceptions about the event that persist to modern times: Pilgrims did not wear such outfits, and the Wampanoag are dressed in the style of Native Americans from the Great Plains.

Despite what we were taught in school plays, for many of the pilgrims, England was just a layover on the way to America. Approximately 40 percent of the adults on the Mayflower were coming from Leiden in the Netherlands. The people of Leiden still celebrate the American settlers who once lived there with a non-denominational church service on the fourth Thursday of November. Afterwards, there’s no turkey, but refreshments of cookies and coffee.

Canada’s Thanksgiving celebrates the harvest and other blessings of the past year and has been an annual Canadian holiday, occurring on the second Monday in October since 1879, when Parliament declared a national day of thanksgiving.

Other countries have their own version of this holiday. Germany sees this celebration as a religious holiday that often takes place on the first Sunday of October. Erntedankfest is essentially a harvest festival that gives thanks for a good year and good fortune. Although turkeys are making inroads, chickens and geese are favored for the feast.

A food decoration for Erntedankfest, a Christian Thanksgiving harvest festival celebrated in Germany.

A variation on North America’s Thanksgiving can be found in the West African nation of Liberia. This country was founde in the 19th century by freed slaves frm the United States. Liberians take the concept of the cornucopia and fill their churches with baskets of local fruits like bananas, papayas, mangoes, and pineapples. An auction for these is held after the service, and then families retreat to their homes to feast.

Kinrō Kansha no Hi is a national public holiday in Japan to celebrate celebration of hard work and community involvement. It is derived from ancient harvest festival rituals named Niinamesai. Today it is celebrated with labor organization-led festivities, and children creating crafts and gifts for local police officers. This is one exception in that food is not central to this holiday and turkey does not have a traditional role.

Tradition has its place in every culture, but more and more new generations are looking to make their mark on the culinary expectations of holidays. “Foodies” like to experiment and cook outside the classification.

What is seen as non-traditional to some will vary to the geographical area and history. I have a friend who through a series of unfortunate events failed to procure a turkey in time to safely thaw and prepare before the family feast last year. He instead prepared some stuffed pork tenderloins and the response from his family was joyous. They have declared this their new “tradition”.

Fusion is a result of mixed cultures and it is represented in food more and more. The gourmet food magazine, Food & Wine, offer alternatives to the Thanksgiving menu and they aren’t referring to just using a Cornish hen vs. a turkey. Out of the kitchen ideas like mushroom lasagna and sausages would make even the most traditional among us give pause.

Wherever you fall in the food spectrum – traditionalist or adventurer – there are many options available both for home preparation and dining out. More restaurants than ever are open on this holiday to give your favorite home chef the day off so everyone can gather and celebrate in their own way — together.

Melody Smith, Blog Wrangler and Extreme Foodie
Access Innovations

National Association of Government Web Professionals (NAGW) 2015 Annual Conference held in Albuquerque, NM, September 23-25, 2015

November 16, 2015  
Posted in Access Insights, Featured, Taxonomy


Conveniently held in our hometown of Albuquerque, the program for the National Association of Government Web Professionals (NAGW) 2015 Annual Conference was sufficiently compelling to warrant our participation. Two of us attended sessions, receptions, and networked with an enthusiastic group of professionals.

A first observation is in the name. NAGW members prefer “Web Professionals” over webmasters. The difference in meaning (semantics) can have a significant impact on how words are perceived.  Master verses professional? I’ll let you draw your own conclusions.

They are a very professional group, in my observation, and the meeting focused on the challenges and triumphs of running an essentially entrepreneurial effort in a highly political, bureaucratic environment. City, county, and state governments and agencies, as well as some federal agencies, were represented. Issues included web site organization, discovery, security, mobile venues, measuring success, Section 508 compliance, look and feel, branding, training, support, dealing with citizens, and a host of issues common to all web professionals. Technical sessions at the coding level were also on the program.

Besides challenges, there were plenty of triumphs chronicled by various presenters as well as NAGW’s annual “Pinnacle Awards”. The Pinnacle Awards are divided into the population size of the government entity – small, medium, large, etc. Some of the award criteria included team size, content, organization, design, performance and flexibility, accessibility, standards, and interactivity. It was nice to see a significant number of entrants in each category. It can be intimidating having your work evaluated by your peers, but it can be very instructive, leading to an improved site.

Delving into the politics of government websites is out of my purview. What gets posted to a government website brings with it an assumed imprimatur. Verifying, checking, and getting approvals (often multiple) of every content item is costly and time consuming. Resisting blatant or even subtle propaganda posting can be hazardous to one’s career! Being responsive to a new mayor with their unfunded mandates requires a great deal of creativity and maneuverings. Government departments are often fiefdoms and getting cooperation on design issues, what to name things, and providing access to important, useful content is not easy.

A challenge that I can address is discovery. Ron Pringle, City of Boulder, gave a great and candid presentation, “Improving Search:  Lessons from the Trenches”. His remarks addressed citizen-facing websites versus internal portals. Why do citizens go to their city’s website? To find resources that answers questions like: What can be recycled? What day is trash pickup? Where do I vote? Who is the city council person for my district? Many city websites seem to be geared to wooing tourists. They are awash in pretty pictures, while a simple listing of government services is woefully missing.



Tourism website for Los Angeles, California



Citizens’ website for Los Angeles, California

Search boxes are hard to find. Navigating is often difficult, although some cities’ websites, like that of Los Angeles, California, were highlighted as quite good.

A good place to start is by analyzing search logs. This will tell you what citizens are trying to find.  It beats guessing. The most requested resources should be the easiest to find.  Simple listings and navigation tabs are helpful.

Even a simple listing of a city’s major departments can be difficult to assemble. Do you list an agency by its official name or by what most citizens call it? Should the Solid Waste Management Department be listed as such or should it be called the garbage department or sanitation department on the website listing? Again, what your citizens call a department should provide a clue. Navigation aids should be just that – clues that help citizens find the resources they need. Once to the right resource, the official name of a department can be, and should be, prominently displayed. A drop-down navigation aid on the home page does not have to have the official name or the technical name. Do you want to lead with “HHW” or with household hazardous waste disposal or maybe just waste disposal? Lead with a common, general term and then get more specific. From “waste” a citizen might navigate to “hazardous waste” and “nonhazardous waste”. Under hazardous waste could be a list, but again, use common names and not the scientific name:  “antifreeze” not “ethylene glycol”. Under types of antifreeze, you could then list ethylene glycol along with propylene glycol, etc., as each may have different disposal requirements.


Albuquerque’s citizen website shows “Trash & Recycling” under the “Community” tab

Lists are good, but what about the ubiquitous search box? This is where a good taxonomy is invaluable. It is the foundation of your navigation lists and aids. A good taxonomy provides the basis for sound navigation and rapid, accurate discovery. It does this by mapping the language of the citizen to the language of city bureaucrats. What the citizen calls the garbage department, the city calls the sanitation department, or the solid waste department, or… A taxonomy will bridge this gap. Taxonomies can help resolve the hundreds of acronyms that are so prevalent in government. It provides a reliable connection between the vernacular and the formal, or more scientific, terminology.

I encourage you to investigate the rich resources on semantics, thesauri, and taxonomies found at our company website. I also encourage you to investigate NAWG, if you are a web professional in the government arena.

Jay Ven Eman, CEO
Access Innovations

Access Innovations Calls for Beta Testers for Ontology Master

November 9, 2015  
Posted in Access Insights, Featured, ontology

Access Innovations is proud to announce that it has reached the testing phase on its latest product, Ontology Master (OM). This adds to their current lineup of Data Harmony software offerings, including Thesaurus Master, MAIstro, and the Data Harmony Suite.

Ontologies are a growing trend in the information science industry. They are intended to provide a language that can be used to describe classes and the relationships between them. They formalize a knowledge domain by defining classes and the properties of those classes, while providing semantic meaning within entities. Ontologies, which use Resource Description Framework (RDF) triples, use Web Ontology Language (OWL), a format developed by the W3C in 2006 to formalize a language that would allow for linked data and relational database sharing across the web.

“Ontology Master extends the already powerful Data Harmony software tools by allowing users to create relationships between terms in a vocabulary,” comments Win Hansen, head of project development for Access Innovations. “This will give our users much more detail and richness when working with their content.”

Ontology Master is still currently in development, but Access Innovations is at a stage in the project where testing is required. They are currently calling out for beta testers to use and comment on the software to help them refine it before release.

Jay Ven Eman, CEO of Access Innovations, remarks, “Ontologies are increasingly becoming the norm in Information science, because software agents can now make more reliable inferences helping get users to the web resources they need. Our team has worked long and hard to bring Ontology Master to our clients and we’re very excited to have people beta test the software.”

If you are interested in becoming a beta tester for Ontology Master, contact Access Innovations at


About Access Innovations, Inc.,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Using N-grams to discover taxonomy terms

November 2, 2015  
Posted in Access Insights, Featured, Taxonomy, Term lists

Words that occur together frequently are likely to encode important concepts. Therefore, simply sorting a list of phrases according to the frequency of occurrence in text is an automatic way of capturing important concepts associated with that subject. Word order is important since meaning tends to be associated with order. Thus, word order must be preserved when creating phrases from text.

The basic idea of N-gram analysis is to count phrases consisting of N sequential words from a document. Sorting a list of all of the phrases of all sizes contained in a corpus of documents by frequency presents a list of candidate phrases. These are likely to encode concepts important in the corpus. Frequency of occurrence will occasionally be correlated with the importance of the concept, though occurrences at the highest frequency level often can be less than helpful. They may well be commonly seen phrases, so they can’t simply be taken on their own; the human element must still come into play.

Concepts are not necessarily captured by phrases of a particular length. Indeed, some concepts might be complex enough to require several sentences to describe them. Therefore, it is appropriate to explore phrases of various lengths when searching for concepts in text. Of course, human propensity for acronyms and economy of communication tends to drive the representation of important concepts toward shorter words or phrases.

Thus there are two opposing forces at work that tend to adjust the balance between representing complex ideas: short sequence of characters that need to be supported by a large dictionary of complex concepts and long sequences of words that can be supported by a smaller dictionary of simpler concepts.

A fine example of this is in the comparison of the Windows and Linux operating systems. There is stark contrast between the point and grunt paradigm of Windows, where enormously complex concepts are embodied into the process of pointing to a simple button in Windows, and the verbosity of Linux, with paragraphs of text used to describe the same operation on Linux.

New ideas can be discovered by finding combinations of words that have not been seen before or that are occurring with higher (or lower) frequency than in the past. Therefore, having the capability of detecting changes in the frequency of occurrence of phrases can be a path toward discovery of new or evolving concepts. In addition, when starting a new taxonomy, useful groups of words can be selected from the N-grams as a starting point for the taxonomy.

N-grams are not only good for discovering new concepts, though. Equally important is the ability to use N-grams to discover concepts that are no longer being discussed. Take a journal on nuclear physics. As early as the 1920s, scientists were hypothesizing on the concept that would come to be known as “cold fusion.” Papers were published on the topic all the way into the late 1980s, when Martin Fleischmann and Stanly Pons drew wide media attention after reporting that their experiment actually worked. The idea was a cause of celebration in the wake of rising energy costs and the need for cheap clean energy.

Had N-grams been run on the corpus of that nuclear physics journal in 1988, “cold” and “fusion” might frequently be seen together–an obvious choice for a candidate term in a taxonomy. Just a year later, however, after nobody could repeat the Fleischmann-Pons experiment, the concept was debunked and quickly considered a joke. Afterward, almost nobody wrote on the concept and it became extremely rare to find anything on the topic in a reputable journal. Running N-grams on the same journal today would reveal it as a concept that may well no longer even belong in the vocabulary at all. N-grams have considerable value in the understanding the evolution of a single concept or an entire discipline.

They can also be extremely useful in limiting concepts in a taxonomy to things that are useful. Say, for instance, I’m building a taxonomy of food for a website and I come to a branch for “Cheese.” There are thousands of different styles of cheese and I could fairly easily get a list of cheeses and add them all into the branch. That’s simple, but extremely time consuming and, ultimately, not very useful. If there is no content on this website about Abondance, an excellent but relatively unpopular cheese, and nobody is searching for content about it, why would it be in the taxonomy? It’ll just sit there uselessly. The answer, of course, is to run N-grams on the site content and the visitor search logs. The cheeses that appear in the results are the ones that could be considered highly useful in the taxonomy, helping to keep it clean, concise and, especially, relevant to your content.

N-grams may not be perfect, but they’re a great beginning to a controlled vocabulary. Their quick analysis is brilliant for going through scores of content, but they still absolutely require the human element to be useful. We at Access Innovations use N-grams in close conjunction with our taxonomists to help bring out the most in our clients’ content.

Daniel Vasicek,Programmer
Daryl Loomis, Business Development
Access Innovations

What is JATS and Why Should I Care

October 26, 2015  
Posted in Access Insights, Featured

Search for “jats” and you will find two very distinct concepts:


14th Murrays Jat Lancers (Risaldar Major) by AC Lovett (1862-1919).jpg

This post doesn’t discuss the Jats, a race of people, interesting as they are. This post covers information tagging, specifically a specialized list of xml elements for journal articles.

Problem and need

Interoperability has continually posed a greater hurdle for scholarly publishers over the past few years. With multiple organizations publishing journal articles on an open source basis and offering free content for casual readers, scientific and technical journals have banded together to abide by a set of standards to streamline shared documents.

As more scientific, medical, technical, and engineering writings are pushed out the door of major publishers, the need to structure this data in a robust and interoperable manner greatly increases. Scholarly publishers, universities, and multiple institutions within the scientific community require access to simple tools in order to convert format to format.

The Journal Archiving and Interchange Tag Suite (JATS) provides a comprehensive list of xml elements and attributes in order for each of the published articles to swap easily into multiple data repositories and archives. Tag sets defined by JATS provide information spanning entity identification for authors, editors, and reviewers, as well as the institutions with whom the authors are affiliated. Regardless of the source for which the content was published, the tag suite would allow publishers and archives to capture the semantic components of each document without requiring additional formatting or processing issues.



JATS logo


In 2003, the National Library of Medicine introduced the NLM DTD v1.0 set of standardized XML elements used to mark up scientific and medical journal articles.

Prior to 2000, articles were published in either  SGML, TeX, LaTeX, PDF , or other proprietary formats. The varied and individually rigid nature of the formats, along with the issues of lacking metadata and structure, caused woes of conversion, sharing, loading, findability and retention. Since then, major revisions have been implemented in order to satisfy needs to mark up header, metadata, full-text, formulas, and references. The JATS schema evolved from the NLM DTD v3.0 standard. An NLM 3.1 version was slated to be in production, but was superseded by the joined efforts of the publishers and new features added to the JATS 1.0 DTD instead.

After the adoption of JATS, journal publishers from all sectors from for-profit to open access began creating repositories for JATS. Several institutions within the scientific community sharing open access journals utilized open access repositories including PubMed Central and SciELO.

Discussions of additional JATS applications have evolved since 2012. Frameworks have been established to encompass additional keyword descriptors from multiple sources and flagging them within the full-body of the articles and to assign a relevance-based frequency count to these keywords within the metadata fields. Further enhancements of JATS could span to defining additional roles for content creators aside from simply distinguishing between authors, editors, and reviewers. The Contributor Roles Taxonomy (CRediT), developed by CASRAI, aims to add additional role-types to the JATS standard to expose data curators, software-used, methodologies, supervisors, and funding sources along with authors, reviewers, and editors.

The growth potential for JATS is immense. Projects to assign unique identifiers for individual contributors, such as ORCID, have begun to develop within the past few years. Since authors may write or appear within multiple journal articles, news articles, or conference proceedings, archives and repositories must accurately assign individuals to each one of their contributed papers. However, since authors share names, locations, and backgrounds, the importance for using a single identifier code to disambiguate authors is entirely more relevant now than previous years.


Content requires structure. Content regarding emerging scientific fields of study, new medical advancements, and solutions for engineering and design woes requires immense amount of discoverability and ease of access. While converting older articles into newer formats may be a hassle for time and resources, publishers must account for changes made to their content within the next decade. Reformatting content into an interchangeable and interoperable format is the only method for success in sharing, hosting, and providing content to end users.


NISO JATS DTD v1.0 is the formal technical specification of the US-based NISO Z39.96 2012-08-22. Discussions have begun for another revision of the specification NISO JATS v1.1.

JATS-CON is the central conference for those implementing JATS or for those who wish to know more about the standard.


Jack Bruce, Senior Taxonomist
Access Innovations

In Defense of Taxonomies: In Response to the Recent Scholarly Kitchen Posts about Google Scholar, Indexing, and Content Findability

October 19, 2015  
Posted in Access Insights, Featured, indexing

Several interesting points were raised over the course of the two posts — and, notably, in the resulting comments featuring  Anurag Acharya — by John Sack about Google Scholar.

Google Scholar is a wonderful tool and resource, and it is not the goal here to disparage or otherwise belittle its importance or contribution to research. But some of the observations and conclusions are confusing — especially as regards the utility of taxonomic indexing vs. the sort of broad indexing Google Scholar has implemented.

Many scholarly and other society publishers have, as Bruce Gossett pointed out in his comment, invested considerable time, effort, and money to build bespoke taxonomies/thesauri to index the specific corpus of their content. It’s misleading to insinuate (per Anurag’s response to that comment) that this is a wasted effort on their part.

1)    I don’t know what taxonomy Anurag is thinking about:

“Taxonomies are often too broad for answering user queries. User queries are usually more specific than taxonomy terms/labels. Full-text matching & ranking matches user expectations better and usually goes a long way towards returning useful results.”

…but scholarly associations often have taxonomies of 3,000-10,000 terms or more — extremely granular subject terms designed specifically to cover their content. Since Google Scholar indexes content from every field, any robust subject-specific thesaurus is almost guaranteed to be more granular with regard to the discipline in question than a generic indexing can provide.

Whether Google Scholar can find a way to leverage this indexing is another matter.

2)    Since we don’t know what Google Scholar is using to “index” the papers, it’s very hard to argue that the indexing is “better” than that done with the bespoke thesaurus of a scholarly publisher.

The information at this link …is not very helpful from an indexing perspective.

One suspects that it’s literally a very large inverted index simply using words that appear in the text — with no synonymy or disambiguation (the two lynchpins of good subject categorization). This is subject indexing 101: it’s not the words that are important, it’s the concepts being expressed.

This cannot be stressed enough.

Consider the following two searches for (what are indisputably) the same concept:


aluminiumAny thesaurus would equate “aluminum” and “aluminium” (the latter of which is the common British spelling; the former the American) and return results for both searches.

3)    It’s also hard to argue that, absent any kind of surfaced indexing/subject browse and disambiguation, Google Scholar’s indexing is always helpful.


So…a search for “mercury” (never mind the absence of any kind of disambiguation: what am I looking for? Planets? Silvery metallic substances? Automobiles? a Roman God?) yields over 2.2 million results (“finding is easy!”) to look through — the “most relevant” of which is from 1969?  (Based on what? Frequency?) Note, also, that two of the top four results are for a visualization tool called “Mercury” (apparently used for the analysis of crystals).

Naturally, there are advanced search options available in Google Scholar to further curate this result set. But the lack of synonymy and disambiguation persists through Advanced Search as well.

Even simple singular/plural pairs yield different results, which is distressing:


This is a bit distressing. Is there no NLP in the background?  Are literally only the words that occur being indexed?

4)    Uncontrolled keywords are, basically, useless metadata from an information science perspective. Author-supplied keywords are notoriously inconsistent; further, even a “helpful” keyword considered in the context of a particular discipline, becomes ambiguous and unusable in another (see example below). It’s not clear to what extent Google Scholar uses or ignores these keywords, but they seem to come up in searches.


From an information science perspective, this is poor practice.  Keywords — unless well-mapped to a central taxonomy of some kind — should be the last thing considered for search indexing (after title, abstract, full text, etc.).


Again, the goal here is not to disparage Google Scholar — but rather to point out the extreme importance of discipline-specific (and, more importantly, content-set specific) taxonomies (and thesauri, ontologies, authority files, etc.) constructed to index specific bodies of content.

That Google Scholar chooses not to map or leverage these important vocabularies is not an indication that this work is fruitless; on the contrary, perhaps the most useful activity Google Scholar could do with regard to indexing would be to gather and map the taxonomies from various large scholarly publishers (to a central ontology? or some other structure?) and leverage them to deliver more focused search results.

If indeed “search is the new browse” we need to have something fewer than 2.2 million results to cultivate — unless we’re all granted a limitless supply of research assistants.

Bob Kasenchak, Director of Business Development
Access Innovations

Difficulties in the Classification of Musical Genres

October 12, 2015  
Posted in Access Insights, Featured

In the world of classification, there may be no subject more hotly debated by people not in the taxonomy world as music genres. As we recently read about video games, genre classification helps consumers find something they might like. Most forms of entertainment are relatively easy to classify, “action” or “role playing” for video games; “comedy” or “horror” for movies.

Music is treated quite differently. At its highest level, these classifications act much the same: “country,” “rock,” “metal,” or what have you. But unlike other forms of entertainment, those terms don’t actually describe a whole lot about that particular style. Music lovers look far more granularly into subgenres to better understand what to expect when they play that record for the first time.

Even then, there is little agreement as to where a particular piece of music might be classified; fans are especially quick to hop into a semantic debate on the subject. To that end, we asked standout musician and part of our programming team Allex Lyons to lay out some of these subgenres and lay out a few of their characteristics. One can easily see how certain characteristics can make up one of the genres, but looking at the list, it amazes me just how diverse other characteristics are under a single genre…no wonder there is such a debate.



Classic – heavy reliance on steel dobro guitar, violins, heavy twang in vocals, guitars seldom vary from clean sound

Pop – heavy use of violin, occasional steel guitar, slight twang, guitars use distortion to more emulate pop rock

Bluegrass – can use banjo and/or violin, often at a fast-paced tempo

Bro – drums emulate dance beat, some violin, heavy use of autotune in vocals



Pop – guitar distortion kept to minimum, can use synths, usually strives for catchy hooks and lyrics

Southern – guitar uses bluesy dirty sound, slight rasp in vocals, lyrics vary from melancholy to pridefulness

Punk – dirty guitars, simplistic chords and lyrics, songs usually short in length

Prog – can rely heavily on synths, long solos and long songs, cryptic lyrics

Grunge – dirty guitars with heavy distortion, often played in mixolyidan key signatures, wailing vocals, anti-establishment, sometimes cryptic lyrics

Glam – distorted guitars, lyrics usually highly sexual, fast solos, heavy emphasis on catchy riffs, clothing and hairstyles

Funk – Uses blues ryhthms and chords with clean guitars, sometimes wah, medium tempos with heavy bass, avoids triplet rhythms



Thrash – fast-paced, heavy distortion, screaming lyrics

Nu – screaming lyrics, reliance on drop-D tuning or 7-string guitars for muddier sound, tempo can vary



Classic – uses instrumentation/some sampling, PG-rated lyrics

Gangsta – minimal instrumentation, lyrics often cover harshness of urban life, heavy use of expletives

Crunk – minimal instrumentation, lyrics often sexual or materialistic



Disco – heavy use of strings and bass, medium tempos

Techno – heavy use of synths and electronic instrumentation, heavy use of autotune

Swing – heavy reliance on complete horn section, often played in triplet rhythm

East-Coast-Swing – uses most hallmarks of swing music but at more relaxed tempos



Cuban — heavy reliance on drums, highly syncopated rhythms, can use horn sections

Mexican — mostly fast-paced, guitar rhythms on downbeat, heavy use of horn sections, often accordion



Classic – slow tempos, guitar rhythms on downbeat, laid-back vocals

Dance Hall – medium tempos, lyrics sung fast

Ska – fast tempos, reliance on horn sections

While this list is highly interesting and informative, it doesn’t even scratch the surface of the entirety of subgenres out there. Techno, for instance, has at least a dozen subgenres that have their own classification and it gets even more granular below that. There have even been algorithms written to attempt automatic genre classification (to varying degrees of success).

While it can be easy and even enjoyable to go down the rabbit hole with trying to get every piece of music classified just so and for definable reasons, there really isn’t a point. The second one were to accomplish this fool’s goal, somebody would release an album that defies that nifty classification. You may not have to start all over, but you’d never finish. Plus, you’d also have no time left to actually listen to the music you’ve so meticulously categorized. Where’s the fun in that?

Allexander Lyons, Programmer
Access Innovations

Daryl Loomis, Business Development
Access Innovations

Access Innovations and Access Integrity Complete ICD-10 Readiness Tool

October 5, 2015  
Posted in Access Insights, Featured, indexing

Access Innovations, Inc. is pleased to announce the delivery of a comprehensive ICD-10 readiness tool for their Access Integrity division.

Access Integrity was formed in 2011 to leverage Access Innovations’ award-winning, patented Data Harmony® software in the health care industry. Using their long-established semantic enrichment toolset, the technology automatically reads patient encounter notes to deliver highly relevant code suggestions to health care providers, administrators, billers, and coders.

This application, known as IntegraCoder, is a web-based solution that connects the semantic analysis technology of Data Harmony with the exhaustive coding and revenue cycle/denial management resources of coding experts Find-A-Code. IntegraCoder analyzes content in electronic medical records (EMRs) and provides highly relevant diagnosis and procedure suggestions. Its indexing system recognizes key concepts within an EMR and delivers suggested codes for users to select from, which increases accuracy in clinical documentation.

ICD-9, the accepted diagnostic coding standard in the United States since 1978, utilized approximately 14,000 codes, while the updated ICD-10 standard expands this number to more than 68,000. This has sent ripples throughout the health care industry, but Access Integrity stands ready to handle the logistical challenge

“We had previously established ourselves with the success of our ICD-9, CPT, and HCPCS coding tools,” commented Access Integrity CEO John Kuranz, “but with the October 1st changeover to ICD-10, we have addressed a huge change in the industry and a very real fear. We have worked very hard to get this ready for launch and are excited to show how much it will enhance clinical documentation and aid in the billing and coding process.”

“It was a great challenge to tackle this enormous codeset,” states Access Innovations Head of Production, Win Hansen. “This is a very complex project and our team enjoyed the challenge, and we’re proud of what we accomplished. It stretched our imaginations, but we stayed flexible and it shows in the results. Solving these kinds of problems makes Data Harmony continue to improve and evolve in the marketplace and what makes it fun to work at Access Innovations.”

IntegraCoder is available as an integration in a number of top EMR companies, as well as its use as a coding aide with Find-A-Code, known as Code-A-Note. For more information, visit their website at


About Access Innovations, Inc. –,,

Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.


About Access Integrity –,,
Access Integrity provides a patented technology for complete and compliant EMR analysis. Access Integrity plays an important role in medical transaction processing by extracting rule based relevant data (concept extractor) from medical records, increasing coding accuracy, clinical decision support, and overall understanding of a patient encounter. Access Integrity is the first company to employ Data Harmony’s semantic enrichment and rule-based concept extraction technology in the healthcare industry. The award-winning and world-renowned Data Harmony software suite has been used in the content management and information technology industries for more than 15 years.


About Find-A-Code, LLC

Find-A-Code, LLC is dedicated to providing the most complete medical coding and billing resource library available anywhere. Find-A-Code’s online libraries include extensive information for all major code sets (ICD-9, CPT®, HCPCS, DRG, APC, NDC, ICD-10 and more) along with a wealth of supplemental information such as newsletters and manuals (AHA Coding Clinic®, AMA CPT Assistant, DH Newsletters, Medicare Manuals). All code information and newsletter databases are indexed, searchable and organized for quick access and extensively cross-referenced. Find-A-Code also provides tools for code set translation (such as ICD-9 to ICD-10), code validation (edits) and claim scrubbing.

Classifying Video Games to Best Inform Their Audience

September 28, 2015  
Posted in Access Insights, Featured, Taxonomy

Trying to decide what video games to buy and play can be tricky; game classification is one way to better aid consumers in making those choices. Most stores classify games by the console on which you can play. It’s more common online to see games classified by the content of the game, rather than the media it can be played on, but the content some games can be hard to classify based on the content.

Game classification is important for another reason: the cost of videogames is relatively high compared to other forms of entertainment, at least up front. Any gamer will tell you that, for their favorite games, the $60-70 spent was a great investment. However, if you spend that money on a game you end up not liking and you go to return that game, you’ll find that you only get back a fraction of that price. This makes initial impressions very important for consumers, and classification is just one of those impressions.

Let’s look at a popular game this year, Rocket League, for an example.


Rocket League is a game that resembles soccer, but the players are in rocket-powered cars. There is a ball and two goal areas, and teams can range from four to just one person. This game would be classified by some as a sports game, due to the objectives. However, because the players ride around in vehicles and have many customizing options for those vehicles, some would classify the game as a racing-type game. Which of these is correct? Both is clearly the answer, but it can lead to some confusion when choosing how to present the game’s content to a potential customer.

If you decided to classify this game as a sports game, you might scare away consumers who never purchase or play sports games. The same could be said if you’d classified this as a racing game. General classifications, such as “action games” could be used here, but the less specific you get, the less interesting the game may sound. The game is sometimes referred to as a Demolition Derby type of game, but that is the opposite end of the spectrum: far too specific to cater to many types of players.

Another recent blockbuster game, Destiny, tends to defy classification because it combines two very popular types of game: First-person shooters (FPSs) and Massively Multiplayer Online Role Playing Games (MMORPGs). It is a game based in space and the objectives basically involve shooting aliens. However, this game also includes many features known as staples in MMORPGs such as character customization, shared-world experiences with many other players, and “loot drops” (in-game currency and equipment) for completing tasks.


This game in particular is divided by players who come from MMORPGs and players who come from FPS games. The MMO players often enjoy the customization and raid activities, while the FPS players stay for the smooth shooting mechanics and a large variety of cool weapons to choose from. Because of this, Destiny developers have tried not to define the game as a shooter or a role-playing game, but rather a “shared-world shooter”. This classification helps to bring in players from many different backgrounds, rather than exclude an entire subset of people who enjoy a certain type of videogame.

Videogames are often also classified by the intensity of the content, or by ratings. The Entertainment Software Ratings Board (ESRB) is responsible for rating all games in North America and placing restrictions on games with higher ratings.

Ratings for younger audiences allows parents of gamers to choose what content they want to expose to their children. These games are fun for children and often include educational benefits, as well. Games with ratings of M (Mature 17+) are restricted to young adults and cannot be purchased by children without their parents present. These games usually include violence, strong language, or sexual content. Classifying games in this manner provides customers with feedback on the content of the games and serve to limit exposure to certain age groups.

Trying to classify videogames can be very difficult for retailers, game developers, and publishers but is necessary in order to properly sell games to customers. There are many games out there that appeal to a very wide audience, and many more that appeal to a small subset of gamers. Games, like all entertainment media, must present an initial impression that grabs a customer and compels them to buy and classification is one aspect of that impression.

Samantha Lewis, Taxonomist
Access Innovations

Choosing Terms for a Taxonomy or Other Controlled Vocabulary

September 21, 2015  
Posted in Access Insights, Featured, Term lists

choosing Original source: First Edition of the Oxford English Dictionary. Image redrawn by User:DavidPKendal after the diagram by James Murray, first editor of the OED.


Recently, we’ve looked at choosing controlled vocabulary terms. More specifically, we’ve considered related terms, choosing non-preferred terms, and choosing broader and narrower terms.  In this final installment of our “choosing terms” series, let’s broaden our scope to the task of choosing terms for inclusion in a vocabulary. And once again, let’s consult our usual trusty guide. That would, of course, be the Z39.19 standard (ANSI/NISI Z39.19-2005 (R2010)), “Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies”.

Yes, there it is. Section 6.1 of Z39.19 covers “Choice of Terms”. But wait. It’s only two inches long. There must be more to choosing terms than that. In fact, the first paragraph states, “Many issues need to be considered in selecting terms for a controlled vocabulary.” While that’s absolutely true, could this be a cop-out by our trusty guide?

And then, there they are — the cross-references:

  • The information space or domain to which the vocabulary will be applied (section 11.1.1)
  • Literary, user, and organizational warrant (section 5.3.5)
  • Specificity or granularity of the terms (section 11.1.7)
  • Relationship with other, related controlled vocabularies (section 10.9)

Let’s identify and examine the relevant passages from each of these sections.

The first cross-reference, regarding “The information space or domain to which the vocabulary will be applied”, is to section 11.1.1. Strangely enough, that section is headed “Avoid Duplicating Existing Vocabularies”. There doesn’t seem to be anything in that section that’s directly relevant to our topic. Anyway, most of us know (or can guess) that a controlled vocabulary for a particular subject area domain should include the terminology of that domain, and perhaps some of the terminology of peripheral subject areas, and not go too far astray from the core subject areas.

Looking elsewhere, we find that section 6.6.1 (Usage) has some advice relevant to vocabulary domains vis-à-vis term selection: “Terms should reflect the usage of people familiar with the domain of the controlled vocabulary as reflected in literary, organizational, and user warrant (see section 5.3.5).

Coincidentally and conveniently, the next cross-reference listed above also tells us to see section 5.3.5 regarding “literary, user, and organizational warrant”. There, we find some important advice that’s directly relevant to our topic:

“The process of selecting terms for inclusion in controlled vocabularies involves consulting various sources of words and phrases as well as criteria based on:

  • the natural language used to describe content objects (literary warrant),
  • the natural language of users, (user warrant), and
  • the needs and priorities of the organization (organizational warrant).”

The subsequent subsections go into a bit more detail. Additionally, going back to section 6.6.1, we find that much of the discussion on usage has to do with literary, user, and organizational warrant.

The next cross-reference, to section 11.1.7 (Levels of Specificity), has to do with “specificity or granularity of the terms”. The main piece of advice there is as follows: “The addition of highly specific terms is usually restricted to the core area of the subject field covered by a controlled vocabulary because the proliferation of such terms in fringe areas is likely to lead to a controlled vocabulary that is difficult to manage.” There are other considerations that are not mentioned there, such as the degree of specificity needed to properly index and search for content that is associated with the vocabulary.

The final cross-reference, to section 10.9, is regarding “Relationship with other, related controlled vocabularies”. The title for section 10.9 is “Storage and Maintenance of Relationships among Terms in Multiple Vocabularies”. Much of this section has to do with mapping between vocabularies. “This option requires designating one controlled vocabulary as the master with others as subsidiaries. The goal is to map the terminologies of the various controlled vocabularies to be included against a common classification scheme.” I think that where the term choice element comes into play here is making sure that the “common classification scheme” is complete enough to encompass the concepts represented by all the terms in all the vocabularies.

Mapping does not necessarily involve a subsidiary vocabulary per se. It could involve selected portions of a vocabulary. And the completeness aspect could involve considerations for making linked data effective. Here’s a tiny case study illustrating the need for adding a term to accommodate both:

“An example of the editorial work needed to create truly linked data is the process of mapping the implied conceptual links to actual links. For instance, the nationality/culture controlled list within ULAN should now map to terms in the AAT. While much mapping could be done through algorithms, comparing the ULAN nationality term to AAT terms, it had to be vetted by the editorial staff. Where “East German” was a historical nationality in the ULAN list, it did not exist in the AAT; the term was added to the AAT so that the link could be made.” (Patricia Harpring, “Linked Open Data in the Cultural Heritage World: Issues for Information Creators and Users.” Post on the Council on Library and Information Resources website, March 20, 2014. Permalink:

There are many other factors to be considered, such as clarity and lack of ambiguity. Those factors, plus the factors mentioned above, along with a goodly amount of common sense, should provide a good foundation for choosing vocabulary terms well.

Barbara Gilles, Communicator

Next Page »