Emojis: Communication without Words

March 16, 2015  
Posted in Access Insights, Featured, Technology

Each year since 2000, the Global Language Monitor has selected the Top Words of the Year, which they derive through statistical analysis of word usage. Some may have thought the organization had gone insane this year when they announced the selections for 2014, because at the very top of the list was not a word at all, but the heart emoji. It’s true; a tiny cartoon heart got used so often last year that it supplanted all actual words.

People might shake their heads at that fact and lament what happened to language with kids these days, but it’s how people communicate online, which is basically how people communicate at all anymore, so it’s well worth looking into how and why this has happened.

No matter how one might personally feel about it, there’s no denying the rampant popularity of emojis. They are more commonly used on Twitter than the digit 5, and the single most popular emoji is more commonly used than the tilde. Those facts are crazy to me, and text analytics company Luminoso has compiled even more. Emojis have taken over at lightning speed and there’s no stopping them, so we might as well start trying to find the meaning in them.

As I discussed previously, I have found myself fascinated the last few weeks after I discovered emojitracker.com. By making use of Twitter’s streaming API, it tracks emoji use across the globe in real time. Though it only uses Twitter and not all the other places where the characters are used, the numbers are still mind-blowing.  The most popular character, “face with tears of joy,” has been used more than 632 million times since the site opened and it, along with the others at the top of the list, increases at an extraordinary pace.

It’s not just a flood of numbers, either. You can click on each icon to see a feed of the tweets that the emojis were used on, as well as see those results in JSON markup language; this is the stuff that I find highly interesting. The feeds for the top ranked emojis move far too quickly to understand anything by the naked eye, but there are things to look at in some of the less popular ones.

Take, for instance, the emoji labelled “Pedestrian,” which is simply a man walking. Oddly, of the 16 million times this symbol has been tweeted, nearly every one is in Arabic. Why are they all walking? To see this stuff with the eye, one has to wade through so much material that it would be simply too daunting to actually find larger meaning in any of this.

Computers could easily parse it all out, though. The trouble is that, while it’s interesting to see the data stream, nothing is really being done with it. Despite the fact that it’s already an example of linked data, there is next to no analysis. That site lives in a vacuum, but emoji usage doesn’t. It grows and evolves more rapidly than text language does, and people from different cultures and groups assign their own meanings to single characters and groups.

Yet, in spite of that evolution that, for whatever reason, makes the pedestrian symbol appealing to Arabic speakers, emojis also have somewhat universally defined meanings that make actual communication possible. The Wall Street Journal allows you to translate their headlines into emojis and, though you sometimes have to stretch a bit, it’s pretty easy to see where they’re coming from. Likewise, in a far more absurd example, Herman Melville’s classic Moby Dick has been turned into emoji. Of course, all the deep contextual and literary meaning will be lost in translation, so to speak, but if the words can be communicated in any kind of comprehensible fashion, that’s pretty impressive, if rather pointless.

The problem with all of this from a semantics perspective is that if the meaning does continue to evolve, how could one possibly analyze the data in a meaningful way? Were one to get a comprehensible result today, would they get that same result later? It’s important in semantic analysis to get provable, repeatable results. You can’t see patterns in data when the rules keep changing.

Like it or not, this is the way people communicate today and, whether or not I think emojis are a lasting phenomenon or will be an enduring part of language (I don’t), they don’t seem to be a product of laziness. Instead, they are about speed and clarity of communication. If we can express a complex emotion like love using one symbol rather than many, people are going to gravitate toward it, just like they gravitated toward texting and Twitter rather than tedious old email.

Words and their meanings are always in flux, just very slowly. The difference between our current English and Geoffrey Chaucer’s is massive, but it happened over six centuries. Still, the language is comprehensible without translation. The meaning of emojis may change at a faster pace, but their meanings are still being communicated to people around the world, regardless of language or cultural barriers. To me, that alone is reason enough to want a much deeper understanding of how they’re being used.

Daryl Loomis
Access Innovations

Access Innovations Named in KMWorld’s Annual “100 Companies That Matter in Knowledge Management”

March 9, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc., a leader in digital data organization, is pleased to announce its inclusion on KMWorld’s annual list of the “Top 100 Companies That Matter in Knowledge Management.”

Access Innovations is featured for its fourth year after first debuting on the list in 2009. Other notable companies given a spot on 2015’s top 100 list include Oracle, Google, IBM, and Microsoft.

“The criteria for inclusion on the list vary, but each of those listed have things in common. Each has either helped to create a market, redefine it, enhance or extend it,” said Hugh McKellar, KMWorld Editor-in-Chief. “They all share a fundamental motivation to innovatively meet and anticipate the widely diverse needs of customers with robust solutions to meet evolving customer requirements challenges.”

Marjorie M.K. Hlava, president of Access Innovations, is honored that the company is included on the list. “Access Innovations enjoys  driving technology to face new challenges in knowledge management,” she says. “It’s stimulating and rewarding to be leaders in knowledge management, and it’s delightful to be recognized as a leader in the field. Making content findable for our customers and their users is and always will be our top priority.”

The annual Top 100 Companies That Matter list is compiled by editorial colleagues, analysts, theorists, and practitioners and, unlike many other trade lists, inclusion is not purchased and is at the sole discretion of KMWorld’s editors.

For a full list of the Top 100 Companies That Matter in Knowledge Management, pick up the March issue of KMWorld, available at newsstands now, or visit the following link to view the article online: http://www.kmworld.com/Articles/Editorial/Features/KMWorld-100-COMPANIES-That-Matter-in-Knowledge-Management-102189.aspx

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

About KMWorld

The leading information provider serving the knowledge, document, and content management systems market, KMWorld informs more than 45,000 subscribers about the components and processes—and subsequent success stories—that together offer solutions for improving business performance. KMWorld is a publishing unit of Information Today, Inc.

Data Analysis in a Standards-Challenged World

We deal pretty heavily around here in words, what they mean, and how they’re used. It should go without saying, but it’s a fundamental part of what we do and is what makes us so concerned with standards, in both taxonomies and the written word. The two go hand in hand; it’s a whole lot easier to have one when the other is compliant.

Academic publishing, which we deal with the most, and most publishing in general, is pretty good about standards, and so we’re able to easily go in and build a taxonomy or mine the content for data analysis. There has been plenty of talk about how useful that enriched content can be in regard to linked open data, direct consumer advertising, and all that. It’s all well and good, but in places where there aren’t standards, it’s a whole lot more difficult to deal with, at least in a semantic sense.

Only a short time ago, nearly all disseminated content would go through some kind of editing process to make sure that simple things like spelling, grammar, and syntax were correct, but also to be sure that it complied with appropriate standards. Only once that process was complete would the public see anything, at least for the most part. This, of course, made for perfectly readable and understandable content, assuming you were familiar with the language and the jargon.

Then the Internet happened. Now, poorly constructed text is the norm and standards have gone out the window. Much of the Internet is about speed of delivery, and content is given a cursory edit, at best. Compliance with standards and delivery speed rarely make good bedfellows.

I’m not here to argue the rightness of accuracy over speed; the change has happened and there’s no going back. Blogging and especially social media have blown up the old ways. This is just how people communicate today and, to me, this is content that begging for analysis.

But how does one even begin? The one who worries about grammar or syntax in a tweet is a rare beast indeed, and that doesn’t even take into account how things are spelled. Multiple Z’s instead of a single S, fifteen O’s when writing “love,” numbers in place of letters, all sorts of ridiculous things.

Add into that the multitude of languages that people use online, along with widespread disregard for traditional spelling and grammar, and the number explodes. However, because more people from more cultures are communicating with one another, it seems even more important to find a way to be able to control and structure all this data that we have for the same reason that we structure vocabularies for scholarly publishing: quick and easy search.

In theory, that’s what tags on blogs and hashtags on social media are for, but when anybody can come up with their own tags, it’s plain chaos. This anarchy is something that has no place in an information realm that requires at least some degree of standardization. Tags might never be completely standardized, but a system of organizing them into broad concepts may be a solution. There will always be things like #janetiswinning, #imeatingdinner or whatever, so noise is going to be inevitable, but some kind of broader classification could help people find what they’re looking for, given how much new content is produced every hour of every day from every corner of the world.

That noise will always exist, but we can’t dismiss it all as trash. Communicating through social media has become too important a part of our lives to pretend that there’s no value in at least some of the millions of tweets, Facebook posts, Instagrams, Vines, and all the rest. If there’s no value from a scholarly standpoint, there still is from anthropological and political ones, and the power that marketing gains using this kind of data analysis is abundantly clear.

This is conceptually pretty simple when we’re talking about data in the form of text, whether that’s a post or a hashtag. They’re all words, after all, even if they’re spelled tragically wrong. What about things that aren’t words, but still convey concepts? Instagram and Vine are currently two of the fastest growing social media sites and, though they use hashtags, they deliver content visually.

And then there’s the whole new issue of emojis. That might seem like a small thing at first, but they aren’t necessarily used at random, and some are used very specifically. An additional wrinkle with these is that they communicate meaning across languages. It seems to have huge potential for analysis, but is effective analysis even possible given the amount of noise?

I think that the answer is almost certainly yes. This kind of data is too valuable not to mine, especially when the technologies for doing so are already being developed for other purposes. For text, there are developments in sentiment analysis that have already been implemented to analyze social media for political campaigns, and its uses are only going to evolve. Less has been done on this level for imagery, but if a computer algorithm can be built that can accurately identify Jackson Pollock paintings and if a self-driving car can determine spatial proximity and object identification in real time, certainly the potential exists for use in social media. Almost nothing has been done with analyzing emoji use, though there is Emojitracker, which absolutely fascinates me (and I will write about at more length in a later post).

We used to communicate almost exclusively by the written word, but now the technology exists to communicate meaning in a large number of ways. Shouldn’t we analyze and study that meaning? I don’t have answers to the questions, but the more we explore these new realms, it seems like time to start thinking about semantics in a slightly broader way. Standards are important and I’m all for them. But I’m also all for people communicating with each other. The least we can do, as people who work in semantics, is to try to find ways to see meaning in the content of social media, even if it seems like a great bog of nonsense some of the time.

Daryl Loomis
Access Innovations

Making Connections

February 23, 2015  
Posted in Access Insights, Featured

The 11th annual Data Harmony Users Group (DHUG) meeting just wrapped up, and as we reflect on the week-long event, making connections is the unofficial theme that emerged. This makes perfect sense, because the benefit that our attendees mention most often is the networking opportunity – the connections we make with colleagues – during the DHUG meetings.

This year we asked our attendees to fill out a short survey about the meeting. I have included some of their responses in the following list of connections that the DHUG meetings make or explain.

  • We meet and connect with other people who are doing what we do. One attendee said “Networking is an important aspect of the meeting” and “Excellent opportunity to establish and deepen networks for later follow up.”
  • We connect with people who can help overcome challenges that we are facing – because they are facing similar challenges. The most valuable thing on the agenda for one attendee? “Case studies. These are the main reason I come. Love seeing what other folks are doing.”
  • We connect ideas and find answers.
  • We get new ideas for additional things we can do with our data or systems.
  • We connect our users’ viewpoints with the functionality of the software and the services we offer – we explain not only WHAT the software does, but also HOW it works and WHY we designed it that way.
  • We connect authors with editors and peer reviewers.
  • We connect content to other related content.
  • We connect related concepts – not just the words that appear, but the meanings of those words.

Here are a few other comments taken from our survey responses:

“Really enjoyed it – right balance of updates and networking opportunities”

“Really well organized event with delightful people – thank you!”

“I think the range of topics and presentations was good. It’s good to have exposure to the variety of subjects.”

“Hearing from a researcher was great, implementing stories and creative uses also great”

“I honestly like [the meeting] the way it is”

If you missed the DHUG meeting, consider joining us in 2016! We have tentatively scheduled the 12th Annual DHUG meeting for February 22-26, 2016. Marjorie M.K. Hlava’s annual features update will kick off the core part of the meeting on February 23, and will be followed by case studies from our users on February 23 and 24. February 22, 25, and 26 are optional days reserved for hands-on software training and one-on-one meetings.

Watch TaxoDiary for an official announcement and further information.

Heather Kotula, Director of Communications
Access Innovations

Win Hansen Named Production Manager at Access Innovations, Inc.

February 16, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc. is pleased to announce another big change to its corporate structure, a move that will streamline workflow and improve the efficiency of Access Innovations’ client projects.

Win Hansen has now been moved into his new role as production manager. Since starting at Access Innovations in 2009, Win has performed myriad tasks for the company and has learned every aspect of the business. This makes him uniquely suited to the wide range of tasks a production manager is required to perform. His versatility, taxonomy building expertise, and people management skills make him the perfect person for the role. Margie Hlava, President of Access Innovations, stated, “Win has been one of our most flexible and versatile employees for some time. He’s willing to get his hands dirty with any project, no matter how strange, with all of his enthusiasm and effort. He is a valuable asset to the Access Innovations family and we are all thrilled with his promotion to Production Manager.”

Win remarks, “I am very excited about this great opportunity. While I know there will be a lot to learn, I am certain that I am more than ready for what’s in store for me and I’m prepared, through this promotion, to help take Access Innovations into the future.”

Win started as a taxonomist at Access Innovations and, later, served as office manager and the company’s graphic designer. He has been involved in projects of all kinds, from taxonomy development to animation and more. Win has led projects for Triumph Learning, the American Society of Civil Engineers (ASCE), JSTOR, the American Institute for Physics (AIP), Harvard Business Publishing, and many more.

Win attended the University of New Mexico, where he earned degrees in history, religion, and art, and he vigorously continues in the classroom to this day. His interests include ceramics, photography, travel, chicken breeding, and beekeeping.

 

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

A Newbie’s Guide to DHUG, Part 2

February 9, 2015  
Posted in Access Insights, Featured

It’s hard to believe that the Data Harmony Users Group Meeting starts a week from today. If I thought things were buzzing in the office a couple of weeks ago, I didn’t know what was in store for me. And since there’s so much to do, it’s definitely at the front of my mind.

The last time I wrote about DHUG 2015, I decided to focus on a few of the featured talks that really stuck out to me as particularly suited to my interests. However, the week is filled with promising and intriguing presentations, from both our clients and on the home front.

It starts from the very top with our own Margie Hlava, who kicks off DHUG 2015 with “Taxonomy 101: Fundamentals, Construction, and Application,” where she’s going to start at the beginning and walk the attendees through the whole reasoning behind taxonomies and how they can effectively be used. As a taxonomist who’s still fairly wet behind the ears, this is the kind of thing that can make a big difference for somebody like me.

With my knowledge base hopefully somewhat more beefed up, I’ll be able to hit the rest of the week running. Last time, I wrote about Helen Atkins from the Public Library of Science (PLOS), who will discuss their Fate Predictor Project. Well, we have another speaker from PLOS, as well: Jonas Dupuich. His case study, “Using MAI and the PLOS Thesaurus for Matching Activities,” will look at how they have leveraged Data Harmony’s semantic enrichment capabilities to match authors with peer reviewers based on subject matter. This speeds up the peer review process, but it also has clear relevance outside of that process.

An employer, using Data Harmony in this manner, could collect information on the skill sets of their employees (hopefully, only for good). Suddenly a strange new project comes up and the employer has to assemble teams of people with very specific skills. No more, “Hey, anybody know somebody who knows how to write technical documentation while playing water polo?” That information is right there to sift through. It not only makes searching for people faster and easier, it allows connections to be made that might otherwise get missed.

Next, we’re getting to the heart of what we do at Access Innovation with Audrey Glowacki of the National Institute for Occupational Safety and Health (NIOSH). Her talk, “Development and Implementation of the Site Browser: Faceted Navigation Tool for Browsing NIOSH Mining Web Site Content,” gets to the nuts and bolts of what we do. NIOSH has been a Data Harmony user for a long time. We first built a custom mining safety and health thesaurus. Next, a custom web content management system (WCMS) was developed that allows users to build custom web pages. Their feedback has been very positive, and I look forward to hearing more about how they are using the software that I myself use.

Finally, we recently got a surprise guest speaker who looks to be giving a pretty interesting talk. He’s Paul G. Kotula, an award-winning author, researcher, and peer reviewer who works in the Materials Characterization department at Sandia National Laboratories. A Google Scholar search for his name reveals over 1600 results as an author, co-author, and in citations. He knows his stuff and he knows how our clients use the content that our software enriches.

His talk, “Six Months of Work in the Lab Will Save You Half a Day in the Library or 30 Minutes Online,” will explain some of this, which should prove useful to the DHUG audience, as well as us here at Access Innovations. He’s addressing a few specific points about how people in his field use information. How do researchers use collections? What do authors of scientific papers think about the publishing and peer review processes? What sort of resources do they use and why do they think they are the best or most reliable?

Answers to these sorts of questions are the type of things that allow us to better serve our clients, or how to better relay the message of how much our software helps researchers, authors, and peer reviewers alike. It’s a real scientist talking about his real needs as a researcher and sharing his firsthand experience from the side of publishing that we often hear too little from: the authors.

With all of these talks, the training that I’m sure will teach me as much as anyone, and of course the catering, it’s going to be a week to remember. This week is going to fly by in anticipation, but I’m sure next week will go by even faster.

Daryl Loomis
Access Innovations

Groundhog Day: Names and Recursions

February 2, 2015  
Posted in Access Insights, Featured, Taxonomy

I’m sure you’re all just like me and waiting anxiously to hear the results from Punxsutawney, Pennsylvania, whence this very day we will find out from Punxsy Phil whether spring will come early this year or we have to wait six more weeks (pro tip: In the Northern Hemisphere, it’s always going to fall on March 20th or 21st).  As ridiculous as the holiday might seem to some of us, though, there are things about groundhogs and Groundhog Day that are pretty interesting.

IMG_0959

Photo, Aaron Silvers, http://www.flickr.com/photos/silvers/24543841/ / CC BY-SA 2.0

Firstly, nobody seems capable of agreeing on what the rodent is called. The holiday would suggest that groundhog is the accepted term, but growing up, I always knew them as woodchucks. And there’s the well-known tongue twister (“How much wood would a woodchuck chuck if a woodchuck could chuck wood?”), which lends credence to its status as the accepted term. But depending on where one resides, the critter is also known as land-beaver, land-squirrel, rock chuck, pasture pig, and my personal favorite: whistle-pig. Some also call it a marmot, but that’s really a broader classification of the genus to which the groundhog belongs (Latin name: Marmota monax). All groundhogs are marmots, but not all marmots are groundhogs, which is plain old Taxonomy 101.

Kingdom: Animalia
Phylum: Chordata
Class: Mammalia
Order: Rodentia
Family: Sciuridae
Genus: Marmota
Species: M. monax

While there are plenty of names for the animal writ large, there are also more celebrity groundhogs than you may be aware; although Punxsy Phil is the most prominent, plenty of states have them. Georgia boasts General Beauregard Lee; Ohio, Buckeye Lee; North Carolina celebrates Groundhog Day with Sir Walter Wally; and Alabama holds Smith Lake Jake to be the true authority on winter’s end. Montana has three: Warren Whitefish, Dayton Dennis, and Moose City Moses. Wiarton, Ontario has a whole festival surrounding the albino groundhog Wiarton Willie, which even features a hockey tournament.

There’s even a song about it, “Oh, Murmeltier” (sung to the tune of “Oh, Tannenbaum”) for which professor and marmot scholar K.B. Armitage of the University of Kansas has written English lyrics:

“Oh Whistlepig, oh Whistlepig,

We celebrate your famous day.

Oh Whistlepig, to you we pray

That winter soon will go away.

We like the sun and daffodils.

We’ve had too much of winter’s chills.

Oh, marmot friend, we’re warning you,

If winter stays, you’ll be rockchuck stew!”

 

…which is just plain weird.

Then, we have “Groundhog Day,” one of the most enduring comedy films of recent decades. In it, a meteorologist named Phil Connors (played by Bill Murray) travels to Punxsutawney to cover the Groundhog Day event. While there, he gets stuck in a recursive feedback loop, in which February 2nd is replayed over and over, while he tries to break the loop and move on to February 3rd (and get the heck out of Punxsutawney).

Bill Murray

All comedy hijinks aside, movies are ripe for classification. Genres, while easily arguable, are the broadest way by which we classify them. In the case of “Groundhog Day,” it’s a comedy, but we also have drama, horror, etc. Sometimes, such as in this case, the classification is fairly obvious, but some films rightly belong to multiple genres, such as horror-comedies, or dramedies (a term that I personally despise, but it’s out there in common use).

Then, for some movies, we sub-classify by the film’s content or style. Film noir, for instance, isn’t a genre of its own; they’re dramas, but they’re particular kinds of dramas with a specific tone and stylistic touch. If somebody wants to watch something of that nature, it’s much smarter to search for “film noir” than to try wading through the thousands of “dramas” that have been released in the century-plus of cinema—and would thereby be returned in an online search.

But we classify movies in ways other than genre, as well. The MPAA rating system is designed to tell consumers whether the movie is suitable for their age group or comfort level. Sometimes, we classify by their overarching plot, such as the biopic, the road movie, or the coming-of-age film, independent of genre. One can classify them by country of origin, or level of the movie’s budget, or really any way at all.

But let’s go back to “Groundhog Day” and the recursive feedback loop in which the main character gets stuck. It’s funny when it happens to Bill Murray, but it can be devastating to taxonomy. Say, for instance, you have a taxonomy with a top term of Business. A sensible narrower term under this could be Risk. That could be used for any number of kinds of risk, but in this case, the taxonomist adds a narrower term of Risk Management.  Under that, one could place Insurance, which easily falls under Risk Management. So far, everything looks just right

Business

Risk

Risk management

Then, somebody comes along to screw around with the taxonomy, and looks at Insurance without looking at the broader terms first. It’s easily arguable that under Insurance, if one wasn’t paying attention, could go Risk Management—of which of course a primary topic is Risk.

Insurance

Risk Management

Risk

When that happens, you get this:

Recursion

Recursions of this kind are the taxonomic equivalent of what happens in “Groundhog Day,” and it’s not good, or even funny. You’ll go on forever in this loop, getting nowhere and draining system resources at an increasing pace.

So today, we can all have a laugh at a movie, watch some hockey, and gather around to see a groundhog (or whatever you want to call it) leave its burrow, all because of Groundhog Day. But stay warm, because (spoiler alert) there is absolutely six more weeks of winter to come.

Daryl Loomis
Access Innovations

Bob Kasenchak Named Head of Product Development at Access Innovations, Inc.

January 26, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc. is pleased to announce an exciting change to its corporate structure, a change designed to increase revenue and maximize the considerable talent of its staff.

Bob Kasenchak has now shifted into the role of head of Product Development. He started at Access Innovations in 2011 and succeeded so thoroughly in shepherding projects from initial lead to completion, as well as building a presence in the marketplace, that the company decided to leverage his talents to help develop new product offerings (such as the forthcoming Ontology Master). He will also be helping deliver exciting new projects to the company. Margie Hlava, President of Access Innovations, stated, “Having worked in production for the last two years, Bob is uniquely suited to take on product development on the cutting edges of information, including ontology implementation, linked data, text mining, and text analytics, which build very effectively on thesauri and taxonomies we have so widely implemented as a firm.”

Bob remarks, “This is the best fit for my combination of skills, and I look forward to working on projects with clients and within the company. I am especially looking forward to projects that will make information more easily available and expose it to its full potential through linking, mining, and stronger search leveraging of the actual content for a better understanding of that content and to support management decisions with content-based facts.”

Bob started as a project manager at Access Innovations, providing oversight and support of editorial projects at the company. The projects that he led involved thesaurus creation and development, as well as the development of indexing rule bases that were associated with those thesauri. He handled a wide range of customer specifications and communications. Bob has led taxonomy development and other projects for JSTOR, McGraw-Hill, Wolters Kluwer, the American Society for Civil Engineering (ASCE), Engineering Research Education (ERE), the American Association for the Advancement of Science (AAAS) and the U.S. Mine Safety and Health Administration (MSHA).

Bob attended St. John’s College, the New England Conservatory of Music, and the University of Texas at Austin, completing his master’s degree in theoretical studies and doctoral work in music theory. He lists his interests as tea, music, design, philosophy, and literature. He is married with one cat.

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

A Newbie’s Guide to DHUG Meetings

January 19, 2015  
Posted in Access Insights, Featured

The biggest week of the year at Access Innovations is almost upon us. Every year, we present the Data Harmony Users Group (DHUG) meeting, where our esteemed clients come from all over the nation to meet and learn from the people who built the software and use it on a daily basis. Right about now, there starts to be a lot of buzz around the office. There are a lot of people coming to Albuquerque for this, and everyone here is pretty excited to swap ideas with them, because they’ve come up with some interesting uses of our software, things that have made us better in the process.

Now, I haven’t been in the taxonomy game as long as most of the people here, so like its attendees, DHUG meetings are brand new to me. I don’t know exactly what to expect, but there some things that I’m definitely looking forward to seeing. The workshops that we’re conducting for the attendees will certainly be interesting and informative for a newbie like me, but the people I’m most anticipating are our guest speakers. These are people with different perspectives who are removed from the office echo chamber, which helps breathe fresh life into taxonomies.

This year, we have great guests who are gracious enough with their time to discuss their experiences with Data Harmony software and how they use it within their organizations. Its applications are broad, and each case study is unique, so what sorts of things am I going to hear about?

Kicking off these case studies are Sharon Garewal and Ron Snyder from JSTOR, one of the largest and most respected shared digital libraries in the world.  We’ve done a lot of work with them and, this year, they’re launching the JSTOR Sustainability Collection and discussing it at DHUG 2015.

This interdisciplinary collection is composed of journals, reports, and working papers from the realms of academic publishing, scholarly societies, industry groups, research institutes, and universities to look at how the environment, human activities, and industry can be made sustainable in the long term.  This has become an increasingly important issue, and they will discuss how the JSTOR Thesaurus, which was built using Data Harmony, makes crossing through the many fields of study a fairly straightforward affair.

One of the really interesting things about taxonomies and modern data analytics is how indexing can be leveraged to see information that would have taken a mountain of time and effort to figure out before. That’s precisely what Helen Atkins of the Public Library of Science (PLOS) will discuss with attendees in her talk, “The Fate Predictor Project.”

PLOS ONE, their international, open-access, online journal, has semantically enriched their content recently. Using the metadata that got extracted, they were able to see statistics about acceptance and rejection of papers. Using this data, along with data about country of origin, author, number of authors, etc., they are able to predict with accuracy whether a currently submitted paper will get accepted or rejected. That doesn’t take away the need for peer review, but knowing what kinds of things flag often for rejection will be able to save the PLOS editors huge amounts of time.

This is just one example of how sophisticated data usage can open eyes to otherwise unseen patterns. Marketing companies use it to see buyer patterns, leading to all those advertisements directed to individuals. This is how the Internet of Things will work, so that your refrigerator knows what resides inside and for how long, and can recommend recipes, keep your shopping list, and tell you when your milk has gone sour. Maybe its biggest current application is in security, where it’s being used in myriad overt and covert ways. This is right in line with the kind of semantic enrichment that Access Innovations does.

The talk that I’m most interested in will come from Kevin Ford of MarkLogic. His presentation, “Implementation of Taxonomy Triples from Data Harmony Exports,” will explore how companies can convey more accurate information, make data-driven decisions, and reduce risk by taking content from documents and data and combining them with RDF triples into a single architecture. By enabling search across different kinds of information from many sources, this kind of architecture can help users glean greater insights, and will help customer bases quickly and accurately mine knowledge from the data.

Ontologies are taking an increasingly prominent place in the world of semantics, and many believe that their use will take a big step toward genuine artificial intelligence. How far off that might be is certainly up in the air, but it’s presentations like this one that will start to reveal how it might work, if not when it might work.

These aren’t the only presentations at DHUG 2015. There will be more case studies from our users, as well as panels by the highly knowledgeable staff from Access Innovations. Those, in conjunction with meeting new people over great food and conversation, are going to make February 16-20 a pretty great week.

Daryl Loomis
Access Innovations

A Celebration of Roget’s Taxonomy

January 12, 2015  
Posted in Access Insights, Featured

Later this week is January 18th, which for taxonomists is notable for two things: 1) it’s Thesaurus Day; and 2) it’s the birthday of Peter Mark Roget. This double occurrence is no coincidence. We may consider Doctor Roget to be the inventor of the thesaurus (or at least one of its pioneers), and a person whose birthday is cause for taxonomists’ celebration.

Peter_Mark_Roget

Yes, this is the man who compiled the first “Thesaurus of English Words and Phrases.” He started writing it in 1805 but didn’t have it published until much later, in 1852. The full title of the first edition was Thesaurus of English Words and Phrases, Classified and Arranged so as to Facilitate the Expression of Ideas.

rgetss

Photo, http://www.pbagalleries.com/view-auctions/catalog/id/339/lot/104070/?url=%2Fview-auctions%2Fcatalog%2Fid%2F339%3Fcat%3D9

Did you catch the “Classified” part of the title? And the “Arranged”?

Most people think of Roget’s thesaurus as a simple list of words and their synonyms. This is understandable, as some of the more recent synonymies that include “thesaurus” in their titles really are just strictly alphabetical lists of words, annotated with some synonyms. Taxonomists sometimes consider Roget’s synonym resource to be much different than modern taxonomic thesauri. After all, hasn’t it always lacked any sort of classification scheme?

No, no, no.

As much of a habitual list maker as Roget was (since he was eight years old, in fact), he recognized that the full potential of a lengthy vocabulary could not be achieved unless there was some sort of categorization or classification of the list entries. Classification was an intrinsic part of Roget’s compilation of synonyms throughout its long development.

As he explained in the preface to the first edition of the Thesaurus of English Words and Phrases: 

“It is now nearly fifty years since I first projected a system of verbal classification similar to that on which the present work is founded. Conceiving that such a compilation might help to supply my own deficiencies [as a writer], I had, in the year 1805, completed a classed catalogue of words on a small scale, but on the same principle, and nearly in the same form, as the Thesaurus now published. I had often during that long interval found this little collection, scanty and imperfect though it was, of much use to me in literary composition, and often contemplated its extension and improvement; but a sense of the magnitude of the task, amidst a multiple of other avocations, deterred me from the attempt. Since my retirement from the duties of Secretary to the Royal Society, however, finding myself possessed of more leisure, and believing that a repertory of which I had myself experienced the advantage might, when amplified, prove useful to others, I resolved to embark in an undertaking which, for the last three or four years, has given me incessant occupation .” (“Roget’s Thesaurus: The Original Manuscript”)

Part of Roget’s classification efforts involved choosing a single term to represent each concept, rather than repeating each synonym in some other part of the list. This is akin to modern taxonomic thesauri, in which each concept is represented by only one term, and alternative ways of expressing that concept are indicated in the term record as non-preferred terms. Roget’s approach was oriented toward findability of a concept through the choice of words that users were most likely to associate with particular concepts.

Beyond that, though, the overall structure of the thesaurus was hierarchical. The table of contents of Project Gutenberg’s presentation of Roget’s thesaurus shows the organization of the book into six main classes, with numerous subdivisions. Wikipedia provides an “Outline of Roget’s Thesaurus” that shows the hierarchical depth to seven levels; this resource also includes links from many of the categories to relevant Wikipedia articles, as does the related Wiktionary resource “Appendix: Roget’s thesaurus classification”.

Roget crafted the thesaurus categories and subdivisions according to principles set out by some eminent philosophers, as explained in the Wikipedia article on “Roget’s Thesaurus”:

“Each class is composed of multiple divisions and then sections. This may be conceptualized as a tree containing over a thousand branches for individual “meaning clusters” or semantically linked words. These words are not exactly synonyms, but can be viewed as colours or connotations of a meaning or as a spectrum of a concept. One of the most general words is chosen to typify the spectrum as its headword, which labels the whole group.

“Roget’s schema of classes and their subdivisions is based on the philosophical work of Leibniz (see Leibniz—Symbolic thought), itself following a long tradition of epistemological work starting with Aristotle. Some of Aristotle’s Categories are included in Roget’s first class “abstract relations”.”

So was Roget an inventor? An originator? A pioneer? Consider these eclectic accomplishments:

  • He invented the log-log slide rule, which greatly simplified the exponential and root calculations.
  • He designed a pocket chessboard and invented several chess problems.
  • He made insightful observations about the perception of motion, thus contributing to the development of mechanical animation devices and, more importantly, to the early development of cinema.
  • He helped found the wonderfully named Society for the Diffusion of Useful Knowledge.
  • He was a co-founder of the Medical and Chirurgical Society of London, the forerunner of the Royal Society of Medicine.
  • He was the first Fullerian Professor of Physiology at the Royal Institution.
  • He helped establish the University of London.
  • He compiled Roget’s Thesaurus, which writers still use to perfect their prose.
  • He developed a classification approach that set an example for modern taxonomists and thesaurians.

Yes, I think we can conclude that Peter Mark Roget was an inventor, an originator, and a pioneer. And a thesaurian, of course. And yes, a taxonomist.

All good reason to celebrate his birthday on Thesaurus Day!

Barbara Gilles, Taxonomist
Access Innovations, Inc.

« Previous PageNext Page »