A Short Roundup of Recent Taxonomy Books

April 13, 2015  
Posted in Access Insights, Featured, reference, Taxonomy

http://www.dreamstime.com/royalty-free-stock-image-3d-people-around-books-image24579066

© Nasir1164 | Dreamstime.com – 3d People Around Books Photo

Over the past couple of decades, the field of knowledge organization systems (such as taxonomies and thesauri) has matured. This maturation has led KOS experts to write books that consolidate and share the theories, insights, and techniques that have emerged. Below is a roundup of some of the more recent books in the area of taxonomies and related knowledge organization systems.

The Taxobook

One of the most recent books, published as a trio of monographs, is The Taxobook, written by Access Innovations President Marjorie Hlava and published by Morgan & Claypool Publishers. TaxoDiary recently had a blog post about The Taxobook. We’ll reiterate the summary:

Volume 1, The Taxobook: History, Theories, and Concepts of Knowledge Organization, introduces the foundations of classification, covering theories from the ancient Greek philosophers to modern thinkers. This volume also includes a glossary that covers all three volumes.

Volume 2, The Taxobook: Principles and Practices of Taxonomy Construction, outlines the basic principles of creation and maintenance of taxonomies and thesauri. It also provides step-by-step instructions for building a taxonomy or thesaurus and discusses the various ways to get started on a taxonomy construction project.

Volume 3, The Taxobook: Applications, Implementation, and Integration in Search, covers putting taxonomies into use in as many ways as possible to maximize retrieval for users.

The Accidental Taxonomist

This book, by well-known taxonomy expert Heather Hedden, was published by Information Today Inc. in 2010. Here’s the publisher’s summary:

The Accidental Taxonomist is the most comprehensive guide available to the art and science of building information taxonomies. Heather Hedden—one of today’s leading writers, instructors, and consultants on indexing and taxonomy topics—walks readers through the process, displaying her trademark ability to present highly technical information in straightforward, comprehensible English.

Drawing on numerous real-world examples, Hedden explains how to create terms and relationships, select taxonomy management software, design taxonomies for human versus automated indexing, manage enterprise taxonomy projects, and adapt taxonomies to various user interfaces. The result is a practical and essential guide for information professionals who need to effectively create or manage taxonomies, controlled vocabularies, and thesauri. 

Introduction to Controlled Vocabularies: Terminologies for Art, Architecture, and Other Cultural Works 

This book, originally published in 2010 by J. Paul Getty and revised in 2013, focuses on controlled vocabularies for the world of museums and cultural studies. Author Patricia Harpring is managing editor of the Vocabulary Program at the Getty Research Institute, which maintains some highly respected thesauri and other controlled vocabularies, including the Art & Architecture Thesaurus (AAT), the Union List of Artist Names (ULAN), the Getty Thesaurus of Geographic Names (TGN), and the Cultural Objects Name Authority (CONA). Co-author Murtha Baca is Head of Digital Art History at the Getty Research Institute. Here’s the Institute’s description of the 2013 revision:

This primer on the characteristics, scope, uses, and methods for building and maintaining controlled vocabularies for art and cultural materials explains how vocabularies should be integrated in cataloging systems; utilized for indexing and retrieval; and structured to group synonyms and arrange concepts into categories.

The updated edition reflects recent developments in the field, including new national and international standards, current trends such as Linked Open Data, and revisions to the Getty vocabularies. The glossary and bibliography have also been updated.

Structures for Organizing Knowledge: Exploring Taxonomies, Ontologies, and Other Schema

This book, published in 2010 by Neal-Schuman Publishers, was written by June Abbas, whose research focuses on the development of user-centered digital libraries, institutional repositories, and knowledge organization structures. In the Preface, she outlines the three major sections of the book:

Traditional Structures for Organizing Knowledge—Part I looks at structures used in libraries, such as MARC records, subject headings, and classification schemes, as well as traditional structures that may not be as familiar, such as those from natural science. The historical  contributions to the organization of knowledge from fields such as library and information science, philosophy, natural science, and cognitive science are examined. Exemplars of how the structures have remained the same and/or have been adapted for use in the digital environment are also included in this section.

Personal Structures for Organizing Knowledge are the focus of Part II. These are systems developed by individuals in both home- and work-related contexts. Several research streams from library and information science (knowledge organization and human information behavior) and human–computer interaction (personal information management) are introduced, and research in each area of personal knowledge structures is explored.

Socially-Constructed Structures for Organizing Knowledge, or those that are beginning to merge as the result of individual and collaborative uses of social bookmarking and social cataloging Web 2.0 sites, are examined in Part III. Research focused on these new environments is becoming more prevalent and providing information professionals with a glimpse into how people organize their own collections. 

Metadata

In 2008, the American Library Association published the first edition of this book by Marcia Lei Zeng and Jian Qin, two experts in the field of knowledge organization systems and in the metadata connected with those systems. The second edition is scheduled to be released in 2016. Here’s the ALA’s description of the new edition:

Metadata remains the solution for describing the explosively growing, complex world of digital information, and continues to be of paramount importance for information professionals. Providing a solid grounding in the variety and interrelationships among different metadata types, Zeng and Qin’s thorough revision of their benchmark text offers a comprehensive look at the metadata schemas that exist in the world of library and information science and beyond, as well as the contexts in which they operate. Cementing its value as both an LIS text and a handy reference for professionals already in the field, this book

Lays out the fundamentals of metadata, including principles of metadata, structures of metadata vocabularies, and metadata descriptions

Surveys metadata standards and their applications in distinct domains and for various communities of metadata practice

Examines metadata building blocks, from modeling to defining properties, and from designing application profiles to implementing value vocabularies

Describes important concepts as resource identification, metadata as linked data, consumption of metadata, interoperability, and quality measurement

Offers an updated glossary to help readers navigate metadata’s complex terms in easy-to-understand definitions

An online resource of web extras, packed with exercises, quizzes, and links to additional materials, completes this definitive primer on metadata.

Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness

This book, by knowledge management consultant Patrick Lambe, was published by Chandos Knowledge Management in 2007. In the book’s introduction, Lambe offers an overview:

In the first half of this book we’ll challenge a number of assumptions about taxonomies and the work of taxonomy building, and relate this work to organization effectiveness and knowledge management.…

In the second half of this book, we take a more practical approach and guide you through the steps involved in a ‘typical’ taxonomy project. Here we challenge the assumption that taxonomy development can be done in the abstract, by a consultant, sitting apart from the information and knowledge world of the organisation it is intended for. Very few taxonomies can be developed in that distant, unengaged way.…

To close, in Chapter 10 we take a forward look at issues and challenges on the horizon for knowledge managers. What do the semantic web, folksonomies, ontologies and social tagging mean for taxonomy work? Will we need taxonomies at all?

Those of us involved with TaxoDiary believe that taxonomies, thesauri, and other controlled vocabularies will continue to be relevant to knowledge management and information retrieval. And we look forward to seeing new insights and approaches, and new books.

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Using Taxonomy to Save a Butterfly

April 6, 2015  
Posted in Access Insights, Featured, Taxonomy

butterfly Adult monarch butterfly nectaring on a swamp milkweed (Asclepias incarnata) in Ramsey County, Minnesota. Photo by Steven Katovich, USDA Forest Service, Bugwood.org, http://www.forestryimages.org/browse/detail.cfm?imgnum=5524769, CC BY 3.0

The monarch butterfly is a beautiful sight, whether it’s fluttering through a garden or resting on a flower. Understandably admired, it’s the state insect of Alabama, Idaho, Illinois, Minnesota, and Texas. It’s also the state butterfly of Vermont and West Virginia.

Unfortunately, the monarch butterfly population has been dropping drastically over the past decades. The situation is explained in a recent article authored jointly by Daniel M. Ashe, Director, U.S. Fish and Wildlife Service, and Collin O’Mara, CEO, National Wildlife Federation:

As recently as 1996, the estimated monarch population wintering in Mexico was more than one billion butterflies, turning forests into seas of orange and black. Last year, however, the wintering population numbered only about 56 million butterflies, gathered on fewer than three acres of forest.

Monarch butterflies, as well as other butterfly species, bees, birds and bats help move pollen from one plant to another, fertilizing flowers and making it possible for plants to produce seeds, berries, fruits and nuts that feed people and wildlife. More than a third of the food that we eat requires pollinators to grow. Yet like the monarch, many of these pollinators are declining, with habitat loss, pesticides and climate change all contributing to their struggles.

We need to know more about exactly why monarch butterflies are disappearing. But we don’t need to wait to take the actions that scientists tell us are necessary to redirect the monarch’s future skyward.

What can we do? Well, for one thing, we can plant milkweed.

Many of us in North America and elsewhere know that monarch butterfly larvae need to feed on milkweed (genus Asclepias) in order to achieve their transformation into winged butterflies. (As a matter of fact, so do the larvae of the monarch butterfly’s closest relatives, the other “milkweed butterflies” of the genus Danaus.) The monarch’s fate hinges on the available supply of milkweed in its natural geographic distribution. Various factors, including well-intentioned weeding, have caused that milkweed supply to dwindle.

Practically anyone can grow a successful patch of milkweed, as long as the right kind of milkweed is chosen and the right conditions are provided. Texas Butterfly Ranch’s aptly named “Got Milkweed?” planting guide offers this caution:

Those of us who have attempted cultivation of native milkweeds from seed in our home gardens have often met frustration and failure. The very traits that make native plants so hardy also often make them extremely particular about their soil, drainage, moisture and available light. As George Cates, chief seed wrangler at Native American Seed Co. in Junction, Texas told me: “These milkweeds have a mind of their own.”

Another reason to plant a species that’s native to your geographic location is that the monarch migration is largely dependent on the timing of milkweed blooms. Tropical milkweed, in particular, when grown outside of extreme southern areas of Texas and Florida, can throw off migration patterns, leading to disease and other problems. Ironically, gardeners wanting to help the monarchs have been planting the more readily available non-native milkweeds. As explained in an article in the February 2015 issue of the Proceedings of the Royal Society B:

Each autumn, monarchs migrate from breeding grounds in the eastern US and Canada to wintering sites in central Mexico. However, some monarchs have become non-migratory and breed year-round on exotic milkweed in the southern US. We used field sampling, citizen science data and experimental inoculations to quantify infection prevalence and parasite virulence among migratory and sedentary populations. Infection prevalence was markedly higher among sedentary monarchs compared with migratory monarchs, indicating that diminished migration increases infection risk. Virulence differed among parasite strains but was similar between migratory and sedentary populations, potentially owing to high gene flow or insufficient time for evolutionary divergence. More broadly, our findings suggest that human activities that alter animal migrations can influence pathogen dynamics, with implications for wildlife conservation and future disease risks.

Gardens planted to accommodate monarch migration are sometimes referred to as waystations. In an article on butterfly gardening for monarchs (where the ubiquitous and irresistible “Got Milkweed?” pops up again), Carole Sevilla Brown recommends, “Plant a Monarch Waystation. Go to the USDA Plants database to determine which species of Asclepius are appropriate for your garden.” Chances are that the species appropriate for your garden is/are what’s native to your area. (Essentially, that’s what the USDA Database shows.)

Ah, species. A look at the Encyclopedia of Life’s Asclepias taxonomies shows the complexity. There are dozens of Asclepias species, viewable by traditional biological taxonomy structure (the NCBI Taxonomy or the Integrated Taxonomic Information System), as well as by Extant and Habitat resource, as well as a few other slice-and-dice approaches. It’s apparent from these resources that there are milkweeds for all kinds of different growing conditions, from wetlands to scrubland.

Wherever you live, you can use one of the various online taxonomic resources, or a database that’s correlated with a detailed taxonomy, to determine what kinds of milkweed to grow.

Got milkweed? No? Use a taxonomy, and get milkweed!

Barbara Gilles, Taxonomist
Access Innovations, Inc.

Heather Kotula Named Director of Communications at Access Innovations, Inc.

March 30, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc. is happy to announce an important change to its corporate structure, a change designed to best represent the talents of its staff.

Heather Kotula, formerly the marketing coordinator for Access Innovations, has expanded her role to assume the duties of Director of Communications. Since starting at Access Innovations in 1995, Heather has succeeded admirably in each role she has taken on and, through her hard work and skill, has been integral to the growth of the business. This will continue as she oversees marketing and sales, and shepherds new initiatives to fruition. Margie Hlava, President of Access Innovations, remarked, “Heather knows this business as well as anybody, and her unique skillset is perfect for her new role. We at Access Innovations are very excited about the new prospects and
opportunities she will bring to the company.”

Heather remarks, “I am thrilled to be stepping into this new role. Over the years, having seen nearly every aspect of the business, I am in a fantastic position to leverage my experience and knowledge into the emergent markets that we are pursuing. It will be exciting to see what the future holds for myself and Access Innovations.”

Heather began as a financial analyst at Access Innovations and has been a part of nearly every aspect of the company, from production work to administration to marketing and sales. She has led various departments to develop budgets, design and produce marketing materials, and acquire space and hire staff for corporate expansion. Additionally, she has worked directly with clients to ensure customer satisfaction, has attended and participated in conferences and trade shows, and will continue to coordinate the annual Data Harmony Users Group (DHUG) meetings.

Heather earned her bachelor’s degree in foreign languages from the University of New Mexico and her MBA from New Mexico State University. When not organizing new marketing and sales initiatives, Heather enjoys various crafting activities, such as quilting and painting, as well as gourmet cooking, hiking, and spending time with her husband, daughter, and three dogs.

 

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Among Good Company

March 25, 2015  
Posted in Access Insights, News, Technology

TEMIS joins the prestigious list of 100 Companies That Matter in Knowledge Management. KMWorld made their annual announcement recently as they recognized the top industry influencers in content, document, and knowledge management. This information came to us from Bio IT World in their article, “TEMIS Named to KMWorld’s 2015 List of “100 Companies that Matter In Knowledge Management”.”

This was the 15th edition of the list that is compiled by KM practitioners, theorists, analysts, vendors, and their customers and colleagues.

Access Innovations was also included in this prestigious list. For a full list of the Top 100 Companies That Matter in Knowledge Management, click here.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

The Promise and Pitfalls of Classifying Food

March 24, 2015  
Posted in Access Insights, Featured, Taxonomy

As a lover of all things tasty, my mind often turns to the kinds of food and drink that I love. As someone who works with taxonomies and thesauri, I tend to try to classify them. Often, though, I don’t get very far with it because I start to get hungry. However, I just ate, so I don’t think it’ll be so much of an issue this time.

Classifying food is at once extraordinarily basic and mind-bogglingly complex, depending on how deep you want to go with it. At its simplest, you have the USDA food pyramid (or plate, depending on when you’re talking about). As a simple guide to make sure your kids are eating the right proportions, the USDA guide can be helpful, if problematic. As a way to look at meaningful relationships between food, though, it’s simply too narrow to be of any real use.

To see a much more complex food classification system, one can simply enter a grocery store.  In there, thousands of food items sit on shelves, organized in a very specific, scientifically driven way. This organization, though, is based on sales maximization, not organizational consistency. That’s how rice ends up in multiple places, with the cheap basic stuff with the other staples and nicely packaged, and pricier styles in International Foods, or somewhere similar. So while there are very good reasons for how the items in a grocery store are arranged, this isn’t the kind of organization that I mean.

I’m thinking of organization based on what food and drink is and how it is viewed by cooks and eaters, not on how to boost sales of the latest in frozen pot pies. However, this can get extremely complicated.

With thousands of various types of food and drink in this world, questions immediately arise that confuse the issue. We all know what bread is, and we all know what cake is. They use nearly all the same ingredients and the result is similar, if very distinct. If we’re building a taxonomy, are they distinct concepts? Is cake a type of bread, or is the fact that one is eaten mostly for dessert, while the other generally isn’t, a big enough difference to keep them separated? How about a box of macaroni and cheese? Obviously, a greater part of what’s in that box is pasta, but in the grocery store, it’s generally nowhere near an actual package of macaroni. Does that little bag of weird cheese powder in the box make it an entirely different product? It seems like a subset to me.

There are problems like this everywhere, which makes attempts at organization seem futile. Where do we even start? In taxonomic terms, the basically useless (for our purposes) food pyramid gives us a few broadest terms to work with. It’s woefully incomplete, but it’s a place to start. Australia did something a little like this with their Australian Health Survey Classification System, which was designed “to group similar foods and report trends in consumption by food category.” While it’s useful and quite interesting from a public health perspective, the near-700 line spreadsheet makes it indecipherable for use by your average eater.

Unless all we want is an organized but flat list of foods and beverages, it seems we must decide on the purpose of the classification, because nothing is going to be one-size-fits-all. There isn’t a comprehensive food taxonomy out there, at least that I know of, but there some really intriguing things that people have done with very specific kinds of classification.

In his book, On Food and Cooking: The Science and Lore of the Kitchen, food scientist and writer Harold McGee features a two-page table (which I cannot post here) that features, on the y-axis, the names of commonly used herbs, and on the x-axis, the chemical compounds that give each its distinct flavor. One look at the table reveals how much is shared between different herbs. Say you’re cooking something and need basil, but are surprisingly all out of it. McGee’s table can show you what other herbs contain the chemical or chemicals that you need to match that flavor. You might get some extra stuff in the dish that you didn’t need, but you will have the flavors that you want.

Then there’s chef Marc Powell, who built a food app that reads menus, turns them into XML documents, and tags them with taxonomy-based metadata for taste, texture, and other food characteristics. This metadata can then be used to do make recommendations for balancing the flavors of a dish, providing a list of ingredients to concoct possible dishes, or any number of possibilities.

I would absolutely love to use that tool; it’s exactly the sort of thing that I want, though for it to work the way I have in my head, I don’t think a simple taxonomy, no matter how large, would be enough, precisely because of the complications that I describe above. On the other hand, an ontology that relates ingredients to associated recipes could be extremely useful. If I could just open my refrigerator or pantry, search in the ontology for the ingredients that I have in there, and have it return possible dishes that use only what I have would change the game for me. With the Internet of Things coming closer and closer to reality for the masses, this doesn’t seem all that far-fetched.

All of this talk about food has given me quite an appetite, but at least I could complete the thought this time.

Daryl Loomis
Access Innovations

Emojis: Communication without Words

March 16, 2015  
Posted in Access Insights, Featured, Technology

Each year since 2000, the Global Language Monitor has selected the Top Words of the Year, which they derive through statistical analysis of word usage. Some may have thought the organization had gone insane this year when they announced the selections for 2014, because at the very top of the list was not a word at all, but the heart emoji. It’s true; a tiny cartoon heart got used so often last year that it supplanted all actual words.

People might shake their heads at that fact and lament what happened to language with kids these days, but it’s how people communicate online, which is basically how people communicate at all anymore, so it’s well worth looking into how and why this has happened.

No matter how one might personally feel about it, there’s no denying the rampant popularity of emojis. They are more commonly used on Twitter than the digit 5, and the single most popular emoji is more commonly used than the tilde. Those facts are crazy to me, and text analytics company Luminoso has compiled even more. Emojis have taken over at lightning speed and there’s no stopping them, so we might as well start trying to find the meaning in them.

As I discussed previously, I have found myself fascinated the last few weeks after I discovered emojitracker.com. By making use of Twitter’s streaming API, it tracks emoji use across the globe in real time. Though it only uses Twitter and not all the other places where the characters are used, the numbers are still mind-blowing.  The most popular character, “face with tears of joy,” has been used more than 632 million times since the site opened and it, along with the others at the top of the list, increases at an extraordinary pace.

It’s not just a flood of numbers, either. You can click on each icon to see a feed of the tweets that the emojis were used on, as well as see those results in JSON markup language; this is the stuff that I find highly interesting. The feeds for the top ranked emojis move far too quickly to understand anything by the naked eye, but there are things to look at in some of the less popular ones.

Take, for instance, the emoji labelled “Pedestrian,” which is simply a man walking. Oddly, of the 16 million times this symbol has been tweeted, nearly every one is in Arabic. Why are they all walking? To see this stuff with the eye, one has to wade through so much material that it would be simply too daunting to actually find larger meaning in any of this.

Computers could easily parse it all out, though. The trouble is that, while it’s interesting to see the data stream, nothing is really being done with it. Despite the fact that it’s already an example of linked data, there is next to no analysis. That site lives in a vacuum, but emoji usage doesn’t. It grows and evolves more rapidly than text language does, and people from different cultures and groups assign their own meanings to single characters and groups.

Yet, in spite of that evolution that, for whatever reason, makes the pedestrian symbol appealing to Arabic speakers, emojis also have somewhat universally defined meanings that make actual communication possible. The Wall Street Journal allows you to translate their headlines into emojis and, though you sometimes have to stretch a bit, it’s pretty easy to see where they’re coming from. Likewise, in a far more absurd example, Herman Melville’s classic Moby Dick has been turned into emoji. Of course, all the deep contextual and literary meaning will be lost in translation, so to speak, but if the words can be communicated in any kind of comprehensible fashion, that’s pretty impressive, if rather pointless.

The problem with all of this from a semantics perspective is that if the meaning does continue to evolve, how could one possibly analyze the data in a meaningful way? Were one to get a comprehensible result today, would they get that same result later? It’s important in semantic analysis to get provable, repeatable results. You can’t see patterns in data when the rules keep changing.

Like it or not, this is the way people communicate today and, whether or not I think emojis are a lasting phenomenon or will be an enduring part of language (I don’t), they don’t seem to be a product of laziness. Instead, they are about speed and clarity of communication. If we can express a complex emotion like love using one symbol rather than many, people are going to gravitate toward it, just like they gravitated toward texting and Twitter rather than tedious old email.

Words and their meanings are always in flux, just very slowly. The difference between our current English and Geoffrey Chaucer’s is massive, but it happened over six centuries. Still, the language is comprehensible without translation. The meaning of emojis may change at a faster pace, but their meanings are still being communicated to people around the world, regardless of language or cultural barriers. To me, that alone is reason enough to want a much deeper understanding of how they’re being used.

Daryl Loomis
Access Innovations

Access Innovations Named in KMWorld’s Annual “100 Companies That Matter in Knowledge Management”

March 9, 2015  
Posted in Access Insights, Featured

Access Innovations, Inc., a leader in digital data organization, is pleased to announce its inclusion on KMWorld’s annual list of the “Top 100 Companies That Matter in Knowledge Management.”

Access Innovations is featured for its fourth year after first debuting on the list in 2009. Other notable companies given a spot on 2015’s top 100 list include Oracle, Google, IBM, and Microsoft.

“The criteria for inclusion on the list vary, but each of those listed have things in common. Each has either helped to create a market, redefine it, enhance or extend it,” said Hugh McKellar, KMWorld Editor-in-Chief. “They all share a fundamental motivation to innovatively meet and anticipate the widely diverse needs of customers with robust solutions to meet evolving customer requirements challenges.”

Marjorie M.K. Hlava, president of Access Innovations, is honored that the company is included on the list. “Access Innovations enjoys  driving technology to face new challenges in knowledge management,” she says. “It’s stimulating and rewarding to be leaders in knowledge management, and it’s delightful to be recognized as a leader in the field. Making content findable for our customers and their users is and always will be our top priority.”

The annual Top 100 Companies That Matter list is compiled by editorial colleagues, analysts, theorists, and practitioners and, unlike many other trade lists, inclusion is not purchased and is at the sole discretion of KMWorld’s editors.

For a full list of the Top 100 Companies That Matter in Knowledge Management, pick up the March issue of KMWorld, available at newsstands now, or visit the following link to view the article online: http://www.kmworld.com/Articles/Editorial/Features/KMWorld-100-COMPANIES-That-Matter-in-Knowledge-Management-102189.aspx

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.

About KMWorld

The leading information provider serving the knowledge, document, and content management systems market, KMWorld informs more than 45,000 subscribers about the components and processes—and subsequent success stories—that together offer solutions for improving business performance. KMWorld is a publishing unit of Information Today, Inc.

Data Analysis in a Standards-Challenged World

We deal pretty heavily around here in words, what they mean, and how they’re used. It should go without saying, but it’s a fundamental part of what we do and is what makes us so concerned with standards, in both taxonomies and the written word. The two go hand in hand; it’s a whole lot easier to have one when the other is compliant.

Academic publishing, which we deal with the most, and most publishing in general, is pretty good about standards, and so we’re able to easily go in and build a taxonomy or mine the content for data analysis. There has been plenty of talk about how useful that enriched content can be in regard to linked open data, direct consumer advertising, and all that. It’s all well and good, but in places where there aren’t standards, it’s a whole lot more difficult to deal with, at least in a semantic sense.

Only a short time ago, nearly all disseminated content would go through some kind of editing process to make sure that simple things like spelling, grammar, and syntax were correct, but also to be sure that it complied with appropriate standards. Only once that process was complete would the public see anything, at least for the most part. This, of course, made for perfectly readable and understandable content, assuming you were familiar with the language and the jargon.

Then the Internet happened. Now, poorly constructed text is the norm and standards have gone out the window. Much of the Internet is about speed of delivery, and content is given a cursory edit, at best. Compliance with standards and delivery speed rarely make good bedfellows.

I’m not here to argue the rightness of accuracy over speed; the change has happened and there’s no going back. Blogging and especially social media have blown up the old ways. This is just how people communicate today and, to me, this is content that begging for analysis.

But how does one even begin? The one who worries about grammar or syntax in a tweet is a rare beast indeed, and that doesn’t even take into account how things are spelled. Multiple Z’s instead of a single S, fifteen O’s when writing “love,” numbers in place of letters, all sorts of ridiculous things.

Add into that the multitude of languages that people use online, along with widespread disregard for traditional spelling and grammar, and the number explodes. However, because more people from more cultures are communicating with one another, it seems even more important to find a way to be able to control and structure all this data that we have for the same reason that we structure vocabularies for scholarly publishing: quick and easy search.

In theory, that’s what tags on blogs and hashtags on social media are for, but when anybody can come up with their own tags, it’s plain chaos. This anarchy is something that has no place in an information realm that requires at least some degree of standardization. Tags might never be completely standardized, but a system of organizing them into broad concepts may be a solution. There will always be things like #janetiswinning, #imeatingdinner or whatever, so noise is going to be inevitable, but some kind of broader classification could help people find what they’re looking for, given how much new content is produced every hour of every day from every corner of the world.

That noise will always exist, but we can’t dismiss it all as trash. Communicating through social media has become too important a part of our lives to pretend that there’s no value in at least some of the millions of tweets, Facebook posts, Instagrams, Vines, and all the rest. If there’s no value from a scholarly standpoint, there still is from anthropological and political ones, and the power that marketing gains using this kind of data analysis is abundantly clear.

This is conceptually pretty simple when we’re talking about data in the form of text, whether that’s a post or a hashtag. They’re all words, after all, even if they’re spelled tragically wrong. What about things that aren’t words, but still convey concepts? Instagram and Vine are currently two of the fastest growing social media sites and, though they use hashtags, they deliver content visually.

And then there’s the whole new issue of emojis. That might seem like a small thing at first, but they aren’t necessarily used at random, and some are used very specifically. An additional wrinkle with these is that they communicate meaning across languages. It seems to have huge potential for analysis, but is effective analysis even possible given the amount of noise?

I think that the answer is almost certainly yes. This kind of data is too valuable not to mine, especially when the technologies for doing so are already being developed for other purposes. For text, there are developments in sentiment analysis that have already been implemented to analyze social media for political campaigns, and its uses are only going to evolve. Less has been done on this level for imagery, but if a computer algorithm can be built that can accurately identify Jackson Pollock paintings and if a self-driving car can determine spatial proximity and object identification in real time, certainly the potential exists for use in social media. Almost nothing has been done with analyzing emoji use, though there is Emojitracker, which absolutely fascinates me (and I will write about at more length in a later post).

We used to communicate almost exclusively by the written word, but now the technology exists to communicate meaning in a large number of ways. Shouldn’t we analyze and study that meaning? I don’t have answers to the questions, but the more we explore these new realms, it seems like time to start thinking about semantics in a slightly broader way. Standards are important and I’m all for them. But I’m also all for people communicating with each other. The least we can do, as people who work in semantics, is to try to find ways to see meaning in the content of social media, even if it seems like a great bog of nonsense some of the time.

Daryl Loomis
Access Innovations

Making Connections

February 23, 2015  
Posted in Access Insights, Featured

The 11th annual Data Harmony Users Group (DHUG) meeting just wrapped up, and as we reflect on the week-long event, making connections is the unofficial theme that emerged. This makes perfect sense, because the benefit that our attendees mention most often is the networking opportunity – the connections we make with colleagues – during the DHUG meetings.

This year we asked our attendees to fill out a short survey about the meeting. I have included some of their responses in the following list of connections that the DHUG meetings make or explain.

  • We meet and connect with other people who are doing what we do. One attendee said “Networking is an important aspect of the meeting” and “Excellent opportunity to establish and deepen networks for later follow up.”
  • We connect with people who can help overcome challenges that we are facing – because they are facing similar challenges. The most valuable thing on the agenda for one attendee? “Case studies. These are the main reason I come. Love seeing what other folks are doing.”
  • We connect ideas and find answers.
  • We get new ideas for additional things we can do with our data or systems.
  • We connect our users’ viewpoints with the functionality of the software and the services we offer – we explain not only WHAT the software does, but also HOW it works and WHY we designed it that way.
  • We connect authors with editors and peer reviewers.
  • We connect content to other related content.
  • We connect related concepts – not just the words that appear, but the meanings of those words.

Here are a few other comments taken from our survey responses:

“Really enjoyed it – right balance of updates and networking opportunities”

“Really well organized event with delightful people – thank you!”

“I think the range of topics and presentations was good. It’s good to have exposure to the variety of subjects.”

“Hearing from a researcher was great, implementing stories and creative uses also great”

“I honestly like [the meeting] the way it is”

If you missed the DHUG meeting, consider joining us in 2016! We have tentatively scheduled the 12th Annual DHUG meeting for February 22-26, 2016. Marjorie M.K. Hlava’s annual features update will kick off the core part of the meeting on February 23, and will be followed by case studies from our users on February 23 and 24. February 22, 25, and 26 are optional days reserved for hands-on software training and one-on-one meetings.

Watch TaxoDiary for an official announcement and further information.

Heather Kotula, Director of Communications
Access Innovations

Hands-on Learning

February 19, 2015  
Posted in Access Insights, News

The Data Harmony Users Group Meeting continued today with a presentation by Paul G. Kotula of the Materials Characterization department at Sandia National Laboratories. In the presentation, “Six Months of Work in the Lab will Save You Half a Day in the Library or 30 Minutes Online”, he shared his experience as both a consumer and a producer of peer-reviewed, published scientific literature.

Paul is an award-winning author, researcher, and peer reviewer who knows his stuff and knows how our clients use the content that our software enriches. During his presentation, he got specific about how people in his field use information and how researchers use collections.

Today wrapped up the case studies. Over the next two days there will be specific hands-on training, networking, and learning opportunities for the clients. Everyone seems to be eager to get their hands “dirty” and the Access Innovations staff are here and available to answer any questions.

Don’t forget to like our Facebook page to keep up with the latest news and information.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.

Next Page »