It’s hard to believe that the Data Harmony Users Group Meeting starts a week from today. If I thought things were buzzing in the office a couple of weeks ago, I didn’t know what was in store for me. And since there’s so much to do, it’s definitely at the front of my mind.
The last time I wrote about DHUG 2015, I decided to focus on a few of the featured talks that really stuck out to me as particularly suited to my interests. However, the week is filled with promising and intriguing presentations, from both our clients and on the home front.
It starts from the very top with our own Margie Hlava, who kicks off DHUG 2015 with “Taxonomy 101: Fundamentals, Construction, and Application,” where she’s going to start at the beginning and walk the attendees through the whole reasoning behind taxonomies and how they can effectively be used. As a taxonomist who’s still fairly wet behind the ears, this is the kind of thing that can make a big difference for somebody like me.
With my knowledge base hopefully somewhat more beefed up, I’ll be able to hit the rest of the week running. Last time, I wrote about Helen Atkins from the Public Library of Science (PLOS), who will discuss their Fate Predictor Project. Well, we have another speaker from PLOS, as well: Jonas Dupuich. His case study, “Using MAI and the PLOS Thesaurus for Matching Activities,” will look at how they have leveraged Data Harmony’s semantic enrichment capabilities to match authors with peer reviewers based on subject matter. This speeds up the peer review process, but it also has clear relevance outside of that process.
An employer, using Data Harmony in this manner, could collect information on the skill sets of their employees (hopefully, only for good). Suddenly a strange new project comes up and the employer has to assemble teams of people with very specific skills. No more, “Hey, anybody know somebody who knows how to write technical documentation while playing water polo?” That information is right there to sift through. It not only makes searching for people faster and easier, it allows connections to be made that might otherwise get missed.
Next, we’re getting to the heart of what we do at Access Innovation with Audrey Glowacki of the National Institute for Occupational Safety and Health (NIOSH). Her talk, “Development and Implementation of the Site Browser: Faceted Navigation Tool for Browsing NIOSH Mining Web Site Content,” gets to the nuts and bolts of what we do. NIOSH has been a Data Harmony user for a long time. We first built a custom mining safety and health thesaurus. Next, a custom web content management system (WCMS) was developed that allows users to build custom web pages. Their feedback has been very positive, and I look forward to hearing more about how they are using the software that I myself use.
Finally, we recently got a surprise guest speaker who looks to be giving a pretty interesting talk. He’s Paul G. Kotula, an award-winning author, researcher, and peer reviewer who works in the Materials Characterization department at Sandia National Laboratories. A Google Scholar search for his name reveals over 1600 results as an author, co-author, and in citations. He knows his stuff and he knows how our clients use the content that our software enriches.
His talk, “Six Months of Work in the Lab Will Save You Half a Day in the Library or 30 Minutes Online,” will explain some of this, which should prove useful to the DHUG audience, as well as us here at Access Innovations. He’s addressing a few specific points about how people in his field use information. How do researchers use collections? What do authors of scientific papers think about the publishing and peer review processes? What sort of resources do they use and why do they think they are the best or most reliable?
Answers to these sorts of questions are the type of things that allow us to better serve our clients, or how to better relay the message of how much our software helps researchers, authors, and peer reviewers alike. It’s a real scientist talking about his real needs as a researcher and sharing his firsthand experience from the side of publishing that we often hear too little from: the authors.
With all of these talks, the training that I’m sure will teach me as much as anyone, and of course the catering, it’s going to be a week to remember. This week is going to fly by in anticipation, but I’m sure next week will go by even faster.
I’m sure you’re all just like me and waiting anxiously to hear the results from Punxsutawney, Pennsylvania, whence this very day we will find out from Punxsy Phil whether spring will come early this year or we have to wait six more weeks (pro tip: In the Northern Hemisphere, it’s always going to fall on March 20th or 21st). As ridiculous as the holiday might seem to some of us, though, there are things about groundhogs and Groundhog Day that are pretty interesting.
Photo, Aaron Silvers, http://www.flickr.com/photos/silvers/24543841/ / CC BY-SA 2.0
Firstly, nobody seems capable of agreeing on what the rodent is called. The holiday would suggest that groundhog is the accepted term, but growing up, I always knew them as woodchucks. And there’s the well-known tongue twister (“How much wood would a woodchuck chuck if a woodchuck could chuck wood?”), which lends credence to its status as the accepted term. But depending on where one resides, the critter is also known as land-beaver, land-squirrel, rock chuck, pasture pig, and my personal favorite: whistle-pig. Some also call it a marmot, but that’s really a broader classification of the genus to which the groundhog belongs (Latin name: Marmota monax). All groundhogs are marmots, but not all marmots are groundhogs, which is plain old Taxonomy 101.
While there are plenty of names for the animal writ large, there are also more celebrity groundhogs than you may be aware; although Punxsy Phil is the most prominent, plenty of states have them. Georgia boasts General Beauregard Lee; Ohio, Buckeye Lee; North Carolina celebrates Groundhog Day with Sir Walter Wally; and Alabama holds Smith Lake Jake to be the true authority on winter’s end. Montana has three: Warren Whitefish, Dayton Dennis, and Moose City Moses. Wiarton, Ontario has a whole festival surrounding the albino groundhog Wiarton Willie, which even features a hockey tournament.
There’s even a song about it, “Oh, Murmeltier” (sung to the tune of “Oh, Tannenbaum”) for which professor and marmot scholar K.B. Armitage of the University of Kansas has written English lyrics:
“Oh Whistlepig, oh Whistlepig,
We celebrate your famous day.
Oh Whistlepig, to you we pray
That winter soon will go away.
We like the sun and daffodils.
We’ve had too much of winter’s chills.
Oh, marmot friend, we’re warning you,
If winter stays, you’ll be rockchuck stew!”
…which is just plain weird.
Then, we have “Groundhog Day,” one of the most enduring comedy films of recent decades. In it, a meteorologist named Phil Connors (played by Bill Murray) travels to Punxsutawney to cover the Groundhog Day event. While there, he gets stuck in a recursive feedback loop, in which February 2nd is replayed over and over, while he tries to break the loop and move on to February 3rd (and get the heck out of Punxsutawney).
All comedy hijinks aside, movies are ripe for classification. Genres, while easily arguable, are the broadest way by which we classify them. In the case of “Groundhog Day,” it’s a comedy, but we also have drama, horror, etc. Sometimes, such as in this case, the classification is fairly obvious, but some films rightly belong to multiple genres, such as horror-comedies, or dramedies (a term that I personally despise, but it’s out there in common use).
Then, for some movies, we sub-classify by the film’s content or style. Film noir, for instance, isn’t a genre of its own; they’re dramas, but they’re particular kinds of dramas with a specific tone and stylistic touch. If somebody wants to watch something of that nature, it’s much smarter to search for “film noir” than to try wading through the thousands of “dramas” that have been released in the century-plus of cinema—and would thereby be returned in an online search.
But we classify movies in ways other than genre, as well. The MPAA rating system is designed to tell consumers whether the movie is suitable for their age group or comfort level. Sometimes, we classify by their overarching plot, such as the biopic, the road movie, or the coming-of-age film, independent of genre. One can classify them by country of origin, or level of the movie’s budget, or really any way at all.
But let’s go back to “Groundhog Day” and the recursive feedback loop in which the main character gets stuck. It’s funny when it happens to Bill Murray, but it can be devastating to taxonomy. Say, for instance, you have a taxonomy with a top term of Business. A sensible narrower term under this could be Risk. That could be used for any number of kinds of risk, but in this case, the taxonomist adds a narrower term of Risk Management. Under that, one could place Insurance, which easily falls under Risk Management. So far, everything looks just right
Then, somebody comes along to screw around with the taxonomy, and looks at Insurance without looking at the broader terms first. It’s easily arguable that under Insurance, if one wasn’t paying attention, could go Risk Management—of which of course a primary topic is Risk.
When that happens, you get this:
Recursions of this kind are the taxonomic equivalent of what happens in “Groundhog Day,” and it’s not good, or even funny. You’ll go on forever in this loop, getting nowhere and draining system resources at an increasing pace.
So today, we can all have a laugh at a movie, watch some hockey, and gather around to see a groundhog (or whatever you want to call it) leave its burrow, all because of Groundhog Day. But stay warm, because (spoiler alert) there is absolutely six more weeks of winter to come.
Access Innovations, Inc. is pleased to announce an exciting change to its corporate structure, a change designed to increase revenue and maximize the considerable talent of its staff.
Bob Kasenchak has now shifted into the role of head of Product Development. He started at Access Innovations in 2011 and succeeded so thoroughly in shepherding projects from initial lead to completion, as well as building a presence in the marketplace, that the company decided to leverage his talents to help develop new product offerings (such as the forthcoming Ontology Master). He will also be helping deliver exciting new projects to the company. Margie Hlava, President of Access Innovations, stated, “Having worked in production for the last two years, Bob is uniquely suited to take on product development on the cutting edges of information, including ontology implementation, linked data, text mining, and text analytics, which build very effectively on thesauri and taxonomies we have so widely implemented as a firm.”
Bob remarks, “This is the best fit for my combination of skills, and I look forward to working on projects with clients and within the company. I am especially looking forward to projects that will make information more easily available and expose it to its full potential through linking, mining, and stronger search leveraging of the actual content for a better understanding of that content and to support management decisions with content-based facts.”
Bob started as a project manager at Access Innovations, providing oversight and support of editorial projects at the company. The projects that he led involved thesaurus creation and development, as well as the development of indexing rule bases that were associated with those thesauri. He handled a wide range of customer specifications and communications. Bob has led taxonomy development and other projects for JSTOR, McGraw-Hill, Wolters Kluwer, the American Society for Civil Engineering (ASCE), Engineering Research Education (ERE), the American Association for the Advancement of Science (AAAS) and the U.S. Mine Safety and Health Administration (MSHA).
Bob attended St. John’s College, the New England Conservatory of Music, and the University of Texas at Austin, completing his master’s degree in theoretical studies and doctoral work in music theory. He lists his interests as tea, music, design, philosophy, and literature. He is married with one cat.
About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
The biggest week of the year at Access Innovations is almost upon us. Every year, we present the Data Harmony Users Group (DHUG) meeting, where our esteemed clients come from all over the nation to meet and learn from the people who built the software and use it on a daily basis. Right about now, there starts to be a lot of buzz around the office. There are a lot of people coming to Albuquerque for this, and everyone here is pretty excited to swap ideas with them, because they’ve come up with some interesting uses of our software, things that have made us better in the process.
Now, I haven’t been in the taxonomy game as long as most of the people here, so like its attendees, DHUG meetings are brand new to me. I don’t know exactly what to expect, but there some things that I’m definitely looking forward to seeing. The workshops that we’re conducting for the attendees will certainly be interesting and informative for a newbie like me, but the people I’m most anticipating are our guest speakers. These are people with different perspectives who are removed from the office echo chamber, which helps breathe fresh life into taxonomies.
This year, we have great guests who are gracious enough with their time to discuss their experiences with Data Harmony software and how they use it within their organizations. Its applications are broad, and each case study is unique, so what sorts of things am I going to hear about?
Kicking off these case studies are Sharon Garewal and Ron Snyder from JSTOR, one of the largest and most respected shared digital libraries in the world. We’ve done a lot of work with them and, this year, they’re launching the JSTOR Sustainability Collection and discussing it at DHUG 2015.
This interdisciplinary collection is composed of journals, reports, and working papers from the realms of academic publishing, scholarly societies, industry groups, research institutes, and universities to look at how the environment, human activities, and industry can be made sustainable in the long term. This has become an increasingly important issue, and they will discuss how the JSTOR Thesaurus, which was built using Data Harmony, makes crossing through the many fields of study a fairly straightforward affair.
One of the really interesting things about taxonomies and modern data analytics is how indexing can be leveraged to see information that would have taken a mountain of time and effort to figure out before. That’s precisely what Helen Atkins of the Public Library of Science (PLOS) will discuss with attendees in her talk, “The Fate Predictor Project.”
PLOS ONE, their international, open-access, online journal, has semantically enriched their content recently. Using the metadata that got extracted, they were able to see statistics about acceptance and rejection of papers. Using this data, along with data about country of origin, author, number of authors, etc., they are able to predict with accuracy whether a currently submitted paper will get accepted or rejected. That doesn’t take away the need for peer review, but knowing what kinds of things flag often for rejection will be able to save the PLOS editors huge amounts of time.
This is just one example of how sophisticated data usage can open eyes to otherwise unseen patterns. Marketing companies use it to see buyer patterns, leading to all those advertisements directed to individuals. This is how the Internet of Things will work, so that your refrigerator knows what resides inside and for how long, and can recommend recipes, keep your shopping list, and tell you when your milk has gone sour. Maybe its biggest current application is in security, where it’s being used in myriad overt and covert ways. This is right in line with the kind of semantic enrichment that Access Innovations does.
The talk that I’m most interested in will come from Kevin Ford of MarkLogic. His presentation, “Implementation of Taxonomy Triples from Data Harmony Exports,” will explore how companies can convey more accurate information, make data-driven decisions, and reduce risk by taking content from documents and data and combining them with RDF triples into a single architecture. By enabling search across different kinds of information from many sources, this kind of architecture can help users glean greater insights, and will help customer bases quickly and accurately mine knowledge from the data.
Ontologies are taking an increasingly prominent place in the world of semantics, and many believe that their use will take a big step toward genuine artificial intelligence. How far off that might be is certainly up in the air, but it’s presentations like this one that will start to reveal how it might work, if not when it might work.
These aren’t the only presentations at DHUG 2015. There will be more case studies from our users, as well as panels by the highly knowledgeable staff from Access Innovations. Those, in conjunction with meeting new people over great food and conversation, are going to make February 16-20 a pretty great week.
Later this week is January 18th, which for taxonomists is notable for two things: 1) it’s Thesaurus Day; and 2) it’s the birthday of Peter Mark Roget. This double occurrence is no coincidence. We may consider Doctor Roget to be the inventor of the thesaurus (or at least one of its pioneers), and a person whose birthday is cause for taxonomists’ celebration.
Yes, this is the man who compiled the first “Thesaurus of English Words and Phrases.” He started writing it in 1805 but didn’t have it published until much later, in 1852. The full title of the first edition was Thesaurus of English Words and Phrases, Classified and Arranged so as to Facilitate the Expression of Ideas.
Did you catch the “Classified” part of the title? And the “Arranged”?
Most people think of Roget’s thesaurus as a simple list of words and their synonyms. This is understandable, as some of the more recent synonymies that include “thesaurus” in their titles really are just strictly alphabetical lists of words, annotated with some synonyms. Taxonomists sometimes consider Roget’s synonym resource to be much different than modern taxonomic thesauri. After all, hasn’t it always lacked any sort of classification scheme?
No, no, no.
As much of a habitual list maker as Roget was (since he was eight years old, in fact), he recognized that the full potential of a lengthy vocabulary could not be achieved unless there was some sort of categorization or classification of the list entries. Classification was an intrinsic part of Roget’s compilation of synonyms throughout its long development.
As he explained in the preface to the first edition of the Thesaurus of English Words and Phrases:
“It is now nearly fifty years since I first projected a system of verbal classification similar to that on which the present work is founded. Conceiving that such a compilation might help to supply my own deficiencies [as a writer], I had, in the year 1805, completed a classed catalogue of words on a small scale, but on the same principle, and nearly in the same form, as the Thesaurus now published. I had often during that long interval found this little collection, scanty and imperfect though it was, of much use to me in literary composition, and often contemplated its extension and improvement; but a sense of the magnitude of the task, amidst a multiple of other avocations, deterred me from the attempt. Since my retirement from the duties of Secretary to the Royal Society, however, finding myself possessed of more leisure, and believing that a repertory of which I had myself experienced the advantage might, when amplified, prove useful to others, I resolved to embark in an undertaking which, for the last three or four years, has given me incessant occupation .” (“Roget’s Thesaurus: The Original Manuscript”)
Part of Roget’s classification efforts involved choosing a single term to represent each concept, rather than repeating each synonym in some other part of the list. This is akin to modern taxonomic thesauri, in which each concept is represented by only one term, and alternative ways of expressing that concept are indicated in the term record as non-preferred terms. Roget’s approach was oriented toward findability of a concept through the choice of words that users were most likely to associate with particular concepts.
Beyond that, though, the overall structure of the thesaurus was hierarchical. The table of contents of Project Gutenberg’s presentation of Roget’s thesaurus shows the organization of the book into six main classes, with numerous subdivisions. Wikipedia provides an “Outline of Roget’s Thesaurus” that shows the hierarchical depth to seven levels; this resource also includes links from many of the categories to relevant Wikipedia articles, as does the related Wiktionary resource “Appendix: Roget’s thesaurus classification”.
Roget crafted the thesaurus categories and subdivisions according to principles set out by some eminent philosophers, as explained in the Wikipedia article on “Roget’s Thesaurus”:
“Each class is composed of multiple divisions and then sections. This may be conceptualized as a tree containing over a thousand branches for individual “meaning clusters” or semantically linked words. These words are not exactly synonyms, but can be viewed as colours or connotations of a meaning or as a spectrum of a concept. One of the most general words is chosen to typify the spectrum as its headword, which labels the whole group.
“Roget’s schema of classes and their subdivisions is based on the philosophical work of Leibniz (see Leibniz—Symbolic thought), itself following a long tradition of epistemological work starting with Aristotle. Some of Aristotle’s Categories are included in Roget’s first class “abstract relations”.”
So was Roget an inventor? An originator? A pioneer? Consider these eclectic accomplishments:
- He invented the log-log slide rule, which greatly simplified the exponential and root calculations.
- He designed a pocket chessboard and invented several chess problems.
- He made insightful observations about the perception of motion, thus contributing to the development of mechanical animation devices and, more importantly, to the early development of cinema.
- He helped found the wonderfully named Society for the Diffusion of Useful Knowledge.
- He was a co-founder of the Medical and Chirurgical Society of London, the forerunner of the Royal Society of Medicine.
- He was the first Fullerian Professor of Physiology at the Royal Institution.
- He helped establish the University of London.
- He compiled Roget’s Thesaurus, which writers still use to perfect their prose.
- He developed a classification approach that set an example for modern taxonomists and thesaurians.
Yes, I think we can conclude that Peter Mark Roget was an inventor, an originator, and a pioneer. And a thesaurian, of course. And yes, a taxonomist.
All good reason to celebrate his birthday on Thesaurus Day!
Barbara Gilles, Taxonomist
Access Innovations, Inc.
Access Innovations, Inc. is proud to announce the publication of The Taxobook, a three-volume series on taxonomies and thesauri, written by Marjorie M.K. Hlava, president of Access Innovations. The three volumes are part of a larger series, Synthesis Lectures on Information Concepts, Retrieval, and Services, edited by Gary Marchionini, Dean of the School of Information and Library Science, University of North Carolina at Chapel Hill, and published by Morgan & Claypool Publishers.
Volume 1, The Taxobook: History, Theories, and Concepts of Knowledge Organization, introduces the foundations of classification, covering theories from the ancient Greek philosophers to modern thinkers. This volume also includes a glossary that covers all three volumes.
Volume 2, The Taxobook: Principles and Practices of Taxonomy Construction, outlines the basic principles of creation and maintenance of taxonomies and thesauri. It also provides step-by-step instructions for building a taxonomy or thesaurus and discusses the various ways to get started on a taxonomy construction project.
Volume 3, The Taxobook: Applications, Implementation, and Integration in Search, covers putting taxonomies into use in as many ways as possible to maximize retrieval for users.
“This book has been a labor of love for me,” said Ms. Hlava. “I believe firmly in the value of taxonomies and their place within information systems, and I have wanted to share my thoughts with a larger audience for some time. I hope these books will contribute to a better understanding of the different ways taxonomies can be implemented and why information management professionals should embrace them.”
“Margie Hlava’s lectures on taxonomy pack a lifetime of experience creating vocabularies for corporations and organizations into narrative and case studies that will delight researchers and teachers and inspire students,” remarked Gary Marchionini. “Her love of language and organizational structure comes through in every chapter of the work.”
“It is our pleasure to have Margie Hlava as a Morgan & Claypool author!” commented Diane Cerra of Morgan & Claypool. “She and her Access Innovations team have made a much needed contribution to our publishing program, and to the community at large. These volumes will serve practitioners for many years to come. In addition, Margie and her group are a joy to work with: a personable, responsible, and responsive team that enabled us to quickly produce this collection.”
The books are available through Morgan & Claypool Publishers in either online or print format.
About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
Photo, by Wikimedia user UFA66, of an artwork titled “Hip, Hip, Hurrah!” by Danish painter P.S. Krøyer, 1888, at the Gothenburg Museum of Art in Sweden. CC BY-SA 3.0. http://commons.wikimedia.org/wiki/File:Hipp_hipp_hurra!_Konstn%C3%A4rsfest_p%C3%A5_Skagen_-_Peder_Severin_Kr%C3%B8yer.jpg
As we anticipate the approaching new Gregorian year, those of us who are taxonomists are looking forward with renewed anticipation to the taxonomic challenges that certain kinds of words bring. Take “glass”, for example.
“Glass” is one of those words that contain an abundance of possible meanings. Ironically, this poses the potential for ambiguity. What makes this particular situation even more ironic is that this ambiguity clouds the very clarity that the word often symbolizes. Ambiguous words are tricky to work with in constructing and developing taxonomies and thesauri. Moreover, they make the writing of effective indexing rules challenging. Taking care in the crafting of those rules becomes all the more important, because of the need for disambiguation.
Another basic challenge is posed by words representing concepts that fit (sometimes neatly, and sometimes, not so neatly) into a variety of categories, and that can be subdivided (again, sometimes neatly, and sometimes, not so neatly) into a variety of sub-categories or sub-concepts. Glass is one of those words, too.
When we raise a glass, or see one as half full or half empty (your choice), that glass is a drinking vessel. A looking glass is (or at least used to be) a mirror. When we venture into the plural form, “glasses” are often understood to be eyeglasses. And search engines have their own opinions, it seems; when I decided to google “glass”, the search engine presented a very different interpretation in the top hit, reflecting a certain bias by a well-known search engine. Glass also has its artistic side, especially with stained glass. Getting down to basics, and to what many of the various meanings have in common, glass is a kind of material.
A glass – a drinking vessel
Photo by Derek Jensen, http://commons.wikimedia.org/wiki/File:Glass-of-water.jpg
Looking glass – a reflection on ourselves
Illustration by John Tenniel, for Lewis Carroll’s Through the Looking-Glass, http://commons.wikimedia.org/wiki/File:Aliceroom3.jpg
Magnifying glass – the bigger to see with
Glasses/eyeglasses/spectacles (Spectacles? There’s some big ambiguity there.) – the better to see with
Google Glass – for when what you normally see (and hear) isn’t enough by itself
Photo by Dan Leveille, http://commons.wikimedia.org/wiki/File:Google_Glass_photo.JPG, CC BY-SA 3.0
Stained glass – For when stained means pretty
Photo by Kate Jewell, of a stained glass window designed by Ronald Whiting and constructed by Chapel Studios of Kings Lynn, http://www.geograph.org.uk/photo/1461459, CC BY-SA 2.0
Glass – a kind of material (getting down to basics)
But what kind of material? Glass as a material presents several categorization challenges. For one thing, artists, art collectors, historians, engineers, and materials scientists deal with a large variety of glass materials. What most of us think of simply as glass, scientists classify as soda-lime glass. The Wikipedia category page on glass types lists forty-two separate articles, and it is by no means exhaustive.
And then there is the matter of state of matter. Is glass a liquid or a solid (presumably at room temperature)? The consensus has gone back and forth. For a long time (before the complications of solid-state physics), it was assumed to be a solid. Then some scientists speculated that it was a supercooled liquid. Evidence was supposedly provided by antique windowpanes, many of which were thicker at the bottom; this supposedly proved that the glass was flowing downward, ever so slowly. This persistent urban myth was solidly debunked only twenty-five years ago, in an article in the Journal of Chemical Education, although it still pops up in textbooks and science classes. Later, in 2000, the same journal published an article with the wonderful title “Glass Doesn’t Flow and Doesn’t Crystallize and It Isn’t a Liquid.”
Well, if glass doesn’t crystallize, it isn’t a solid, either, at least not in the traditional solid-state physics sense. So what is it?
Here’s one answer (of sorts) that I found on a University of California, Riverside webpage:
“There is no clear answer to the question “Is glass solid or liquid?” In terms of molecular dynamics and thermodynamics it is possible to justify various different views that it is a highly viscous liquid, an amorphous solid, or simply that glass is another state of matter that is neither liquid nor solid. The difference is semantic. In terms of its material properties we can do little better. There is no clear definition of the distinction between solids and highly viscous liquids. All such phases or states of matter are idealisations of real material properties. Nevertheless, from a more common sense point of view, glass should be considered a solid since it is rigid according to everyday experience.”
Well, that was kind of iffy. But then I found that a team of scientists recently tried to settle the question by performing a series of experiments and measurements on a chunk of amber (yes, amber turns out to be a kind of glass) that was twenty million years old.
Photo of Dominican amber by Brocken Inaglory, http://commons.wikimedia.org/wiki/File:Ant_in_amber1.jpg, CC BY-SA 3.0
Their report, at http://www.nature.com/ncomms/journal/v4/n4/full/ncomms2809.html, has been summarized in an io9 blog post by George Dvorsky as follows:
“The team performed a series of calorimetric and stress relaxation experiments on the Dominican amber. They measured its relaxation times (intermolecular rearrangements) at various temperatures, including above its fictive temperature. The team observed that the amber relaxation times did not diverge — meaning that it couldn’t possibly be a kind of fluid.”
Okay, so it’s a solid, I thought. But then, I came across a fairly recent news article with this comment:
“The solids catalog used to be pretty straightforward. Solid stuff was either a crystal or a glass. Crystals fill up space with atoms or molecules in specific, fairly rigid patterns. The positions of the atoms are fixed such that if you take any section of pure crystal and slide it up, down, in, out or sideways a given distance, it will fit perfectly in the new position. That’s translational symmetry. You can also spin the crystal through certain angles and the atoms also will line up; that’s rotational symmetry.
“Glasses have neither symmetry. They’re just a random arrangement of their components, as if you’d taken a liquid and suddenly frozen everything in place without giving the atoms a chance to get in order. Which, in fact, is how metallic glasses are made.”
Okay, so glass isn’t a liquid, and it isn’t a crystalline solid. So what is it, really?
The modern scientific consensus seems to be that glass is a special kind of non-crystalline solid. As explained in Wikipedia’s “Amorphous solid” article:
“In condensed matter physics and materials science, an amorphous (from the Greek a, without, morphé, shape, form) or non-crystalline solid is a solid that lacks the long-range order characteristic of a crystal. In some older books, the term has been used synonymously with glass. Nowadays, “amorphous solid” is considered to be the overarching concept, and glass the more special case: A glass is an amorphous solid that exhibits a glass transition. Polymers are often amorphous. Other types of amorphous solids include gels, thin films, and nanostructured materials.”
So non-crystallinity is a defining characteristic of glass. Here we encounter another irony, and another potential source of ambiguity. Fine drinking glasses used to be made of “lead crystal”, but paradoxically, lead crystal is actually a kind of glass, and not one bit crystalline. Nor are its modern replacements that replace the lead oxide with a safer oxide, and that are known as crystal glass or lead-free crystal. So when you’re raising a fancy glass, be assured that it’s really a glass (unless it’s plastic or some such).
Let’s raise that glass to the wonderful richness and complexity of words! Cheers!
Barbara Gilles, Taxonomist
The bell struck twelve.
The Phantom slowly, gravely, silently, approached. … It was shrouded in a deep black garment, which concealed its head, its face, its form, and left nothing of it visible save one outstretched hand. But for this it would have been difficult to detach its figure from the night, and separate it from the darkness by which it was surrounded.
(From A Christmas Carol in Prose; Being a Ghost Story of Christmas, by Charles Dickens.)
And so it is with emerging concepts, those concepts whose forms we can but vaguely discern at the present point in time, whose true reality lurks in the future.
As taxonomists, we have a responsibility to discern those future concepts, although they may still be invisible to most. We can save the various expressions of those concepts in search logs from being rejected from consideration for a vocabulary simply on account of their as yet infrequent appearance. In a taxonomy or thesaurus, we can provide labels that will consolidate the indexing for a concept for which researchers have not yet settled on a name. In some cases, especially with widely used vocabularies, we can perhaps determine the name by which a concept will be known on a standard basis.
This role in itself is one of the emerging responsibilities for taxonomists, thanks to the rapid advances in science and technology. In “What Next, Taxonomy?” (posted on The Taxonomy Blog on November 4, 2011), taxonomist Marlene Rockmore concludes that taxonomists need to deal with emerging technologies in a variety of ways, including collection of relevant content:
“So what next, taxonomy? What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics and challenges – how to distribute and decentralize (localize) while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data. Phew! That’s a challenge. Taxonomists have a chance to build relationships not only between terms, but with stakeholders on the way to a compelling, visualized, multidimensional content strategy. Good luck.”
This challenge has been growing in step with the rapid advances in science and technology. One example among the many advances in science is the ability of biologists to recognize new and emerging species, as well as life forms that have existed for a while but were formerly overlooked. The Live Science page Newfound Species observes:
“Science has identified some 2 million species of plants, animals and microbes on Earth, but scientists estimated there are millions more left to discover, and new species are constantly discovered and described. The most commonly discovered new species are typically insects, a type of animal with a high degree of biodiversity. Newly discovered mammal species are rare, but they do occur, typically in remote places that haven’t been well-studied previously. Some animals are found to be new species only when scientists peer at their genetic code, because they look outwardly similar to another species — these are called cryptic species. Some newfound species come from museum collections that haven’t been previously combed through and, of course, from fossils.”
Even the humble hosta has its own emergings, due in part to technological and social advances in communication.
A Rookie’s Guide to Hostas, Hostas, Hostas observes:
“In past centuries, we used to talk about people “discovering” new species of plants. What this usually meant was that European, English or American plant explorers traveled to remote parts of the world and found plants that were new to them. Now, of course, we know that local people in those other parts of the world were often quite familiar with these plants all along. Many of the so-called new plants, including hostas, have been found in local paintings and documents produced long before the Westerners started poking around. In more recent times, however, with better communications, we more universally share the knowledge of different horticultural communities.”
As far as actually emerging species are concerned, evolutionary biologist Rob DeSalle of the American Museum of Natural History has indicated the continuing nature of species emergence:
“Identifying a new species as it emerges is the holy grail of evolutionary biology. … Species must be emerging someplace on earth. The best places to look would be places with lots of species, like rain forests, and islands, because isolation opens new niches.” (In “Q & A; Emerging Species” by C. Claiborne Ray, published June 17, 2003 in The New York Times)
The ScienceDaily website has a webpage dedicated to news about “new” species of plants and animals. While most of these will escape public awareness, Time Magazine has sifted through the barrage of information to identify the Top 10 New Species” of 2014. According to author Bryan Walsh, “The collection includes a dragon tree, a skeleton shrimp, a gecko and a microbe that likes to hang out in the clean rooms where spacecraft are assembled.”
Speaking of top things of 2014, and moving on to emerging technologies, the Massachusetts Institute of Technology’s online Technology Review has published a list of 10 Breakthrough Technologies 2014.” The list includes such things as brain mapping, genome editing, and agile robots.
The Wikipedia article “Emerging technologies” emphasizes the role of technology convergence in the emergence of new technologies. The article mentions an acronym of particular interest to those in the information technology world:
“NBIC, an acronym for Nanotechnology, Biotechnology, Information technology and Cognitive science, is currently the most popular term for emerging and converging technologies, and was introduced into public discourse through the publication of Converging Technologies for Improving Human Performance, a report sponsored in part by the U.S. National Science Foundation.”
Wikipedia also has a “List of emerging technologies” containing brief descriptions of “some of the most prominent ongoing developments, advances, and innovations in various fields of modern technology.” More than two hundred emerging technologies are listed.
There are and will continue to be many new and emerging concepts in science, technology, and other fields. Taxonomies can help define the terminology for those concepts. This is perhaps most readily evident for genus-species-subspecies-etc. names, whose designation is the territory of the biological taxonomist, or the biologist temporarily acting as taxonomist. Elsewhere, taxonomists can identify predominant labels and the occasionally used synonyms, and then use that information to add appropriate preferred terms and non-preferred synonyms to a vocabulary. They can also add definitions and scope notes. The skills of the taxonomist can bring clarity to formerly mysterious concepts and nomenclature.
No fog, no mist; clear, bright, jovial, stirring, cold; cold, piping for the blood to dance to; Golden sunlight; Heavenly sky; sweet fresh air; merry bells. Oh, glorious! Glorious!
So don’t be scared of the ghosts of future concepts. Think of them as true spirits of the future, taking flight with the benefit of well-chosen terms and synonyms in a taxonomy or thesaurus.
Every time a new term rings true, an emerging concept gets its wings.
Barbara Gilles, Taxonomist
Originally posted December 30, 2013. Updated for 2014.
A typical thesaurus construction project for a scholarly publisher, policy clearinghouse, medical institution, or any other client with a technical vocabulary involves the input of a number of stakeholders.
At a certain point—usually about three-quarters of the way through the construction of the taxonomy—it’s vital to get the input of subject matter experts (or: subject-matter experts, SMEs, or domain experts). These SMEs generally work for the client—often as technical editors, but just as frequently in other capacities—although we at Access Innovations are occasionally asked to provide them.
Figure 1: An SME in his native habitat
In either case, the SME’s job is to review the emergent thesaurus from the perspective of an expert in the field. Even the best taxonomists can garner a finite amount of knowledge through research, especially in complex, complicated, or highly sophisticated domains; without a PhD in particle physics, your branch on quantum field theory will only be so comprehensive.
The SME is, ideally, well acquainted with the state of research in the field, conversant with the current hot topics, steeped in the journal literature, and familiar with both the stable and fluctuating terminologies used by other practitioners in their discipline.
A vocabulary that’s missing important concepts or terms covering exploding areas of research will not present a good public-facing website for your client’s organization. Imagine browsing a computer science taxonomy that’s missing the term “Big Data” in 2014—would you assume the parent of that vocabulary is competent and up-to-date? Therefore, the SME is vitally important, as the end users who will be searching for content in, for example, a large repository of physics papers will very likely be similar to the SME engaged in the review.
Dealing with SMEs can be an excellent and productive experience; unsurprisingly, this process also involves a number of challenges.
(1) [Most] SMEs are not taxonomists: you have to explain it to them.
Your physicist, oncologist, or social scientist SME (unless they happen also to have attended library school) is probably not intimately familiar with tagging, information retrieval, or taxonomies—at least, not nearly to the extent that they understand their chosen area of study.
Don’t assume that whoever wrangled the SMEs for a call with you explained the project, what’s expected of them, or anything else. Be prepared to give a very short expository talk on why the taxonomy matters, what it will do, and why they’ve been asked to participate.
Figure 2: (Probably?) not a taxonomist
The bright side is that SMEs tend to be intelligent, so they’ll pick it up. Just don’t expect to be able to dive right into the hierarchy without explaining the background of the project, the purpose of their input, and what you expect them to do.
This last point is especially important. Explain clearly what you do and don’t expect the SMEs to provide feedback on. You definitely want:
(a) Input on missing terms/concepts. What needs to be added?
(b) Input on NPTs. What other names, especially acronyms, can supplement the existing terms? What is the fancy new name for a term that everyone’s using since Dr. Johnson wrote his famous 2009 paper?
(c) Input on term placement and hierarchical organization. Do the branches make sense? Are the top- and second-level terms a good outline of the field? (More on this below.)
(2) Some SMEs will be more engaged than others.
Almost predictably, some of the SMEs on a given project will take the minimum amount of time to provide feedback (there may even be some initial grumbling), and you’ll never hear from them again. That’s okay; accept the feedback (asking questions if necessary) and let them go back to their job. For many SMEs, their involvement is an extra assignment over and above their normal workload, so it’s understandable.
Invariably, though, you’ll get at least one SME who gets it—who is excited and engaged and thinks the thesaurus is cool and wants to help. This is exactly what you’re looking for, so make sure to match their level of engagement and enthusiasm. Once they get used to thinking about the taxonomy, they’ll be an invaluable resource.
(3) Disagreements between SMEs and taxonomists
See (1), above. SMEs are not used to thinking like taxonomists, so their ideas about term placement, term formation, and warrant are probably not influenced by things like the ISO and ANSI/NISO standards governing thesaurus construction. They will also not be very sensitive to ambiguous terms, and may be familiar only with the portion of relevant content covering their particular sub-area of expertise. You’ll want to watch out for a few specific issues:
(a) Literary warrant. SMEs will want to add terms covering their entire field, not just the terms required to index the content in question. When considering terms suggested by SMEs, remember to check the content for warrant; reject any terms that don’t meet your criteria.
(b) Term placement. SMEs will have ideas that make plenty of sense to them, but violate (for example) the all-some rule. Stand your ground here; no matter how you cut it, “dog food” is not a dog. Be ready to suggest using associative relationships (RTs), and explain why they’re helpful.
(c) Top- and second-level terms structure. This requires a little more flexibility on the part of the taxonomist; while “Particle physics” is clearly a child of “Microphysics” (as it’s a sub-discipline of that field), if your physicist SME insists that it’s a major enough topic to be a Top Term, you should listen.
(4) Spec creep: how much time for SME review?
A taxonomy, as we know, is a living document that’s subject to constant revision and review, but at a certain point you have to call it complete and deliver the project. This is where your enthusiastic SME can cause problems; they will want to make tweak after tweak ad infinitum. Set a schedule for SME reviews, including a timetable for providing material, getting feedback, integrating that feedback, and returning the revised taxonomy. One more round of changes is acceptable, but if you allow for more, it’ll never end.
Try to allow for about eight hours to process the feedback from each SME. Each suggestion, addition, term move, and deletion needs to be considered carefully, so make sure to allow your taxonomists time to properly weigh SME input.
(5) How many SMEs do you need?
This really depends on the size—and, moreover, scope—of the vocabulary. A thesaurus covering All of Science will require more reviewers than one on Acoustics. If you have many SMEs, try to keep them from stepping on one another’s toes.
(6) Tips on presenting the taxonomy and soliciting feedback
Ideally, you can provide a hierarchical display (naturally, a read-only version) that the SMEs can access; this allows them to see the entire term record, including non-preferred terms (NPTs), related terms (RTs), and multiple broader terms (BTs).
In conjunction with a hierarchical view (if possible), the best mechanism for SME feedback is [still] a spreadsheet with a hierarchical display of terms. (If you can, provide each SME with just the branches of the hierarchy that they’re being asked to review.) A spreadsheet allows the SME to make comments, changes, suggestions, additions, and other input using colors, adding cells, or leaving remarks in adjacent fields. Make sure that the taxonomist can see the feedback at a glance, so they don’t have to spend time poring over the document looking for comments.
(7) You don’t have to integrate every single comment–you’re the filter.
On receiving SME feedback, the taxonomist’s job is not to make every change suggested by the SMEs; rather, the SME’s input is raw material for the taxonomist to consider using. In other words, the SME’s expert opinion has to be run through the taxonomist’s filter to accept, reject, or re-format for inclusion in the taxonomy.
Figure 3: A grain (or more) of salt
On the other hand, have respect for the SME’s expertise. Be flexible when you can, and try to accommodate the SME’s point of view wherever possible. Oftentimes the SME will, for example, make a suggestion to add a term that already exists phrased another way (a conceptual duplicate); this can trigger the addition of an NPT, changing the preferred version of the term, or some other action—one that was not intended, but nevertheless turned out to be useful.
Figure 3: Best. Present. Ever.
SMEs can be a great gift for any taxonomy project—if you have a strategy, provide a clear set of expectations, and maintain good communication throughout the process.
Bob Kasenchak, Project Coordinator
There are many pieces that need to fit together to implement a semantic strategy. Just as the instruments in an orchestra combine to provide a beautiful sound, a semantic strategy can provide a wonderful platform for your user community to enjoy. To really make it work, however, you need to understand each instrument and how to organize them for the best effect.
Register today and plan to attend the webinar – Orchestrating an Effective Semantic Strategy Webinar on December 16 from 11:00 a.m. to 12:00 p.m. (Mountain Time). This webinar is being facilitated by our own Margie Hlava, who will outline the full process from beginning to end — from article creation by the author to the eventual presentation on the distribution platform. If you have been concentrating on semantics — metadata strategy, taxonomy creation, tagging in XML or another format, connecting to a content repository, and integrating display and search on a platform — this is the webinar for you.
Space is limited so register today!
Marjorie M. K. Hlava is a popular speaker on topics involving information organization, semantic enrichment, and taxonomy and thesaurus creation. She is the author of over 200 articles and three books. Parts 1 and 2 of “The Taxobook”, her most recent book, are available now via Morgan & Claypool Publishers. Part 3 is in production now.
Melody K. Smith
Sponsored by Access Innovations, the world leader in indexing and making content findable.
« Previous Page — Next Page »