Data Analysis in a Standards-Challenged World
We deal pretty heavily around here in words, what they mean, and how they’re used. It should go without saying, but it’s a fundamental […]
We deal pretty heavily around here in words, what they mean, and how they’re used. It should go without saying, but it’s a fundamental […]
As we anticipate the approaching new Gregorian year, those of us who are taxonomists are looking forward with renewed anticipation to the taxonomic challenges that certain kinds of words bring. Take “glass”, for example. “Glass” is one of those words that contain an abundance of possible meanings. Ironically, this poses the potential for ambiguity. What makes this particular situation even more ironic is that this ambiguity clouds the very clarity that the word often symbolizes. Ambiguous words are tricky to work with in constructing and developing taxonomies and thesauri. Moreover, they make the writing of effective indexing rules challenging. Taking care in the crafting of those rules becomes all the more important, because of the need for disambiguation.
As taxonomists, we have a responsibility to discern those future concepts, although they may still be invisible to most. We can save the various expressions of those concepts in search logs from being rejected from consideration for a vocabulary simply on account of their as yet infrequent appearance. In a taxonomy or thesaurus, we can provide labels that will consolidate the indexing for a concept for which researchers have not yet settled on a name. In some cases, especially with widely used vocabularies, we can perhaps determine the name by which a concept will be known on a standard basis.
We all know Roget as the word man, but he clearly was one brilliant man who loved all sorts of things, words among them. Jen Bryant and Melissa Sweet captured that in their latest release, The Right Word: Roget and His Thesaurus. The authors approach presents Roget’s lifelong passion for word lists as well as much more.
I like games, all types but especially word games. My love of Scrabble was developed early on as my mother and I would play for hours on end. My siblings had no interest so it was quality mom and me time that we still take advantage of when time and situations allow. That doesn't happen as often as I'd like, but I do appreciate her passing on a passion for words, books and vocabulary to me.
People often ask us how much time it will take to manage a rule base with Data Harmony software. We reply with specific customer experience numbers and tell them a few hours per month of editorial time to maintain both the thesaurus and the rule base. One customer of ours, the American Institute of Physics, found that maintaining their thesaurus and rule base takes less than 15 hours per month for 2000 articles per week throughput. Another customer, The Weather Channel, manages breaking news all day long with four hours per month of maintenance.
When we (at least those of us in Greater Mexico) hear of or read about Cinco de Mayo there is no question in our minds that “Mayo” refers to the month of May. The preceding “Cinco de” (Spanish for “Fifth of”) pretty much clinches it. Of course, if the overall content is in Spanish, there might still might be some ambiguity about whether it is the holiday that is being referred to, or simply a date that happens to be the one after the fourth of May. (As in “Hey, what day do we get off work?” “The fourth of July, I think.”)
When you use a thesaurus for indexing context covering multiple disciplines, the need for disambiguation of terms is increased. This fact of thesaurus life was well illustrated in a presentation at this year’s DHUG (Data Harmony Users Group) meeting. The presentation, by Rachel Drysdale, Taxonomy Manager of the Public Library of Science (PLOS), was titled “The PLOS Thesaurus: the first year.”