Access Innovations, Inc. is pleased to announce the delivery of an extensive thesaurus for the American Association for the Advancement of Science (AAAS).
Data Harmony With a Century of Content
In the summer of 2013, AAAS, the publisher of Science Magazine and various specialty journals (such as Science Signaling and Science Translational Medicine), contacted Access Innovations, Inc. regarding a thesaurus project to index and tag their data using Access Innovations’ patented, award-winning Data Harmony software. Because the publications represent the entire spectrum of science, this presented a unique challenge to Access Innovations. “The taxonomy needed to address all areas with a great degree of specificity to improve discovery for AAAS users,” remarked Bob Kasenchak, Project Coordinator for Access Innovations.
Formed in 1848, AAAS has content dating back to the 1880s. Since then, scientific language and terminology have undergone significant changes. With a database of 250,000 articles spanning 120 years of science, it is crucial for AAAS to implement granular topical indexing and provide a mechanism for browsing the full corpus of electronic content. The implementation of Data Harmony will greatly improve the accuracy of the free text searches used on the AAAS website.
A Cooperative Effort
Access Innovations has now delivered the thesaurus and continues to streamline content for AAAS. Due to the extremely broad range of topics in their content, AAAS has access to an unusual number of in-house subject matter experts (SMEs). Access Innovations worked in close conjunction with the SMEs to review and refine the thesaurus. Will Schweitzer, Business Director of AAAS, notes, “We’re excited to use our new thesaurus throughout all our platforms and products. We’ll first use the thesaurus to make our peer-review processes more efficient and to improve our readers’ browsing and search experience.”
As AAAS begins leveraging the thesaurus for website navigation, Access Innovations continues to refine the index and manage the taxonomy.
About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com
Access Innovations has extensive experience with Internet technology applications, master data management, content-aware database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world. Access Innovations: changing search to found since 1978.
Election Day has once again come and gone. Incumbents are ousted, bills are passed, and the political ads have finally stopped, so now the fallout begins. But regardless of where you fall on the political spectrum, there’s one thing you can be sure of: lobbying groups and political campaigns have utilized Big Data to try to secure your vote and will place an increasing amount of importance on it. On a smaller level, data analysis has been happening in this realm for years, but really, it became huge during President Barack Obama’s 2012 reelection campaign.
During that campaign, Obama’s team assembled a huge staff of analysts to work with the terabytes of collected voter data. The results of their analytics included a number of strategies to target specific constituents and disseminate information and donation requests to them based on exactly the issues that matter most to them. For the voter, that not only reduces the noise in their emails, it personalizes the election and, as we’ve seen plenty of times through the years, people tend to vote for someone to whom they feel a personal connection.
People felt that connection to Obama in 2008 through his particular personality and brand of speaking; while none of that changed much over the following four years, 2012 saw him reach audiences through the use of data, as well.
As much as Big Data is being used by American politicians, they aren’t the leaders in it. In India’s elections earlier this year, the Bharatiya Janata Party (BJP) used their mountains of data to secure funds, advertise, and organize events directed toward, again, putting a personal face on the election. BJP leader Narendra Modi is now India’s prime minister and has reached out to the world using the same strategies. His six million Twitter followers would suggest that they work.
Then we have the BBC. We’ve already seen their innovative uses of Big Data and Linked Data for BBC Nature and the 2012 Olympics, but for the UK elections this past May, they devised a system by which they could aggregate their data and disseminate news and information in sophisticated, highly useful ways. In this case, it’s more about analysis to serve their own reporting, but the way richness with which they were able to deliver the news to their readers and viewers was nothing short of fantastic.
These are three strong examples of how Big Data, Linked Data, and semantic enrichment are changing the way election campaigns are conducted and covered for the better, but all three are top-down processes. By that, I mean that in all these cases, we are being directed to look at or think about particular subjects and issues. But in this increasingly interconnected online world, I’m not the only person who would rather direct myself, to tell myself what issues are important to me.
In their Olympics coverage, they proved beyond a shadow of a doubt that Linked Data can be effectively used to make knowledge easy to access for the individual. You might recognize all the members of the gold-medal winning women’s gymnastic team; we couldn’t hear enough about them during the event. What about the Russian team that won the silver though? We don’t hear much about them, but going to the BBC website would not only tell you their names, but where they’re from, what other events they’ve competed in, and much, much more. Elections are far more important than the Olympics, so why not do the exact same thing with candidates?
We know what the ballot will look like well in advance. All that would have to happen to get this process started is to lay that out on a publicly accessible website. Each candidate would have a link attached that would take the viewer to a page for the individual. From there, we could see the voting history of candidate X, other candidates who voted alongside her, links to campaign speeches, writings, and news reports on the person, as well, most likely, as many things that I can’t think up right now.
We could do the same thing with ballot measures with little additional trouble. For these, we could look at a particular measure’s history, others like it voted on previously or in other regions, reporting on it, and all sorts of statistics.
All of this is in the service of information and knowledge, which helps us as voters make more reasoned, coherent decisions. We are being monitored constantly in service of directed advertising, whether it’s in the political spectrum or elsewhere. On a personal level, I don’t really care about that, but there’s no good reason why we couldn’t have access to candidate data in an easily digestible form. The information is out there, but it’s a lot of labor for individuals to take on by ourselves. I hope the day soon comes when I’ll be able to go to a single place and learn what I need so that I make the best possible decisions while in the voting booth.
Obviously, I’m hugely impressed with how the BBC has embraced this new philosophy about they way they deliver their content. They’ve made it easy for individuals to collect knowledge in all kinds of realms. Now, they don’t have what I’m looking for, either, but between the Olympic Data Service and Vote 2014, it’s clear they have both the mindset and capability to make it happen. When will media outlets on this side of the pond follow suit? Quickly, I hope.
The rules for academic publishing really haven’t changed in centuries. Once, there was a large percentage of the populace who were skeptical of academic research, as was apparent when Philosophical Transactions of the Royal Society began its publication life in 1665. To make the system work against that pushback, the method had to be codified. As a result, access to research material was difficult to attain, to the extent that scientists as late as the 19th century actively condoned criminal behavior just to have access to corpses for study and presentation.
It took a while, but given the advances that made everyone’s life better, people eventually put more trust in scientific research, so the bodies could stay resting in the ground and scientists could do their research in relative peace. Even so, publishing was still expensive and research material hard to find. In order to facilitate and disseminate the research, then, academic publishing took on a model that made it look much like a guild system, with all the benefits for those inside and all the roadblocks for the rest.
It made sense in those days, but it seems like advances in science and technology, as well as our general faith in the goodness of those things, would have opened research material availability. However, the reality is that, at best, the system has stayed the same, even with the rise of computers, which makes publishing fast and inexpensive. Even though it’s supposed to be about the science, it has become increasingly about the revenue.
As a result, the cost of a particular journal can run into the thousands of dollars and, as everywhere else today, organizational budgets for libraries have shrunk to the point that they are having to make hard decisions about which journals to cut out of their subscription loop. That’s plain sad, because, again, it’s supposed to be about the science.
Happily, though, we are in a particularly interesting place in history, in which our use of computer technology has become so sophisticated that it makes the old system appear rather silly. As individuals, we can go online and find mountains of content on any subject of interest, teach ourselves to do virtually anything, and make sense of things that people only a generation ago hadn’t the tools to even begin.
If it’s that easy for us, shouldn’t the path also be made clear for academics, scientists, and researchers, who are the ones advancing the fields that allow us as individuals to collect so much knowledge and information? Now, there are obvious considerations to keep in mind. First, and most importantly, it’s absolutely cost-prohibitive for individuals in general to access that material, same as it is for the researcher. That’s why they affiliate with budgeted organizations that can collect and store them for use. That’s great, but organizations can’t pay the prices—as much as $40,000 for a single year of a top journal—that the publishers charge, not for all of them that they might want or need, at least.
The model has to change, and that appears to be happening as I write. Organizations like the Public Library of Science (PLOS) are in full support of open access to scientific research, while the World Wide Web Consortium (W3C) is actively engaged in creating what they’re calling the Semantic Web, a new way to look at the Internet, one that focuses on data in general rather than simply documents. This view puts entities (people, places, and things) in relationship to one another. By linking with one or many of existing datasets out there (DBpedia or Wikidata, for instance), one can access related content from around the web, while your data is added to the pile for others to access.
The greater the number of participating organizations, the better this is going to work. But I firmly believe that once people see the wealth of possibilities inherent in such a venture, their eyes will be opened to possibilities I can’t even imagine. Just look at what the BBC did with Linked Data for their coverage of the Olympics or the BBC Nature website. This stuff is absolutely amazing. BBC Nature, especially, thrills me. The deluge of information you return on a simple search for “koala” makes me want to learn everything I can about the little guys. How could anybody not want access like this?
You can’t put the genie back in the bottle, so as individuals and organizations get used to working in the Semantic Web, with all the access to information that comes with it, they’ll start to demand it everywhere. To publishers, the most important thing is always going to be the bottom line. There’s no real way to change that, except to drag them kicking and screaming into this brave new world of information exchange.
One of the biggest questions floating around every year in October is this: “What movies will you watch this month?”—or some derivative of that. All the major networks play their own hand-picked list of 25-50 “horror” movies all month long. Farms open their doors to a public that can roam around haunted barns and mazes; even the malls host their very own haunted houses (generally not advised for children). Many of these attractions are partly or fully inspired by horror films that over the generations have become staples every October.
So, what movies are you watching this month? Are you a fan of the black and white films with beautiful stars and horrible creatures? Or perhaps your taste is more modern, going for the iconic slashers from the 1980s and 1990s. Do you get more of a thrill from the horror movies that make you re-evaluate everything you thought you knew?
In the event that instead of being curled up on your couch, you and your friends are sharing the horror experience with a room full of strangers, how do you decide which films to spend your money on?
If you need some advice this Halloween, our Taxonomy of Horrors might be just the thing.
How do you choose what movies to watch with your friends late at night, curled up on the couch with a handful of popcorn stretched halfway to your open mouth? The list of horror films grows constantly, and it seems every October, new films are released into the wild jungle of the movie theaters. Our Horror Film Aficionados have exhaustively included everything scary under the sun (though not including the sun, sorry for all you solarphobes).
Maybe where the horror takes place is something that grabs you. Remote lakes or foggy forests after midnight are popular stages for terror and mystery—and of course, the abandoned cabin in the middle of the forest, sitting next to a lake is the ultimate spot for devious deeds. Old homes with history attached, basements below, and attics above should not be overlooked, either.
What creature scares you most? The person that cannot gaze upon a full moon without growing hair and ferocious canine claws surely raises the hair on the back of some people’s necks, and yet others find that the monsters of reality bring on the true fear. The psychopath that kills because of his troubled past or out of a twisted sense of morality forces some people to bolt the door and sleep with the lights on, and others still will cover their eyes when ghosts make a subtle appearance in the corner of the television screen.
Horror films come in all flavors—and levels of blood and gore—and it’s impossible to say which type will scare a person the most. Perhaps though, true horror is not always necessary. Perhaps revisiting films that scared us as children or that we even find funny is the goal. After all, Halloween should be about respecting our darker sides and enjoying our time with friends and family.
Oh, and eating tons of candy. Tons. Happy Halloween!
Daryl Loomis and Samantha Lewis
Registration is now open for the 11th annual Data Harmony Users Group (DHUG) meeting, scheduled for February 16-20, 2015 at the Access Innovations, Inc. offices at 4725 Indian School Road Northeast, in Albuquerque, New Mexico.
Access Innovations, the developer and seller of the Data Harmony product line, is hosting the meeting. Several new features and options will be unveiled during the Annual Features Update report, including an introduction to the new graphical user interface (GUI).
A full day of training on building taxonomies is included on Monday, February 16, 2015. The Annual Features Update report will be presented by Access Innovations President Marjorie M.K. Hlava on Tuesday morning from 9:00 a.m. to noon. On Tuesday afternoon and on Wednesday, Data Harmony users will present case studies detailing their implementations of the software. Two full days of hands-on software training sessions led by Access Innovations and Data Harmony staff members are scheduled for Thursday, February 19, and Friday, February 20.
The meeting also includes a Monday evening networking reception and a networking dinner on Tuesday evening. The Tuesday evening dinner will be held at the Indian Pueblo Cultural Center, a museum and educational center dedicated to preserving and perpetuating Pueblo culture and to advancing understanding, by presenting, with dignity and respect, the accomplishments and evolving history of the Pueblo people of New Mexico.
By hosting the meeting at the Access Innovations home office, the entire staff will be able to participate in discussions. “Each year, this meeting provides our members an opportunity to share ideas and address issues and methodologies with colleagues,” said Ms. Hlava. “We enjoy talking with our clients and finding out what items are on their wish lists for future software developments, and the new releases reflect those requests.”
Also, members have the opportunity to discuss technical and tactical issues with Access Innovations staff in person. “Just by sitting down together, we can work through key issues quickly and to everyone’s benefit,” said Bob Kasenchak, Production Coordinator at Access Innovations. “Even little questions that come up during these discussions can get resolved – questions that don’t seem important enough to bring up during conference calls or in email correspondence.”
To register for the meeting, go to www.dataharmony.com/dhug/regform/.
For information about planning a trip to Albuquerque for the meeting, go to www.dataharmony.com/dhug/dhugtripplanning/.
To see the provisional agenda, go to www.dataharmony.com/wp-content/uploads/2014/10/Agenda-2015.pdf.
“Time is free, but it’s priceless. You can’t own it, but you can use it. You can’t keep it, but you can spend it. Once you’ve lost it you can never get it back.” So muses American businessman Harvey MacKay.
We have no choice in the matter. Time cannot be “saved” …only spent. Our responsibility is to determine how we wish to allocate it. Otherwise, time will not only be spent but also wasted. How valuable, then, are those skills and tools that help us distribute our time in ways that we consider most useful and productive!
Shiyali Ramamrita Ranganathan proposed the fourth law of library science to be: “Save the time of the reader.” Fast and accurate retrieval of relevant information is one of the fundamental arguments in favor of enterprise taxonomy development and usage. Let’s consider someactive search strategies that will help you avoid tail-chasing and wearying labyrinths when searching for project taxonomy resources to assist you in your knowledge management.
Let’s first explore some active search strategies. If you have ever conducted a keyword search on the open web for online taxonomy resources, you may have had some difficulty hitting your target. After simply typing the keyword word “taxonomy” or “taxonomies” or “thesaurus” or “thesauri” into your favorite search engine window, you may have obtained less than satisfactory results. How many of your results were even remotely related to information structures, knowledge organization, or contextualized concepts organized by term?
How can you get better search results in less time?
- Consider using operators and/or advanced search techniques
- Isolate exact search phrases for use in full-text searches
- Once you’ve found a good online resource, take one additional step to find similar results
Each of these three tactics is discussed below.
1. If your favorite search engine allows for operators, try enabling operators under “advanced search settings.” Operators may refer to common Boolean operators (AND, OR, NOT) or may contain different symbols for the same function as Boolean operators. Familiarize yourself with your particular engine’s operators and vernacular. In Google, for example, go to the “Settings” link located at the bottom right of your Google search page (open on the Explorer or Firefox browser). From the drop-down menu (or pop-up menu in this case), choose “Advanced search.” Scroll down to the entry at the left that reads “Use operators in the search box.”
Explore this page’s many options. A little time invested here typically yields large dividends in your future searches. You may also decide to use the advanced search boxes or choose to use the Boolean shortcuts like OR, – (NOT), AND. You may also truncate words or employ wildcards using the asterisk *. An additional descriptor, like “business,” “knowledge management” or “project” added to “taxonomy” will better identify your target.
The time you take to carefully construct your search query will help “prefilter” your results and increase their relevance. Your searching will begin to look more like this:
(KM taxonom* OR enterprise taxonom* OR business taxonom* OR project taxonom* or corporate taxonom*) AND (manag* OR software)
2. In order to trim down the number of results, try the strategy of isolating exact phrases for full-text searching with quotation marks (“”). Your search queries will begin to look more like this:
(“project taxonomy” OR “enterprise taxonomy” OR “corporate taxonomy”)
Although it is possible to conduct the same searches in the “advanced search” option of most search engines, why not “save time” by learning a few of these shortcuts and experimenting with them?
3. Once you’ve discovered a resource or webpage that contains relevant content, try utilizing a few related searches that will expand similar useful and relevant results. In Google, for instance, you can type: info: (followed immediately by the web address – the Uniform or Universal Resource Locator (URL). Here’s an example of what you might type into the search box window if you liked what you saw at www.taxonomystrategies.com.
After you receive your results, look to the bottom of the page for increased options such as these:
Consider another example. If you were pleased with what you found at www.taxobank.org, then try typing:
Type the following terms into your search box window, and note the differences and nuances of the results rendered: (NOTE: Leave no space(s) between letter(s) that border the colon. Cf. the example above.)
Info: (Cut and paste the URL from the relevant site here.)
Related: (Cut and paste the URL from the relevant site here.)
Link: (Cut and paste the URL from the relevant site here.)
You might also try typing the URL into www.similarsites.com. (Beware; many of the “results” here are interspersed with ads!)
In the next post in this series, we will consider additional active search strategies to assist you in using time wisely to ferret out resources for your taxonomy needs.
Eric Ziecker, Information Consultant
Access Innovations, Inc.
Once upon a time, there was a real art to finding something in a library, and the card catalog, in a way, was the medium. Those giant wooden cabinets were filled with mysteries to be uncovered, but the first mystery was how to navigate it. There were always the artists—the librarians—who could help you through it, but that is really only viable for a limited amount of content; librarians have other duties, after all.
For people with larger goals —authors, researchers, and the like—it could get complicated really fast. They’re never in the position of needing a single book or a single article; they need a mountain of them. If it’s a very narrow subject they were working in, it made things a little easier. But the broader the subject or the greater the number of narrow subjects, the more quickly it became clear just how much work it would be to successfully find everything they needed.
What’s more, it was virtually impossible to enrich the research with material they never knew existed, at least not without the direct help of a colleague or expert who could recommend new material to them.
It gets even more complicated when you start to consider expanding the search beyond specific titles into authors, publishers, or tangentially related subjects. Then you start to get into cross-references; those are complete sets of records in themselves. By now it’s an unmanageably huge amount of information to deal with, and librarians, magical though they may be, could only do so much.
The thing is that “once upon a time” really isn’t that long ago; advances in information sciences have turned that magic into something more accessible to everyone. Tagging documents with metadata to identify the author name, institutions, subject matter, or any relevant piece of information at all brings all of those card catalogs into a single databank, accessible all at once.
It opens up wide possibilities for content usage, but what about applying those same “tagging” principles to people? We like to call it Semantic Fingerprinting because, it turns out, tagging a person’s electronic record actually does reveal the uniqueness of the person.
In academic publishing, the benefit of this fingerprinting is pretty clear. Knowing the author’s name, date of birth, institution, or really anything you want allows him or her to be identified quickly and, more importantly, with accuracy. This is important for a couple of reasons.
On the author’s side, having proper credit for their work is of course important, and, with their name and, likely, their institution already tagged in their book or article, their identity is pointed straight at their tagged record, proving them the true author. Additionally, if the subject matter they’ve written about is tagged in their record, as well, a new article submission can be placed intelligently into the peer review process. If you write about nanotechnology, experts in the field can quickly be identified, and be sent the article for review, eliminating one of the many possible slowdowns in a tedious, but necessary process.
For the publisher, it’s just as important, as it makes categorization of various authors easier. With the subjects tagged, it becomes really easy to see in which journal the article belongs, but it also aids in sales and subscriptions, which are becoming more important to the whole process than ever.
Subscription prices are going up while institutional budgets are slashed, meaning that a university has to make some hard choices about which journals are most important to them. So for the publisher to be able to look at their author and institution identities is a big deal. If they get word that a university library is planning to cancel their subscription, they can match who from that institution published in the journal and suggest that maybe they reconsider, given that their faculty has published in the journal whatever number of times over the last ten years. It’s unfortunate to think of the bottom line all the time, but we’ve all got to keep the lights on.
Many of these same things apply for researchers, which gets back to the original problem of sifting through content in a library. When the document is tagged, the researcher can quickly identify all of an author’s published work, when it was published, and on what subjects. From those subjects, they can then see other authors who published on the same or related topics and, soon, you see a network of information starting to build that is massively useful to people all throughout the publishing process.
And while we talk about academic publishing a lot around these parts, the private sector can get just as much use out of Semantic Fingerprinting as the public. Suppose, as a random example, the manager of a corporate marketing department is trying to put together a team of people for a big campaign. The manager needs people with very specific skills that may or may not go along with their job descriptions. Let’s say that the manager had employees take a survey at some previous point, which suggested individual skill sets. What if, then, each individual had those skills tagged within their employee record? Rather than have to hunt or, worse, simply hope that the chosen employees can perform the duties, the manager could just look at those skill tags and pinpoint exactly who will do for what task.
I don’t know how many companies out there are doing stuff like that, but I can see so many possibilities in working with semantic fingerprints. I can imagine possibilities in just about any industry I can think of, and I’m sure there’s a mountain of uses that I haven’t fathomed yet. In connection with Linked Data, it could be almost endless.
“The Award of Merit is our society’s highest award,” explained Richard Hill, Executive Director of ASIS&T, “and Margie has definitely earned it through her achievements. She has created opportunities where none previously existed, thereby expanding the field itself. In addition, as a member of ASIS&T, she has contributed countless hours of volunteer service to the great benefit of the Society.”
“Marjorie Hlava has spent forty years demonstrating how published theories of information science work in large-scale environments. Information professionals, and in fact people not even aware they are part of the information industry, use things she has created without realizing it. She has a keen eye for identifying ways in which fundamental principles of knowledge organization can become useful in the less-than-perfect environment of everyday applications,” wrote Harry Bruce, ASIS&T president, in the meeting program. “She could easily have led an academic life; however, she chose a different, and in many ways more difficult, way of shaping information science. She created a company and set of products and solutions (standards, schemas, languages, databases, taxonomies) that both applied principles and drove research by demonstrating what worked and what needs to be done.
“Patents, a diversity of projects, and a spirit of entrepreneurship illustrated strengthened key linkages between associated fields. Her nomination packet includes five letters, all of which are from significant information scientists, demonstrating how Marjorie is an example of how ASIS&T is unique in supporting a special blend of applied and theoretical work.”
Ms. Hlava was interviewed in April of 2014 as part of the “Leaders of Information Science and Technology Worldwide: In Their Own Words” initiative sponsored by the ASIS&T under the guidance of the Special Interest Group, History and Foundations of Information Science (SIG/HFIS) and the 75th Anniversary Task Force of ASIS&T. A video of this interview is posted on the ASIS&T website and can be viewed here.
“I am surprised, delighted, and humbled by this honor,” commented Ms. Hlava. “I have always enjoyed my membership in ASIS&T and found the presentations to be a springboard for new ideas to try.”
Access Innovations CEO Jay Ven Eman observed, “The insights Margie has gained from attending the meetings and networking with other members have fueled her desire to undertake new (and sometimes daring!) developments with the company’s service offerings and, later, the software. Conversations with other members have helped her find creative ways to address the applications of information science and its challenges. We look forward to many more years of continued involvement in ASIS&T.”
According to the ASIS&T website, “The Award of Merit was established in 1964 and is administered by the Awards and Honors Committee. The purpose of the award is to recognize an individual deemed to have made noteworthy contributions to the field of information science. Such contributions may include the expression of new ideas, the creation of new devices, the development of better techniques, or substantial research efforts which have led to further development of thought or devices or applications, or outstanding service to the profession of information science, as evidenced by successful efforts in the educational, social, or political processes affecting the profession.
“The award is a once-in-a-lifetime award and is sponsored by the Society-at-Large and is administered by the Awards and Honors Committee. The award shall be announced and presented to the winner by the ASIS&T President, with appropriate ceremony, at the banquet of the annual meeting of the Society.”
The presentation of the Award of Merit and the society’s other awards is to be made by Harry Bruce, the current ASIS&T president, at the upcoming ASIS&T Annual Meeting in Seattle, Washington at the Awards Luncheon on Tuesday, November 4, 2014.
About Access Innovations, Inc.
www.accessinn.com, www.dataharmony.com, www.taxodiary.com
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus and taxonomy creation, and semantic integration. Access Innovations’ Data Harmony® software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet productions environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
About ASIS&T – www.asis.org
Since 1937, the Association for Information Science and Technology (ASIS&T) has been the association for information professionals leading the search for new and better theories, techniques, and technologies to improve access to information. ASIS&T brings together diverse streams of knowledge, focusing what might be disparate approaches into novel solutions to common problems. ASIS&T bridges the gaps not only between disciplines, but also between the research that drives and the practices that sustain new developments. ASIS&T counts among its membership some 4,000 information specialists from such fields as computer science, linguistics, management, librarianship, engineering, law, medicine, chemistry, and education – individuals who share a common interest in improving the ways society stores, retrieves, analyzes, manages, archives and disseminates information, coming together for mutual benefit.
Nobody is going to deny that publishing is and always has been a sometimes messy process, but sophisticated uses of metadata and taxonomies can help clean it up. It fascinates me how intimately it can work in every step of the process to make it easier on everybody, from the author writing the piece to the institution that publishes it, all the way to its marketing and use.
Let’s start at the beginning, with the writer. Presumably, the person is an expert in his or her field, or at least working toward it, but that absolutely doesn’t make them an expert in searching for the information they need. That’s what always made library sciences so valuable, and while they’re still extremely valuable (don’t want to offend my librarian friends out there), the rise of enriched metadata means that searching and finding the content they need to conduct their research can be laid out clearly and concisely in front of them. This allows them to function in a noise-free environment and produce their best possible work.
So they’ve done all that and it’s time to submit the work to publishers. As we’ve seen, this can be an ordeal, but semantically enriched content, once again, can be implemented to ease the process for both the author and the publication. Tagged with relevant thesaurus terms, the submission can be analyzed to identify its subject, where it can then be more easily sorted and sent to properly qualified experts in the field for peer review. This might seem like a small part of it, but any amount of time saved is a big benefit to the author, who is often under the crushing weight of tenure deadlines.
However, once the author’s submission is out the door and in the hands of peer reviewers, it goes through its revision process, sent back and forth to get everything squared away. This, of course, can take a long time, but once the work is ready for publication, metadata begins to take on its most important role. Those same (or similar) subject terms that helped direct the submission into peer review now help to make certain that it is now directed to the most relevant possible journal, ensuring that the right people can easily find it.
This is the point at which, with the right tools and the right people in place, the metadata can really shine, because there’s so much that can be done with it. Once an article is published, either in an open access format like PLOS One or a more traditional subscription journal, its metadata can be used for an increasing number of purposes, anything from simple organization to highly advanced linked data.
Whatever that data is used for, the most important thing is that the content can be found. Everything after that is useless if it sits in the ether, hidden so nobody can read it. And as is likely fairly clear by now, the metadata is absolutely crucial at this end stage, where other researchers need to locate the content to conduct their own work. Just like original authors’ needs for clear, concise search results when their process started, if these new researchers have their results muddled with bad results and noise, let alone a result that get missed completely, it’s much more difficult to find the necessary content. This can prevent authors’ work from reaching the people who require it and keep it from furthering work in the field.
That’s counterproductive to research, obviously, but it’s also totally unnecessary. It shouldn’t take much to get people to see how this kind of metadata enrichment can make authors’ and publishers’ lives easier. It’s relatively new and there are a lot of buzzy words attached to it, but that doesn’t change the value of the core concept.
The good news is that semantically enriched metadata is starting to show up all over the place. Software like Data Harmony from Access Innovations automates much of this to help academic journals and institutions facilitate research. The pile of metadata is already gigantic, so it’s vital that the new content that journals are constantly publishing gets analyzed and tagged swiftly and accurately.
To me, the furthering of research is the most important thing, but there is another step in the process, that of marketing and sales. It’s the same principle as with everything else here: you can’t buy what you can’t find. The place with the clearest inroads to the content the consumer is looking for will be the one that wins. But the truth is that the sooner that people adopt the ideas behind semantically enriched metadata, the sooner it is that we all win.
Access Innovations recently debuted Data Harmony Version 3.9. Within its new features and fixes is a sneakily clever module called Inline Tagging. On the surface, it does exactly what the name says: It allows the user to see in a piece of content, quickly and clearly, what concepts in the text, exactly where in the text, triggered subject tagging by the software. It seems simple enough, a handy tool, but upon closer inspection, it really opens doors for the user.
Once the text is tagged, it becomes a question of what the user wants to do with it. That’s where the possibilities start to get really intriguing. In part, it allows an editor to do some very helpful things internally. Once term indexing triggers are tagged in a document, the editor could, for instance, go to the terms’ thesaurus listing, where they can see broader and related terms, along with synonyms or any number of facets of the taxonomy.
Thus, Inline Tagging is a helpful tool in aiding the editing process, but my thoughts are moving more toward the end user right now. It’s they who can truly reap its benefits. That’s because Inline Tagging can easily serve as a conduit for linking data, which has the potential to dramatically enrich a user’s search experience; absolutely crucial, especially in publishing.
We’ve already seen how massive the amount of data in the world has become, and we’ve seen the need to understand and control it. We see the emergent patterns in that data, and we work with it to discover new avenues for viewership or revenue or education. But that’s using just a handful of datasets. No matter how large they might be, the size of that data pales in comparison to the data in the world. If we could harness that power, what could we do?
Linked data, which has emerged as one of the most important concepts in data publishing, could well be the answer. In a database, one that implements Inline Tagging, the key terms and concepts in the documents are located at their occurrences within those documents. By using Inline Tagging, you turn a passage of text into a data item that can be quickly plucked for analysis. But how does that help us?
It can work on a number of levels. This can be as simple as having a taxonomy term link to a definition page, with broader and narrower terms, synonyms, etc. That right there can help with clarity, speed, and accuracy, but that’s just the beginning. There could also be a more substantial relationship between a thesaurus and the world’s data, one that allows users to take those data items and send them out to mine the web for related tags, drawing them back to the original page as related materials.
Say somebody is starting to write a paper on how a cheetah raises its young. They go online to research it and find a paper that addresses the topic perfectly. Now, this website also happens to implement linked data, so when the user queried “cheetahs raising young,” not only did the search result in a strong match on the site, it also, in turn, queried the cloud of data in the web. On its own, it locates information on other sites on the same topic and pulls down additional links: a wiki page, other related articles and papers, videos, or really anything.
It’s well known that people love one-stop shopping. That’s true in retail and that’s true in publishing. If the researcher can get all that information, curated personally for them in a clear, concise, and most importantly, highly accurate manner, they’ll almost certainly make that site their primary resource.
Some of the concepts have already been implemented in places, notably the BBC, whose unique Sport Ontology created for the 2012 Olympic games revealed just some of the potential of linked data. The idea was to personalize how the viewer watched the Olympics, understanding that enriched, relevant information delivered to the viewer in real time will drive traffic to the site.
There are even bigger ways linked data is being used, or potentially being used. The European Union is funding a project called Digitised Manuscripts to Europeana (DM2E), which aims to link all of Europe’s memory institutions to Europeana, the EU’s largest cultural heritage portal, to give free access to the stores of European history.
What if, in theory, a medical organization had access to linked data during flu season? That organization could pull information from not only medical records, but from, say, community records, school data, and other sources to try to predict when and where outbreaks might occur to minimize the damage. Certainly, there are issues with privacy and other hurdles that would need to be addressed, but even though that example is theoretical, the potential is massive.
Of course, proper implementation of linked data takes plenty of cooperation, so the jury is still out on how much or how soon sophisticated linked data usage could come about. The possibilities for academia, cultural awareness, and even retail look too enticing for it not to flourish. I, for one, am looking forward to a day where information I never dreamed of is right at my fingertips. I don’t know what it’s going to be, but it should be a fun ride.