Covered In Metadata: Semantic Fingerprinting

Once upon a time, there was a real art to finding something in a library, and the card catalog, in a way, was the medium. Those giant wooden cabinets were filled with mysteries to be uncovered, but the first mystery was how to navigate it. There were always the artists—the librarians—who could help you through it, but that is really only viable for a limited amount of content; librarians have other duties, after all.

For people with larger goals —authors, researchers, and the like—it could get complicated really fast. They’re never in the position of needing a single book or a single article; they need a mountain of them. If it’s a very narrow subject they were working in, it made things a little easier. But the broader the subject or the greater the number of narrow subjects, the more quickly it became clear just how much work it would be to successfully find everything they needed.

What’s more, it was virtually impossible to enrich the research with material they never knew existed, at least not without the direct help of a colleague or expert who could recommend new material to them.

It gets even more complicated when you start to consider expanding the search beyond specific titles into authors, publishers, or tangentially related subjects. Then you start to get into cross-references; those are complete sets of records in themselves. By now it’s an unmanageably huge amount of information to deal with, and librarians, magical though they may be, could only do so much.

The thing is that “once upon a time” really isn’t that long ago; advances in information sciences have turned that magic into something more accessible to everyone. Tagging documents with metadata to identify the author name, institutions, subject matter, or any relevant piece of information at all brings all of those card catalogs into a single databank, accessible all at once.

It opens up wide possibilities for content usage, but what about applying those same “tagging” principles to people? We like to call it Semantic Fingerprinting because, it turns out, tagging a person’s electronic record actually does reveal the uniqueness of the person.

In academic publishing, the benefit of this fingerprinting is pretty clear. Knowing the author’s name, date of birth, institution, or really anything you want allows him or her to be identified quickly and, more importantly, with accuracy. This is important for a couple of reasons.

On the author’s side, having proper credit for their work is of course important, and, with their name and, likely, their institution already tagged in their book or article, their identity is pointed straight at their tagged record, proving them the true author. Additionally, if the subject matter they’ve written about is tagged in their record, as well, a new article submission can be placed intelligently into the peer review process. If you write about nanotechnology, experts in the field can quickly be identified, and be sent the article for review, eliminating one of the many possible slowdowns in a tedious, but necessary process.

For the publisher, it’s just as important, as it makes categorization of various authors easier. With the subjects tagged, it becomes really easy to see in which journal the article belongs, but it also aids in sales and subscriptions, which are becoming more important to the whole process than ever.

Subscription prices are going up while institutional budgets are slashed, meaning that a university has to make some hard choices about which journals are most important to them. So for the publisher to be able to look at their author and institution identities is a big deal. If they get word that a university library is planning to cancel their subscription, they can match who from that institution published in the journal and suggest that maybe they reconsider, given that their faculty has published in the journal whatever number of times over the last ten years. It’s unfortunate to think of the bottom line all the time, but we’ve all got to keep the lights on.

Many of these same things apply for researchers, which gets back to the original problem of sifting through content in a library. When the document is tagged, the researcher can quickly identify all of an author’s published work, when it was published, and on what subjects. From those subjects, they can then see other authors who published on the same or related topics and, soon, you see a network of information starting to build that is massively useful to people all throughout the publishing process.

And while we talk about academic publishing a lot around these parts, the private sector can get just as much use out of Semantic Fingerprinting as the public. Suppose, as a random example, the manager of a corporate marketing department is trying to put together a team of people for a big campaign. The manager needs people with very specific skills that may or may not go along with their job descriptions. Let’s say that the manager had employees take a survey at some previous point, which suggested individual skill sets. What if, then, each individual had those skills tagged within their employee record? Rather than have to hunt or, worse, simply hope that the chosen employees can perform the duties, the manager could just look at those skill tags and pinpoint exactly who will do for what task.

I don’t know how many companies out there are doing stuff like that, but I can see so many possibilities in working with semantic fingerprints. I can imagine possibilities in just about any industry I can think of, and I’m sure there’s a mountain of uses that I haven’t fathomed yet. In connection with Linked Data, it could be almost endless.

Daryl Loomis
Access Innovations

Covered In Metadata: Semantic Fingerprinting