Accessing the Data

July 31, 2014  
Posted in indexing, metadata, News, search, Taxonomy

Atypon has released Literatum 14.1 for professional and scholarly publishers. Literatum powers the websites of over 200 professional and scholarly publishers, including many of the most prestigious publishers in the world. This interesting news came from Knowledgespeak in their article, “Atypon unveils Literatum 14.1.

This being the first of multiple releases scheduled for 2014, Literatum 14.1 adds dozens of new features such as taxonomy management capabilities.

Access to data is important. It can be achieved by creating strong taxonomies. Proper indexing against a strong standards-based taxonomy increases the findability of data. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

Managing Change with Consistency

July 29, 2014  
Posted in indexing, metadata, News, Taxonomy

Like most things in life, there are changes everywhere. Technology isn’t the exception, it is more than likely the leader. As it evolves, with it come new challenges. Information architecture sees its own changing landscape with new content types, expanding/collapsing the taxonomy and modifying the metadata, as well as relationships between content types and associated business rules. How do you manage that kind of change? This interesting topic was found on KM Institute in their article, “Governing the Information Architecture – Model, Taxonomy, Metadata.”

It is important to promote the consistency of the information used across the platforms. By managing enterprise content, its metadata, and associated taxonomy, users will find the content they are looking for when and how they need it.

Proper indexing against a strong standards-based taxonomy increases the findability of data. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

Locating and Finding Are Not Equal

July 29, 2014  
Posted in News, search, storage, Taxonomy

New technologies are responsible for the electronic storage of information. Interestingly enough, though, 90% of businesses are still storing documents in paper format, according to a study by JSE-listed storage management provider Metrofile. This interesting information was brought to our attention by Business Day Live in their article, “Firms prefer paper to electronic records.”

Apparently 75% of respondents store the original paper documents on site, while only 45% of businesses scan the original paper documents as back-up. While the physical location of data still mattered, it is becoming increasingly irrelevant and can be replaced.

Access to data is important. It can be achieved by creating strong taxonomies. Proper indexing against a strong standards-based taxonomy increases the findability of data. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

Buzzwords, Bling, and Being Snowed

July 28, 2014  
Posted in Access Insights, Featured, Taxonomy

 

breathofsnowhttp://blackjack0919.deviantart.com/art/Breath-Of-Snow-336525153 / / CC BY-ND 3.0

I recently had the opportunity to see webinars featuring a couple of software systems for taxonomy construction/management and content categorization. The systems were both impressive and, if I didn’t have 20 years in the business, I would have been totally awed … and snowed. It’s easy to be overwhelmed by a slick appearance and professional presentation.

Early in my career, I was overwhelmed and confused by the terminology—its abundance, multiplicity, and ambiguity. Each software company used different words, all very catchy, developed by a creative marketing department. I didn’t get whether they were talking about different concepts or the same in different verbal wrappers. Cutting through the terminology to identify key software features and functions can be tough. Yet that’s just what must be done for an informed buying decision.

One of the buzzwords I came across in these recent webinars was “content-driven” (or “data-driven”) to describe a taxonomy. To my amazement, this was described as a “trend” in taxonomy construction by the presenter for the company “with over 15 years of experience.” Apparently it was intended as a strike at a “top-down” approach to pulling together terms for a taxonomy based on an abstract, authoritative view of a domain. The top-down approach was described as more complex than necessary and including nodes not reflecting your content.

However, the discussion ignored the equally familiar and long established counterpart to top-down. This is the “bottom-up” approach, drawing terms directly from the documents to be categorized, i.e., content-driven. Here’s a link to a brief description of the strategies written in 1996 by Jessica Milstead.

In most cases, building a taxonomy or thesaurus requires a hybrid approach, with the overall organization based on a top-down approach for navigation and the bulk of terms reflecting the preferred terms for concepts in the domain and drawn from the actual documents. The strategies are most often used in balance, with the taxonomist providing a logical “top” structure into which the content-linked terms can fit.

The software on display generated a list of candidate terms, offering words and phrases from the content as terms. But this was just a starting point in taxonomy construction. Time for the taxonomist to add the value of organization through hierarchical, associative, and equivalence relationships.

Ah, “relationships” takes me to semantics, another buzzword that sounds very impressive and truly represents the power of taxonomies. The key thing to remember is that semantics in a taxonomy starts with the hierarchical, associative, and equivalence relationships. (Actually, a taxonomy with all those features is more accurately called a thesaurus). Organizing terms in a hierarchy of broader and narrower concepts—from general to specific—and recognizing synonymous alternative expressions and internal conceptual links all add semantic richness to terms by providing context based on the meanings of words. These are features built into a well-developed taxonomy, providing pivot points from one term to another through logical semantic associations. Applied as metadata to content items, the taxonomy terms provide semantic enrichment.

Another slick webinar focused on semantic enrichment with an artfully designed but effective presentation. As jaded as I have become, I was duly impressed by the appealing motifs, the jazzy colors, the graphics in motion, and the requisite buzzwords in the opening. This is the part you show to the CIO, CTO, etc., the one with final budget authority. We are still talking about semantically enriching content with metadata from a domain-specific taxonomy. You say, “This is just what I need!”

Several modules were described. One extracts words and phrases as key topics for a taxonomy-ish product, called by a name not found in the ANSI/NISO Z39.19 standard for taxonomy construction. Another is for human taxonomy building from scratch, if the ready-built domain taxonomies are not a good fit. Others serve categorizing/indexing/tagging/annotating content (chose your favorite expression), also known as applying taxonomy terms as metadata or … semantically enriching the content.

I must admit I was impressed, but not snowed. I’m an editor, not in marketing or in an art department. I knew what this was all about because this is basic taxonomy and indexing work that I do daily, using software that delivers these functions. I know that slick is cool to look at, but it comes at a price.

ferrarihttp://commons.wikimedia.org/wiki/File:Ferrari_F430_2.jpg

“Gee, thanks for the spin in your Ferrari, but I was hoping for a Chevy pricetag and Honda/Volvo/Subaru reliability.”

civicPhoto by Ahmad Mukhlis, http://en.wikipedia.org/wiki/Honda_Civic#mediaviewer/File:Honda_Civic_Hybrid_%28Malaysia%29.jpg / CC BY-SA 3.0

I also know that essential functions are available in products much more accessible to organizations on a budget.

If you are interested in software for taxonomy creation, management, and application, don’t get snowed by the buzzwords and bling. Know the basics of taxonomy construction and implementation, and use that knowledge as a starting point when comparing software. Know the functions you need to perform and avoid slick but unnecessary frills, as alluring as they may be. Know if the product will work with other systems and whether you’ll need a high-priced mechanic or an editor to do the work. When you hear about trends, consider established history and experience. Data Harmony software from Access Innovations was developed for a demanding production setting, starting in 1995 and continually improving over the years. It may be eclipsed on the slick presentation front, but the software has proven it’s up to the job. Just ask any of our satisfied customers. Contact us; we’d be happy to give you the list.

Alice Redmond-Neal, Senior Taxonomist
Access Innovations

Don’t Underestimate the Value of a Taxonomy

July 25, 2014  
Posted in News, Taxonomy

EBSCO Information Service and Infotrieve have joined efforts to combine their search, content access, rights management and document delivery in a single platform. This new information came to us from KM World in their article, “EBSCO and Infotrieve Partner.”

Customers will be able to use Infotrieve’s Mobile Library as their e-content access and management platform while using the search and extensive metadata from EBSCO Discovery Service (EDS).

It is important to remember the value of a solid taxonomy and its role in the search process. How the content is classified impacts the findability of your data. Professionals should look for an experienced builder of solid standards-based taxonomies to associate content for appropriate machine-assisted indexing.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

Trees, Fractals, and Taxonomies

July 21, 2014  
Posted in Access Insights, Featured, Taxonomy

Dragon_treesImage by Solkoll,
en.wikipedia.org/wiki/Patterns_in_nature#mediaviewer/File:Dragon_trees.jpg

If you look at a branch of a typical deciduous tree, you can see that it looks like a smaller tree. Likewise, that branch branches off into smaller branches that look like even smaller trees.

This characteristic of trees is an example of what mathematicians, biologists, and systems scientists call self-similarity. Self-similar systems repeat their basic geometry at smaller and smaller scales, creating multiple miniatures of themselves at different scales. In general, natural and mathematical systems in which self-similarity results in complex and detailed patterns are referred to as fractal systems.

Many natural phenomena are or can be fractal:

snowflakes,

12armSnowflake2004UTbr

Photo of a 12-sided snowflake by Becky Ramotowski,
www.srh.noaa.gov/abq/?n=features_snowflake

ocean waves,

Mount-Fuji-Seen-Below-a-Wave-at-Kanagawa

Painting by Katsushika Hokusai,
www.katsushikahokusai.org/Mount-Fuji-Seen-Below-a-Wave-at-Kanagawa.html
/CC BY-NC-ND 3.0

and even broccoli.

640px-Fractal_Broccoli

Photo by Jon Sullivan,
en.wikipedia.org/wiki/Romanesco_broccoli#mediaviewer/File:Fractal_Broccoli.jpg

Trees are loosely fractal. While the trunks don’t keep replicating, the branches do. As the Fractal Explorer observes:

If you don’t know anything about fractals a tree might seem as a very random object. No patterns, no rules. But if you know something about fractals and look closer you can see that basically a tree is a trunk with trees on it. That is a basic pattern that every tree follows.

Taxonomies are often described as taxonomic trees, or as having a tree-like structure. To carry the analogy further, we often refer to the progressively more specific and more numerous hierarchical subdivisions in a taxonomy as branches. The overall domain of a taxonomy, while sometimes referred to as its root, might also be viewed as its trunk.

So this begs the question: Are taxonomies fractal? As it turns out, several authors have written articles on the fractal nature of biological genus-and-species taxonomies. These articles discuss the branching characteristics of these taxonomies, the same branching characteristics that we see in taxonomies outside the realm of biological species categorization. They also discuss the mathematical tendencies of the proportions of the various branches, tendencies that could perhaps be a natural result of the degree to which things in a group need to be different before we find it appropriate to give them different names.

In recent years, interdisciplinary scientists such as Christophe Eloy have been studying the natural forces that make trees grow the way they do, and how their growth patterns might make them resilient in windstorms. Interestingly enough, these scientists have been inspired, in part, by an observation that another person with an interdisciplinary approach, Leonardo da Vinci, made 500 years ago.

As Joe Palca explains in “The Wisdom Of Trees (Leonardo Da Vinci Knew It)”:

Leonardo noticed that when trees branch, smaller branches have a precise, mathematical relationship to the branch from which they sprang. Many people have verified Leonardo’s rule, as it’s known, but no one had a good explanation for it. …

Leonardo’s rule is fairly simple, but stating it mathematically is a bit, well, complicated. Eloy did his best:

“When a mother branch branches in two daughter branches, the diameters are such that the surface areas of the two daughter branches, when they sum up, is equal to the area of the mother branch.”

Translation: The surface areas of the two daughter branches add up to the surface area of the mother branch.

Here’s another explanation, from Esther Inglis-Arkell’s article “Scientists Still Puzzled by a Fractal Discovered 500 Years Ago”, that might be more intuitive:

Strip the leaves off of the average tree, soak the whole thing in water until it gets mushy, bundle the branches up together, and you’ll get what looks like one long trunk. That’s what Leonardo Da Vinci said in the fifteen hundreds. If a tree trunk splits off into three main branches, each of the branches will be one third the size of the trunk. When each of those branches splits into three again, making nine branches on the second ‘tier’ of the tree, each of these second tier branches will be one ninth the side of the trunk. As the branches grow and split, they will always be a particular fraction of the size of the trunk, and adding together all the fractional bits of each ‘tier’ of branches will always add up to ‘one trunk.’ This isn’t the case in all trees, but the majority hold to this pattern.

Can we gain a new perspective on taxonomies from all this? I think the lesson might have to do with scope, specificity, and detail. According to da Vinci’s observation, tree branches uniformly become ever thinner until they taper off, yet their total bulk at most levels of the tree will be approximately the same. So, in a taxonomy that grows naturally, we might expect that the terms at any given depth might be at approximately the same level of specificity. At the same time, their individual scopes at any given depth will add up to a sum total that will ideally (I think) cover the same scope as the top level of terms. As with trees branches tapering off, though, this will be less true as the taxonomy branches naturally taper off and end at the most specific levels.

Inglis-Arkell sums up with some interesting observations about the beauty of branches:

This pattern of growth has a mathematical, as well as physical, beauty. Trees are natural fractals, patterns that repeat smaller and smaller copies of themselves. Each tree branch, from the trunk to the tips, is a copy of the one that came before it. Branches split off from the highest tip the same way they do from the trunk, and set of branches splits off at the same angle to each other. Physics, math, and biology come together to create the simplest and most efficient growth pattern. It just took Leonardo Da Vinci to first notice it, the big show-off.

 Barbara Gilles, Taxonomist
Access Innovations

Taxonomies Can Level the Playing Field

July 21, 2014  
Posted in indexing, metadata, News, search, semantic, Taxonomy

We already know that metadata and metatagging enable findability in search-based applications, but what happens when there are diverse formats? Search Content Management brought this thought to our attention in their article, “Metadata tagging and the innovation behind search-based applications.”

When we are looking for information within an ever-widening array of technologies, from mobile devices to the cloud – where does the search technology come in? A search-based application can query a variety of structures and return the results of the query in a single, unified view. This is powerful because it encompasses all types of content.

This is where a solid taxonomy comes into play, as it can provide consistency for tags that don’t quite line up with one another. Indexing metadata against this taxonomy results in solid and comprehensive search results. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

The Value of a Taxonomy Tool

July 18, 2014  
Posted in indexing, News, Taxonomy

TEMIS has launched Luxid 7, the seventh generation of its flagship semantic content enrichment platform. Luxid 7 promises a beefed up scalable and robust semantic enrichment pipeline and includes a dedicated ontology management tool. Broadway World Geeks brought this to our attention in their article, “TEMIS Integrates Ontology Management and Semantic Enrichment in Luxid’ 7.”

When building a taxonomy or ontology, you want that model to be available across the enterprise and not tied to one single program. All of this is done to make it a dynamic and comprehensive system with outstanding search results. Access Innovations provides Search Harmony to allow that same taxonomy/ontology to be used on the user search side to leverage the tagging of the documents and further enhance search results. A user can easily change the configuration of the ontology/thesaurus/taxonomy through our administrative module so the data model retains integrity by matching the guidelines of the standards while modifications are made to the user needs. This easily integrated feature is critical to quality, progressive search results.

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.

Standards and Taxonomies – Match Made in Heaven

July 17, 2014  
Posted in News, Standards, Taxonomy

A recent study revealed that an open standard for fixed-income reference data would better coordinate taxonomies, even with the Enterprise Data Management (EDM) Council’s Financial Industry Business Ontology (FIBO). Waters Technology brought this news to our attention in their article, “Study Recommends Open Standard For Taxonomies.”

Standards create consistency and in taxonomies, that is key for success. The common language and classification from the beginning result in fast and thorough results.

Proper indexing against a strong standards-based taxonomy increases the findability of data. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

 

Thesaurus evolution – a case study in “Synthetic biology”

July 14, 2014  
Posted in Access Insights, Featured, Taxonomy

The following post, by Rachel Drysdale, originally appeared in PLOS BLOGS on April 8, 2014.

Science does not stand still and neither does the PLOS thesaurus. With more than 10,700 Subject Area terms, we use the thesaurus to index our articles and provide useful links to related papers, enhanced search functions, and, for PLOS ONE (more than 90 articles published every day!), customizable Subject Area-based email alerts and Subject Area landing pages.

Sometimes we decide to renovate a sector of the thesaurus to better reflect the make-up of the PLOS corpus. For example, we’ve long had a Subject Area term for “Synthetic biology,” sitting beneath “Biology and life sciences.” We even have a healthy Synthetic Biology Collection. However, the Subject Area term “Synthetic biology” was being applied to only a handful of articles despite the fact that many more PLOS articles were about synthetic biology and should ideally have been indexed accordingly. Why was this?

Part of the explanation is that ‘synthetic biology’ is not a phrase that is frequently used in natural language. So whereas an article about hypertension may use the word ‘hypertension’ 26 times within the text, an article about synthetic biology might state ‘synthetic biology’ rarely, if at all. This poses a challenge to the Machine Aided Indexing process which assigns Subject Areas to articles based on the frequency of matches in the text.

The way around this is to introduce a level of abstraction to the rulebase that governs the Machine Aided Indexing. The base rules are very literal: “if I see ‘synthetic biology’ in the text I’m going to use the ‘Synthetic biology’ Subject Area term.” But there are additional words and phrases that are diagnostic of synthetic biology topics, such as “biobricks” and “Registry of Standard Biological Parts.” Adding rules for these terms – for example “if I see ‘Registry of Standard Biological Parts’ in the text I’m going to use ‘Synthetic biology’” – increases the frequency of indexing to “Synthetic biology” and thus the retrieval of relevant articles in our searches.

A second factor is to do with the hierarchical structure of the thesaurus – an especially important factor given that our search functionality is designed to utilize this hierarchy. For example, a Subject search for “Vascular medicine,” beneath which Hypertension sits, retrieves articles indexed specifically with Hypertension, even if they have not been explicitly tagged with “Vascular medicine.” In earlier versions of the PLOS thesaurus “Synthetic biology” had no narrower terms, and this was doing it no favours with regard to how useful it was for retrieving relevant articles. We therefore reviewed essays about synthetic biology, scope descriptions from relevant institutional and departmental web sites, and proceedings from synthetic biology conferences, all in light of the content of our articles, and introduced new, narrower terms to sit beneath our existing “Synthetic biology” where that made sense.  So we went from having the single “Synthetic biology” term to the new structure of 30 terms in one renovation.  Here is what we have now:

synbio_crop

Much of the evolution of the PLOS thesaurus is gradual, as for example when we realised that “puma” can be used as an abbreviation for “p53 upregulated modulator of apoptosis” as well as a kind of big cat, or learned that asteroids can be starfish. Dealing with these indexing missteps requires small-scale changes to specific rules. But sometimes the change needs to be more radical. Our new “Synthetic biology” sector was implemented in Ambra 2.9.12 (released March 26th, 2014). Where previously only a handful of articles was indexed with “Synthetic biology,” now a Subject search across all PLOS journals retrieves over 400 “Synthetic biology” articles – much more fitting for this important and developing field.

For more about the work PLOS is doing with Synthetic biology see “An Invitation to Contribute to the Second Life of the Synthetic Biology Collection.”

Next Page »