Semantic enhancement extends beyond journal article indexing, though the ability of users to easily find all the relevant articles when searching still remains the central purpose. Now, in addition to articles, semantic fingerprinting is used for identifying and clustering ancillary published resources, media, events, authors, members or subscribers, and industry experts.
The systems you choose to enhance the value of your assets, and the people behind them, are extraordinarily important.
It starts with a profile of your electronic collection. It may include a profile of your organization as well. As you choose the concepts that represent the areas of research today and in the past, the ideas and thoughts of your most articulate representatives, and the emerging methods and technologies, you bring together a picture of the overall effort. This can be done with a thesaurus, an organized list of terms representing those concepts or taxonomy, enhanced with relationship links between terms.
Human intelligence is still the most powerful engine driving the development and maintenance of this lexicographic profile. Technology tools help with the content mining, frequency analyses, and other measures valuable to a taxonomist, but the organization, concept expressions, and relationship building is still best done by humans.
Similarly, the application of the thesaurus is best done by humans. Because of the volume of content items being created every day, it may not be possible to have human indexers review each of them. Our automated systems can match 90% what a human indexer would choose, so high-valued content is still indexed by humans. While this is accomplished much more efficiently than in the past, it still must be done by humans. And the balance requires the contribution of humans to inform the algorithm in actual natural language. Fully enabled, the automated system produces impressive precision in identifying the aboutness of a piece of content.
And how can a system achieve accuracy and consistency? Our approach is to reflect the reasoning process of humans, using a set of rules. Our rule base is simple to enhance and simple to maintain, and like the thesaurus, flexible enough to accommodate new terminology in a discipline as it evolves. About 80% of the rules work well just as initially created. The other 20% achieve better precision when touched by a human who adds conditions to limit, broaden, or disambiguate the use of the term triggering the rule.
Metadata makes digital content findable. However, findability works only when a proper taxonomy is in place. Proper indexing against a strong standards-based taxonomy increases the findability of data. Access Innovations is one of a very small number of companies able to help its clients generate ANSI/ISO/W3C-compliant taxonomies.
Data Harmony is a fully customizable suite of software products designed to maximize precise and efficient information management and retrieval. Our suite includes tools for taxonomy and thesauri construction, machine aided indexing, database management, information retrieval and explainable artificial intelligence.
Melody K. Smith
Sponsored by Access Innovations, the intelligence and the technology behind world-class explainable AI solutions.