The New Search

July 8, 2014  
Posted in News, search, semantic

Digital marketing is changing in every industry, but none more than the insurance industry. There are tactics more appropriate and effective than the search engine optimization and keyword linking of days gone by. Google has made that very clear. Using proper keywords and acquiring backlinks is still relevant strategy for ranking your insurance agency or carrier website in search engines, but the volume of keywords and accumulation of backlinks must look natural.

This change is being driven by semantic search. This kind of search has the ability to make associations between things in ways that come closer to how we humans make such connections. So search results that provide more relevant resources to our actual needs are of a higher value than link driven results. This interesting information came from Small Business Trends in their article, “How Semantic Search is Changing Insurance Industry Digital Marketing.”

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.

The Science of Search

July 3, 2014  
Posted in News, search, semantic

Mostly known as a method for technical writers, structured authoring is fast becoming the preferred choice for enterprise-level content production. This interesting information came from ClickZ in their article, “Structured Authoring: The New Normal in Content Production.”

The science of information retrieval and processing is an ever-changing science. The evolution is evidenced by last fall’s Hummingbird update and the changes at Schema. Adjusting to semantic search in this evolutionary process is key if marketers want to keep up. Consider it much like the impact the printing press took when technology hit the print world.

To be most effective, modern content needs to be machine-readable for semantic technologies. It needs to be responsive to users at ever-more-granular levels and easily deployed on mobile and stationary devices alike.

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.

The Time of Data

June 27, 2014  
Posted in metadata, News, semantic

Informatica has announced its Intelligent Data Platform that is designed to provide “the right data at the right time.” Sci-Tech Today brought us this information in their article, “Informatica Intros Its Planned Intelligent Data Platform.”

There are three components to Informatica’s Intelligent Data Platform. A data intelligence layer delivers self-service data for businesses, collecting metadata, semantic data and usage information. A second component, the infrastructure, offers clean and connected data. And a data engine aggregates and manages data.

Not yet a completed product, the platform is being developed as “a combination of existing Informatica platform capabilities and new product initiatives, some of which are in early beta testing.” Apparently some platform capabilities will be available as packaged offerings and reference architecture by the end of this year.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.

Rule Base Solutions

People often ask us how much time it will take to manage a rule base with Data Harmony software. We reply with specific customer experience numbers and tell them a few hours per month of editorial time to maintain both the thesaurus and the rule base. One customer of ours, the American Institute of Physics, found that maintaining their thesaurus and rule base takes less than 15 hours per month for 2000 articles per week throughput. Another customer, The Weather Channel, manages breaking news all day long with four hours per month of maintenance. It takes the editorial team just a few hours per month to keep up with the changing trends and events within their field and transfer those into the organizational knowledge base represented by the M.A.I.™ rule base. This is a small investment that provides the organization with the highest level of accuracy in coding (usually well over 90% hits without human intervention), as well as to support analysis of the trends in the business, the creation of author profiles, semantic fingerprints of the entire organizational holdings, and extraction of real meaning for all the data. Other customers, such as IEEE and the US GAO, find the accuracy of their Data Harmony software implementations so high that they now only sample the data periodically to glean new terms and trends. They do not see the need to review every single item.

The real question, though, should be a matter of control. If a rule-based solution maintained by the editorial staff is the approach taken, then full control remains with the editorial department. If a programmatic learning system – the seductive call of the purely automatic system – is the choice, then oversight either remains with the vendor or moves to the IT (information technology) department. The lower accuracy of the indexing returns (usually in the 60% range) means much more time spent by the editorial department on the production of the taxonomy tagged items. The time that would have been spent improving the knowledge base is instead spent in production time processing records, due to lower accuracy levels.

Here’s an example:  let’s assume 1000 articles per month. Using 90% accuracy versus 60% accuracy, how much extra production time is involved?  Let’s also suppose, for easy calculations, that there are 10 terms per article. If our rule base indexing is 90% accurate, then only one term will need to be reviewed, researched, and replaced or discarded. If alternative indexing methods produce 60% accuracy, then there are four terms per record to research, replace, or discard. The time to research a term and decide on its disposition is conservatively two minutes. So two minutes per term at 1 term per article is just 33.3 hours per month. But if four terms (60% accuracy) need reviewing, then 133.3 editorial hours per month are needed – obviously, four times the effort.  Moreover, the rule base improves over time with this small editorial input, so the maintenance time continues to decrease.

A statistical approach can appear to be a gift on a silver platter, but beware – such an approach means more time spent on production, less on building a knowledge base, lower accuracy, higher throughput costs, and no chance to learn about the data through semantic fingerprinting. To make matters even more frustrating, you have little control of the system. It has to be improved and worked on by the vendor or the IT department. New terms require a full revamping of the system each time, resulting in costly delays, rather than the real-time, instant updates that a system based on Java object-oriented programming allows. As a result, the taxonomy is not responsive to the organization’s data.

It is tempting to think that the classification of content can be done without the use of a vetted taxonomy properly applied or that the taxonomy only provides a convenient file folder naming convention. Unfortunately, the cost is high to make that choice. The accuracy is lower, the throughput is slower, and the clerical aspect of the indexing process is increased when you use a statistical system. In addition, control is no longer with the editorial department, but shifted to IT and the vendor. The power dynamic of the choice is clear: IT versus editorial. Who do you want to be in control of your indexing?

Marjorie M.K. Hlava
President, Access Innovations

Data Harmony Version 3.9 Includes MAI Batch GUI – A New Interface For M.A.I.™ (Machine Aided Indexer) and MAIstro™ Modules

June 16, 2014  
Posted in Access Insights, Featured, metadata, semantic

Access Innovations, Inc. has announced the inclusion of the MAI Batch Graphical User Interface (GUI) as part of the recent Data Harmony Version 3.9 software update release. MAI Batch GUI is a new interface for running a full directory of files through the M.A.I. Concept Extractor. This tool enables processing of large amounts of text through the Data Harmony M.A.I. Concept Extractor with a single command. Usually used in working with legacy or archival files, it allows complete semantic enrichment of entire back files in a short time. Once run, the taxonomy terms from a thesaurus or taxonomy become part of the record itself.

“For Data Harmony Version 3.9, we decided to add the interface to the MAIstro and M.A.I. modules to allow use directly from the desktop, giving more power to the user,” remarked Marjorie M. K. Hlava, President of Access Innovations, Inc. “It’s a fast, easy way to perform machine-aided indexing on batches of documents, without any need for command-line instructions.”

“M.A.I.’s batch-indexing capability has been in place for years via command line interface,” noted Bob Kasenchak, Production Manager at Access Innovations. “This new GUI makes it really easy to use. Customers only need to open ‘MAI Batch app’ in their Data Harmony Administrative Module, choose the files or directories to process, and submit the job.”

The purpose of MAI Batch is to provide immediate processing of data files on demand. MAI Batch can be deployed to achieve rapid subject indexing of legacy text collections.

MAI Batch GUI offers semantic enrichment by extracting concepts from input text in most file formats, including the following:

  • Adobe PDFs
  • MS Word DOC files
  • HTM/HTML pages
  • RTF documents
  • XML files

For XML files, the ‘XML Tags’ option permits users to define specific XML elements for MAI Batch GUI to analyze during batch processing. This option opens the door for indexing source documents that are tagged according to different XML schemas. XML Tags also permits the exclusion during indexing of sections in the document structure, as designated by the user.

The interface’s Input and Output panes present a practical view of the batch during processing, enabling a degree of interactivity – M.A.I. is a very accessible automatic indexing system. It’s a ‘machine-aided’ software approach, even when applied to batches of documents. IT support is important but not needed to process and maintain the Data Harmony Suite of products.

When the documents already contain indexing terms, MAI Batch GUI will derive accuracy statistics for inclusion in the output, logging the statistics of indexing accuracy for the batch. M.A.I. calculates the indexing accuracy of its suggested terms from Concept Extractor compared to the previously-applied subject terms. This powerful method for enhancing the accuracy of subject indexing is based on reports generated by the M.A.I. Statistics Collector, giving a taxonomy administrator all the data needed to continually improve the results based on the system recommendations, selections, and additions.

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has leveraged semantic enrichment of text for internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs.  Data Harmony is used by publishers, governments, and corporate clients throughout the world.

Preventing Damage Before it Starts

June 12, 2014  
Posted in News, semantic

Malware is the bane of any user’s existence. It harms everyone from the most naive to the competent techy guru.

Danfeng (Daphne) Yao, associate professor of computer science at Virginia Tech College of Engineering, and colleagues have developed the first workable proactive system to detect malware in individual computers or in networks before any damage can be done. Examiner brought this news to us in their article, “Malware protection based on semantics proves successful.”

The problem with the majority of present malware detection systems is its hindsight. The new program is the first malware protection program that can actively detect malware before it installs itself. This also can prevent an infected computer from contaminating the rest of a network. Using the history of the user and the network, the semantics based program determines if there is basis for a relationship and determining if the activity is foreign and potentially harmful. Way to go semantics, and Dr. Yao.

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.

 

Marco? Polo!

June 9, 2014  
Posted in News, search, semantic

Making content findable by users has become easier and more intuitive through semantic search, a technique used to determine the actual intent and contextual meaning for the keywords that a person types into a search engine. It works on the principles of language and is based on the context, substance, and intent and concept of the search phrase. Semantic search can also incorporate location, synonyms of a term, current trends, and other natural language elements as part of the search. This interesting topic came from the Colorado CEO blog and their post, “The Semantic Search and How Search is Changing.”

Semantic search is a comprehensive and powerful technology that can return exact results based on the context, content, and user intent. Consumers benefit in this new world of search.

Semantic technology requires a special knowledge of terminology and coding to reduce errors. Access Innovations, developer of the M.A.I. machine-assisted indexing system and specializing in complex coding, tagging, and indexing, provides a range of services that deliver tag integrity.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

Semantic Technology Gets Real

June 4, 2014  
Posted in News, semantic

The possibilities and limitations of semantic search continue to be evaluated as more and more enterprises and applications utilize the technology. Semantic search can be challenging to understand, let alone implement. If all your experience of search engine use has been the Boolean search of the past, then this is a new world for you to explore.

The good news is that while semantic search may be difficult to implement at the Google level, for the end user it has actually become a little easier. This topic was inspired by the post on David Amerland’s blog, “Semantic Search – Three Basic Principles You Need to Know About.”

Semantic technology requires a special knowledge of terminology and coding to reduce errors. Access Innovations, developer of the M.A.I. machine-assisted indexing system and specializing in complex coding, tagging, and indexing, provides a range of services that deliver tag integrity.

Melody K. Smith

Sponsored by Access Innovations, the world leader in thesaurus, ontology, and taxonomy creation and metadata application.

Data Everywhere

June 3, 2014  
Posted in metadata, News, semantic

We talk a lot about big data. It is important, but what about smart data? Adding more data isn’t the solution to all problems. Getting value from data comes down to how easily you can do something with it. This interesting perspective came from Technical.ly in their article, “Forget big data, here comes ‘smart data’: Semantic Web Meetup.”

Smart data is data with semantics attached. It is also about adding value to previously overlooked resources. There are many pieces of metadata that might seem unimportant, but that if known and used will improve search results. For example, knowledge of the authorship of a document can be used to calculate relevance and importance of a document for ranking results.

Even with semantic technology involved, information management for any type of business is critical for fast, easy, and comprehensive findability. One key way to ensure this is through a solid taxonomy, based on standards, built by someone with years of experience in the field.

Melody K. Smith

Sponsored by Access Innovations, the world leader in taxonomies, metadata, and semantic enrichment to make your content findable.

Access Innovations, Inc. Announces Release of the Semantic Fingerprinting Web Service Extension for Data Harmony Version 3.9

June 2, 2014  
Posted in Access Insights, Featured, semantic

Access Innovations, Inc. announces the Semantic Fingerprinting Web service extension as part of their Data Harmony Version 3.9 release. Semantic Fingerprinting is a managed Web service offered to scholarly publishers to disambiguate author names and affiliations by leveraging semantic metadata within an existing publishing pipeline.

The Semantic Fingerprinting Web service data mines a publisher’s document collection to build a database of named authors and affiliated institutions, and then expands the database over time with customization and administration services provided by Access Innovations during configuration. The author/affiliation database powers M.A.I.™ (Machine Aided Indexer) algorithms for matching names in new content received from contributors. During the configuration phase, an essential component is the graphical user interface (GUI) where users disambiguate unmatched names using clues that M.A.I. surfaces as a result of rigorous document analysis.

“Like a fingerprint, each author has a unique ‘semantic profile’ that captures the specific disciplines and topic areas in which they publish – reflecting subject areas covered in their body of research. Data Harmony generates subject keywords that describe the document’s content, to increase the number of author name matches a reviewer can find during editorial review of unresolved names,” explained Kirk Sanders, Access Innovations Taxonomist and Data Harmony Technical Editor.

“Semantic Fingerprinting is a versatile addition to the Data Harmony software lineup,” said Marjorie M. K. Hlava, President of Access Innovations, Inc. “Publishers can incorporate Semantic Fingerprinting to build each author’s profile, precisely reflecting that person’s research and publication achievements and institutional affiliations – all driven by information that’s already moving through the pipeline. It’s an elegant approach to data-mining a document stream for highly practical purposes, an approach presenting immediate benefits for the scholarly publisher.”

“Semantic Fingerprinting is driven by patented natural language processing algorithms,” responded Bob Kasenchak, Production Manager at Access Innovations, when asked to comment on the module’s inclusion in the Version 3.9 software update release. “The Web service enables a publisher to move far beyond adding subject metadata in their pipeline by supplementing it with the author’s research profile. This module and the process also offer a new way to improve precise document search and retrieval. Enhancements to document metadata also present opportunities to support other functions related to marketing or assigning appropriate peer reviewers.”

The Semantic Fingerprinting extension from Data Harmony 3.9 is a Web service (managed by Access Innovations) that relates terms from a publisher’s controlled vocabulary (a taxonomy or thesaurus) to the contributing authors, their affiliated institutions, and other relevant metadata information. Software components such as the user interfaces and entity-matching algorithms are adjustable, because every data set needs a targeted approach. As more data is processed by the matching algorithms and/or human editors, the name authority file and other processes require routine monitoring and adjustments. In many cases, suggestions for adjustments will come from human editors, based on questionable entities that they resolve by searching the name authority file in the Semantic Fingerprinting interface.

About Access Innovations, Inc. – www.accessinn.com, www.dataharmony.com, www.taxodiary.com

Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes machine aided indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs.  Data Harmony is used by publishers, governments, and corporate clients throughout the world.

« Previous PageNext Page »