To better understand the complexities of life science data, one company is transforming unstructured text into rich, contextualized machine-readable content with ontologies. Bio-IT World brought this news to us in their article, “A New Machine Learning Approach To Document Classification – A Pfizer/SciBite Collaboration.”

SciBite leverages ontologies to enable computers to understand millions of concepts relevant to life science. Referred to as VOCabs, these hand curated ontologies cover over 100 concept types. At the heart of these technologies sits TERMite (TERM identification, tagging & extraction); a named entity recognition (NER) and extraction engine.

Knowledge transfer is crucial for successful integration of external research projects or commercial acquisitions into the organization. Much of this knowledge is found in a myriad of free-text documents that must be cataloged and integrated with internal data management systems. This is where the ontologies become even more useful.

The implications of the work developed are impressive, part of a pioneering effort in the use of advanced machine learning and natural language processing to ensure more efficient strategic business investments.

Melody K. Smith

Sponsored by Data Harmony, a unit of Access Innovations, the world leader in indexing and making content findable.