Access Innovations, Inc. was recently engaged by the George A. Smathers Libraries to help them define processes that can be used to identify and organize content for the Portal of Florida History, including the development and application of enhanced metadata using controlled vocabulary.
A set of 25,000 digital and digitized University of Florida theses and dissertations were selected as the test content for the project. The objective was to apply enhanced subject and geographic metadata derived using controlled vocabulary to each publication and subsequently identify the ones of which Florida itself is the subject, rather than the source of the publication. Access Innovations developed a metadata schema for the project using its XIS® (XML Intranet System), building an extended Dublin Core application to hold the metadata. Once the schema was tested and approved, Access Innovations launched a XIS® project to accommodate the full text and metadata.
After testing several thesauri, the JSTOR thesaurus was selected as the controlled subject vocabulary for the pilot project. Access Innovations extracted subject metadata from the full text of the UF theses and dissertations using a MAIstro™ instance and deposited the full metadata record along with the full text into a XIS® project. In addition, Access Innovations extracted a set of “Florida-specific terms” to be used to identify candidate theses and dissertations for inclusion in the Portal of Florida History. This taxonomy will include Florida place names, notable people and other terms indicative of Floridian content. It can be extended for use with print collections.
“Recent large scale initiatives have focused on the need for significantly expanded and enhanced metadata for our digital collections, both retrospective and prospective,” said Judith Russell, Dean of University Libraries at the University of Florida. “We intend to apply the automated tools and the techniques learned from this initial project with Access Innovations to other digital collections”
“This has been an intriguing challenge for us,” said Marjorie M.K. Hlava, president of Access Innovations. “This is an opportunity to apply subject metadata from deep thesauri to library content improving access and discoverability of collections far beyond what the current cataloging process allows.”
Access Innovations looks forward to the next steps in this project with the University of Florida Libraries.
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. Access Innovations’ Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.