Access Innovations, a leading data management firm, has partnered with AIP Publishing LLC, one of the world’s largest physical science publishers, to produce a comprehensive list of about 980,000 discrete academic authors and 33,000 institutions involved in publishing physics papers worldwide. Dating back to the early 20th century, the complete dataset contains more than 800,000 articles.
Marjorie M.K. Hlava, president of Access Innovations, said the project provides a basis for reliable, accurate retrieval of records by author name.
Access Innovations used the peripheral metadata about each author in a sophisticated disambiguation process taking into account dates, affiliations, and subject areas to determine which authors were duplicated with variations. The data was then subjected to human editorial review to refine the process.
Each author or affiliation record includes a full list of the available DOIs (digital object identifiers) for every article from that source in the AIP collection. The result is a database of publishing physicists complete with a record of affiliations, areas of expertise, papers published, and co-authors.
The original list of 5.5 million potential names resulted in about 980,000 unique author records. “The first challenge was to figure out who was who,” noted Bob Kasenchak, project manager for Access Innovations. “I’m sure you can imagine just how many people named ‘Robert Smith’ have published an article in a physics journal in the past 100 years. We found over 6,000 authors with the surname “Wang.” In addition, one journal might only use an author’s first initial. Another journal may include an author’s middle initial. We used all of the information and resources available to us in order to better determine unique identities.”
“Because we were able to add subject information and keywords from each individual author’s article, we’re able to create conceptual profiles of an author’s area of expertise,” Hlava explained. “This project is not only a vast improvement to the interconnectivity of the physics publishing community, but it will also better facilitate networking among colleagues.”
Institution records also underwent the same rigorous disambiguation process. From the original set of more than 800,000 articles, about 33,000 universities, laboratories, and other research institutions emerged. “Not only are there many ways of writing ‘Oxford University,’ we also had to contend with colleges that merged, disappeared, or changed names, or which are expressed in languages other than English — as well as those institutions in countries that no longer exist,” Kasenchak explained. “For example, South China Normal University used to be called Guangdong Teacher’s College, and before that, it was known as the Education College of Xiangqin.”
“This project, when incorporated into AIP Publishing’s Scitation platform, will improve the interconnectivity among publishing physicists around the globe,” said Evan Owens, CIO, AIP Publishing. “The next step is to have our users utilize the profiles that have been created and provide feedback that will help us identify areas where additional refinements to the process are needed.”
Founded in 1978, Access Innovations has extensive experience with Internet technology applications, master data management, database creation, thesaurus/taxonomy creation, and semantic integration. The Access Innovations Data Harmony software includes automatic indexing, thesaurus management, an XML Intranet System (XIS), and metadata extraction for content creation developed to meet production environment needs. Data Harmony is used by publishers, governments, and corporate clients throughout the world.
About AIP Publishing – http://journals.aip.org
AIP Publishing LLC is a scholarly publisher in the physical and related sciences, providing the global science community with a comprehensive collection of highly cited peer reviewed scientific information. Accessed by researchers at nearly 4,000 institutions worldwide, AIP Publishing’s portfolio of 17 journals includes prestigious titles such as Applied Physics Letters, Journal of Applied Physics and The Journal of Chemical Physics, and the AIP Conference Proceedings. AIP Publishing LLC is a wholly-owned subsidiary of the American Institute of Physics, and publishes on behalf of several of AIP’s Member Societies and other publishing partners.