HathiTrust Research Center has expanded its services to now support computational research of the world’s largest digital library collection housed by HathiTrust. This interesting news came to us from their press release, “One of the world’s largest digital libraries opens doors to text mining scholars.”

Previously scholars could only data mine the open-source part of the HathiTrust collection. Now, however, the collection in its entirety can be analyzed. Algorithms allow for the exploration of more than 14 million digitized materials which includes over 7 million books, 725,000 U.S. Federal documents and 350,000 serials. This allows scholars to find trends and make comparisons across time and subject. The types of data that can be collected are immense and can lead to new insights and understandings.

The HathiTrust Research Center is a cooperative service between Indiana University, the University of Illinois and HathiTrust.

The critical part of any data, regardless of type, is being able to find the content you are looking for with ease and speed. One way to ensure findability, even with big data issues, is through a taxonomy. A standards-based taxonomy provides clear and concise order to your data, which enables comprehensive search results. Standards are key to a solid taxonomy and comprehensive indexing.

Jennifer Crawford
Access Innovations

Sponsored by Access Innovations, the world leader in indexing and making content findable.