Texas Digital Library Conference System, 2014 Texas Conference on Digital Libraries

Font Size: 
Examining Massive Digital Libraries
Andrew Weiss

Last modified: 2014-03-25


Massive Digital Libraries can be defined as digitized book collections that rival or even surpass the current size of most physical "brick and mortar" libraries. Many of these MDLs reach sizes of several million volumes. The largest is Google Books at nearly 30 million volumes and the HathiTrust is a distant second at 11 million volumes.

This presentation will examine the results of two related studies.  For the first, a study currently being conducted examines levels of access for four Massive Digital Libraries, including Google Books, HathiTrust, Open Content Alliance’s Open Library, and Internet Archive among Spanish language and English language random samples.  In a preliminary study, differences in the level of access between Spanish and English language books were noted and compared.  This study provides a more complete examination of the data of nearly 1,200 records culled randomly from a library catalog.

In the second study, the author examines rates of error and problems associated with scanning Japanese language books found in the Google Books and HathiTrust Massive Digital Libraries. The study is based on interviews conducted by the author at Keio University in Tokyo, Japan, the sole Japanese organization to partner with Google Books, and on a current examination using randomized records retrieved from OCLC World Cat.  The results show a number of errors in metadata and scanning that occur.

The results of both studies point suggest that aggregated content development in massive digital libraries may be impacted negatively by a lack of diversity in partnerships.

Furthermore, problems of mass digitization of non-English, non-Western books occur due not only to the limited numbers available but also due to issues of copyright clearance, availability of materials and non-Western book binding techniques and printing technologies.


digital libraries, analysis, metadata, diversity, Japanese language, Spanish language

Full Text: Slideshow