Harvesting Quality: Evaluating Metadata for Digital Collections
Paromita Biswas

Metadata creation practices for digital library projects vary widely amongst libraries. Digital library projects often have to deal with multiple metadata creators, new formats and resources, and dynamic metadata standards for different communities (Park & Tosaka, 2010). As a result while accuracy and consistency in metadata are prioritized by field practitioners, metadata records created for specific digital projects may lack the quality needed to support successful end-user resource discovery and access. Park and Tosaka’s survey of metadata quality control in digital repositories and collections reveal that digital repositories often rely on periodic sampling or peer review of original metadata records as mechanisms for quality assurance (Park & Tosaka, 2010).

This poster proposal presents another means of running quality checks on metadata created for digital projects based on Hunter Library’s experience with the WorldCat Digital Collection Gateway tool used for harvesting metadata for digital collections into WorldCat. Hunter Library’s digital collections are described using Dublin Core in Contentdm and the Library has recently started harvesting its collections into WorldCat using Gateway. During harvesting the Gateway, by default, places the names of “creators” and “contributors” recorded in separate fields in the local metadata environment into one broad “Author” field for WorldCat users. A cursory review of this “Author” field in WorldCat for several harvested items  from one of the library’s collections revealed an unexpected presence of corporate body names alongside personal names. Consequently this led to an evaluation of how the “creator” and “contributor” fields had been used in that collection. The “Frequency Analysis” feature in Gateway proved to be particularly useful in this evaluation since it provided a breakdown of each field in a particular collection by the values used in that field and the number of times they had been used. For example, a high frequency usage of a particular name indicated that the usage had not been a random mistake but had been consistent. A subsequent analysis of the library’s digital collections’ metadata using “Frequency Analysis” revealed that for some collections, the “contributor” field had been used to record entities whose roles, in relation to the item described, spanned from publisher, printer, editor, or recipient of letter. However, the library’s then current metadata schema had limited the definition of the “contributor” field to entities who had a direct but secondary role in the creation of an item like editors or illustrators. This discrepancy between the library’s metadata schema and the usage of the “contributor” field led to a redefinition of the role of the “contributor.” The schema now incorporates the plethora of roles that “contributors” could have in relation to an item and recommends that the role of each “contributor” be explained in the “description” field to account for the diversity of roles. Updating of the schema has thus promoted consistency in recording the “contributor” field across the library’s digital collections while also possibly benefitting users searching for an item by the various names associated with it.




digital libraries; metadata; quality control; harvesting

