Texas Digital Library Conference System, 2016 Texas Conference on Digital Libraries

Font Size: 
OpenRefine: or How I Learned to Stop Worrying and Love Data Transformation
Kara Long, Darryl Stuhr

Last modified: 2016-03-22

Abstract


The Baylor University Library launched its first digitization project in 1999, with the Spencer Collection of American Popular Sheet Music. The first phase of the project was to scan and place online 1,000 pieces of music, out of nearly 30,000 pieces in the print collection. The online collection now comprises over 7,000 titles of American sheet music from the 18th to the 20th centuries. A major challenge throughout the project has been generating rich metadata for the digital objects. In 2008, the Library contracted with Flourish Music Cataloging to outsource the creation of MARC records describing the print collection. These records are transformed into metadata describing the digital collection as well.  

This presentation will cover the history of this project and the evolving workflow, as well as demonstrate the most recent change – implementing OpenRefine into the workflow toquality check and transform the metadata from the catalog records. This transformed metadata is used to generate a CONTENTdm load file. This presentation will interest metadata librarians, especially those interested in OpenRefine, and CONTENTdm administrators.


Keywords


metadata; OpenRefine; digital collections

Full Text: Slideshow