Texas Digital Library Conference System, 2016 Texas Conference on Digital Libraries

Font Size: 
Novel Workflow for Large Scale Thesis Digitization
Todd Peters, Jeremy Moore, Jason Long

Last modified: 2016-03-22

Abstract


Texas State University recently began digitizing approximately 6,000 theses to create digital preservation copies and electronic versions that may eventually be used for patron access. This presentation will discuss our novel workflow that allows student workers to rapidly scan, process, and perform quality control on the images while managing the metadata necessary for future ingest into our institutional repository. In brief, the process begins with students debinding and scanning theses, downloading MARC records with MARCEdit, and using an in-house web application to sort images based on content. Students then process the images with a combination of BASH scripts, ImageMagick, and Adobe Photoshop as they perform quality control and fix any errors found. The resultant preservation TIFFs are OCR’d and combined into PDFs using ABBYY FineReader 12. A final quality control step is performed by the Digital Media Specialist at which point the electronic conversion has been completed. The workflow allows a student to process approximately 50 theses in a 20-hour work week.


Keywords


ETD; Theses; Digitization; Metadata; Project Management; Scanning

Full Text: Slideshow