Texas Digital Library Conference System, TCDL 2012

Font Size: 
Metadata Quality Enhancement for Large Digital Collections: Web Browser Automation with Selenium IDE
Andrew James Weidner, Daniel Gelaw Alemneh

Last modified: 2012-04-16


Creating and maintaining accurate descriptive metadata for digital objects is one of the best ways to connect with digital library users and maintain those connections over the long term. Good metadata empowers users to not only discover exactly what they searched for, but also to locate relevant resources that they did not expect to find. Metadata quality characteristics for digital libraries depend on many factors, including: the types of resources the repository offers and the users’ needs, which vary across the spectrum of user communities. The metadata quality issue is particularly acute if there are multiple institutions participating in collaborative digital projects that employ diverse naming schemes for their documents and files. Furthermore, harvesting large sets of documents from open repositories presents a number of challenges for creating accurate descriptive metadata. For example, metadata schema do not always map well, creating disconnects when published in the local repository. In the aforementioned cases, substantial rework is usually required to create descriptive data that meets local repository standards.

The University of North Texas (UNT) digital libraries group utilizes various tools and mechanisms to ensure metadata consistency and precision across all digital resources. Pre-populated controlled vocabulary terms in its Web-based Dashboard editing interface enable metadata operators to easily select standard values via drop-down menus and auto-suggest for text input fields. In addition, careful mapping prior to ingest facilitates accurate conversions among various metadata element sets. Crosswalks also facilitate exporting metadata records to other systems. To support these activities—in cases where post-ingest metadata normalization will enhance recall and precision for its digital objects—the UNT Libraries recently implemented Selenium IDE as a tool for streamlining the process of editing large sets of metadata records. Created by the Web development community in order to simplify the process of testing Web applications, Selenium IDE is a Firefox browser plug-in that provides an integrated development environment for creating, debugging, and running Web browser automation scripts.

This poster will discuss the complex set of tools and actions required to maintain usable and sustainable digital collections and demonstrate how Selenium IDE facilitates metadata editing for large digital collections by automating a range of data entry tasks. Any institution that employs a content management system with a Web-based metadata editing interface can potentially benefit from Selenium IDE’s automation capabilities.


descriptive metadata; controlled vocabularies; metadata normalization; software