Texas Digital Library Conference System, TCDL 2013

Font Size: 
Current Practices in Quality Assurance for Web Archives
Brenda Reyes

Last modified: 2013-04-19


Web archiving is the process of storing and maintaining Internet resources (such as websites) to preserve them as a historical, informational, legal, or evidential record. The process involves three stages: selecting relevant resources for preservation, gathering and storing them, and providing for their access. In recent years, it has become an increasingly common practice in libraries around the world, as national libraries, such as the Library of Congress and the National Library of Australia, seek to preserve their national digital heritage. Many universities have also begun archiving the web, usually to create subject-specific collections of web sites that supplement their existing print and digital collections.

Within the web archiving community, a step that often goes unmentioned is the Quality Assurance process (QA), which measures the quality of an archived site by comparing it to a standard that must be met. Currently, each institution conducts its QA process independently, using a myriad of different standards and software tools. The result is a considerable knowledge gap: practitioners do not know if and how their peers are conducting a QA process and generally do not share this information. Consequently, there are no agreed-upon quality standards or processes.

The study presented here attempts to address this information gap in the web archiving community. To this end, we investigated how several institutions conduct their quality control processes. It is worth noting that quality control procedures are often not publicly available and not thoroughly documented, if at all. Much of the information present here has been obtained from reports, electronic communications, listserv discussions, and interviews with staff involved in the QA process. The results we obtained led us to design a survey instrument to gather information in a more thorough and structured manner. The results from this survey are included here.


web archives; quality assurance; digital preservation;

Full Text: Poster