Texas Digital Library Conference System, 2015 Texas Conference on Digital Libraries

Font Size: 
Automation, Virtualization, and Integration of a Digital Repository Server Architecture or How to Deploy Three Production DSpaces in One Night and Be Home for Dinner
James Creel, John Micah Cooper, Jeremy Huff

Last modified: 2015-03-17

Abstract


Texas A&M University Libraries have operated a DSpace repository since the year 2004.  For most of this period, the public-facing production server ran on dedicated hardware and was installed by hand by a system administrator using numerous tweaks for the local architecture. DSpace requires several inter-dependent sub-applications including a user interface (XMLUI or JSPUI), a Solr indexer, an OAI service, and a handle server.  Even after careful pre-production releases and testing, redeployments can take all night and lead to days or weeks of bug fixing.

Yet migrations and upgrades are inevitable and desirable.  The open-source community is constantly implementing new features.  In addition, with usage and content submission, a production DSpace instance will eventually outgrow its server hardware and need to be redeployed.

With a diversity of special requirements from repository stakeholders, TAMU Libraries has a history of heavy customization of DSpace.   An abridged list of customizations includes a search interface for ETDs, additional options for item licensing, request a copy for restricted items, an expanding/collapsing browser for the community/collection hierarchy, and improvements to administrative views, all presented with a branded theme.  These customizations touch on every level of the code, from the java backend to the XML front-ends.

Although DSpace-related software development at TAMU produced important contributions early on, most notably the XMLUI front-end (aka Manakin), over the years local demands for it new features fast outweighed the imperative to package features for submission back to the core DSpace code.  Such demands proved shortsighted, as DSpace upgrades became increasingly difficult and fraught as customization increased.  Lead times for redeployments grew intolerable, as developers were forced with each upgrade to rewrite customizations and examine thousands of lines of configuration.

Recently, three factors have brought about a profound improvement in developers' efficiency at TAMU Libraries - but not without cost.  The Libraries' system administration leadership saw a need to automate server deployment tasks, and in a related initiative, to move to a virtualized server infrastructure whereby server machines are deployed to commodified and generic virtual machines using resources from a transparently allocated hardware pool.  Finally, since DSpace 3.x, the official DSpace code has been refactored in such a way as to facilitate customization with independent sub-modules that need not disturb the build structure.  The repayment of this technical debt has required about a year, during which time customers made do with very little new feature development.  However, having returned to a standard build with modular customizations, we are now better equipped to submit our customizations back to the community.

Our software for build automation is the Chef tool, a Ruby based framework that encapsulates a multitude of common deployment functions like writing and templating files, managing users and permissions, and enabling services.  For our virtualization infrastructure, we started on OpenStack, and have recently migrated to vmware.

In this talk, we will recount experiences with systems and customers during our lengthy transition to automation and virtualization and conclude with some recent success stories about production DSpace deployments.


Keywords


DSpace; automation, virtualization; Chef; continuous integration, vmware; virtual machines

Full Text: Slideshow