Texas Digital Library Conference System, TCDL 2012

Font Size: 
Connecting with Users beyond Language Boundaries through Multilingual Information Access for Digital Collections
Jiangping Chen

Building: AT&T Executive Education and Conference Center
Room: Amphitheatre 204
Date: 2012-05-25 08:30 AM – 09:00 AM
Last modified: 2012-04-25


Very few digital collections in the United States support multilingual information access (MLIA) that enables non-English users to search, browse, recognize, and use information from multilingual digital objects. In the increasingly global knowledge society, libraries and museums need to design and implement effective and efficient MLIA in order to serve broader user groups and to sharing information with global societies.

This presentation will discuss and demonstrate a research project titled "Enabling Multilingual Information Access to Digital Collections: An Investigation of Metadata Records Translation," which is a collaboration of four entities: The Department of Library and Information Sciences in the College of Information at the University of North Texas(UNT); the UNT Libraries Digital Projects Unit (DPU); the School of Information Management at Wuhan University, China; and the Autonomous University of the State of Mexico (UAEM) in Mexico. The project is jointly funded by U.S. Institute for Museum and Library Services (IMLS: http://www.imls.gov/) and UNT. It aims to evaluate the extent to which current machine translation technologies generate adequate translation for metadata records, and to identify the most effective metadata records translation strategies for digital collections.

During the first year of this project, the research team developed HeMT (http://txcdk-v10.unt.edu/HeMT/): a multilingual participatory platform for human evaluation of machine translation. HeMT is used by three types of users including translators, evaluators, and reviewers. It consists of six major modules: User Management, Manual Translation, User Training, Evaluation, Result Visualization, and Multilingual Lexicon Management. HeMT can be used by digital libraries and machine translation communities for conducting manual translation and machine translation evaluation tasks. A usability testing has been conducted during the development of HeMT. Evaluators recruited from China and Mexico have used HeMT to perform the evaluation of metadata records machine translation. The evaluation results can be visually presented and viewed in real-time.

The second phase of this project will focus on exploring effective Multi-engine Machine Translation (MEMT) strategies in order to provide guidance for digital libraries that are interested in implementing MLIA. In order to train our MEMT system, we are seeking for collaborations with libraries in China and Mexico through our partners in these two countries. Specifically, we expect to obtain metadata records in Chinese and Spanish to develop the language models for metadata records for English-Chinese and English-Spanish machine translation. Our future work will focus on evaluating and implementing the metadata records translation strategies identified from this project through collaborating with 1-2 digital collections in different subject domains.

Digital libraries should connect effectively with their users and collaborators for sustainable development and services. This presentation will also discuss challenges and benefits of crowdsourcing and collaboration based on our experience in this project.


metadata records, machine translation evaluation, multilingual informaiton access

Full Text: Slideshow