TCDL 2013

Determining and Mapping Locations of Study in Scholarly Documents: A Spatial Representation and Visualization Tool for Information Discovery
James Creel, Katherine Weimer

Theses and dissertations play a significant role in the scholarly literature and often refer to locations of interest or regions under study. Through geoparsing, which is the identification and disambiguation of place names, we have created a tool to generate interactive maps of the geographic locations referenced in theses and dissertations. Our visualization affords increased awareness of the numerous locations being researched and which departments and majors are studying each location. More broadly, the interface supports multidisciplinary research, student recruitment and faculty collaboration. Using geographic and gazetteer metadata and open source mapping applications, this tool provides researchers with serendipitous geographic and interdisciplinary connections. The beta version consists of several DSpace curation tasks to take a given ETD through each step of the metadata creation and mapping processes. Once the tool has suggested geospatial metadata for an ETD, the DSpace administrative interface allows curators to approve the suggested metadata values.

Our geoparser integrates various open-source tools as well as specialized heuristics to automate the name extraction and disambiguation tasks. We have employed the OpenNLP and Stanford NLP libraries for the name extraction task, and use the Geonames gazetteer as our source for referenced entities. A preliminary evaluation of the tool indicates an accuracy of 84% with regard to the disambiguation of names to specific Geonames IDs. Work toward improving the accuracy is ongoing.

The visualization component of the tool reads geospatial metadata as KML and can render the referenced locations in any of three map visualization options selected by the reader: OpenLayers, OpenStreetMaps and Google Maps. Once a site of interest is located on the map, the reader may select a link to the complete thesis or dissertation stored in the university's instance of DSpace, our institutional repository. The long-term goal of this project is to extend the content to include all TDL ETDs for a widely used search mechanism.


geoparsing; geocoding; toponyms; named entity recognition, name disambiguation; natural language processing

