Karma: An Open Source Tool for Linked Data

TCDL 2015 Pre-Conference Workshop

Date: Sunday, April 26, 2015
Time: 1:00 PM - 5:00 PM
Location: Perry-Castañeda Library Room 4.102
Cost to attend: $25.00


Workshop Description

Karma is an open source tool that makes it easy to convert data from a variety of formats into Linked Data.

Users load into Karma the ontologies for their application and data samples of each of the data files to be converted. Karma makes the conversion process easy as it provides an intuitive graphical user interface to visualize and edit the mapping of data files to ontologies; Karma is flexible as it can import data from a wide variety of data formats (SQL, XML, JSON, CSV, Excel, AVRO, Web-Services) and it allows users to define Python scripts to reformat and clean the data before conversion; Karma learns to map the data, providing guidance on how to map each data field to the ontology; Karma can produce RDF for loading in a triple store and JSON for loading in NoSQL stores such as MongoDB and ElasticSearch; Karma scales to very large dataset (40 million documents, 1 billion triples) and can refresh periodically (e.g., every hour); Karma is a free, open source tool available at http://github.com/usc-isi-i2/Web-Karma.

The workshop will introduce the basic capabilities of Karma and provide attendees with hands on training on Karma. Attendees will learn how to provide ontologies to Karma, how to load data, how to define URIs, how to transform data using Python scripts, how to map the data to the ontology, how to save, reuse and share mapping files, and how to produce RDF and JSON.

About the Instructor:

Dr. Pedro Szekely is a Project Leader at the USC Information Sciences Institute (ISI) and a Research Associate Professor at the USC Computer Science Department. Dr. Szekely joined USC in 1988 after receiving his M.S. and Ph.D. degrees in Computer Science from Carnegie Mellon University in 1982 and 1987, respectively. His research interests include Big-Data, Semantic Web and Human-Computer Interaction. His research focus is on techniques and tools to extract and integrate data from a wide variety of sources (Web pages, databases, spreadsheets, etc.), and on methods to index the integrated data to support accurate querying and sophisticated analysis. The resulting software tool Karma, released as open source, has been used in a variety of applications, including intelligence analysis, bioinformatics, environmental engineering and cultural heritage. A notable example is the work with the Smithsonian American Art Museum to publish the meta-data about the museum’s collection as Linked Open Data.

