A First Step Towards RDF in Koha RFC

From Koha Wiki
Jump to navigation Jump to search

A First Step Towards RDF in Koha RFC

Status: unknown
Sponsored by: Stockholm University/The National Library of Sweden
Developed by: Magnus Enger, David Cook
Expected for:
Bug number: Bug
Work in progress repository:
Description: A first step towards adding support for RDF (linked data) in Koha, using OAI-PMH to harvest the data and then using it to enrich Kohas OPAC.


Background

Linked Open Data (LOD) is public data that is available in digital form and published according to the principles of linked data. These principles requires that the data is published in a standardized format (such as RDF), which gives each data element a unique identifier. RDF (Resource Description Framework) is a framework to express information about resources. It was originally a data model designed to express metadata, but has come to evolve into a general method to express information. RDF is one of the basic techniques of the semantic web. More information on RDF can be found at: https://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/

There has been plenty of interest within the library sector for the semantic web and linked open data, where the Library of Congress is developing BIBFRAME to eventually replace MARC as it’s metadata standard. Other examples of progressing development within the area can be found in Scandinavia, with both Oslo Public Library and The National Library of Sweden developing cataloging in RDF for their respective library systems.

Today Koha relies on MARC (of various flavors) as its metadata structure and it is deeply embedded in the system. There is however a long term ambition within the community of opening Koha up to more metadata standards in general, and linked data specifically. Moving beyond MARC as a metadata format for Koha is a very large undertaking, and is beyond the scope of this project, so the idea is to find an approach that will let us keep MARC, but at the same time explore and take advantage of new technologies.


Aim of the project

Stockholm University Library, with funding from the National Library of Sweden, aims to take a step in such a direction, enabling Koha to be able to handle, ingest, store and above all make use of linked open data (basically RDF, in some form). The idea of the project is to build on workflows between the Swedish union catalogue LIBRIS and a local Koha instance – while showing the possibilities of linked open data within an ILS. However, the aim is also to allow for local instances and not only be built on the workflow for union catalogues.


The basics essentials (nitty-gritty)

Getting data into Koha would be done one in two ways:

  1. Using an external source like LIBRIS, and potentially BIBFRAME in the future. I.e. it is catalogued somewhere else, and harvested to Koha through OAI-PMH. The dataflow from LIBRIS comes in two steps: first, a MARCXML record is harvested and imported into Koha. A named graph is then created in a triplestore when the MARCXML record is added. The RDF graph is then in turn harvested and stored in the triplestore.
  2. Using existing MARCXML records in the catalog, and creating named graphs from them. The data in the graph can then be populated manually or from other sources. Using an existing tool like marc2rdf (built by Oslo Public Library), as proposed in the original Linked Data RFC, is also a possibility.

After getting the linked data stored in a triple store, the rest of the project would mostly be about how to enrich Kohas OPAC with information from that triplestore. Both in displaying result-lists (how multiple edition of the same work would be handled for instance) and record display. Pages to show authors, subjects etc. in the OPAC can probably be generated.

Work that needs to be done

  • Develop Icarus (Kohas OAI-PMH harvester) to be able to harvest RDF as well as MARCXML
  • Add the downloaded RDF to a triplestore and import MARCXML into Koha
  • Build a way to create a named graph in the triplestore when new MARCXML records are added
  • Create an interface in the OPAC for exploring the data in the triplestore
  • Interface to build SPARQL queries and associated templates
  • Create an interface in Koha for enriching the RDF in the triplestore with external sources or the knowledge of local librarians etc.

Triplestore

Some of the steps involve a triplestore for saving and querying semantic data. Wikipedia has a long List of triplestore implementations - the key is to stick with some version of SPARQL, so we can give users a choice of any triplestore that supports that version. Goal: Koha should be able to support any triplestore, as long as it supports the required version of SPARQL (1.1 at the moment). This is also the prefered approach by other systems, like DSpace [1].


Outside the scope of the project

  • Anything and everything related to cataloging in Koha.
  • Using the linked data in the local triplestore to make the library holdings available on the web/indexed by the big search engines
  • This is a very promising possibility of linked data, but is also outside the scope of this project
  • Tools to convert MARC data to RDF (but this can be added later)
  • Search functionality using the RDF data (this can also be added later)
  • Replace MARC with RDF
  • Adding support for various thesauri/ontologies in Koha (like TemaTres)

See also