Easy integration of external data RFC
Background
The Web is starting to create some new identifiers that are highly relevant for libraries, but they are hard to use in our online presence because we are tied to the fossilized MARC format. In which MARC field do you put a MusicBrainz ID or DBpedia URI? And how do you make use of them? This RFC describes a novel way to tie identifiers to bibliographic records, and utilize data from the web in the OPAC.
Two main approaches
There are two things we might like to do:
- Link to external websites - if we have the MusicBrainz ID for an album, we can link to the page for that album in MusicBrainz
- Mash data from external sources into the OPAC ("mashin"), so we can enrich the user experience with data from the web
Basic mechanics of a mashin
The second point above can be implemented as follows:
- Construct a URL, based on the identifier and a URL template
- Fetch data from the URL
- Parse the data into a datastructure
- JSON can be parsed directly
- XML can be parsed with a generic parser, or one tailored to the data in question, e.g. RSS
- Pass the datastructure and a template for rendering the datastructure to the template engine
- Render the template with the data from the external source as part of the OPAC detail template, using the "eval" filter
Tables and columns
Table: identifiers
Columns:
- identifierid
- Primary key, integer, auto increment
- biblionumber
- Foreign key for biblios
- identifier
- This can be any kind of ID, including a URI
- identifiertype
- Foreign key for the identifiertype table
Table: identifiertype
Columns:
- identifiertypeid
- Primary key, integer, auto increment
- targeturl
- The URL to which an HTTP request will be made, in order to fetch data. This can be treated as a template where e.g. {{ID}} will be replaced with the value from identifiers.identifier
- sparql
- A SPARQL template
- datatype
- Possible values: json, xml, rss (or we might skip this and autodetect the format of the data)
- template
- A TT template that can display the data
- locationonpage
- An identifier used for selecting where on the detail view to display this data, e.g. belowitems, aboveitems
- cachetime
- How long should data from remote sites be cached
Use cases
Simple links
identifiertype.targeturl is empty. Just provide the identifiers.identifier as data to the template and it will render as one or more links.
identifiers.identifier:
f4d4ba9a-bb17-49fd-91de-360c3b8f9a78
identifiertype.template:
<ul> <li><a href="http://musicbrainz.org/release/{{ID}}">MusicBrainz</a></li> </ul>
This would replace {{ID}} with the value from identifiers.identifier and render a link to MusicBrainz, using the ID to create an exact link.
REST API
identifiers.identifier:
f4d4ba9a-bb17-49fd-91de-360c3b8f9a78
identifiertype.targeturl:
http://musicbrainz.org/ws/2/release/{{ID}}?inc=artists+labels+recordings
Make a HTTP GET request to the resulting URL, and retrieve XML data:
http://musicbrainz.org/ws/2/release/f4d4ba9a-bb17-49fd-91de-360c3b8f9a78?inc=artists+labels+recordings
Parse the XML into a datastructure and pass that to the template, along with the template from identifiertype.template, which might look something like this:
Tracks on this album:
<ol> [% FOREACH track IN mashin.belowitems.data.metadata.release.medium-list.medium.track-list %] <li>[% track.recording.title %]</li> [% END %] </ol>
SPARQL
identifiers.identifier:
http://dbpedia.org/resource/Master_of_Puppets
identifiertype.sparql:
select ?last ?next where { <{{ID}}> <http://dbpedia.org/property/lastAlbum> ?last . <{{ID}}> <http://dbpedia.org/property/nextAlbum> ?next . }
These two are combined into this SPARQL query:
select ?last ?next where { <http://dbpedia.org/resource/Master_of_Puppets> <http://dbpedia.org/property/lastAlbum> ?last . <http://dbpedia.org/resource/Master_of_Puppets> <http://dbpedia.org/property/nextAlbum> ?next . }
This query is then combined with the URL template in identifiertype.targeturl:
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query={{SPARQL}}&format=application/sparql-results+json&timeout=0&debug=on
This is the complete request that is sent, using e.g. RDF::Query::Client:
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+%3Flast+%3Fnext+where+{+%0D%0A++%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FMaster_of_Puppets%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FlastAlbum%3E+%3Flast+.%0D%0A++%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FMaster_of_Puppets%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FnextAlbum%3E+%3Fnext+.%0D%0A}+&format=application/sparql-results+json&timeout=0&debug=on
This returns JSON data like this:
{ "head": { "link": [], "vars": ["last", "next"] }, "results": { "distinct": false, "ordered": true, "bindings": [ { "last": { "type": "literal", "xml:lang": "en", "value": "Ride the Lightning" } , "next": { "type": "literal", "xml:lang": "en", "value": "...And Justice for All" }} ] } }
The JSON data is then parsed into a data structure and passed along for rendering, together with the template from identifiertype.template, which might look something like this:
<ul> <li>Search for previous ablum: <a href="?q=[% mashin.aboveitems.data.results.last.value %]">[% mashin.aboveitems.data.results.last.value %]</a></li> <li>Search for next ablum: <a href="?q=[% mashin.aboveitems.data.results.next.value %]">[% mashin.aboveitems.data.results.next.value %]</a><</li> </ul>
Passing stuff to TT
Conceptually, the datastructure passed to TT should look something like this:
mashin.[locationonpage].data.[data from json or xml] .template.[template from identifiertype.template]
Or if we want to have multiple pieces of data in the same location, [locationonpage] could be an array of hashes:
mashin.[locationonpage][0].data.[data from json or xml] [0].template.[template from identifiertype.template] [1].data.[data from json or xml] [1].template.[template from identifiertype.template]
Locating things on the page
In the template for the OPAC detail page we could put things like this in the locations we wanted to "open up" to data from external sources:
[% PROCESS mashin location='belowitems' %]
And then define one BLOCK that processes the data and template for that location, if there is any:
[% BLOCK mashin %] [% IF mashin.$location %] [% mashin.$location.template | eval %] [% END %] [% END %]
(The TT code here has not actually been tested, but something similar to it should hopefully work.)
Serverside or AJAX?
The description above is based on all the processing happening serverside, resulting in a complete page being sent to the client. This can of course result in delays, because we have to wait for one or more external services, that can be slow or not respond at all. An alternative approach would be to send the basic page to the client, and then use AJAX techniques to fetch HTML fragments produced by methods similar to the ones described above from the server, and then shoehorn them into the page when they arrive at the client.
Other stuff
- Display records that share the same ID as "related"