LDR Processing
Table of Contents
1 Generate SIPs
RDF Turtle (Terse RDF Triple Language) files are automatically generated from the accessions database according to the data model established for each digital collection. See under each digital collection, e.g., constructing SIPs for Campus Publication accessions since 2015.
2 Ingest SIPs
2.1 Test the validity of the Turtle files as follows:
for files in *.ttl do rapper -r -i turtle -c $files done
This produces no output (-i). A modern RDF triplestore should be able to ingest Turtle files directly. If it is required to produced RDF XML, do not use -i, and capture the output to a file with a .xml extension.
2.2 Delete the graph for the collection, e.g.:
DUMPME_RDF campub
(See ancillary documentation for details.)
Note: In future, we need to update only those triples which have changed. It should not be necessary to delete the whole graph.
2.3 Ingest the Turtle files into the triplestore:
for files in *.ttl do LOADME_RDF $files campub done
(See ancillary documentation for details.)
3 Generate DIPs for a collection
For example:
sh CURLER {host}
In this example, CURLER is
#!/bin/sh usage () { echo "Usage: sh ${0##*/} host" } case $# in 0) usage exit ;; *) host=$1 sh CURL $host CAMPUBjpg.q > CAMPUBjpg.xml sh CURL $host CAMPUBmetadata.q > CAMPUBmetadata.xml sh CURL $host CAMPUBocr.q > CAMPUBocr.xml sh CURL $host CAMPUBpageNumbers.q > CAMPUBpageNumbers.xml sh CURL $host CAMPUBpdf.q > CAMPUBpdf.xml sh CURL $host CAMPUBthumbnail.q > CAMPUBthumbnail.xml sh CURL $host CAMPUBtiffinfo.q > CAMPUBtiffinfo.xml ;; esac
CURL is
#!/bin/sh usage () { echo "Usage: sh ${0##*/} host sparql_query" } case $# in [0-1]) usage exit ;; *) host=$1 sparql_query=$2 curl --silent --anyauth --user [user:pass] -H "Content-type: application/sparql-query" -H "Accept: application/sparql-results+xml" --data-binary @./${sparql_query} http://${host}:8003/v1/graphs/sparql | xmllint --format - ;; esac
CAMPUBjpg.q is
prefix dc: <http://purl.org/dc/elements/1.1/> prefix dcterms: <http://purl.org/dc/terms/> prefix edm: <http://www.europeana.eu/schemas/edm/> prefix mix: <http://www.loc.gov/mix/v20> prefix ore: <http://www.openarchives.org/ore/terms/> prefix premis: <info:lc/xmlns/premis-v2> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix xs: <http://www.w3.org/2001/XMLSchema#>
select ?jpg from <http://lib.uchicago.edu/campub> where { ?jpg dc:format "image/jpeg" . ?jpg a edm:WebResource . } order by ?jpg
(See ancillary documentation for full details.)