LDR Processing

Table of Contents

1 Generate SIPs

RDF Turtle (Terse RDF Triple Language) files are automatically generated from the accessions database according to the data model established for each digital collection. See under each digital collection, e.g., constructing SIPs for Campus Publication accessions since 2015.

2 Ingest SIPs

2.1 Test the validity of the Turtle files as follows:

for files in *.ttl
do
rapper -r -i turtle -c $files
done

This produces no output (-i). A modern RDF triplestore should be able to ingest Turtle files directly. If it is required to produced RDF XML, do not use -i, and capture the output to a file with a .xml extension.

2.2 Delete the graph for the collection, e.g.:

DUMPME_RDF campub

(See ancillary documentation for details.)

Note: In future, we need to update only those triples which have changed. It should not be necessary to delete the whole graph.

2.3 Ingest the Turtle files into the triplestore:

for files in *.ttl
do
LOADME_RDF $files campub
done

(See ancillary documentation for details.)

3 Generate DIPs for a collection

For example:

sh CURLER {host}

In this example, CURLER is

#!/bin/sh

usage () {
    echo "Usage: sh ${0##*/} host"
    }

case $# in
    0)  
        usage
        exit
        ;;
    *)  
        host=$1
        sh CURL $host CAMPUBjpg.q > CAMPUBjpg.xml
        sh CURL $host CAMPUBmetadata.q > CAMPUBmetadata.xml
        sh CURL $host CAMPUBocr.q > CAMPUBocr.xml
        sh CURL $host CAMPUBpageNumbers.q > CAMPUBpageNumbers.xml
        sh CURL $host CAMPUBpdf.q > CAMPUBpdf.xml
        sh CURL $host CAMPUBthumbnail.q > CAMPUBthumbnail.xml
        sh CURL $host CAMPUBtiffinfo.q > CAMPUBtiffinfo.xml
        ;;
esac

CURL is

#!/bin/sh

usage () {
    echo "Usage: sh ${0##*/} host sparql_query"
}

case $# in
    [0-1])  
        usage
        exit
        ;;
    *)  
        host=$1
        sparql_query=$2
        curl --silent --anyauth --user [user:pass] -H "Content-type: application/sparql-query" -H "Accept: application/sparql-results+xml" --data-binary @./${sparql_query} http://${host}:8003/v1/graphs/sparql | xmllint --format -
        ;;
esac

CAMPUBjpg.q is

prefix dc: <http://purl.org/dc/elements/1.1/>
prefix dcterms: <http://purl.org/dc/terms/>
prefix edm: <http://www.europeana.eu/schemas/edm/>
prefix mix: <http://www.loc.gov/mix/v20>
prefix ore: <http://www.openarchives.org/ore/terms/>
prefix premis: <info:lc/xmlns/premis-v2>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix xs: <http://www.w3.org/2001/XMLSchema#>
select ?jpg
from <http://lib.uchicago.edu/campub>
where {
  ?jpg dc:format "image/jpeg" .
  ?jpg a edm:WebResource .
 } order by ?jpg

(See ancillary documentation for full details.)

Author: Charles Blair (repository@lib.uchicago.edu)

Date: 2015-06-24

Emacs 25.3.1 (Org mode 8.2.10)

Valid XHTML 1.0 Strict