Digital Collections: Specifications for Ingest into the Library Digital Repository

Table of Contents

1 Introduction

The Library Digital Repository (LDR)1 consists of collections. It includes manuscript collections (for example, the papers of a faculty member), digital collections (for example, Chopin Early Editions), websites, videos of lectures, etc.

This document is concerned with specifications for digital collections disseminated from the LDR.

The goal of many preservation repositories is to maintain usable versions of intellectual entities over time. For an intellectual entity to be displayed, played, or otherwise made usable to a human, all of the files making up at least one version of that intellectual entity must be identified, stored, and maintained so that they can be assembled and rendered to a user at any given point. A representation is the set of files required to do this. (PREMIS Data Dictionary for Preservation Metadata, version 2.2, July 2012, p. 8.)

Each digital collection consists of intellectual entities (e.g., a musical score or a manuscript) consisting of one or more representations of that entity (e.g., the digital masterfile for each page object, a METS representation for the digital masterfiles, associated OCR, and descriptive metadata, or a PDF representation of the entity).

For each digital collection, and for each intellectual object in that digital collection, this document specifies the required components. For any collection, specifications may change over time, for instance, as the result of changes in equipment creating the digital objects forming part of a representation.

Not all preservation repositories will be concerned with representations. A repository might, for example, preserve file objects only and rely on external agents to assemble these objects into usable representations. If the repository does not manage representations, it does not need to record metadata about them.

This document contains the specifications for ensuring that the LDR contains all the digital objects necessary for usable representations of the intellectual entities in a digital collection. These specifications also assist in the creation of software to assemble these representations. Representations will be maintained in the LDR as pointers to the latest versions of accessioned digital objects using the Turtle (Terse RDF Triple Language) linked-data syntax.2

Not all of the digital collections listed in the Library's Digital Library Collections & Activities pages are included in what follows. Some may be out of scope (e.g., a simple website, archival and manuscript finding aids, the digital components of finding aids), and some may not have been deposited (e.g., the components of the Digital South Asia Library).

Notes on nomenclature: "Deposit" and "accession" are used interchangeably in this documentation. "SIP" is the abbreviation for Submission Information Package in OAIS ( Open Archival Information System) terminology.

2 American Environmental Photographs, 1891-1936

Digital images from 4,442 glass lantern slides, glass negatives, and photographic prints created by faculty members and students of the University of Chicago Department of Botany between 1891 and 1936.

From Judith Dartt, 2014-09-19:

There are 210 tiffs in each of the tiff-CROPPED and tiff-UNCROPPED directories. The total of AEP tiff files then is 4,632, which is the same count as the descriptive metadata records in the database table aepDescriptive.

When I did the scanning, it was determined that some images would be cropped if there was a "bad spot" in them, overly bright, scratched, etc. I believe the most sensible approach would be to add the uncropped versions to the Tiff directory and delete the crops, since the crop approach isn't current practice.

As to the missing 12 aep-mas2-13 files, … Harvard claimed ownership of those we had of the Arnold Arboretum. That is why there is a difference of 12 in the derivatives, 4,620 instead of 4,632. We pulled them from online …

This collection was hosted by the Library of Congress until 2014-09. Judith Dartt will edit a few of the metadata records. We will then

  • deposit the complete set of 4422 records for images we own as XML
  • re-derive the OAI DC records and re-populate the OAI-PMH provider with the corrected records
  • delete the DCMI Metadata Terms records from the provider
  • deposit 210 uncorrected TIFF images into the LDR as a new accession, superseding those in the old accession, acq6q9q0t50.

3 The Automatic Age

The Automatic Age is a magazine-like publication.

This collection has yet to be added to the LDR.

4 Campus Publications

http://campub.lib.uchicago.edu/

Campus Publications is designed to include both University and student publications, for example, back issues of the University Record, Cap and Gown, and the Maroon.

Campus Publications is an open collection. New titles and issues will be added.

For file-naming conventions, see Campus Publications Specifications.

What follows are the specifications needed to ensure the completeness of a deposit for each intellectual entity in the collection.

For all accessions, each intellectual entity in the collection consists of two representations, one suitable for browser-based page-turning, one a PDF file, suitable for printing.

4.1 For accessions through 2014

Each intellectual entity is an aggregation of two representations and a description.

The description:

  • a .dc.xml file of descriptive metadata (title, date, identifier, description)

4.1.1 Representation 1

  • a .txt file of structural metadata (object, page, milestone)

For each page object

  • a .tif masterfile
  • a .jpg derivative
  • a .xml file of OCR
  • a .pos file of position data

4.1.2 Representation 2

  • a .pdf file

4.1.3 SIP

A SIP is constructed from the above, as indicated by the following (partial) example.

@prefix edm: <http://www.europeana.eu/schemas/edm/>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix erc: <http://purl.org/kernel/elements/1.1/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix oai: <http://www.openarchives.org/OAI/2.0/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.
@prefix premis: <info:lc/xmlns/premis-v2>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@base <http://ark.lib.uchicago.edu/ark:/61001/>.

<s9gx23kjrzvh8/rem/2014-006/mvol/0002/0017/0001>
dcterms:created "2014-01-21T11:24:06"^^xsd:dateTime;
dcterms:creator <http://repository.lib.uchicago.edu/>;
ore:describes <s9gx23kjrzvh8/aggregation/2014-006/mvol/0002/0017/0001>;
a ore:ResourceMap.

<s9gx23kjrzvh8/aggregation/2014-006/mvol/0002/0017/0001>
edm:aggregatedCHO <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>;
edm:dataProvider "University of Chicago Library";
edm:object <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/jpg/00000001.jpg>;
edm:isShownAt <http://pi.lib.uchicago.edu/1001/dig/campub/mvol-0002-0017-0001>;
edm:isShownBy <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/mvol-0002-0017-0001.pdf>;
edm:provider "University of Chicago Library";
edm:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
ore:isDescribedBy <s9gx23kjrzvh8/rem/2014-006/mvol/0002/0017/0001>;
a ore:Aggregation.

### The ProvidedCHO contains descriptive metadata about the item in
### simple Dublin Core. It indicates the logical components of the
### item. It contains a pointer to at least one file of descriptive
### metadata, in this case, a dc.xml file, if that is provided.
<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>
dc:coverage "Chicago";
dc:date "1924-11";
edm:year "1924";
dc:description "The alumni magazine of the University of Chicago.";
dc:identifier "mvol-0002-0017-0001";
dc:language "en";
dc:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
dc:title "The University of Chicago Magazine";
dc:type "Text";
edm:type "TEXT";
dcterms:hasPart <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000001>;
dcterms:hasPart <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000002>;
## [etc. Not all of these parts are shown below.]
dcterms:hasPart <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000055>;
## replace NNNN in what follows with the mvol number, e.g., 0001 for
## Cap and Gown.
dcterms:isPartOf <http://repository.lib.uchicago.edu/collections/mvol-NNNN>;
## dc:creator -> erc:who; if no dc:creator, then ":unav";
erc:who ":unav";
## dc:title -> erc:what
erc:what "The University of Chicago Magazine";
## dc:date -> erc:when
erc:when "1924-11";
## the URI for the edm:ProvidedCHO (i.e., the subject of these assertions) -> erc:where
erc:where <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>;
a edm:ProvidedCHO.

### For the provided Dublin Core file.
<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/mvol-0002-0017-0001.dc.xml>
dc:format "application/xml";
ore:proxyFor <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>;
ore:proxyIn <s9gx23kjrzvh8/aggregation/2014-006/mvol/0002/0017/0001>;
a ore:Proxy.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/mvol-0002-0017-0001.pdf>
dc:format "application/pdf";
dcterms:isFormatOf <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>;
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/mvol-0002-0017-0001.pdf>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "4f6237c25a51382c3f6c489e550f3b2a241574abbfc57adbf9e0f9b6c674b1a5";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 31011220;
premis:formatName "application/pdf";
premis:originalName "mvol-0002-0017-0001.pdf";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "s9gx23kjrzvh8"; 
premis:eventType "creation";
premis:eventDateTime "2014-01-21T11:24:06"^^xsd:dateTime;
a edm:WebResource.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000001>
# A ProvidedCHO must have either dc:title or dc:description. Both may
# be provided if available. Page titles come from the file of
# structural metadata (.txt file). Descriptions come from the OCR .pos
# and .xml files.
#
dc:description <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/pos/00000001.pos>;
dc:description <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/xml/00000001.xml>;
dc:language "en";
dc:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
dc:type "Text";
edm:type "TEXT";
dcterms:isPartOf <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>;
a edm:ProvidedCHO.

<s9gx23kjrzvh8/rem/2014-006/mvol/0002/0017/0001/00000001>
dcterms:created "2014-01-21T11:24:06"^^xsd:dateTime;
dcterms:creator <http://repository.lib.uchicago.edu/>;
ore:describes <s9gx23kjrzvh8/aggregation/2014-006/mvol/0002/0017/0001/00000001>;
a ore:ResourceMap.

<s9gx23kjrzvh8/aggregation/2014-006/mvol/0002/0017/0001/00000001>
edm:aggregatedCHO <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000001>;
edm:dataProvider "University of Chicago Library";
edm:isShownBy <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/tif/00000001.tif>;
edm:object <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/jpg/00000001.jpg>;
edm:provider "University of Chicago Library";
edm:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
ore:isDescribedBy <s9gx23kjrzvh8/rem/2014-006/mvol/0002/0017/0001/00000001>;
a ore:Aggregation.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000002>
dc:description "[contents of the corresponding .pos file]";
dc:language "en";
dc:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
dc:title "Page 1";
dc:type "Text";
edm:type "TEXT";
dcterms:isPartOf <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001>;
edm:isNextInSequence <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/00000001>;
a edm:ProvidedCHO.

<s9gx23kjrzvh8/rem/2014-006/mvol/0002/0017/0001/00000006>
dcterms:created "2014-01-21T11:24:06"^^xsd:dateTime;
dcterms:creator <http://repository.lib.uchicago.edu/>;
ore:describes <s9gx23kjrzvh8/aggregation/2014-006/mvol/0002/0017/0001/00000006>;
a ore:ResourceMap.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/tif/00000001.tif>
dc:format "image/tiff";
mix:fileSize 5947938;
mix:formatName "image/tiff";
mix:messageDigestAlgorithm "MD5";
mix:messageDigest "204d464e1e6b57753c2c2c870c39275f";
mix:imageWidth 2208;
mix:imageHeight 2688;
mix:bitsPerSampleUnit "integer";
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/tif/00000001.tif>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "59f151f37b77b7603f467c5f45fa242f11f45d0702e76c73be7cf97c0894d875";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 1383736;
premis:formatName "image/tiff";
premis:originalName "00000001.tif";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "s9gx23kjrzvh8"; 
premis:eventType "creation";
premis:eventDateTime "2014-01-21T11:24:06"^^xsd:dateTime;
a edm:WebResource.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/jpg/00000001.jpg>
dc:format "image/jpeg";
a edm:WebResource.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/pos/00000001.pos>
dc:format "text/plain";
a rdfs:Resource.

<s9gx23kjrzvh8/2014-006/mvol/0002/0017/0001/xml/00000001.xml>
dc:format "application/xml";
a rdfs:Resource.

These depend on definitions like these for the collection

@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcmitype: <http://purl.org/dc/dcmitype/>.

<http://repository.lib.uchicago.edu/ead/mvol-007>
dc:title """Campus Publications""";
dc:description """Official reports, addresses, actions of Ruling Bodies,
notices of campus events, and activities of University of Chicago faculty.""";
a dcmitype:Collection.

and the creating agent.

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcmitype: <http://purl.org/dc/dcmitype/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/>.

<http://repository.lib.uchicago.edu/>
foaf:name "The University of Chicago Library Digital Repository";
foaf:page <http://repository.lib.uchicago.edu/>;
a dcmitype:Agent.

4.2 For accessions from 2015

@prefix edm: <http://www.europeana.eu/schemas/edm/>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix erc: <http://purl.org/kernel/elements/1.1/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix mix: <http://www.loc.gov/mix/v20>.
@prefix oai: <http://www.openarchives.org/OAI/2.0/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.
@prefix premis: <info:lc/xmlns/premis-v2>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@base <http://ark.lib.uchicago.edu/ark:/61001/>.

### For an issue: Resource Map, providedCHO, Aggregation, WebResource (the PDF file).
<[NOID]/rem/[path/to/providedCHO]>
## dcterms:created is machine-generated
dcterms:created "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
## dcterms:creator is a constant
dcterms:creator <http://repository.lib.uchicago.edu/>;
## ore:describes is mandatory in a ResourceMap
ore:describes <[NOID]/aggregation/[path/to/providedCHO]>;
a ore:ResourceMap.

<[NOID]/aggregation/[path/to/providedCHO]>
## dcterms:created is machine-generated
dcterms:created "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
## dcterms:modified is machine-generated
dcterms:modified "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
## edm:aggregatedCHO is a required element.
edm:aggregatedCHO <[NOID]/[path/to/providedCHO]>;
## edm:dataProvider is a constant. It is a required element.
edm:dataProvider "University of Chicago Library";
## Either edm:isShownAt or edm:isShownBy is required. Both may be provided. One must be.
edm:isShownAt <http://pi.lib.uchicago.edu/1001/dig/campub/mvol-[NNNN]-[MMMM]-[PPPP]>;
edm:isShownBy <[NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].pdf>;
## Ask about edm:object. It may vary.
edm:object <[NOID]/[path/to/providedCHO]/jpg/00000001.jpg>;
## edm:provider is a constant. It is a required element.
edm:provider "University of Chicago Library";
## edm:rights is a constant. It is a required element.
edm:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
## ore:isDescribedBy and ore:describes are reciprocal
ore:isDescribedBy <[NOID]/rem/[path/to/providedCHO]>;
a ore:Aggregation.

### The following are the mandatory properties for the ProvidedCHO.
### dc:title or dc:description. Both may be provided. One must be provided.
### dc:language for text objects.
### dc:subject or dc:type or dc:coverage or dcterms:spatial. 
###   For campus publications provide coverage, since the value is always "Chicago".
### edm:type. Always "TEXT" for campus publications.
###
<[NOID]/[path/to/providedCHO]>
## For campus publictions, the value of dc:coverage is always
## Chicago. Otherwise, ask, for it might vary.
dc:coverage "Chicago";
## Supply from [NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].dc.xml
dc:date "[YYYY[-MM[-DD]]]";
edm:year "[YYYY]";
## Supply from [NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].dc.xml
dc:description "A string.";
## Supply from [NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].dc.xml
dc:identifier "mvol-[NNNN]-[MMMM]-[PPPP]";
## For campus publications, dc:language is a constant. Otherwise, ask, for it might vary.
dc:language "en";
## dc:rights is a constant
dc:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
## Supply from [NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].dc.xml
dc:title "[A string]";
## For campus publications, we assume dc:type "Text".
dc:type "Text";
## edm:type is UPPER CASE
edm:type "TEXT";
## This is a link to the plain-text OCR for the issue. We use dcterms:description for this.
dc:description <[NOID]/aggregation/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].txt>; 
dcterms:hasPart <[NOID]/[path/to/providedCHO]/00000001>;
dcterms:hasPart <[NOID]/[path/to/providedCHO]/00000002>;
dcterms:hasPart <[NOID]/[path/to/providedCHO]/00000003>;
dcterms:hasPart <[NOID]/[path/to/providedCHO]/00000006>;
dcterms:hasPart <[NOID]/[path/to/providedCHO]/00000055>;
## replace NNNN in what follows with the mvol number, e.g., 0001 for
## Cap and Gown.
dcterms:isPartOf <http://repository.lib.uchicago.edu/collections/mvol-NNNN>;
## dc:creator -> erc:who; if no dc:creator, then ":unav";
erc:who ":unav ";
## dc:title -> erc:what
erc:what "[A string]";
## dc:date -> erc:when
erc:when "[YYYY[-MM[-DD]]]";
## the URI for the edm:ProvidedCHO (i.e., the subject of these assertions) -> erc:where
erc:where <[NOID]/[path/to/providedCHO]>;
a edm:ProvidedCHO.

### For the provided Dublin Core file.
<[NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].dc.xml>
dc:format "application/xml";
ore:proxyFor <[NOID]/[path/to/providedCHO]>;
ore:proxyIn <[NOID]/aggregation/[path/to/providedCHO]>;
a ore:Proxy.

<[NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].pdf>
## These values are derived from generated technical metadata, unless constant.
dc:format "application/pdf";
dcterms:isFormatOf <[NOID]/[path/to/providedCHO]>;
## premis:objectIdentifierType "ARK" is a constant.
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <[NOID]/[path/to/providedCHO]/mvol-[NNNN]-[MMMM]-[PPPP].pdf>;
## premis:objectCategory "file" is a constant.
premis:objectCategory "file";
## premis:compositionLevel 0 is a constant.
premis:compositionLevel 0;
## premis:messageDigestAlgorithm "SHA-256" is a constant.
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "4f6237c25a51382c3f6c489e550f3b2a241574abbfc57adbf9e0f9b6c674b1a5";
## premis:messageDigestOriginator "/sbin/sha256" is a constant unless
## we start using a different program.
premis:messageDigestOriginator "/sbin/sha256";
premis:size 31011220;
premis:formatName "application/pdf";
premis:originalName "mvol-[NNNN]-[MMMM]-[PPPP].pdf";
## premis:eventIdentifierType "ARK" is a constant.
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "[NOID]"; 
## premis:eventType "creation" is a constant.
premis:eventType "creation";
premis:eventDateTime "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
a edm:WebResource.

### For a page object: providedCHO, Resource Map, Aggregation, WebResource (the TIFF images).
<[NOID]/[path/to/providedCHO]/[00000001]>
## Insert the contents of ALTO/mvol-[NNNN]-[MMMM]-[PPPP]_[QQQQ].xml
## here. In this case it would be ...0000_0001.xml
dc:description <[NOID]/[path/to/providedCHO]/ALTO/mvol-[NNNN]-[MMMM]-[PPPP]_[QQQQ].xml>;
dc:language "en";
dc:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
dc:type "Text";
edm:type "TEXT";
dcterms:isPartOf <[NOID]/[path/to/providedCHO]>;
a edm:ProvidedCHO.

<[NOID]/rem/[path/to/providedCHO]/[00000001]>
dcterms:created "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
dcterms:creator <http://repository.lib.uchicago.edu/>;
ore:describes <[NOID]/aggregation/[path/to/providedCHO]/[00000001]>;
a ore:ResourceMap.

<[NOID]/aggregation/[path/to/providedCHO]/[00000001]>
## dcterms:created is machine-generated
dcterms:created "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
## dcterms:modified is machine-generated
dcterms:modified "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
edm:aggregatedCHO <[NOID]/[path/to/providedCHO]/[00000001]>;
edm:dataProvider "University of Chicago Library";
edm:isShownBy <[NOID]/[path/to/providedCHO]/TIFF/mvol-0007-0013-0001_0001.tif>;
edm:object <[NOID]/[path/to/providedCHO]/JPEG/mvol-0007-0013-0001_0001.jpg>;
edm:provider "University of Chicago Library";
edm:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
ore:isDescribedBy <[NOID]/rem/[path/to/providedCHO]/[00000001]>;
a ore:Aggregation.

### Each page object after the first needs to point back to the
### preceding using edm:isNextInSequence.
<[NOID]/[path/to/providedCHO]/[00000002]>
## Insert the contents of ALTO/mvol-[NNNN]-[MMMM]-[PPPP]_[QQQQ].xml
## here. In this case it would be ...0000_0002.xml
dc:description <[NOID]/[path/to/providedCHO]/ALTO/mvol-[NNNN]-[MMMM]-[PPPP]_[QQQQ].xml>;
dc:identifier "mvol-[NNNN]-[MMMM]-[PPPP]_[QQQQ]";
dc:language "en";
dc:title "Page 1";
dc:type "Text";
edm:type "TEXT";
dcterms:isPartOf <[NOID]/[path/to/providedCHO]>;
edm:isNextInSequence <[NOID]/[path/to/providedCHO]/[00000001]>;
a edm:ProvidedCHO.

### For a WebResource that is a digital masterfile.
<[NOID]/[path/to/providedCHO]/TIFF/mvol-0007-0013-0001_0001.tif>
dc:format "image/tiff";
## Extract the following MIX elements from
## .../[NOID]/[path/to/providedCHO]/mvol-NNNN-MMMM-PPPP.mets.xml for
## each digital masterfile (TIFF image).
mix:fileSize 5947938;
mix:formatName "image/tiff";
mix:messageDigestAlgorithm "MD5";
mix:messageDigest "204d464e1e6b57753c2c2c870c39275f";
mix:imageWidth 2208;
mix:imageHeight 2688;
mix:bitsPerSampleUnit "integer";
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <[NOID]/[path/to/providedCHO]/TIFF/mvol-0007-0013-0001_0001.tif>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "59f151f37b77b7603f467c5f45fa242f11f45d0702e76c73be7cf97c0894d875";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 1383736;
premis:formatName "image/tiff";
premis:originalName "[00000001].tif";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "[NOID]"; 
premis:eventType "creation";
premis:eventDateTime "[YYYY]-[MM]-[DD]T[HH]:[MM]:[SS]"^^xsd:dateTime;
a edm:WebResource.

### For a WebResource that is a derivative (access copy).
<[NOID]/[path/to/providedCHO]/jpg/[00000001].jpg>
dc:format "image/jpeg";
a edm:WebResource.

### For each file of OCR data in XML format.
<[NOID]/[path/to/providedCHO]/ALTO/mvol-[NNNN]-[MMMM]-[PPPP]_[QQQQ].xml>
dc:format "application/xml";
a rdfs:Resource.

5 A Century of Progress: The 1933-34 World's Fair

Pamphlets selected from the Library's collection of official pamphlets, brochures, and booklets published specifically for the Century of Progress World's Fair between 1933 and 1935.

For all accessions, each intellectual entity in the collection consists of two representations, one suitable for browser-based page-turning, one a PDF file, suitable for printing.

Each intellectual entity has two representations.

For each representation

  • an oai\dc-century-NNNN.xml file of descriptive metadata, where NNNN is the number assigned to the intellectual entity (starting from 0001)

5.0.1 Representation 1

For each page object

  • a .tif masterfile observing the following naming convention: centuryNNN-MMM.tif, where NNNN is the number assigned to the intellectual entity (starting from 0001) and MMM is the number assigned to the page object (starting from 001, but see note below)

5.0.2 Representation 2

  • a .pdf file observing the following naming convention, centuryNNNN.pdf, where NNNN is the number assigned to the intellectual entity (starting from 0001)

A SIP shall consist of all the bulleted items, above.

Note: Some analysis may be required to understand how to construct a pamphlet from the deposited objects, as the directory structure is not consistent among deposits for intellectual entities, and the naming convention for masterfiles sometimes deviates slightly from what is noted above, e.g.,

  • acv4b6v9b0s/Century/century0352/century0352-003.tif
  • acv4b6v9b0s/Century/century0352/century0352-003a.tif

6 The Chicagoan

http://chicagoan.lib.uchicago.edu/

A jazz-age magazine, modeled on the New Yorker, that aimed to portray the city as a cultural hub and counter its image as a place of violence and vice. The magazine contains a wealth of material on the literary, cultural, artistic, athletic and social milieu of Chicago between 1926-1934.

The Chicagoan is an open collection: a few issues have yet to be located and added.

For all accessions, each intellectual entity in the collection consists of two representations, one suitable for browser-based page-turning, one a PDF file, suitable for printing.

For file-naming conventions, see The Chicagoan: Workflow.

Each intellectual entity has two representations.

For each representation

  • a .dc.xml file of descriptive metadata (title, date, identifier, relation, description, source)

6.0.1 Representation 1

  • a .txt file of structural metadata (object, page, milestone)

For each page object

  • a .tif masterfile
  • a .xml file of OCR
  • a .pos file of position data

6.0.2 Representation 2

For each intellectual entity

  • a .pdf file

A SIP shall consist of all the bulleted items, above.

7 Chopin Early Editions

http://chopin.lib.uchicago.edu/

Digitized version of the Library's collection of early printed editions of Chopin's musical compositions. The collection can be searched by a variety of data points including uniform title, genre, plate number, dedicatee, publisher. place of publication, etc., allowing scholars to study the differences between scores as they were published concurrently in different countries with variant texts.

Chopin Early Editions is an open collections: scores continue to be acquired and added.

Each intellectual entity in the collection consists of one representation.

7.1 Representation

  • a mods-chopin-NNN.xml file (where NNN is a number to the intellectual entity, starting from 001) or a .mrc record (a MARC record in MARC communications format) of descriptive metadata

For each page object

  • a .tif masterfile observing the following naming convention: chopinNNN-MMM.tif, where NNN is the number assigned to the intellectual entity (starting from 001) and MMM is the number assigned to the page object (starting from 001)

A SIP shall consist of all the bulleted items, above.

8 The First American West: The Ohio River Valley, 1750-1820

A collaboration between the University of Chicago Library and the Filson Historical Society of Louisville, Kentucky, to digitize 745 rare books, pamphlets, newspapers, maps, prints, and manuscripts presenting a wide-ranging historical overview of the Ohio River Valley and trans-Appalachian West from the earliest Euro-American settlement to the passing of the frontier beyond the Mississippi River.

This collection is currently hosted by the Library of Congress.

9 The Goodspeed Manuscript Collection

http://goodspeed.lib.uchicago.edu/

"The Edgar J. Goodspeed Manuscript Collection comprises 68 early Greek, Syriac, Ethiopic, Armenian, Arabic, and Latin manuscripts ranging in date from the 5th to the 19th centuries." (From the website.)

The Goodspeed Manuscript Collection is an open collection: some manuscripts have yet to be added.

10 Collection of Manuscripts on Cultural Anthropology (MCA)

11 Map Collection

http://www.lib.uchicago.edu/e/collections/maps/

The University of Chicago Map Collection is one of the largest university map libraries in North America. Its 470,000 maps, 10,000 air photos, 2000 books, and hundreds of gigabytes of spatial data constitute a rich source of information for scholars and other users. (From the website.)

The LDR contains the digital portion of this collection.

Each intellectual entity in the collection consists of one representation.

11.1 Representation

For each intellectual entity

  • a .mrc record in MARC communications format of descriptive metadata observing the following naming convention: G4104-C6-1892-R3-NW.mrc, where G4104-C6-1892-R3-NW in this example is the call number assigned to the intellectual entity

For each page object

  • a .tif masterfile observing the following naming convention: G4104-C6-1892-R3-NW.tif, where G4104-C6-1892-R3-NW in this example is the call number assigned to the intellectual entity

A SIP shall consist of all the bulleted items, above.

Note: An intellectual entity often but not always corresponds to one digital object. Where more than one digital object is needed to represent an intellectual entity, there will be one MARC record but more than one digital masterfile.

11.2 SIP

@prefix edm: <http://www.europeana.eu/schemas/edm/>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix erc: <http://purl.org/kernel/elements/1.1/>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix ldr: <http://repository.lib.uchicago.edu/ldr/>.
@prefix oai: <http://www.openarchives.org/OAI/2.0/>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.
@prefix premis: <info:lc/xmlns/premis-v2>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix skos: <http://www.w3.org/2004/02/skos/core#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@base <http://ark.lib.uchicago.edu/ark:/61001/>.

### Note: We use the base URL for ARK identifiers in the repository.
<x0971s4d8g8wb/rem/Maps/Chi1890/G4104-C6P33-1897-B536>
dcterms:creator <http://repository.lib.uchicago.edu/>;
dcterms:created "2014-01-21T11:24:06"^^xsd:dateTime;
### add dcterms:modified if anything pointed to by the resource map
### has been modified
ore:describes <x0971s4d8g8wb/aggregation/Maps/Chi1890/G4104-C6P33-1897-B536>;
a ore:ResourceMap.

### The ProvidedCHO contains descriptive metadata about the item in
### simple Dublin Core. It indicates the logical components of the
### item. To extract these elements for Maps, do the following:
### yaz-marcdump -i marc -o marcxml -f marc8 -t utf8
### G4104-C6P33-1897-B536.mrc > G4104-C6P33-1897-B536.xml 
###
### Map the output to Dublin Core using the following erlsh marc2dcttl
### main G4104-C6P33-1897-B536.xml 2>> errs | xmllint --encode utf8
### --output G4104-C6P33-1897-B536.dc.xml --format - or equivalent,
### e.g.,
### http://www.loc.gov/standards/marcxml/xslt/MARC21slim2OAIDC.xsl
<x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536>
dc:creator """Blanchard, Rufus, 1821-1904.""";
dc:title """New map of Chicago showing street car lines in colors and street numbers in even hundreds /Rufus Blanchard.""";
dc:date """1897""";
dc:coverage """United States -- Illinois -- Chicago""";
dc:description """Insets: Lake shore north of Chicago -- East Chicago, Hammond, Wolf Lake area -- Harvey, Chicago Heights and other suburbs.""";
dc:format """Scale [ca. 1:21,000].""";
dc:format """1 map : col. ; 216 x 118 cm.""";
dc:identifier """Northwest quarter http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-NW""";
dc:identifier """Northeast quarter http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-NE""";
dc:identifier """Southwest quarter http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-SW""";
dc:identifier """Southeast quarter http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-SE""";
dc:publisher """Chicago : Rufus Blanchard""";
dc:subject """Local transit -- Illinois -- Chicago -- Maps.""";
dc:subject """Local transit -- Illinois -- Chicago Metropolitan Area -- Maps.""";
dc:subject """Chicago (Ill.) -- Maps.""";
dc:subject """Chicago Metropolitan Area (Ill.) -- Maps.""";
dc:type """cartographic""";
dcterms:hasPart <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-NW>;
dcterms:hasPart <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-NE>;
dcterms:hasPart <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-SW>;
dcterms:hasPart <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-SE>;
dcterms:isPartOf <http://ldr.lib.uchicago.edu/ead/ICU.MAPS.CHI1890>;
### derive edm:year from dc:date, using just the year component.
edm:year "1897";
### The following element is required. The value is fixed for all maps.
edm:type "IMAGE";
### dc:creator -> erc:who; if no dc:creator, then """:unav""";
erc:who """Blanchard, Rufus, 1821-1904.""";
### dc:title -> erc:what
erc:what """New map of Chicago showing street car lines in colors and street numbers in even hundreds /Rufus Blanchard."""; 
### dc:date -> erc:when
erc:when """1897""";
### the URI for the edm:ProvidedCHO (i.e., the subject of these assertions) -> erc:where
erc:where <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536>;
a edm:ProvidedCHO.

<x0971s4d8g8wb/aggregation/Maps/Chi1890/G4104-C6P33-1897-B536>
edm:aggregatedCHO <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536>;
edm:dataProvider "University of Chicago Library";
### edm:isShownAt is derivable from the URL portion of the dc:identifier
edm:isShownAt <http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-NW>;
edm:isShownAt <http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-NE>;
edm:isShownAt <http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-SW>;
edm:isShownAt <http://pi.lib.uchicago.edu/1001/maps/chi1890/G4104-C6P33-1897-B536-SE>;
edm:provider "University of Chicago Library";
edm:rights <http://creativecommons.org/licenses/by-nc/4.0/>;
ore:isDescribedBy <x0971s4d8g8wb/rem/Maps/Chi1890/G4104-C6P33-1897-B536>;
a ore:Aggregation.

### For the provided MARC record
<x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536.mrc>
dc:format "application/marc";
ore:proxyFor <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536>;
ore:proxyIn <x0971s4d8g8wb/aggregation/Maps/Chi1890/G4104-C6P33-1897-B536>;
a ore:Proxy.

### premis:eventDateTime should be the same as dcterms:created in the
### ore:ResourceMap. If the latter contains dcterms:modified, then
### premis:eventDateTime should reflect that.
<x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-NW.tif>
dc:format "image/tiff";
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-NW.tif>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "MD5";
premis:messageDigest "60e63c3e6d68c51608edecfed80d2ebe";
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "59f151f37b77b7603f467c5f45fa242f11f45d0702e76c73be7cf97c0894d875";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 382384244;
premis:formatName "image/tiff";
premis:originalName "G4104-C6P33-1897-B536-NW.tif";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "x0971s4d8g8wb"; 
premis:eventType "creation";
premis:eventDateTime "2014-01-21T11:24:06"^^xsd:dateTime;
a edm:WebResource.

<x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-NE.tif>
dc:format "image/tiff";
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-NE.tif>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "MD5";
premis:messageDigest "60e63c3e6d68c51608edecfed80d2ebe";
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "59f151f37b77b7603f467c5f45fa242f11f45d0702e76c73be7cf97c0894d875";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 382384244;
premis:formatName "image/tiff";
premis:originalName "G4104-C6P33-1897-B536-NE.tif";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "x0971s4d8g8wb"; 
premis:eventType "creation";
premis:eventDateTime "2014-01-21T11:24:06"^^xsd:dateTime;
a edm:WebResource.

<x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-SW.tif>
dc:format "image/tiff";
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-SW.tif>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "MD5";
premis:messageDigest "60e63c3e6d68c51608edecfed80d2ebe";
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "59f151f37b77b7603f467c5f45fa242f11f45d0702e76c73be7cf97c0894d875";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 382384244;
premis:formatName "image/tiff";
premis:originalName "G4104-C6P33-1897-B536-SW.tif";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "x0971s4d8g8wb"; 
premis:eventType "creation";
premis:eventDateTime "2014-01-21T11:24:06"^^xsd:dateTime;
a edm:WebResource.

<x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-SE.tif>
dc:format "image/tiff";
premis:objectIdentifierType "ARK";
premis:objectIdentifierValue <x0971s4d8g8wb/Maps/Chi1890/G4104-C6P33-1897-B536/G4104-C6P33-1897-B536-SE.tif>;
premis:objectCategory "file";
premis:compositionLevel 0;
premis:messageDigestAlgorithm "MD5";
premis:messageDigest "60e63c3e6d68c51608edecfed80d2ebe";
premis:messageDigestAlgorithm "SHA-256";
premis:messageDigest "59f151f37b77b7603f467c5f45fa242f11f45d0702e76c73be7cf97c0894d875";
premis:messageDigestOriginator "/sbin/sha256";
premis:size 382384244;
premis:formatName "image/tiff";
premis:originalName "G4104-C6P33-1897-B536-SE.tif";
premis:eventIdentifierType "ARK";
premis:eventIdentifierValue "x0971s4d8g8wb"; 
premis:eventType "creation";
premis:eventDateTime "2014-01-21T11:24:06"^^xsd:dateTime;
a edm:WebResource.

12 Middle East Photograph Archive (MEPA)

A digital archive of early photographs of the Middle East. Most of the photographs date to the second half of the nineteenth century. The archive is particularly strong in photographs of nineteenth century Cairo.

13 Rose and Chess

http://roseandchess.lib.uchicago.edu

The University of Chicago is celebrating the acquisition of a manuscript of Le Roman de la Rose (The Romance of the Rose) and its reunion with Le Jeu des échecs moralisé (The Moralized Game of Chess), a manuscript that has been in the Library’s collection since 1931. Each of these two popular medieval texts — one a courtly romance, the other a treatise on medieval society that uses the game of chess as its framework — was written and decorated in France, ca. 1365. (From the website.)

14 The Speculum Romanae Magnificentiae Digital Collection

http://speculum.lib.uchicago.edu/

The Speculum Romanae Magnificentiae is a series of late Renaissance and early Baroque prints of Rome. It is a closed collection: it is not anticipated that new prints will be added.

Each intellectual entity in the collection consists of one representation.

14.1 Representation

For each intellectual entity

  • a work entry in speculum.xml, which is a file of VRA Core 4.0 representing descriptive metadata for all intellectual entities in the collection

For each page object

  • a .tif masterfile observing the following naming convention: speculum-NNNN-MMM.tif, where NNNN is the number assigned to the intellectual entity (starting from 0001) and MMM is the number assigned to the page object (starting from 001)

A SIP shall consist of all the bulleted items, above.

Note: For this collection, there is a one-to-one correspondence between intellectual entity and page object (each intellectual entity is one print consisting of one image).

15 The University of Chicago Photographic Archive (UCPA)

http://photoarchive.lib.uchicago.edu

The University of Chicago Photographic Archive documents the history of the University of Chicago and the development of its campus, academic programs, and community life. (From the website.)

UCPA is an open collection: new accessions are expected.

Footnotes:

1

For background on the LDR, see Library Digital Repository.

2

For a technical overview of the LDR, see the LDR technical documentation.

Author: Charles Blair (c-blair@uchicago.edu)

Date: 2016-01-26

Emacs 25.3.1 (Org mode 8.2.10)

Valid XHTML 1.0 Strict