The University of Chicago Library's Digital Repository is a preservation repository for digital content for which the University of Chicago Library has assumed curatorial responsibility. Its primary purpose is to ensure that this content persists through time. Persistence in a digital context may require transformation of deposited content into new digital formats if it is expected that the originally deposited formats may become obsolete in time. At bottom, ensuring persistence requires two things: that bitstreams are physically safe (that the bits have not been corrupted or destroyed); that bitstreams are logically safe (that the bits can be converted back into usable information by a machine, such as a desktop computer, that wants to consume the bitstreams and render them meaningfully). The core responsibility of digital repository management is to ensure these two kinds of persistence.
The University of Chicago Library's Digital Repository is managed by the Digital Library Development Center (DLDC). Currently, it consists of two mirrored computer systems. Nightly, content is copied from the primary system onto a second system, which can serve as a live backup to the first in case of need. From there, content is transferred to NSIT's centralized TSM tape-storage system, for disaster recovery. When content is deposited into the Repository (discussed below) it is inspected for at-risk digital formats (formats that are currently expected to become obsolete); if detected, content in these formats is converted into formats that are expected to persist for some time.
The Repository will also include a multi-terabyte storage array dedicated to scientific datasets from the Sloan Digital Sky Survey (SDSS). This array will be a mirror of one at Fermilab. A second mirror will exists at Johns Hopkins University.
It is assumed that digital content for which the University of Chicago Library has assumed curatorial responsibility is more or less analogous to non-digital content for which such responsibility has already been assumed. Examples include a digitally reformatted brittle book which is already part of the Library's collections, a map in digital form, an electronic thesis or dissertation, archival materials in digital form (e.g., letters, manuscripts, correspondence), and so on. However, the digital world is not precisely analogous to the physical world, and new types of objects, the persistence of which the Library might want to ensure, can be imagined, for example, a website, a weblog, a scientific dataset, and so on.
It is not envisioned at this time that the Digital Repository will contain all digital content for which the Library wants to assume curatorial responsibility. Instead, it is envisioned that the Repository will preserve those digital objects for which other solutions do not (at least at present) exist. Examples of content which the Repository might not house include those licensed resources which are being preserved by Portico, books digitized by Google and destined for Hathi Trust, and so on. Thus Digital Repository management exists in a context and with an awareness of other relationships and agreements which the Library might enter into for this common purpose. Selection for the Repository thus also has to have this context and awareness.
Digital content entering the Repository has a life-cycle: (a) Deposit; (b) Accessioning; (c) Processing.
Deposit requires an interaction between the depositor and the Repository, and has the following components.
Accessioning and Processing are internal Repository functions. Accessioning means to move the deposit from the place where it was originally transferred into a place where it can be managed. Processing means to take the deposited and accessioned content, and the description of that content, and package it according to established standards for packaging digital content, such as METS, and best practices for the application of those standards.
In addition to its core function of ensuring the physical and logical persistence of the digital content it contains, and in addition to a Deposit function, a digital repository may support a Discovery function, and must support an Access (Delivery) function. Deposit, Discovery and Access functions all presuppose answers to the questions, Who may Deposit? Who may Discover? and Who may Access? Who may Deposit has been addressed above. Who may Discover and Who may Access are determined by the rights and permissions associated with the deposited content. In addition, some materials present the additional question, When may these be accessed? For example, some archival materials are embargoed for some period of time (e.g., 25 years) before access is allowed. Because these rights and permissions issues considerably complicate automation, implementing Discovery and Access functions for the Digital Repository is being implemented after implementation of the Deposit function. Currently, the Repository supports viewing simple lists of what has been deposited at http://repository.lib.uchicago.edu/; an RSS feed has also been implemented for convenience. Access is currently mediated, by sending email to repository@lib.uchicago.edu. More sophisticated Discovery functions for materials for which there are no rights and permissions issues are being built. The first of these will take the form of an OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting) provider, allowing services that know how to harvest from OAI-PMH providers to extract metadata for freely available content; these metadata will have been created for such content from the originally deposited metadata. More interactive Discovery mechanisms are being considered, but any forward motion on this path must be preceded by the question, How should Repository discovery interoperate with other Library discovery services such as LENS, or discovery services that are being contemplated as part of Project Bamboo? In other words, rushing to create yet another silo'd interface, instead of thinking carefully about how a repository might best interoperate with what exists or is being contemplated, is neither well thought-out, nor coordinated, and potentially not cost-effective, not at any time, but especially not in this economic climate which restricts available resources for all of the Library's initiatives.
The University of Chicago Library's Digital Repository is not a so-called institutional repository. An institutional repository as currently construed is designed to hold the scholarly or research output of an institution, specifically faculty publications or pre-publications, but institutional repositories do not necessarily guarantee the persistence through time of the content they contain. They are designed primarily for public access and discovery. Though the Digital Repository may in future support public access and discovery for some materials, its primary purpose is to ensure the preservation of digital content selected by recognized selection processes. If the University were to impose a "self-archiving mandate," discovery and access for content deposited according to that mandate would have to be provided, but the need for the Digital Repository as a place to preserve content indefinitely would not go away; either the Repository would have to provide these functions itself for these materials, or it might serve as the preservation component of systems that already provide these functions, such as DSpace, EPrints, or Fedora.