The CIC Metadata Portal The CIC Metadata Portal
The CIC Metadata Portal

CIC Proposal

Proposal to Host and Develop an OAI-PMH Metadata Harvesting Service for the CIC

Introduction

Scope of work to be performed by UIUC Library

Overview
Specific UIUC Library tasks & objectives
Contributions by participating CIC member libraries

Appendix One: DLIOC OAI-PMH Proposal to CIC Library Directors (3-31-03)

Executive summary
Key questions and answers
What is OAI-PMH?
What are the benefits?
Project features
Project management
OAI-PMH metadata provider technical specifications

Introduction

On 31 March 2003 the CIC Digital Library Initiatives Overview Committee (CIC-DLIOC) submitted a proposal (Attachment I) to the CIC Library Directors to implement an experimental CIC-wide metadata harvesting service based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The DLIOC proposal proposed that this metadata harvesting service be hosted by the University of Illinois Library at Urbana-Champaign and that the University of Illinois Library at Urbana-Champaign be asked to create and develop this service in consultation with participating CIC member libraries. Ten CIC member libraries have agreed to participate in this project and will contribute the financial resources to fund an award to support implementation and investigation of this experimental CIC metadata harvesting service. A memo of agreement between each participating member library and the CIC will be signed to formalize the project. Under the terms of the MOA, this experimental service will be funded for three years.

Scope of work to be performed by UIUC Library

Overview (back to top)

The Library of the University of Illinois at Urbana-Champaign (UIUC) proposes to create and implement an OAI-PMH metadata harvesting service to aggregate metadata describing information resources held by participating CIC Libraries. The UIUC Library will make this metadata aggregation available to end-users (students, faculty, and the general public), both within and outside of the CIC, using appropriate, state-of-the-art search and discovery tools and browsing and navigation interfaces. In collaboration with participating CIC member libraries, the UIUC Library will research issues relating to consortial metadata aggregation, normalization, and best practice authoring, will investigate search and discovery and browsing and navigation issues that arise across a metadata aggregation describing both freely available and restricted license content, and will provide recommendations to the CIC Library Directors regarding long-term implementation of OAI-PMH and the relationship between OAI-PMH and the CIC Virtual Electronic Library (VEL). Specific UIUC Library deliverables are detailed below.

Specific UIUC Library Tasks & Objectives (back to top)

The UIUC Library will undertake to accomplish the following tasks.

Project years 1 and 2:

- Various metadata schemas and metadata application profiles are in use across the CIC member libraries and even within individual CIC member libraries. The UIUC Library will develop strategies to integrate these various metadata schemas & application profiles, in a manner to allow useful aggregation of metadata harvested, and will help develop, in collaboration with participating CIC member libraries, consortial metadata best practices, schemas, and crosswalks between schemas.

- Along these same lines, the UIUC Library will specifically research, develop, and validate (if proven useful) metadata normalization strategies necessary to facilitate searching and browsing of metadata harvested in different native metadata schemas.

- The UIUC Library will implement CIC-specific value-added and harvesting service features. Priority features will be identified in consultation with participating CIC member libraries and with DLIOC, and will include things such as developing ways to effectively manage authorization and access for the searching of restricted access metadata.

Project years 2 and 3:

- The UIUC Library will work to make available for end-user use, by no later than the middle of year 2 of this project, a CIC-specific metadata search and discovery interface for searching all metadata harvested from participating CIC member libraries. Alpha and beta versions of this interface will be available earlier for preliminary testing by staff at participating CIC member libraries, but the interface will not be made available for public access until vetted by participants in a manner acceptable to DLIOC.

- Once available the UIUC Library will lead and coordinate usability testing of end-user search and discovery and browse and navigation interfaces both locally at UIUC and at selected other participating CIC member library sites.

- The UIUC Library will work with CIC, DLIOC, and participating member libraries to promote end-user services developed to CIC librarians, faculty, students, staff, and other potential end-users. This will be done through journal and conference publications describing the work, local and (when feasible) remote workshops, and implementation of a project Website that includes appropriate descriptive and promotional content suitable for redistribution in both print and electronic format.

- The UIUC Library will collaborate with CIC, DLIOC and participating member libraries to help identify and prioritize CIC-wide collaborative metadata & digital collection needs, and to present this needs assessment to CIC member library directors.

Ongoing tasks during all 3 project years:

- The UIUC Library will routinely harvest metadata from participating CIC member libraries using OAI-PMH on a schedule appropriate to frequency of metadata change and updating.

- The UIUC Library will provide central coordination for this experiment, including reporting of results and observations and day-to-day operation and maintenance of harvesting service. This will be done with guidance and input from the CIC-DLIOC.

- The UIUC Library will provide technical advice and support (remote) for implementation of OAI by participating CIC member libraries. This will include consultation regarding details and interpretation of the OAI-PMH specification, suggestions regarding OAI metadata provider service architectures, and test harvesting and validation of participating CIC member library OAI metadata provider implementations.

- The UIUC Library will lead collaborative study and investigation of sustainability issues and implications for next-generation CIC VEL.

- The UIUC Library will support collaborative grant submissions and projects, undertaken by participating CIC member libraries, which make use of metadata aggregation testbed created as part of this experiment. This will include making available information about character and scope of aggregation, harvesting service performance metrics, and results of related research conducted by the UIUC Library singly or in collaboration with other participants.

Contributions by participating CIC member libraries (back to top)

As detailed in the MOA, participating CIC member libraries will contribute effort equivalent to six weeks of staff member time. During year 1 of this project, this contribution will focus on establishing and/or expanding OAI metadata provider services and supporting UIUC-led investigations of metadata schema crosswalks, metadata authoring best practices, and metadata normalization. During project years 2 and 3 this contribution will focus on user interface evaluation and usability testing, on identification and evaluation of future consortial metadata service priorities, on promotion of the service, and on local development issues.

Appendix One: DLIOC OAI-PMH Proposal to CIC Library Directors (3/31/03)

Executive Summary (back to top)

Digital Library Initiatives Overview Committee (DLIOC) recommends harvesting Open Archives Initiative (OAI) metadata for CIC-related digital materials. The purpose of the harvesting is to:

  • improve access to selected resources at CIC member libraries;
  • advertise these resources;
  • prepare member institutions for future grant-mandated OAI-based resource sharing;
  • serve as a useful testbed for future grant-funded projects.

OAI Protocol for Metadata Harvesting (OAI-PMH) also offers a way to reinvent the CIC's Virtual Electronic Library (VEL) in order to unlock the hidden web of resources that are available at CIC institutions.

As of January 2003, seven CIC member institutions (University of Illinois-UC, University of Michigan, Michigan State University, University of Wisconsin, Indiana University, University of Minnesota, and University of Chicago) have implemented or are about to implement OAI–compliant metadata provider services.

$6500 in cash per year for three years from at least eight CIC institutions will suffice for the infrastructure work, thanks to prior grant-funded research. An additional six weeks of time for one systems staff member will be needed to establish or extend local OAI provider services and to collaborate on evaluation.

Key Questions and Answers (back to top)

What is the estimated total cost of the project?
$156,000 over the course of 3 years for development, implementation, and testing. With 8 participating institutions, this works out to $6,500 per year per participant for 3 years. See Appendix 3 for an itemized budget.

How much local staff time will be involved at what level of expertise?
Six weeks of time for one systems staff member to be spent on bringing up provider services, integrating access into local online resources, developing best practices, advertising availability, and doing evaluation.

Would some participating libraries need to hire staff to obtain that level of expertise?
No.

To whom will the service be available?
The harvested metadata available to all, except perhaps for licensed materials.

What specifically will be the deliverables?
Becoming a functional OAI provider; item-level access to digital resources; highlighting CIC resources; and building a testbed for future projects.

What is OAI-PMH? (back to top)

Open Archives Initiative Protocol Metadata Harvesting (OAI-PMH) is designed to enable resource discovery across distributed and heterogeneous collections. Originally developed to facilitate interoperability among e-print archives, OAI-PMH is now in use by numerous communities to expose and allow aggregation of metadata describing a wide range of collections.

CIC member institutions have played a leading role in the design and development of the OAI-PMH. UIUC and the University of Michigan were among the first to establish OAI–compliant metadata harvesting services, with funding provided by the Andrew W. Mellon Foundation. As a result of these early efforts by member institutions, CIC is now positioned to lead in future development and evolution of OAI-based services.

What are the benefits? (back to top)

OAI offers a way to reinvent the CIC's Virtual Electronic Library (VEL) in order to unlock the hidden web of resources that are available at CIC institutions. In 1999 a CIC task force chaired by Bonnie MacEwan (Penn State) called for the VEL "to provide seamless access to both traditional and digital collections across the CIC member institutions." OAI-PMH will provide this access to our digital collection and will strike a balance between institutional control and centralization. Metadata providers maintain ultimate control over their metadata and their content, while benefiting simultaneously from access to consortium-wide metadata.

Access

OAI offers cost-effective item-level access to digital resources through a single discovery mechanism. The current VEL system requires cataloging each item in MARC, which raises scaling, cost and granularity issues. One record for each image seems prohibitive, but only one record for a whole large collection may under-represent the materials.

Awareness

A common CIC OAI-based resource will make students and faculty more aware of resources at other CIC institutions. It will highlight the amount and variety of digital resources available to faculty and students at CIC institutions.

Experience

Since both the Institute of Museum and Library Services (IMLS) and the National Science Digital Library (NSDL) have chosen OAI-PMH as a strategic tool for uniting digital collections, CIC institutions need experience with building OAI infrastructure.

Testbed

A CIC OAI-based resource provides a testbed for future grant-funded projects. The scale and variety of such a CIC testbed will make it useful for projects such as interfaces with course management systems and authenticated access to licensed databases.

Project Features (back to top)

A successful OAI-PMH project needs to look beyond the technology and collections issues to integrate the resources into the teaching and research aspects of the participating universities. The VEL in its current form is largely a staff tool. An OAI-PMH-extended VEL needs also to be a tool for students and faculty. While specific goals for this three year project need to be sufficiently limited to be realistic, the intent is to build an infrastructure that can meet the needs of the whole campus of all CIC institutions.

Collections

The DLIOC proposes creating a new OAI harvesting service for all CIC digital collections. Our target audience is the teaching and research community at our institutions. In order to be most useful to the widest range of users, the project should include metadata for digitized content only. The metadata may include locally created content as well as materials purchased or licensed by all participating institutions.

The focus will encourage contributions that support the instructional needs of the institutions and materials that are not represented at the item level in the catalog. Since more detailed information often yields more useful OAI records, the project will favor materials with richer metadata. Some mapping of metadata types will be necessary as part of the infrastructure development.

Examples of specific collections include: the Chopin collection (Chicago), the Wright American Fiction collection (Indiana), the National Gallery of the Spoken Word (Michigan State), Making of America (Michigan), Belgian-American Research collection (Wisconsin), and the World War I and II Posters collection (Minnesota).

Metadata Research and User Interface Design

This research will enhance interoperability, and point to best practices. The aggregated OAI metadata will have varying levels of granularity. Increasingly complex relationships, navigational modes, access conditions and electronic formats may require richer metadata than Dublin Core.

The CIC OAI-harvester project provides an opportunity for greater focus on interface and metadata issues than was possible in the Mellon grants. Techniques for designing the interface will include data normalization, index-based browsing and/or search limiting, result clustering, and data mining, in addition to the usual layout/presentation issues. Many of these techniques rely on the underlying metadata.

Interface Evaluation

Evaluation will be done at each institution, and may vary in complexity and extent. The intent of the evaluation process is ongoing feedback about choices and directions for the OAI metadata harvesting service. Each participating institution will conduct usability testing of local users using collaboratively developed tools and testing procedures. Centrally maintained transaction logs will also be analyzed. This testing will be done at least annually, and will provide necessary feedback for development staff.

Future Plans

Future plans include interfaces with courseware management systems, exploring the inclusion of finding aids that point at non-digital or non-shareable materials, addressing authentication issues, and seeking grants to support further work.

Project Management (back to top)

The management of this project needs to use existing expertise and committee structures that include a cross-committee evaluation team. It also needs to rely on active involvement from each participating institution. Both the costs and the management structure reflect these principles.

Costs

DLIOC members have consulted about the support needs for the harvesting tools. Significant development on these have already been done through the Mellon Foundation grant. UIUC believes that $6500 in cash per year for three years from at least eight of the CIC institutions will suffice for the infrastructure work. The money would be held in a CIC account and made available as needed. It will be used to pay staff time and other infrastructure development costs, primarily at UIUC, which will also contribute local and grant-funded resources. The work includes coordination efforts, customizing the search engine, adding new fields, normalizing data, feedback to data providers, and writing usability testing scripts. Michigan will continue to support OAI in DLXS.

In addition the DLIOC recommends that participating institutions contribute at least six weeks of time for one systems staff member, which would be spent on bringing up provider services, integrating access into local online resources, developing best practices, advertising availability, and doing evaluation.

Structures

The DLIOC recommends that the Directors appoint a management team that oversees the financial and administrative aspects of the project. The DLIOC as a whole should remain closely involved with the implementation.

The DLIOC also recommends a cross-sectional advisory team with representation from CIC committees on courseware, public services, reference, collections, and technical services. This team could work on interface and evaluation issues, and could contribute to a final report. Timothy Cole at UIUC will oversee the day-to-day work of project.

Project Evaluation and Dissemination

The DLIOC will evaluate the progress and success of the project annually, in consultation with the cross-sectional team. Criteria will include actual use, content growth, interface development, user testing results, and the establishment of best practices. DLIOC members will also disseminate the results within their institutions and more broadly through conferences and publications in the library and digital library world.

OAI-PMH Metadata Provider Technical Specifications (back to top)

By design the technical barrier and required effort for metadata providers wishing to conform to the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is low. OAI-PMH allows institutions great flexibility in how they choose to implement the protocol and in how they choose to integrate OAI-PMH functionality with existing metadata creation workflow. OAI-PMH is built on top of the ubiquitous HTTP and XML standards. A wide range of Open Source tools for creating new OAI-PMH metadata provider implementations are available on the SourceForge Website (http://www.sourceforge.net/).

The essential requirements for implementing an OAI-PMH metadata provider are:

  • A Web server with CGI capability (may be used for other functions simultaneously)
  • XML validation and parsing software
  • Accessible metadata with defined mappings to DC
  • Staff time to create, adapt, & maintain CGI scripts required to tie these components together

While there is obviously some incremental demand on metadata provider IT infrastructure, the protocol allows implementers to manage these additional demands as appropriate to their situation:

  • Metadata provider controls maximum number of metadata records sent at one time in response to any OAI-PMH request.
  • Metadata provider can set minimum interval between servicing requests from any one harvester.
  • Metadata provider can define how many simultaneous harvests it services at any time.
  • Metadata provider can terminate an in-progress harvest at any time, and can specify the interval the harvester should wait before retrying.
  • Metadata provider can use standard Web server functionality to block or limit who can harvest (i.e., by either IP address or paired userid - password strings).

Typical staff time required to implement OAI-PMH (assuming accessible metadata with defined DC mapping) is at most 2 - 4 person-weeks for initial development time (programming and customizing Open Source tools, writing supplemental CGI scripts, creating XSLT or other metadata transforming utilities). Technical level is such that bulk of this work may be done by graduate assistants or other part-time programming staff. Small amount of ongoing maintenance is required for dealing with errors in metadata, ongoing modifications in metadata workflow, character encoding issues, and responding to bug reports from harvesting agents. This work is typically incorporated into existing system administration workflow.

Illustrative implementations (many other possible):

  • Apache / Tomcat Web server (Open Source) running on a Linux Server.
  • Metadata stored in a mySQL database (OpenSource).
  • Java serverlets running as extensions of the Tomcat Web server component connect to the mySQL database and service OAI-PMH requests.
  • Microsoft Internet Information Server running on a Windows 2000 server.
  • Metadata stored as XML files in MODS or MARCXML format.
  • Active Server Page scripts running as extensions of the Web server service OAI-PMH requests, using XSLT stylesheets to transform metadata files as required.