Proposal to Host and Develop
an OAI-PMH Metadata Harvesting Service for the CIC
Introduction
Scope of work to be performed by UIUC Library
Overview
Specific UIUC Library tasks & objectives
Contributions by participating CIC member libraries
Appendix One: DLIOC OAI-PMH Proposal to CIC Library Directors (3-31-03)
Executive summary
Key questions and answers
What is OAI-PMH?
What are the benefits?
Project features
Project management
OAI-PMH metadata provider technical specifications
On 31 March 2003 the CIC Digital Library Initiatives Overview Committee
(CIC-DLIOC) submitted a proposal (Attachment I) to the CIC Library Directors
to implement an experimental CIC-wide metadata harvesting service based
on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
The DLIOC proposal proposed that this metadata harvesting service be
hosted by the University of Illinois Library at Urbana-Champaign and
that the University of Illinois Library at Urbana-Champaign be asked
to create and develop this service in consultation with participating
CIC member libraries. Ten CIC member libraries have agreed to participate
in this project and will contribute the financial resources to fund an
award to support implementation and investigation of this experimental
CIC metadata harvesting service. A memo of agreement between each participating
member library and the CIC will be signed to formalize the project.
Under the terms of the MOA, this experimental service will be funded
for three years.
The Library of the University of Illinois at Urbana-Champaign (UIUC)
proposes to create and implement an OAI-PMH metadata harvesting service
to aggregate metadata describing information resources held by participating
CIC Libraries. The UIUC Library will make this metadata aggregation available
to end-users (students, faculty, and the general public), both within
and outside of the CIC, using appropriate, state-of-the-art search and
discovery tools and browsing and navigation interfaces. In collaboration
with participating CIC member libraries, the UIUC Library will research
issues relating to consortial metadata aggregation, normalization, and
best practice authoring, will investigate search and discovery and browsing
and navigation issues that arise across a metadata aggregation describing
both freely available and restricted license content, and will provide
recommendations to the CIC Library Directors regarding long-term implementation
of OAI-PMH and the relationship between OAI-PMH and the CIC Virtual Electronic
Library (VEL). Specific UIUC Library deliverables are detailed
below.
The UIUC Library will undertake to accomplish the following tasks.
Project years 1 and 2:
- Various metadata schemas and metadata application profiles are in
use across the CIC member libraries and even within individual CIC
member libraries. The UIUC Library will develop strategies to integrate
these
various metadata schemas & application profiles, in a manner
to allow useful aggregation of metadata harvested, and will help
develop, in collaboration
with participating CIC member libraries, consortial metadata best
practices, schemas, and crosswalks between schemas.
- Along these same lines, the UIUC Library will specifically research,
develop, and validate (if proven useful) metadata normalization strategies
necessary to facilitate searching and browsing of metadata harvested
in different native metadata schemas.
- The UIUC Library will implement CIC-specific value-added and harvesting
service features. Priority features will be identified in consultation
with participating CIC member libraries and with DLIOC, and will include
things such as developing ways to effectively manage authorization
and access for the searching of restricted access metadata.
Project years 2 and 3:
- The UIUC Library will work to make available for end-user use, by
no later than the middle of year 2 of this project, a CIC-specific metadata
search and discovery interface for searching all metadata harvested from
participating CIC member libraries. Alpha and beta versions of this interface
will be available earlier for preliminary testing by staff at participating
CIC member libraries, but the interface will not be made available for
public access until vetted by participants in a manner acceptable to
DLIOC.
- Once available the UIUC Library will lead and coordinate usability
testing of end-user search and discovery and browse and navigation interfaces
both locally at UIUC and at selected other participating CIC member library
sites.
- The UIUC Library will work with CIC, DLIOC, and participating member
libraries to promote end-user services developed to CIC librarians,
faculty, students, staff, and other potential end-users. This will
be done through journal and conference publications describing the
work, local and (when feasible) remote workshops, and implementation
of a project Website that includes appropriate descriptive and promotional
content suitable for redistribution in both print and electronic format.
- The UIUC Library will collaborate with CIC, DLIOC and participating
member libraries to help identify and prioritize CIC-wide collaborative
metadata & digital collection needs, and to present this needs
assessment to CIC member library directors.
Ongoing tasks during all 3 project years:
- The UIUC Library will routinely harvest metadata from participating
CIC member libraries using OAI-PMH on a schedule appropriate to frequency
of metadata change and updating.
- The UIUC Library will provide central coordination for this experiment,
including reporting of results and observations and day-to-day operation
and maintenance of harvesting service. This will be done with guidance
and input from the CIC-DLIOC.
- The UIUC Library will provide technical advice and support (remote)
for implementation of OAI by participating CIC member libraries. This
will include consultation regarding details and interpretation of the
OAI-PMH specification, suggestions regarding OAI metadata provider service
architectures, and test harvesting and validation of participating CIC
member library OAI metadata provider implementations.
- The UIUC Library will lead collaborative study and investigation
of sustainability issues and implications for next-generation CIC VEL.
- The UIUC Library will support collaborative grant submissions and
projects, undertaken by participating CIC member libraries, which make
use of metadata aggregation testbed created as part of this experiment.
This will include making available information about character and
scope of aggregation, harvesting service performance metrics, and results
of related research conducted by the UIUC Library singly or in collaboration
with other participants.
As detailed in the MOA, participating CIC member libraries will contribute
effort equivalent to six weeks of staff member time. During year 1 of
this project, this contribution will focus on establishing and/or expanding
OAI metadata provider services and supporting UIUC-led investigations
of metadata schema crosswalks, metadata authoring best practices, and
metadata normalization. During project years 2 and 3 this contribution
will focus on user interface evaluation and usability testing, on identification
and evaluation of future consortial metadata service priorities, on promotion
of the service, and on local development issues.
Digital Library Initiatives Overview Committee (DLIOC) recommends harvesting
Open Archives Initiative (OAI) metadata for CIC-related digital materials.
The purpose of the harvesting is to:
- improve access to selected resources
at CIC member libraries;
- advertise these resources;
- prepare member institutions for future
grant-mandated OAI-based resource sharing;
- serve as a useful testbed
for future grant-funded projects.
OAI Protocol for Metadata Harvesting (OAI-PMH) also offers a way to
reinvent the CIC's Virtual Electronic Library (VEL) in order to unlock
the hidden web of resources that are available at CIC institutions.
As of January 2003, seven CIC member institutions (University of Illinois-UC,
University of Michigan, Michigan State University, University of Wisconsin,
Indiana University, University of Minnesota, and University of Chicago)
have implemented or are about to implement OAI–compliant metadata
provider services.
$6500 in cash per year for three years from at least eight CIC institutions
will suffice for the infrastructure work, thanks to prior grant-funded
research. An additional six weeks of time for one systems staff member
will be needed to establish or extend local OAI provider services and
to collaborate on evaluation.
What is the estimated total cost of the project?
$156,000 over the course of 3 years for development, implementation,
and testing. With 8 participating institutions, this works out to $6,500
per year per participant for 3 years. See Appendix 3 for an itemized
budget.
How much local staff time will be involved at what level of expertise?
Six weeks of time for one systems staff member to be spent on bringing
up provider services, integrating access into local online resources,
developing best practices, advertising availability, and doing evaluation.
Would some participating libraries need to hire staff to obtain that
level of expertise?
No.
To whom will the service be available?
The harvested metadata available to all, except perhaps for licensed
materials.
What specifically will be the deliverables?
Becoming a functional OAI provider; item-level access to digital resources;
highlighting CIC resources; and building a testbed for future projects.
Open Archives Initiative Protocol Metadata Harvesting (OAI-PMH) is designed
to enable resource discovery across distributed and heterogeneous collections.
Originally developed to facilitate interoperability among e-print archives,
OAI-PMH is now in use by numerous communities to expose and allow aggregation
of metadata describing a wide range of collections.
CIC member institutions have played a leading role in the design and
development of the OAI-PMH. UIUC and the University of Michigan were
among the first to establish OAI–compliant metadata harvesting
services, with funding provided by the Andrew W. Mellon Foundation. As
a result of these early efforts by member institutions, CIC is now positioned
to lead in future development and evolution of OAI-based services.
OAI offers a way to reinvent the CIC's Virtual Electronic Library (VEL)
in order to unlock the hidden web of resources that are available at
CIC institutions. In 1999 a CIC task force chaired by Bonnie MacEwan
(Penn State) called for the VEL "to provide seamless access to
both traditional and digital collections across the CIC member institutions." OAI-PMH
will provide this access to our digital collection and will strike
a balance between institutional control and centralization. Metadata
providers maintain ultimate control over their metadata and their content,
while benefiting simultaneously from access to consortium-wide metadata.
Access
OAI offers cost-effective item-level access to digital resources through
a single discovery mechanism. The current VEL system requires cataloging
each item in MARC, which raises scaling, cost and granularity issues.
One record for each image seems prohibitive, but only one record for
a whole large collection may under-represent the materials.
Awareness
A common CIC OAI-based resource will make students and faculty more aware
of resources at other CIC institutions. It will highlight the amount
and variety of digital resources available to faculty and students
at CIC institutions.
Experience
Since both the Institute of Museum and Library Services (IMLS) and the
National Science Digital Library (NSDL) have chosen OAI-PMH as a strategic
tool for uniting digital collections, CIC institutions need experience
with building OAI infrastructure.
Testbed
A CIC OAI-based resource provides a testbed for future grant-funded projects.
The scale and variety of such a CIC testbed will make it useful for
projects such as interfaces with course management systems and authenticated
access to licensed databases.
A successful OAI-PMH project needs to look beyond the technology and
collections issues to integrate the resources into the teaching and
research aspects of the participating universities. The VEL in its
current form is largely a staff tool. An OAI-PMH-extended VEL needs
also to be a tool for students and faculty. While specific goals for
this three year project need to be sufficiently limited to be realistic,
the intent is to build an infrastructure that can meet the needs of
the whole campus of all CIC institutions.
Collections
The DLIOC proposes creating a new OAI harvesting service for all CIC
digital collections. Our target audience is the teaching and research
community at our institutions. In order to be most useful to the widest
range of users, the project should include metadata for digitized content
only. The metadata may include locally created content as well as materials
purchased or licensed by all participating institutions.
The focus will encourage contributions that support the instructional
needs of the institutions and materials that are not represented at the
item level in the catalog. Since more detailed information often yields
more useful OAI records, the project will favor materials with richer
metadata. Some mapping of metadata types will be necessary as part of
the infrastructure development.
Examples of specific collections include: the Chopin collection (Chicago),
the Wright American Fiction collection (Indiana), the National Gallery
of the Spoken Word (Michigan State), Making of America (Michigan), Belgian-American
Research collection (Wisconsin), and the World War I and II Posters collection
(Minnesota).
Metadata Research and User Interface Design
This research will enhance interoperability, and point to best practices.
The aggregated OAI metadata will have varying levels of granularity.
Increasingly complex relationships, navigational modes, access conditions
and electronic formats may require richer metadata than Dublin Core.
The CIC OAI-harvester project provides an opportunity for greater focus
on interface and metadata issues than was possible in the Mellon grants.
Techniques for designing the interface will include data normalization,
index-based browsing and/or search limiting, result clustering, and data
mining, in addition to the usual layout/presentation issues. Many of
these techniques rely on the underlying metadata.
Interface Evaluation
Evaluation will be done at each institution, and may vary in complexity
and extent. The intent of the evaluation process is ongoing feedback
about choices and directions for the OAI metadata harvesting service.
Each participating institution will conduct usability testing of local
users using collaboratively developed tools and testing procedures.
Centrally maintained transaction logs will also be analyzed. This testing
will be done at least annually, and will provide necessary feedback
for development staff.
Future Plans
Future plans include interfaces with courseware management systems, exploring
the inclusion of finding aids that point at non-digital or non-shareable
materials, addressing authentication issues, and seeking grants to
support further work.
The management of this project needs to use existing expertise and
committee structures that include a cross-committee evaluation team.
It also needs
to rely on active involvement from each participating institution.
Both the costs and the management structure reflect these principles.
Costs
DLIOC members have consulted about the support needs for the harvesting
tools. Significant development on these have already been done through
the Mellon Foundation grant. UIUC believes that $6500 in cash per year
for three years from at least eight of the CIC institutions will suffice
for the infrastructure work. The money would be held in a CIC account
and made available as needed. It will be used to pay staff time and
other infrastructure development costs, primarily at UIUC, which will
also contribute local and grant-funded resources. The work includes
coordination efforts, customizing the search engine, adding new fields,
normalizing data, feedback to data providers, and writing usability
testing scripts. Michigan will continue to support OAI in DLXS.
In addition the DLIOC recommends that participating institutions contribute
at least six weeks of time for one systems staff member, which would
be spent on bringing up provider services, integrating access into local
online resources, developing best practices, advertising availability,
and doing evaluation.
Structures
The DLIOC recommends that the Directors appoint a management team that
oversees the financial and administrative aspects of the project. The
DLIOC as a whole should remain closely involved with the implementation.
The DLIOC also recommends a cross-sectional advisory team with representation
from CIC committees on courseware, public services, reference, collections,
and technical services. This team could work on interface and evaluation
issues, and could contribute to a final report. Timothy Cole at UIUC
will oversee the day-to-day work of project.
Project Evaluation and Dissemination
The DLIOC will evaluate the progress and success of the project annually,
in consultation with the cross-sectional team. Criteria will include
actual use, content growth, interface development, user testing results,
and the establishment of best practices. DLIOC members will also
disseminate the results within their institutions and more broadly
through conferences
and publications in the library and digital library world.
By design the technical barrier and required effort for metadata
providers wishing to conform to the Open Archives Initiative Protocol
for Metadata
Harvesting (OAI-PMH) is low. OAI-PMH allows institutions great flexibility
in how they choose to implement the protocol and in how they choose
to integrate OAI-PMH functionality with existing metadata creation
workflow.
OAI-PMH is built on top of the ubiquitous HTTP and XML standards.
A wide range of Open Source tools for creating new OAI-PMH metadata
provider
implementations are available on the SourceForge Website (http://www.sourceforge.net/).
The essential requirements for implementing an OAI-PMH metadata provider
are:
- A Web server with CGI capability (may be used for other functions
simultaneously)
- XML validation and parsing software
- Accessible metadata with defined
mappings to DC
- Staff time to create, adapt, & maintain CGI scripts
required to tie these components together
While there is obviously some incremental demand on metadata provider
IT infrastructure, the protocol allows implementers to manage these
additional demands as appropriate to their situation:
- Metadata provider
controls maximum number of metadata records sent at one time in
response to any OAI-PMH request.
- Metadata provider can set minimum interval
between servicing requests
from any one harvester.
- Metadata provider can define how many
simultaneous harvests it services at any time.
- Metadata provider can
terminate an in-progress harvest at any time, and can specify the
interval the harvester should wait before
retrying.
- Metadata provider can use standard Web server functionality to block
or limit who can harvest (i.e., by either IP address
or paired userid - password strings).
Typical staff time required to implement
OAI-PMH (assuming accessible metadata with defined DC mapping) is
at most 2 - 4
person-weeks for initial development time (programming and customizing
Open Source tools,
writing
supplemental CGI scripts, creating XSLT or other
metadata transforming utilities). Technical level is such that bulk
of this
work may
be done by graduate assistants or other part-time
programming staff. Small
amount of ongoing maintenance is required for dealing
with errors in metadata,
ongoing modifications in metadata workflow, character
encoding issues, and responding to bug reports from harvesting
agents. This work is
typically incorporated into existing system administration
workflow.
Illustrative implementations (many other possible):
- Apache / Tomcat Web server (Open Source) running on a Linux Server.
- Metadata
stored in a mySQL database (OpenSource).
- Java serverlets running as
extensions of the Tomcat Web server component connect to the mySQL
database and service OAI-PMH requests.
- Microsoft Internet Information
Server running on a Windows 2000 server.
- Metadata stored as XML files
in MODS or MARCXML format.
- Active Server Page scripts running as extensions
of the Web server service OAI-PMH requests, using XSLT stylesheets
to transform
metadata files as required.