IUG 2009 – OAI Harvesting for Encore

DeeAnn Allison

University of Nebraska-Lincoln


The goal is an integrated search that brings together the UNL catalog with all of the unique digital collections being created.


Search the catalog along with other resources.

Empowered account holders can add their own tags.

Send searches to ResearchPro to find articles


Library has a number of multimedia collections in CONTENTdm – still images, audio, video, and data.

Over 67 collections with over 189,000 items in the collection.

The CONTENTdm data is shared by many departments across campus, it can be difficult for the library to know when and how often content is updated. There may also be issues wit data quality since the library doesn’t handle all data entry duties.

Also needed to pull in EAD and TEI data.


Worked with OAI-PMH2 tool from Virginia Tech.



The library is more interested in being a data provider than in being a search engine developer.


Brought together data from various collections through Encore.


What should we catalog vs. what should we harvest? Ideally, they would prefer to harvest. Collection has many legacy records that were cataloged. However, there are many harvested TEI records that have to be cleaned up. Harvesting is the goal, but they are still working through some data issues.

The Latest on CONTENTdm: New Capabilities, New Possibilities

Excerpts from OCLC’s presentation. Their full presentation will be posted online after the conference.


CONTENTdm 5 will use Webalyzer for reports.


CONTENTdm will be added to the FirstSearch Base Package and will include

Full-function CONTENTdm hosted by OCLC

3 Project Clients for collection building (items also may be added with the simple web add form or Connexion digital import)

3,000 item limit and 10 GB storage

Available May 1, 2009


Digital Collection Gateway

Improve access and presence for digital collections

Synchronize non-MARC metadata with WorldCat




1. Unicode

Full support of Unicode for importing, storing, displaying and searching Unicode languages

OCR support expanded – 184 languages

Supports the creation of ditial collections in any language


2. Find Search Engine (used for WorldCat)

Find search engine integrated into CONTENTdm software

More robust capability and the ability to offer additional search features


Faceted searching

Spelling suggestions

Unicode searching

Search in any language


3. Controlled Vocabularies

Adds efficiency to colleciton building by providing pre-loaded thesaurifor cataloging

Integration with OCLC Terminologies Service

Providing nine new thesauri for CONTENTdm users


4. Reports

More robust, scalable reporting module integrated into software

Provides expanded reports

Views by collection and item


5. Flexible workflows

Added more options for approving and indexing items

New batch and subset handling of pending items

One-click approve and index on demand

Scheduling options for approve and index

Background processing


6. Registration

New registration process added during installation

One-click sends server information to OCLC

Registered servers called once a month to gather data on usage


7. Project Client

New client application replaces old version

New programming language

New, more intuitive interface

Unicode support

More robust

Project Settings Manager – Metadata Templates

Different templates for different file types

Images, JPEG, JPEG2000, TIF

PDF, compound objects, URL, audio, video

Options for generating data from different file types

Images – Colorspace, Bits per sample

PDF – Extracts content from embedded fields (application, author, date modified, date created, etc.)


8. File Transfer

Replaced FTP with custom HTTP transfer protocol

Uploading items occurs in the background

Continue working while items are uploaded

Pause process and resume later


9. EAD

New import process and display options

Custom metadata mapping

Full text searching

Search term highlighting within EAD

Multiple display veiws

XML web service

Users control metadata mapping and display


10. Capacity

Increased capacity throughout application

Supports more collections, items for batch processing, and metadata fields.

Expand metadata schemas to incorporate preservation metadata or more custom fields

Faster batch processing and conversion from existing databases