Archive for the ‘digital libraries’ Category

This session examined the experiences of three different schools in maintaining and marketing their institutional repositories.

 

Michelle Harper

OCLC Moderator

Director of Special Collections

 

Sarah L. Shreeves

University of Illinois at Urbana-Champaign

Coordinator for the Illinois Digital Environment for Access to Learning and Scholarship (IDEALS)

http://www.ideals.illinois.edu/

 

The library manages the front-end and services portions of the project. Campus IT handles the infrastructure. This project has always had strong administrative support. This is a DSpace-based repository, and it houses content from departments across campus. There have been 1.2 million downloads from the repository.

 

They are shifting their thinking from a repository-centered focus to a services-centered focus. The idea of just filling a box with stuff is a dead end. They think in terms of both services and collections. This is a substantial shift in thinking which is something of a trend in this field.

 

IDEALS is leading the way in the library’s other digital preservation efforts. Existing policies and procedures can be applied where applicable. Technical reports and occasional papers are added to IDEALS. It’s becoming more fully integrated into campus workflows.

 

They have tried to eliminate bureaucracy and enable departments to add content on their own. When collections are set up for departments, the departments are given free reign to develop their own policies and procedures. The managers have tried to help weave it into the fabric of the university.

 

They are also looking at serving non-traditional users and special missions of the university.

 

They try to think about the roles of their repository: access, dissemination, and long-term preservation.

 

MacKenzie Smith

MIT Libraries

Associate Director for Technology

http://dspace.mit.edu/

 

The managers of MIT’s project think of the institutional repository as part of the library’s mission of preserving university-generated content.

 

The repository has 40-50 thousand documents of high-quality content.

The success of a repository depends on how well you define, use, and market the repository.

 

If you look at it simply as a piece of infrastructure, it’s cheaper than your link resolver. If you look at it as a suite of services that are critical to the future of your organization, it has to be sustainable.

 

Conflating the institutional repository with things like open access is a mistake. You cannot pin the success of one on the success of another. Just because one is (or is not) successful, that doesn’t necessarily apply to the other.

 

How are we going to define success in terms of financial sustainability?

Are libraries comfortable with the blurring of the lines between libraries, museums, and archives?

What are the added value services?

Is it realistic for libraries to be in charge of their own technology fate?

Is it even useful to talk about institutional repositories outside of the context of libraries in general?

 

People don’t visit IRs just to see what an institution produced recently. They come because of a subject or because of types of content.

 

Catherine Mitchell

Directory, Publishing Services

California Digital Library

http://www.cdlib.org/

 

What might a sustainable IR look like?

Viable financial model

Interoperable design

Relevant

 

Even within one’s own infrastructure, the IR should be able to connect to other infrastructures.

 

We have to understand the nature of value in relation to academic research.

Who are the users and what do they need? It’s not enough to just build a place for stuff.

 

The managers realized that they weren’t engaging with their users. The IR only had 30,000 total documents while the university was producing more than 26,000 documents per year.

 

Ideological and practical irrelevance

Few on campus understood the term open access

Fewer seemed to understand or feel comfortable with the term repository

Virtually no one had heard of eScholarship (the brand name of this institution’s repository)

 

There was a need for support for:

Campus-based journal and monographic publishing programs

Multimedia publications

Data sets

Conferences

Non-traditional publications

 

In other words . . . Needs=value.

 

A rebranding initiative was conducted with a new focus on:

Providing a targeted and compelling publishing services infrastructure

Integrating those services into the scholarly research lifecycle

 

IR Deposit is a natural by-product of services rendered, rather than an end in itself.

 

Reinventing the IR as open access publisher

 

eScholarship Site Redesign

Emphasize services, not policy

Contextualization of content: engaging with problems of authority and legitimacy

Enhance research tools and publication display

Remain true to our development philosophy of simplicity, generalizability, and scalability

 

Enhanced publishing services

Journals, books, conference papers, seminars

 

Marketing: What’s in it for the faculty?

Keep your copyright

Reach more readers

Publish when you want to

Protect your work’s future

 

Value Propositions

To enable scholars to have direct control over the creation and dissemination of the full range of their work.

To provide solutions for current and emerging scholarly publishing needs within UC that aren’t met by traditional publishing models.

Toa coordinate with UC Press to provide a sustainable publishing model that extends the University’s capacity to disseminate its creative output to the world.

 

Questions about formats for archiving and data migration

IDEALS offers three tiers of preservation support. Under the highest level of preservation support, there is an effort to maintain the viability, renderability, understandability, and functionality of the original digital object. For more information, see the IDEALS Digital Preservation Support Policy.

Dao Rong Gong

Lucas Mak

Michigan State University

 

As a quick note to self, this looks like it could be very useful for a pending project I have in mind. Can’t wait to get my hands on the conference presentation handout.

 

Innovative uses its own type of XML data. This can be retrieved through HTTP queries.

 

The data arrangement is based on MARC fields, but MARC fields and their subfield are siblings.

Two types of XML records that can be retrieved from Millennium: Brief records and full records

 

The Millennium System and XML

Encore has built-in functionality that allows it to harvest OAI-compliant services.

 

XSLT

Manipulation of XML documents by creating a new document based on

 

XSLT uses Xpath expressions to select/filter the data node

 

Sunday School books collection

Needed to batch load records into Content Pro. Original data source is based on MARC.

 

One option is to create a list of records as a review file. Records could then be converted to Qualified Dublin Core using MarcEdit.

 

Used an HTTP query to request the Innovative XML. Then turned that into Qualified Dublin Core with XSLT.

 

Issued with Converting Innovative XML Data

 

Data is structured differently from MARC21 XML

Availability of existing "Innovative XML to DC/QDC" XSLT?

 

Not optimized for data manipulation

Complications in data selection

Selection of data node by matching criteria against values in individual elements

A series of matching may be needed just to be able to select one node

DeeAnn Allison

University of Nebraska-Lincoln

 

The goal is an integrated search that brings together the UNL catalog with all of the unique digital collections being created.

 

Search the catalog along with other resources.

Empowered account holders can add their own tags.

Send searches to ResearchPro to find articles

 

Library has a number of multimedia collections in CONTENTdm – still images, audio, video, and data.

Over 67 collections with over 189,000 items in the collection.

The CONTENTdm data is shared by many departments across campus, it can be difficult for the library to know when and how often content is updated. There may also be issues wit data quality since the library doesn’t handle all data entry duties.

Also needed to pull in EAD and TEI data.

 

Worked with OAI-PMH2 tool from Virginia Tech.

http://www.openarchives.org/OAI/openarchivesprotocol.html

 

The library is more interested in being a data provider than in being a search engine developer.

 

Brought together data from various collections through Encore.

 

What should we catalog vs. what should we harvest? Ideally, they would prefer to harvest. Collection has many legacy records that were cataloged. However, there are many harvested TEI records that have to be cleaned up. Harvesting is the goal, but they are still working through some data issues.

I came across an interesting New York Times article several days ago: Exploring a ‘Deep Web’ That Google Can’t Grasp. The article explores a shortcoming of current search technologies that librarians have known about and struggled with for quite some time. As good as current search engines may be, they rely primarily on crawlers or spiders that essentially trace a web of links to their ends. That works for a lot of content out on the Internet, but it doesn’t do so well for information contained in databases. So . . . library catalogs, digital library collections, a lot of the things that libraries do aren’t being picked up by the major search engines.

Of course at some level that makes perfect sense. When a web crawler comes to a page with a search box, how is it supposed to know what to do? It needs to input search terms to retrieve search results, but what search terms are appropriate? Is it searching an online shopping website? A tech support knowledgebase? A library catalog? This discussion surfaces again and again particularly as we talk about one of our digital collections. There is a wealth of information here for people researching the history of accounting, but it resides in a database. The database works perfectly well for humans doing a search. The only problem is that they have to find out about the database first. Now we’ve done a number of things to get the word out: papers, conference presentations, a Wikipedia article . . . If we’re lucky, these things will get users to the top level of the collection. Hopefully once they’re there, their research will draw them in. (In case anyone notices, I should get credit for positioning that set of homonyms like that!)

But getting them there in the first place – that’s the hard part. That’s why I have so much hope for deep web indexing. If researchers can build tools that will look into our databases intelligently, then extensive new levels of content will ben opened up to everyone. In particular I think about students who decide that the first few search engine hits are “good enough” for their school project. Usually they’re not good enough, but the students don’t always realize that. If new search engines can truly open up the deep web, the whole playing field changes!

Excerpts from OCLC’s presentation. Their full presentation will be posted online after the conference.

 

CONTENTdm 5 will use Webalyzer for reports.

 

CONTENTdm will be added to the FirstSearch Base Package and will include

Full-function CONTENTdm hosted by OCLC

3 Project Clients for collection building (items also may be added with the simple web add form or Connexion digital import)

3,000 item limit and 10 GB storage

Available May 1, 2009

 

Digital Collection Gateway

Improve access and presence for digital collections

Synchronize non-MARC metadata with WorldCat

 

CONTENTdm 5

 

1. Unicode

Full support of Unicode for importing, storing, displaying and searching Unicode languages

OCR support expanded – 184 languages

Supports the creation of ditial collections in any language

 

2. Find Search Engine (used for WorldCat)

Find search engine integrated into CONTENTdm software

More robust capability and the ability to offer additional search features

Relevancy

Faceted searching

Spelling suggestions

Unicode searching

Search in any language

 

3. Controlled Vocabularies

Adds efficiency to colleciton building by providing pre-loaded thesaurifor cataloging

Integration with OCLC Terminologies Service

Providing nine new thesauri for CONTENTdm users

 

4. Reports

More robust, scalable reporting module integrated into software

Provides expanded reports

Views by collection and item

 

5. Flexible workflows

Added more options for approving and indexing items

New batch and subset handling of pending items

One-click approve and index on demand

Scheduling options for approve and index

Background processing

 

6. Registration

New registration process added during installation

One-click sends server information to OCLC

Registered servers called once a month to gather data on usage

 

7. Project Client

New client application replaces old version

New programming language

New, more intuitive interface

Unicode support

More robust

Project Settings Manager – Metadata Templates

Different templates for different file types

Images, JPEG, JPEG2000, TIF

PDF, compound objects, URL, audio, video

Options for generating data from different file types

Images – Colorspace, Bits per sample

PDF – Extracts content from embedded fields (application, author, date modified, date created, etc.)

 

8. File Transfer

Replaced FTP with custom HTTP transfer protocol

Uploading items occurs in the background

Continue working while items are uploaded

Pause process and resume later

 

9. EAD

New import process and display options

Custom metadata mapping

Full text searching

Search term highlighting within EAD

Multiple display veiws

XML web service

Users control metadata mapping and display

 

10. Capacity

Increased capacity throughout application

Supports more collections, items for batch processing, and metadata fields.

Expand metadata schemas to incorporate preservation metadata or more custom fields

Faster batch processing and conversion from existing databases

Wikipedia. Love it or hate it, admit it or not, lots and lots of people use it. People continue to express concerns about the accuracy and verifiability of Wikipedia‘s information and rightfully so, but library users are going there for information. Google searches (another favorite of library users) are turning up more and more Wikipedia entries. And librarians are using the site as well. One of the best descriptions of Wikipedia use came from a reference librarian. The basic idea was that when neither the librarian nor the patron know enough about a topic to research it, Wikipedia usually gives a number of relevant keywords and subjects that can guide further research in library resources.

Since USERS ARE GOING THERE, then it’s worthwhile to provide accurate information when and where we can. Now I’m not suggesting that librarians begin poring through the website, ferreting out inaccuracies, and posting updates duly attributed to reputable, verifiable sources. That’s fine if people have the time, but most don’t. No, instead I’m talking about contributing to the wider body of knowledge through Wikipedia when and where it is appropriate.

A couple of days ago I wrote about making some updates to the web pages for our Digital Accounting Collection. As we were talking about the collection, it occurred to me that Wikipedia might be a good place to share information about the collection. An entry might describe some of the collections as well as giving some history on the digitization project. When I came across this University of Florida entry, I was even more convinced that this was a good idea.

I created my account, and started experimenting in the sandbox. I worked with a couple of colleagues to develop the entry, and I posted an entry on the Digital Accounting Collection today. It was an interesting process to work through. Interestingly, this is the first Wikipedia entry for our library — interesting in an ironic sort of way I suppose. The Digital Accounting Collection was our first fully searchable digital collection, so I guess it’s only fitting that our first Wikipedia entry is about this collection.

So there it is. It’s out there. Since we wrote about things that we know and have worked with ourselves, the information is as accurate as it can possibly be to the best of our knowledge.

At least until somebody else edits it.  ;-)

It’s that time of year again. A colleague is getting ready for a conference presentation, and we need to do some web page updates for the Digital Accounting Collection. A number of new items have been added, and records for an entirely new collection have been created.

Several years after the fact we’re still proud of this one. This project was our library‘s first fully searchable online digital collection. It was a big project for us. Perhaps making the digitized items full-text searchable was a bit ambitious for our first digitization project, but it works and there is a lot of good content here!