Accepted Presentations for User Group Sessions

Session 1

DSpace: Governance and Architecture

Moderator: Tyler Walters

  • DSpace Status Update
    MacKenzie Smith

    A review of and update on the current status of the DSpace open source software community governance plan, including a separate 501c3 non-profit corporation, and a series of activities that the organization may undertake to help the community continue to use the DSpace platform more easily. This plan is the result of a series of recommendations by an ad-hoc DSpace advisory group that met in 2006 to discuss the long-term governance and sustainability of the project. Information about this is available on the project wiki at http://wiki.dspace.org/index.php/DspaceGovernance.

  • DSpace Architecture Update
    John Ockerbloom

    A review of the recently approved new technical architecture for DSpace that will improve key aspects of the system like its add-on mechanism and customizable workflows, as well as an improved data model with better support for versioning and FRBR-like structure. Detailed information from this group's deliberations is available on the project wiki at http:// wiki.dspace.org/index.php/ArchReview.

  • Introducing Manakin: Overview and Architecture
    Scott Phillips, Cody Green, Alexey Maslov, Adam Mikeal, and John Leggett

    Manakin is the second release of the DSpace XML UI project. Manakin introduces a modular interface layer, enabling an institution to easily customize DSpace according to the specific needs of a particular repository, community, or collection. Manakin’s modular architecture enables developers to add new features to the system without affecting existing functionality. First the project’s goals will be introduced, followed by a discussion of Manakin’s relationship with DSpace. Finally, an architectural overview of the primary components will be given.

EPrints: Experiences (I)

Moderator: Leslie Carr

  • A World-Wide repository: the technical challenge of E-LIS
    Zeno Tajoli, CILEA, Italy

    E-LIS is the largest world-wide disciplinary repository for Library and Information Science. It stores and delivers metadata and digital papers in different Unicode scripts (Latin, Chinese, Greek and others). Contributions come from more than 80 countries in all continents. At present it contains around 4,500 full-text documents. The presentation describes the technical improvements implemented in order to manage linguistic differences in uploading, searching and disseminating contents, and to help the editors share their review tasks according to their country. We conclude with an analysis of the beta version of EPrints 3 against some problems that are still open.

  • Research Funding Agencies: repository requirements to support open access mandates
    Pauline Simpson, National Oceanography Centre, UK

    The last few years have seen a number of research funding agencies worldwide mandate the deposit of publications, resulting from their research grants, into institutional or subject repositories. In the UK, the Wellcome Trust led the way, but in 2006 after a long consultation the Research Councils UK published its Position Statement and each Research Council issued their individual mandates, encouraging or requiring their grantees to deposit full text in open access repositories. Some research councils had existing repositories, others needed to build one. Funders repositories, whilst primarily supporting open access, also wish to exploit the data for other purposes and require more from repository software, than its vanilla version supports. Some examples from a university and a research council exemplar are used to illustrate how mandates and the ‘one record for many purposes’ advocacy has contributed to bespoke repository software repurposing.

  • "Latest News": Eprints meets Web 2.0
    Anita Coleman and Joseph Roback, University of Arizona, US

    A key Web 2.0 tenet is that users add value and expand the usefulness of the software. Eprints, originally envisioned as software for building a digital repository, is now being extended in many ways by its users. We report on the development of ‘Latest News’ a small feature, we added to our eprints-2.0 based archive, dLIST. Latest News is wildly popular as a social networking tool with the dLIST communities. dLIST is an interdisciplinary and cross-institutional archive for the Information Sciences with about 10 editors who connect the fragmented communities in these related areas. It has become obvious that a News module that is more blog-like whereby multiple editors can post News to stay in touch with their respective communities would greatly enhance our efforts to grow active users for the repository. We are now investigating the development of News as an Eprints 3.0 plug-in. Scholarly behavior, including self- archiving, varies by discipline but features such as News help all scholars to see themselves as active participants not just in repository growth and use but also its design and software development.

Fedora: Design Strategies for Repositories (Strategy/Requirements)

Moderator: Carol Minton Morris

  • Best Practices in Developing a Digital Library Repository Using Fedora
    Leslie Johnston

    In three years of development the University of Virgina Library has revisited decisions it made in establishing a digital library repository using Fedora for the management and delivery of digital collections. They have revised their implementation strategy, as well as starting the process to develop content models for new media types: video, datasets, audio, and GIS. This presentation will present processes used for the identification of functional requirements, inventorying of digital media, development of production standards, and the translation of the output of those processes into content models and disseminators.

  • Adventures in architecting and implementing digital repository services: a case study of the Tufts digital repository
    Robert Chavez, Anoop Kumar, Nikolai Schwertner

    The Digital Repository Group at Tufts University has designed and implemented a Fedora-based service-oriented digital repository to provide for long-term management and integration of both new and legacy digital collections at Tufts University. This presentation describes the current architecture of the Tufts digital repository and several associated services that facilitate the use and dissemination of repository content. The end-user services that will be discussed include an ingestion service, a naming service, a personal publication service, a social tagging service, a collection browsing service, and the general dissemination services of the repository. A discussion of the following applications will illustrate the convergence of the systems oriented and customer oriented approaches and demonstrate a variety of dissemination types available to applications via the Tufts digital repository service: Perseus Project digital library and Open Content Alliance content, American Antiquarian Society Early American Voting Records web environment, Tufts Fine Arts Department web-based curriculum tool, and the Visual Understanding Environment.

  • Versioning of Digital Objects in a Fedora-based Repository
    Matthias Razum

    eSciDoc is as a shared project of the Max Planck Society and FIZ Karlsruhe, with the aim to realize a platform for communication and publication in scientific research organizations. This presentation gives an overview on the complex versioning requirements for digital objects in the scope of the project eSciDoc and discusses our solution based on Fedora. Besides ensuring permanent access to the research results and research materials of the Max-Planck Society, the result of the entire eSciDoc project is intended to support scientific collaboration in future eScience scenarios. The goal requires a shift from traditional digital library systems to a more interactive environment in which information consumers become as well information producers. Collaborative authoring raises an issue familiar to software developers: versioning of digital objects. All intermediate or working versions of artifacts should become part of the repository, not just the final versions. Four major requirements pertaining to versioning have to be fulfilled by the underlying repository architecture: 1) Versioning on Object Level, 2) Fixed and Floating Object References, 3) Internal and public versions, and 4) Container objects.

Session 2

DSpace

Moderator: Catherine Jannik

  • Using DSpace for Digitized Collections
    Lisa Spiro, Marie Wise, Sidney Byrd and Geneva Henry

    As organizations adopt institutional repositories (IR) to store and make accessible scholarly materials, they are finding new and expanded uses for these powerful tools. Institutional repositories can archive not only “born-digital” assets such as pre-prints and dissertations, but also digitized materials such as books, photographs, and recordings. Such primary source materials serve as building blocks for research, particularly in the humanities and social sciences. Although DSpace, one of the leading IR systems, was originally designed for born-digital resources, Rice University has adopted it as a platform for digitized materials as well. Using a single IR for different kinds of scholarly assets provides unified access to diverse materials and can be more efficient than running multiple systems. Making DSpace work for complex collections of digitized materials can require developing new tools and processes. Rice is using DSpace for several digitization projects, including the Travelers in the Middle East Archive (TIMEA), a collection of XML-encoded texts, images, and maps focused on Western interactions with the Middle East; the Shepherd School Archive of digital audio of performances at the music school; and the Rice Institute Pamphlets Archive, PDFs and XML-encoded text of a significant academic journal. Each of these projects poses unique challenges. This presentation will include a discussion of how Rice has confronted these challenges in employing DSpace for digitized assets.

  • Digital Repository Projects at the North Carolina State University Libraries
    James Jackson Sanborn and Jim Tuttle

    The North Carolina State University Libraries has undertaken a number of projects related to the development of digital repositories. In the creation of our open repository of scholarly content, we have taken an uncommon approach. Rather than develop or deploy a repository system and solicit contributions from campus researchers and organizations, the NCSU Libraries has worked to leverage existing collections and to identify other untapped resources already rich in content. This approach will result in three major collections: published faculty papers which draws upon a pre-existing database of over 20,000 citations of papers published by NCSU affiliated authors; NCSU promulgated technical reports from departments and institutes across campus which was created through harvesting content and automated ingest; and a collection of NCSU electronic theses and dissertations (ETDs) submitted through legacy ETD software. DSpace is also functioning as the repository for geospatial data acquired through the NCSU Libraries NDIIPP (National Digital Information Infrastructure Preservation Program) investigation. The NDIIPP DSpace instance is a controlled-access archive selected to investigate repository-agnostic pre-ingest workflows on highly complex content. This session will share the high-level architecture of our system, focusing on the ways we have worked to integrate DSpace with other systems and processes. Management, ingest and access tools created for these projects will be presented. We will also discuss the decisions made regarding planning, policy and implementation, as well as future goals for these, and other, digital repository projects.

EPrints: Experiences (II)

Moderator: Leslie Carr

  • Aquatic Commons
    Stephanie Haas, University of Florida Libraries

    The International Association of Aquatic and Marine Science Libraries and Information Centers (IAMSLIC) is a global non-profit organization that provides a forum for discussion of information issues related to marine and aquatic science. During the last ten years, its members have developed resource sharing mechanisms that support research throughout the world. With an existing Z39.50 Distributed Library that supports interlibrary loan functions, a more comprehensive integrative Aquatic Commons model was developed to address the need for and growth of repositories and harvesters. This presentation will discuss the model and the development of the E-prints Aquatic Commons repository.

  • The Classification of Open Source Software and its Implications for Open Access Research
    Fernando Elichirigoity (University of Illinois Graduate School of Library and Information Science) and Cheryl Malone (University of Arizona), US

    The open source/open access movement can be seen as both "pushback" to counter the increasing protection of copyrighted information at the expense of use, even fair use, and as a new mode of production. But as open source software has developed into an industry demonstrating that traditional copyright protection is not necessary to earn profits, other factors have begun to drive the development of new kinds of software, such as EPrints for creating open repositories of information. At the same time, the difficulty of conceptualizing and classifying open source software production apparent in information infrastructures such as the North American Industry Classification System (NAICS) suggests a parallel difficulty in conceptualizing and classifying open access research. NAICS organizes economic data on a principle of "like production processes." For the Information Sector of the economy, as formulated in NAICS, a key production process is the acquisition and defense of copyright. With open source, copyleft licensing eliminates copyright acquisition and protection as major production processes, suggesting that the open source software industry warrants a separate NAICS category. Similarly, the open access movement presents its own challenges in terms of appropriate categories and metrics, and it's not clear that traditional copyright and open access can coexist in the scholarly publishing industry, despite evidence from ROMEO (romeo.eprints.org) that many publishers allow self-archiving. In this presentation we will analyze the intertwined historical development of EPrints software and EPrints repositories in an effort to suggest the implications.

  • Preservation Services: External Expert Support for EPrints Managers
    Jessie Hey, University of Southampton. UK

    "It is a truth universally acknowledged, that a repository in possession of a good collection of digital objects, must be in want of a preservation solution." Finding the best prospect for long-term preservation is quite as taxing for a repository manager as it was for Jane Austen's characters. The complete range of digital preservation activities is complex and is likely best managed by specialists – preservation service providers - and unfortunately we don’t yet know what a preservation service provider looks like. This presentation shows the work of the JISC PRESERV project in developing the beginnings of a cross-platform preservation service for digital repositories. In conjunction with The British Library and The National Archives, the first of a planned portfolio of preservation tools was deployed through the Registry of Open Access Repositories to demonstrate the utility of Preservation Services to EPrints (and other) repository managers.

Fedora: Workflow for Ingest and Validation (Technology)

Moderator: Richard Green

  • Submission of Content to a Digital Object
    Andreas Hense and Johannes Mueller

    The prototype of a workflow system for the submission of content to a digital object repository is presented. It is based entirely on open-source standard components and features a service-oriented architecture. The front-end consists of Java Business Process Management (jBPM), Java Server Faces (JSF), and Java Server Pages (JSP). A Fedora Repository and a MySQL database management system serve as a back-end. The communication between front-end and back-end uses a SOAP minimal binding stub. In this presentation we present the prototype, show the software architecture, and discuss the possibilities and limitations of workflow creation by administrators.

  • Developing an Ingest Service for Fedora
    Ryan Scherle and Muzaffer Ozakca

    Although the Fedora architecture holds great promise for storing and managing digital collections, tools for ingesting content into a Fedora repository do not provide the combination of power and flexibility needed to ingest many types of collections. As a result, the community has not standardized on a single method of ingestion. We will review the current tools available for ingesting content into Fedora and describe the need for a tool that combines the power of content-specific tools with the flexibility of more general-purpose tools. In addition we will present the architecture of the Ingest Service being built for the digital repository at Indiana University. The Ingest Service provides a simplified API for updating objects that have been placed in the repository, which will ease the creation of tools for cataloging and automatically ingesting objects from external sources.

  • RIFF: Referential, Rule and Schema based Content Model Validation Tool for Fedora
    Stephan Drescher and Toke Eskildsen

    Logical preservation systems require an active evaluation of the integrity of their hosted data and implemented data models. All parts of the system have to be referenced via standards to which we have given the name contracts. RIFF aims to capture different ways of designing Fedora data objects--later called content models--and gives these a framework for validation. RIFF distinguishes between object to object relational checks, object type checks (e.g. schemas) and specific object content checks. The framework includes autonomous validators for those tasks, which can also be combined with each other. RIFF checks can operate both on a running Fedora installation or independently on a filesystem's FOXML data repository (for example, pre-ingests).

Session 3

DSpace

Moderator: Karen Zimmerman

  • Introducing New Services with DSpace
    Julie Griffin, Kent Woynowski and Susan Wells Parham

    The Georgia Tech (GT) Library and Information Center established SMARTech (http://smartech.gatech. edu/), our DSpace Institutional Repository (IR) in August 2004. We envisioned an open access (OA) system of user-submitted scholarly faculty output. It soon became clear that few faculty members would submit their own work. Submitting to SMARTech was also not the highest priority for their graduate students or administrative assistants. To obtain initial content for the system, we began batch loading technical and research reports from departmental and lab web sites. We enriched the descriptive metadata by including our own subject headings and keywords. We began to think of SMARTech as less of a product, and more of a service. This service-oriented focus broadened the collecting scope of our IR, and expanded our use of DSpace as a tool for providing publishing and preservation services to the GT community. Our first service was to submit faculty research ourselves; to supply item level metadata and review copyright. We decided adding more publishing services would make supplying content for SMARTech easier for faculty. We also decided it would be mutually beneficial to expand our use of DSpace to include these new conference and journal publishing services (http://epage.gatech.edu/ ) because faculty include publications, conference participation, and editorial positions in tenure and promotion packages. The new services would offer faculty a low-cost model for creating and maintaining conference web sites and OA journals, allowing them more time to focus on content rather than system support. These expanded services will reinforce the position of SMARTech as a valuable service to the GT community.

  • SPECTRa - Federated Data Reposition Using DSpace
    Jim Downing and Alan Tonge

    The SPECTRa (Submission, Preservation and Exposure of Chemistry Teaching and Research Data) project is a JISC funded collaboration between the university libraries and chemistry departments at the University of Cambridge and Imperial College, London. The project addresses the provision of open access to primary research data ("Open Data") in experimental chemistry through Institutional Repositories. This presentation will describe the project and its outputs, go in depth into the technical interactions with DSpace and investigate how SPECTRa could inform federation interactions between Institutional Repositories and institutional science research.

EPrints: Training (I)

Moderator: Leslie Carr

  • EPrints Tutorial
    • How to Configure EPrints
    • Description of EPrints Capabilities
    • Walk Through the EPrints Configuration Files

Fedora: Preservation and Archiving (Strategy/Requirements)

Moderator: Sandy Payette

  • Fedora Preservation Services: A Working Group Report
    Ron Jantz

    A major objective of the Fedora Preservation Services Working Group (WG) is to define the requirements and architecture for preservation services that can be integrated into Fedora. We believe our work will provide capabilities for Fedora users to build trusted repositories. To accomplish our objectives, the Working Group is specifying services and technologies that can be readily integrated into the Fedora Framework. In the specification process, the WG is focused on the underlying capabilities to support digital object persistence, life cycle management, multidisciplinary collections, and management of the repository environment (e.g. storage, memory, operating system, etc). Capabilities and features currently under consideration include checksum creation and validation, event management and messaging, and a repository history service. This presentation will provide a WG progress report with special emphasis on the concept architecture, key features, and the development roadmap for preservation services. The discussion will also cover how Fedora preservation services relate to the PREMIS data model and the audit checklist for trusted digital repositories.

  • The AGU Digital Archive: A Case Study
    Carter Glass

    The American Geophysical Union is a leading publisher of scholary journals in the Earth and space sciences and has published every article in a fully electronic version since 2002. As a scientfic publisher, AGU has an obligation to preserve its digital content. AGU selected FEDORA as a long-term 'deep' archive for its electronic publications. AGU built tools and processes on top of the FEDORA platform. The paper will describe how AGU selected FEDORA as an archiving platform, how AGU preserves electronic article components and meta-data, and AGU strategies for long-term preservation and migration. The complete AGU archive architecture will be presented as well as samples of working source code.

  • A service-oriented workflow for the ingest and preservation of complex digital objects
    Mark Hedges

    The Arts and Humanities Data Service (AHDS) is a body that maintains a repository for the preservation and dissemination of digital resources arising from research in arts and humanities disciplines, mostly resources resulting from publicly funded projects carried out by academics at UK institutions of higher education. These resources are very varied in size and data format, and can be of very complex structure. To facilitate the management of these collections, the AHDS is migrating its repository to a Fedora-based system. We will present the AHDS vision to develop a workflow that supports all the required pre-ingest functions, from initial deposit up to ingest into Fedora, while minimising the need for human intervention. For example, these functions will include the creation of metadata and the normalization of deposited files to formats suitable for preservation. Our approach is not to develop a monolithic tool, but a set of modular web services, each encapsulating a well-defined unit of functionality at an appropriate level of granularity, which can be configured and combined to produce workflows. Access control will be implemented for all user inputs to identify the agent who will in turn be recorded in the preservation metadata.

Session 4

DSpace

Moderator: Sayeed Choudhury

  • DM-DSpace and PF-DSpace: Standards-based Peer-to-Peer DSpace Federation and Federating DSpace-Based Digital Museums in China
    Wei Liu, Xukun Shen, Yue Qi, Yuhong Xiong, Baoyao Zhou, James Rutherford, Xiaoyu Li, Weihua Huang, Shu Wang, Bailiang Chen, and John Erickson

    The China Digital Museum Project (CDMP) is an ongoing collaboration involving the Chinese Ministry of Education, HPLabs and several Chinese universities. The goal of CDMP has been to enable a federation of universities to provide a large-scale infrastructure based on DSpace to store, manage, preserve and disseminate the digitised versions of university museum artefacts. In the final phase of CDMP, there will be more than 100 university museums with digital artefacts stored in federated DSpace installations. The federation architecture of both DM-DSpace and PF-DSpace consists of a number of repository nodes exposing an OAI-PMH data provider interface to enable authorised repositories to harvest METS Dissemination Information Packages (DIPS). These DIPs are the currency that enables a harvesting repository to replicate the underlying digital object. We are also able to distribute OAI-PMH “friends” lists within a pool of nodes: given the OAI-PMH baseURLs for a set of repositories, a PF-DSpace instance asks each repository what repositories it knows of and add those to the set, recursing through the total set. Our next steps will be to explore possible vocabularies for performing selective harvests of remote collections and relating this to the existing METS ingest/dissemination mechanism that already exists in DSpace; implement and verify a joint protocol for admitting new peers into federations; develop a set of “strawman” administrative best practices for distributed, peer-based repository federations. We also plan to explore the integration of PF-DSpace into the research environment (“LabSpace”) by peering it with other elements of the research infrastructure, including departmental wikis, existing technical report repositories and the like.

  • Configurable Submission System for DSpace
    Tim Donohue

    The DSpace Submission User Interface is somewhat limited in its abilities to be configured for locally developed policies and procedures. DSpace does allow for custom metadata schemas and metadata gathering interfaces, but there is little configurability beyond that. UIUC has developed what we call the Configurable Submission System which modularizes the DSpace submission process into a series of "steps". Each “step” generally represents a single submission “module”, in charge of gathering specific information important to constitute a single DSpace submission package. In this session, the presenter will discuss the benefits and usage of the Configurable Submission System for DSpace. He will provide high level details of how to rearrange, remove and create new steps within the normal DSpace submission process, as well as how you can customize the submission process on a collection-level. Finally, the presenter will discuss some of the upcoming ideas/plans that IDEALS has for providing more automation to the submission process, by implementing custom non-interactive steps.

EPrints: Training (II)

Moderator: Leslie Carr

  • EPrints Hands-on Exercises - Bring a laptop!
    • Branding with Confidence
    • How to Make New Views and Searches
    • How to Manage New Kinds of Digital Object for Deposit
    • How to Manage New Metadata Requirements
    • Configuring the Deposit Workflow

Fedora: Repository Exposure (Technology)

Moderator: Ross Wayland

  • Simplifying Fedora Frontends with XForms and Fedora Disseminators
    Matt Zumwalt

    One of the primary barriers to adoption of Fedora is the difficulty in deploying custom frontends to the system. MediaShelf is currently refining techniques that leverage XForms and Fedora disseminators that address this challenge. We will present MediaShelf, LLC , which provides hosted online media archival solutions as well as consulting services focused on planning, development, and deployment of the Fedora Repository Architecture.

  • Building an Institutional Repository Interface Using EJB3 and JBoss Seam
    Peter Murray

    The vision of the Ohio Digital Resource Commons (DRC) is to leverage statewide economies of scale with a content repository that enables higher education and other Ohio institutions to rapidly publish and comprehensively access the wealth of research, historic and creative materials produced by Ohio's scholarly communities. The tools and services of the DRC feature an interface that appears as if the repository is at a member's institution using the institution's URLs and institutional look-and-feel branding. In reality, OhioLINK maintains the underlying hardware and software—allowing institutions to redirect resources from building their own physical systems to adding and supporting content in the institution's repository space on the DRC. The first service facet to be constructed is the DRC Institutional Repository, featuring a community-oriented interface to ingesting and presenting PDF and related content. The DRC vision also encompasses other service facets on top of the content repository, including digital libraries, an online exhibition interface, e-journal publishing, integration with collaborative learning environments, ePortfolio, and electronic records management. This session will provide an overview of the Ohio DRC vision, a demonstration of initial DRC-IR functionality, and a discussion of using EJB3/SEAM to create an interface for a FEDORA repository.

  • Federated Authentication and Authorization for Fedora
    Chi Nguyen

    Fedora's popularity amongst institutions is largely due to its scalability and flexibility to handle a large variety of data types. With the increased take up rate, the need to support federated authentication and flexible authorization is becoming more and more evident. The main drivers are the need by end users to share data across institutional boundaries, and the ability to specify new and different authorization policies for repository data over time. In this presentation, we outline a new architecture for authentication and authorization in Fedora, based on open standards, that will address the new authentication and authorization requirements while preserving the need to support multiple GUIs for Fedora.

Session 5

DSpace

Moderator: John Erickson

  • If we build it, will they come?
    Philip Davis and Matthew Connolly

    Much of the work on institutional repositories has focused on their rationale, design, or implementation. While institutions have devoted significant resources to implementing IRs, there has been a scarcity of work on evaluating their IRs. If IRs are to achieve the vision of "a universal service for author self-archived scholarly literature"(Ginsparg, Luce, & Van de Sompel, 1999), strong contributions from faculty are absolutely necessary. This presentation will present a multi-faceted approach to evaluating the success of one institution’s implementation of DSpace in terms of faculty participation. First, we provide an empirical analysis of participation in and growth of the Cornell University DSpace, using item submissions and downloads as primary metrics. We identify three typical patterns of community growth and investigate the properties of the most highly-downloaded objects. Second, we provide a comparative analysis of data harvested from other institutional DSpace sites to compare patterns of growth and models of organization. For example, is it more effective to organize DSpace as a small number of general communities, or a large number of specific communities? Lastly, we report on a series of detailed interviews conducted with Cornell faculty across disciplines to better understand how faculty disseminate the findings of their research. We consider attitudes, motivations and rationale for behaviors such as sharing preprints with colleagues, depositing preprints in disciplinary archives, and posting published articles on personal websites. How scholars communicate is largely determined by the reward structures of their discipline. Based on these structures, we suggest why participation in digital repositories has become culturally engrained in some disciplines and largely ignored in others.

  • A DSpace-based Preservation Repository Design
    Joseph Pawletko and Ekaterina Pechekhonova.

    At NYU's Digital Library we are building a Digital Preservation Repository (PR) that uses DSpace as a core component. During the system design phase we were faced with the question "Should we build a monolithic application that does everything, or distribute the preservation functionality over a collection of components?" We decided upon the latter approach. In this talk we will discuss why we chose the component approach; the DSpace features and add-ons that enabled us to use DSpace as a component; the role DSpace plays in the overall PR architecture; our strategy for dealing with large files (> 4GB); other components and implementation technologies used in the PR (Java, Ruby, SRW/U, XML-RPC, Shibboleth, the Handle system, METS, MODS, LC-AV, SRB, and others); the current system development status; and future plans.

  • DSpace as a Platform: Creating Custom Interfaces with Content Packaging Plugins
    Don Gourley and Larry Stone

    The latest major release of DSpace, 1.4, introduced a simple plugin mechanism that provides a powerful means to extend and customize DSpace. This presentation includes two parts: First, I describe the new extension to the DSpace architecture and some of the plugins included in DSpace 1.4. Then, as an example of usage, I recount how content packaging plugins are used at the Washington Research Library Consortium (WRLC) to integrate DSpace with local tools that create and present digital objects. This case study illustrates how plugins can be used to define a custom network interface of simple HTTP GET and POST requests to access DSpace resources and services, opening DSpace up as a flexible and customizable repository platform.

EPrints: Official Launch of EPrints 3.0

Moderator: Leslie Carr

  • EPrints 3, walkthroughs and user demos including:
    • Streamlined eprint deposit and management interface
    • Comprehensive multimedia deposit types
    • Auto-completion of author & journal names for metadata quality enhancement
    • Full eprint audit history to support preservation applications
    • Comprehensive import and export facilities via RSS, BibTeX, EndNote, METS, OpenURL ContextObjects, XML, spreadsheets etc.
    • Support for Web 2.0 data facilities - e.g. Google Maps, Timelines etc.

Fedora: Project Highlights (Shorts)

Moderator: Ron Jantz

  • NSDL 2.0: Building a Collaborative Digital Library
    Dean Krafft

    http://NSDL.org has recently created the NSDL Data Repository (NDR) built on Fedora to serve as the premier source for science, technology, engineering and mathematics (STEM) education on the net. This repository is the foundation for creating an ecosystem of collaborative, contributory tools that fully integrate with the library. In this presentation we will discuss the challenges and significance of integrating repositories with semantic web technologies and "Web 2.0" tools. The NDR and its API support the creation, organization, discovery and dissemination of resources, sourced metadata statements about resources, arbitrary aggregations of resources and other aggregations, and agents, who serve as the providers of metadata and the selectors for aggregations. The NDR also supports arbitrary RDF-style relationships among resources and queries on those relationships. The production NDR, currently built primarily on aggregated OAI-PMH metadata records from over 125 collections, contains roughly 2 million science education resources, and currently about the same number of metadata statements about those resources. We will give an overview of the design of the NDR, showing the objects and relationships and describing the overall workflow. We will present the API, showing how it provides authenticated access to modify and query the NDR while preserving the constraints of the NDR data model. In addition to overviews of current and planned tools and services we will review applications of these tools to enhance science education, facilitate the transfer of science research to science education, and create a dynamic, living library of science, technology, engineering and mathematics education.

  • MPTStore: Implementing a fast, scalable, and stable RDBMS-backed triplestore for Fedora and the NSDL
    Chris Wilper and Aaron Birkland

    The National Science Digital Library (NSDL) hosts a large Fedora-based metadata repository that depends heavily on Fedora's RDF Resource Index. Currently, it holds about two million digital objects with an average of about 100 triples per object. Developers from Fedora and the NSDL have developed MPTStore, an alternative RDBMS-backed triplestore, with the explicit goal of supporting a very large, rapidly changing ResourceIndex in Fedora. We will describe the MPTStore based around the idea of mapped predicate tables (MPT). Each predicate in the triplestore is mapped to a single relation in a Relational database. A triple, therefore, is represented as a subject and object pair within a specific predicate relation. MPTStore itself is a Java library that provides access to the triples, manages mapping metadata and triple relations, and generates SQL queries for simple and complex queries. Currently, NSDL has adopted MPTStore as the ResourceIndex in its production deployment of Fedora. So far, this move has eliminated the RI corruptions and has greatly sped up ingest and update processes with bulk concurrent loads.

  • RepoMMan: using web services and BPEL to facilitate interaction with a Fedora repository
    Richard Green, Chris Awre, Ian Dolphin, and Robert Sherratt

    RepoMMan is developing a standards-based, flexible workflow tool through which users can interact with Fedora, using the repository not just as a public archive for finished works, but also as a tool that can support the development of such materials through a 'My Repository' facility. Using BPEL (the Business Process Execution Language) to orchestrate web service calls to Fedora and other software, RepoMMan is developing processes to manage a user's "works-in-progress". Automated metadata generation is the second major area being addressed by the RepoMMan project: this will be a further function of the tool that is being developed and will also be managed via BPEL and web service calls.

  • Fedora Outreach and Communications: A Working Group Report
    Carol Minton Morris,  Eric Jansson, Stacy Pennington, David Dinham

    Every project has a story as the Fedora Outreach and Communications Group discovered by conducting the Fedora Users Interview Survey from August-October 2006. The Fedora community is a patchwork of compelling narratives with both broad and concise goals for their institutions and their projects. The purpose of the Survey was to piece narratives together to find out what distributed Fedora development means in the context of a variety of use cases. Survey responses provide a compelling framework for developing a strategic plan for Fedora outreach, communications, and marketing. This information has the potential to encourage the growth, sustainability and ongoing development of a Fedora that is not only a technical solution, but that is also a vibrant community of collaborators. In this presentation we will review the Fedora Users Interview Survey methodology, results, and significant findings related to developing a plan for increasing Fedora Outreach and Communications activities. We will conclude with suggestions for a Fedora outreach, communications, and marketing plan to increase public awareness of Fedora particularly among institutional leaders.

Session 6

DSpace: Manakin Themes and Applications

Moderator: Robert Tansley

  • Manakin Themes: customizing the look-and-feel of DSpace
    Alexey Maslov, Cody Green, Adam Mikeal, Scott Phillips, and John Leggett

    A cursory examination of the more than 150 registered DSpace instances reveals a striking degree of conformity in style. Although there are many reasons for institutions to customize the look and feel of their repository -- such as institutional branding or imparting context -- the current JSP paradigm makes this a tedious task. Furthermore, some customization tasks, such as applying a different style to a specific collection or community, are currently not supported. Manakin enables these customizations to be easily applied to communities, collections or the entire repository. This portion of the presentation demystifies the process of creating a Manakin theme and adapting it to the unique needs of your institutional repository. The presentation will be structured into four main sections: theme components, basic and complex theme development techniques, and an overview of advanced topics.

  • Manakin Case Study: visualizing geospatial metadata & complex items
    Adam Mikeal, Cody Green, Alexey Maslov, Scott Phillips, Kathy Weimer and John Leggett

    Increasingly, repositories are responsible for preserving complex items, and items with specific/unique metadata, such as geospatial metadata. These collections present unique challenges for the repository interface, and traditional approaches often fail to provide adequate visualization mechanisms. This portion of the Manakin presentation is a case study of a particular collection that exhibits a Manakin solution to both of these challenges. The Geologic Atlas of the United States is a series of 227 folios published by the USGS between 1894 and 1945. Each folio consists of 10 to 40 pages of mixed content— including maps, text, and photographs—with an emphasis on the natural features and economic geology of the coverage area.

  • Content Interchange and the Invisible Repository
    Scott Yeadon

    The Australian National University (ANU) will be undertaking development work for the Australian Partnership for Sustainable Repositories (APSR) in 2007. Much of this work will be focused around repository interoperability and the integration of a repository service within the university’s application infrastructure. This presentation will discuss and demonstrate some of the prototype DSpace-related development work undertaken so far and planned for further development in 2007. Specifically: a METS SIP/DIP profile intended to be used as a national standard for the meaningful exchange of digital objects between repositories; separation of concerns at a functional level so an institution can select best-of-breed software, with an example using Open Journal Systems (OJS) to manage publication workflow, DSpace to manage preservation and Manakin as an access/publication point; and a Manakin theme incorporating Google Earth and Google Maps functionality.

EPrints: Official Launch of EPrints 3.0

Moderator: Leslie Carr

  • EPrints 3, the technical issues
    • Installing a new EPrints v3 repository
    • Upgrading from v2 to v3
    • Customisation and personalisation of deposit workflow
    • Support for third party plugins
    • Improved XML native import/export format
    • Programming applications with Web Services

Fedora: Plenary

Moderator: Thornton Staples

  • Fedora Project Plenary Presentation
    Sandy Payette, Co-Director, Fedora Project

    As many open source projects are discovering, there are many unknowns and challenges in sustainability. The directors of the Fedora Project and the Fedora Advisory Board are currently working on the transition plan for moving Fedora into a non-profit organization named Fedora Commons. We are working on multiple strategies to facilitate startup of this new entity and to maximize the potential for sustainability over the long haul. This entails pursuing new funding opportunities, building strong relationships with related open source projects, and focusing on outreach to new communities and new potential adopters. With more funding and more invested stakeholders, we plan for the Fedora Commons to take on a more ambitious agenda that we describe as “Scholarly and Scientific Service Oriented Architecture.” In this presentation, I will discuss plans for the Fedora Project in 2007 and the vision for the new Fedora Commons organization that is emerging.

  • Risk, What Risk: Choosing Fedora for the National Science Digital Library
    Kaye Howe

    In a world of technical and social innovation--from the proverbial high school student who cobbles together a killer app in a basement, to institutional leaders who go out on a limb promising end-to-end solutions on tight deadlines--the National Science Digital Library's (NSDL) development strategies fit somewhere in the middle. NSDL made the decision to build a Fedora-based technical platform to enable user participation and collaboration across over 200 partner digital libraries and other science, technology, engineering, and mathematics discipline communities in support of NSDL's educational mission. Dr. Howe will discuss the Fedora vision, why it inspired her management team to "take the leap," and how she balanced inherent risk and investment against long-term benefits. She will give an overview of a unique Fedora-enabled educational application and discuss why it solves a fundamental problem for K12 teachers. And finally, she will reflect on why she believes that, in the end, you will achieve none of your goals by avoiding risks.