The Project Formerly-Formerly Known as DRSR

“What’s in a name?  That which we call a rose by any other word would smell as sweet.”  –William Shakespeare, Romeo and Juliet

As all of you know we’ve gone through several iterations of names for this project.  Our working title started out as “Digital Repository Services for Research,” but we always knew it wasn’t a name that we wanted to use for our end-users of the services we’re building and linking.  So we moved on to “Research Cloud Services,” something we felt might work better as a name for public consumption.  But that name never took off within Libraries or IT, and most of us went back to calling it DRSR or “DRSR or whatever we’re calling it these days.”  Some branding work on the IT-side of our project helped us see that we don’t need to name the suite of services for our users– they’ll be more concerned with the individual services than with how all of the services work together.  But we all need to understand these services as parts of a larger whole so that we might be better able to help researchers throughout the research lifecycle, so internally and when/if needed we’re back to DRSR for this project.  No more name changes.  We promise.

Fall 2017 Update on Research Storage Project

STATUS REPORT: Maturing Research Data Storage Infrastructure Services 
High quality infrastructure services enable scholars to focus on the research work itself

NYU’s commitment to growth as a research institution coupled with the growing data needs in the research process drives the need to deliver strong services and support for storing and managing digital content within and between each stage of the Research Data Lifecycle.

The Research Data Lifecycle is a series of sequentially related stages or phases in which information is produced, processed and shared. In recent years, the amount of information has been growing exponentially.

The lines between circles represent the transitions that occur in research as work is finished and passed to the next stage. It is critical for a research institution such as NYU to support each stage of the lifecycle and to facilitate the smooth transition between stages for maximum use and impact.

In 2014, NYU IT and NYU Libraries teamed up to take a holistic view of the Research Data Lifecycle in order to consider services and environments that are interconnected and that would benefit everyone involved in these facets of research.

Last year we began planning for new infrastructure to support researchers’ needs. In particular, we delineated two areas of focus, Both of which would likely be comprised of multiple components, applications, or layers:

  1. A dynamic storage environment that allows for easy and fast access, sharing, and workflow management tasks.
  2. A publication environment that allows for deposit of finished digital content with persistent links, preservation, and discovery and access controls.

In Fall 2016 we began designing and building out pilots for both these areas. To facilitate that work, we brought on board storage architecture specialists, research data management librarians, application developers, and repository specialists from across our two organizations.

This year, to address dynamic workspace needs, we put into production the Open Science Framework for Institutions (OSF) software which allows for easy groups and research workflow management and connects to existing NYU storage options like NYU Drive (Google) and NYU Box, and to many other standard storage platforms.

To enhance underlying infrastructure options, we are now piloting a large storage system that is mountable from computers on the NYU network. This storage system allows researchers to access, manipulate, and analyze data from a large external drive with performance similar to their own desktop. The soft launch of the production version is expected to roll-out in Fall 2017 on new robust NetApp hardware. Researchers will get free access to 2TB of space, with a competitive cost structure for purchasing more. In addition, development will continue on integrating OSF with our new storage hardware, on creating more web accessibility for the storage, and on creating more and better publication repository options.

For the publication environment, we have made improvements to existing repository offerings like the Faculty Digital Archive, enhancing the user interface and metadata authoring capabilities. We have also used the parallel development of the new Spatial Data Repository as a use case for future development of other static preservation needs. Lastly, the OSF is giving users new ways to publish their work out by minting static Digital Object Identifiers (DOIs) at a project’s end.

Project Overview

Let’s Start at the Very Beginning, a Very Good Place to Start

This project began with the Next Generation Repository Planning Project, launched in December 2014 as a collaboration between IT and Libraries. That project’s primary goal was to create a roadmap for central repository services at NYU by studying the national landscape to see how other institutions are addressing the needs of researchers and librarians with respect to the research data lifecycle, and learning about the latest approaches being employed and explored from a technological and organizational perspective. This planning work was completed in July 2015, and it informs our approach to the Digital Repository Services for Research Project.

Why Repository Services? Why Now?

Providing support services for NYU research is a key responsibility for the Libraries and IT. High-quality infrastructure services enable scholars to focus on the research work itself. In scientific and humanities research, there are growing needs for storing, moving, finding, sharing, and accessing digital assets. The research process moves through a series of sequentially related stages or phases – outlined in the chart below – in which information is produced, processed and shared. In recent years, the amount of information has been growing exponentially.

Research Data Lifecycle figure

Figure created by Vicky Steeves, New York University

The lines between circles represent the transitions that occur in research as work is finished and passed to the next stage. (Note: The term “data” here is used generally for any digital content that serves as raw material for research.) It is critical for the institution to support each stage of the lifecycle and to facilitate the transitions between stages for maximum use and impact.

NYU’s commitment to growth as a research institution, coupled with the growing data needs in the research process, drives the need to deliver strong services and support for storing and managing digital content within and between each stage of the research data lifecycle. By 2014, several internal and external changes had occurred that warranted taking a closer look at the existing support environment for research data within IT and the Libraries to assess the current state and evaluate how to improve it. These changes included:

 

  • Data sharing requirements for grant funding
    Increasing numbers of granting agencies now require investigators to share the primary data, samples, physical collections, and other supporting materials created or gathered in the course of funded research work.
  • Data explosion, need for safe backup
    With the huge increase in the amount of data used in research, large file storage and backup is a critical need.
  • Moving and sharing files
    As data files become larger and more widely used, researchers are spending more time managing files (moving and sharing), often at the expense of focusing on their subject work. Improved integrated tools are needed to make these tasks easier and efficient.
  • Upgrades needed for existing repository services
    Existing NYU repository services (e.g. Faculty Digital Archive and Spatial Data Repository) provide valuable functionality for preserving, sharing, and citing digital assets and for making them discoverable. However, the technology design of the current services cannot accommodate the expanding requirements for preserving and sharing digital content.
  • Growth in Libraries’ digital collecting
    The Libraries’ digital collection is growing at a rapid rate. Meanwhile, scholars’ expectations for gaining access to digital materials quickly and easily are increasing.

Rather than addressing these issues individually, Libraries and IT teamed up to take a holistic view of the research data lifecycle in order to consider services and environments that are interconnected and would benefit everyone involved in these facets of research including librarians, professional support staff, technical staff and the researchers themselves.

 

Sounds Good, But How Do We Create That?

This project proposes the creation of an integrated model for digital repository services.  In addition to creating a scalable foundation, the integrated model builds on existing strengths in our working relationship between Libraries and IT and offers the most robust service model.  The core of this project is building a shared portfolio of four research-related storage service bands that are distinct from a technical perspective but will satisfy the needs of all dimensions of the research data lifecycle seamlessly for users. One or more of these four bands may encompass multiple end-user services, but they will be unified internally, i.e. each band will share an overarching design and will use a shared foundational infrastructure where possible.

Additionally the four service bands will be planned so they are interconnected from an architectural and support perspective. The end-user services within the four bands will be utilized by library curatorial staff in building collections as well as by researchers throughout all stages of their research process.

It is these services that will collectively be referred to as the Digital Repository Services for Research, or DRSR.

The four service bands will provide:

  1. Temporary storage for data analysis requiring very fast access to data, including large files
  2. A storage environment designed for ongoing activities
  3. A feature-rich publication environment with preservation and curation options available
  4. An environment for working with high security restricted data.

Where the Rubber Meets the Road: Working Groups

To develop these four service bands, we created four working groups.  These groups will collaborate on determining the portfolio of end-user services that rely on each of Service Bands 1-4, and define the underlying infrastructure model for the bands. They will also deliver the following:

  • Architectural  Working Group:  determine the design of each of the four service bands plus the overarching design and interconnectedness.
  • Technical Working Group: make software and technology recommendations for fulfilling architectural design group recommendations.
  • Functional Validation & Prioritization Working Group:  represent user needs to development teams, elaborate on functional specs, finalize requirements, recommend priorities, prepare for transition to live service.
  • Policy Working Group:  identify policy issues and needs around each of the services. (Note: this work may feed additional requirements to the Functional Validation and Prioritization group.)

For more information on all the teams associated with the DRSR project, see Meet the Teams.

Welcome to Our Project Updates Site

“A great research university produces, preserves, and transmits new ideas, insights, and knowledge. Its basic research activities promote and nurture scientific progress, develop artistic and creative expression, and sustain an informed democratic society and its political life.” – NYU Framework 2031

This site provides information for members of NYU IT and the NYU Division of Libraries regarding the joint Digital Repository Services for Research (DRSR) project.  Here you will find background information on this project, details of the project’s working groups, and regular updates on DRSR progress.