Using Research Workspace for Data Transfer

When we launched Research Workspace one year ago, we promoted the service as a place where NYU researchers can store and share their work. As early adopters of the service, Digital Library Technology Services (DLTS) staff have incorporated Research Workspace into our production workflows in order to transfer data between different repositories and from our repositories to end users. Below, we describe two examples of this transfer usage.

Research Workspace as a dropbox for submissions to the Faculty Digital Archive
Researchers from Courant Institute share large data sets produced on NYU’s High Performance Computing cluster through the Faculty Digital Archive (FDA), NYU’s institutional repository, managed by DLTS. The standard way to upload material involves manually downloading data from HPC servers and then manually uploading it to the FDA. This can be a cumbersome process for data sets which contain many files and are large in total size. A recent data set shared through the FDA contained over one million individual files. With Research Workspace, we are able to simplify this process. Research Workspace can be mounted on both the HPC transfer nodes and on our DLTS server. The research team from Courant moves their data and metadata into the Research Workspace mounted on the HPC cluster, and it is automatically packaged and transferred into their collection on the FDA. Currently, we have 80G dataset ready for upload.

This same mechanism could be used by other collection administrators who make regular deposits into the FDA. Individual admins could mount Research Workspace on a local computer instead of HPC transfer nodes and submit materials for the FDA.

Research Workspace for transferring digital materials to patrons
The DLTS preservation repository contains digital special collections materials which are available for use by researchers in the reading room. This material is transferred from the preservation repository to researchers upon their request. In order to perform these transfers, we mount Research Workspace on a DLTS server. The repository administrator copies collection materials directly into it. Staff then mount Research Workspace on a secure dedicated computer in the reading room and transfer these materials for patron use. With this new approach, we are able to save time and eliminate complexities caused by moving requested materials on external hard drives.

These examples demonstrate the value of mounting Research Workspace in a variety of environments in order to transfer data. The Research Workspace service team is available to assist others with similar use cases. 

Click here to learn more about this service and to request a share.

EARLY WINS FOR NEW RESEARCH WORKSPACE SERVICE

The research data lifecycle
Vicky Steeves, New York University

Now that we’re up and running in the new dynamic mountable storage service called Research Workspace (also known internally as the “SB2” service), we’ve had a couple of great early successes in using the service.

Don Mennerich in the Digital Library Technology Services and Archival Collections Management group in the library has noted how it’s already improved the technical work environment. In their case, the infrastructure has allowed for the creation of new workflows that more closely model the traditional processes of the transfer and accessioning of born-digital archival collections into NYU Libraries’ Special Collections. Previous to this storage being available, the process consisted of a series of manual hops of unique collection materials between various storage media and the preservation repository. The advantages afforded by this new infrastructure are many: granular permissions, system security and the bit-level preservation of collection materials prior to deposit into the preservation repository, and they will provide the Special Collections with the flexible and scalable infrastructure required for its increasing collection of born-digital archives.

In another library-collections success story, we’ve already been able to take advantage of this infrastructure to meet an emerging need for scholars to access larger data objects. The Libraries recently acquired a large set of “text-as-data” materials drawn from the Proquest Historical News and Congressional Record products. This 8TB corpus of historical sources provides researchers with direct machine access to scanned/OCR’ed .xml data that is highly valuable for machine learning, natural language processing, and other applications. Rather than providing this data only for individual download and analysis, using the RWS environment, we can now provide access to that data as a share to interested users, thereby eliminating the resource-intensive process of retrieval.

These new use cases nicely illustrate how having solid infrastructure available to us can open doors to new services and processes.

Research Workspace Goes Live

Research Workspace gives you storage, just like a folder on your desktop
Research Workspace helps you manage and share

After completing an extended pilot phase between fall 2017 and spring 2018, the DRSR mountable storage service known as Research Workspace became a full service for researchers on June 15. As those familiar with the DRSR project know, this service has been a long-anticipated addition to the suite of storage services available at NYU. Research Workspace consists of mountable storage designed to be accessed by a researcher on any of his or her local computers (or by way of a High Performance Computing account) to provide fast, reliable storage for management of research data. Researchers are able to request 2 TB of storage, with more capacity available for specialized projects requiring more significant storage in the near-to-medium term. Research Workspace is primarily aimed at users for whom cloud storage like NYU Box or NYU Drive, and local storage options such external hard drives and standalone servers, do not provide adequate performance in transfer of data or the type of robust backup and snapshots that is needed at scale for these particular projects.

Powering through the final phase of getting the full Research Workspace service off the ground was a truly collaborative effort encompassing members of NYU IT, Digital Library Technology Services, and Data Services. May and early June in particular involved a sprint to wrap up knowledge base articles, complete service level agreements, test provisioning systems, ensure cross-platform compatibility, and configure backup systems. Research Workspace joins a large ecosystem of NYU IT technologies, and it has been a rewarding experience for partners in NYU Libraries to learn about all the moving parts of standing up a new service. This collaboration between NYU IT technologists and NYU Libraries allows us to go further than we could independently, and hopefully make for a dependable and helpful resource.

Information on Research Workspace is available on its service home page at https://www.nyu.edu/life/information-technology/research-and-data-support/research-workspace.html. NYU Library’s Data Management team (https://guides.nyu.edu/data_management) will be incorporating it into its list of recommended services that it educates researchers about in its classes and consultations. And for researchers, it is hoped that managing the data storage needs of ongoing and future projects will be that much easier with Research Workspace in production.   

–Nicholas Wolf

The Project Formerly-Formerly Known as DRSR

“What’s in a name?  That which we call a rose by any other word would smell as sweet.”  –William Shakespeare, Romeo and Juliet

As all of you know we’ve gone through several iterations of names for this project.  Our working title started out as “Digital Repository Services for Research,” but we always knew it wasn’t a name that we wanted to use for our end-users of the services we’re building and linking.  So we moved on to “Research Cloud Services,” something we felt might work better as a name for public consumption.  But that name never took off within Libraries or IT, and most of us went back to calling it DRSR or “DRSR or whatever we’re calling it these days.”  Some branding work on the IT-side of our project helped us see that we don’t need to name the suite of services for our users– they’ll be more concerned with the individual services than with how all of the services work together.  But we all need to understand these services as parts of a larger whole so that we might be better able to help researchers throughout the research lifecycle, so internally and when/if needed we’re back to DRSR for this project.  No more name changes.  We promise.

Fall 2017 Update on Research Storage Project

STATUS REPORT: Maturing Research Data Storage Infrastructure Services 
High quality infrastructure services enable scholars to focus on the research work itself

NYU’s commitment to growth as a research institution coupled with the growing data needs in the research process drives the need to deliver strong services and support for storing and managing digital content within and between each stage of the Research Data Lifecycle.

The Research Data Lifecycle is a series of sequentially related stages or phases in which information is produced, processed and shared. In recent years, the amount of information has been growing exponentially.

The lines between circles represent the transitions that occur in research as work is finished and passed to the next stage. It is critical for a research institution such as NYU to support each stage of the lifecycle and to facilitate the smooth transition between stages for maximum use and impact.

In 2014, NYU IT and NYU Libraries teamed up to take a holistic view of the Research Data Lifecycle in order to consider services and environments that are interconnected and that would benefit everyone involved in these facets of research.

Last year we began planning for new infrastructure to support researchers’ needs. In particular, we delineated two areas of focus, Both of which would likely be comprised of multiple components, applications, or layers:

  1. A dynamic storage environment that allows for easy and fast access, sharing, and workflow management tasks.
  2. A publication environment that allows for deposit of finished digital content with persistent links, preservation, and discovery and access controls.

In Fall 2016 we began designing and building out pilots for both these areas. To facilitate that work, we brought on board storage architecture specialists, research data management librarians, application developers, and repository specialists from across our two organizations.

This year, to address dynamic workspace needs, we put into production the Open Science Framework for Institutions (OSF) software which allows for easy groups and research workflow management and connects to existing NYU storage options like NYU Drive (Google) and NYU Box, and to many other standard storage platforms.

To enhance underlying infrastructure options, we are now piloting a large storage system that is mountable from computers on the NYU network. This storage system allows researchers to access, manipulate, and analyze data from a large external drive with performance similar to their own desktop. The soft launch of the production version is expected to roll-out in Fall 2017 on new robust NetApp hardware. Researchers will get free access to 2TB of space, with a competitive cost structure for purchasing more. In addition, development will continue on integrating OSF with our new storage hardware, on creating more web accessibility for the storage, and on creating more and better publication repository options.

For the publication environment, we have made improvements to existing repository offerings like the Faculty Digital Archive, enhancing the user interface and metadata authoring capabilities. We have also used the parallel development of the new Spatial Data Repository as a use case for future development of other static preservation needs. Lastly, the OSF is giving users new ways to publish their work out by minting static Digital Object Identifiers (DOIs) at a project’s end.

Mountable Storage Pilot: First Impressions

Background

Our pilot group has started interacting with the new “mountable storage” element of our Service Band 2 infrastructure. As you’ll remember, this storage is meant to provide remotely accessible, fast-as-desktop storage for users and their in-process work. Users of this storage interact with it much as one does with a drive on one’s own computer: it appears as an extra drive in your menus, and you can easily move files in and out as desired, as well as share access to your drive with others, and even connect to it via existing NYU computing environments like HPC. In short,  It’s speedy, reliable, shareable within teams, and centrally managed, all things that our researchers tell us they require for their work. Our pilot group members — 8-10 researchers across a wide spectrum of disciplines — now have access to the beta version of this resource, and are giving us feedback about it to help us develop a production version, hopefully for fall 2017.

Early Feedback

One of our users has reported that though file copying was a bit slow, once files are there, accessing and using these files in computing applications has been quite smooth. We’re hoping that the production version will have increased speed and capacity, and understanding how users workflows are structured will help us develop the right sorts of tools to make these kinds of tasks run faster.

With the semester winding down, other testers reported their excitement at putting the storage through its paces. One researcher expressed satisfaction at having this storage directly connected to HPC, while others are just interested in seeing how it will interact with and impact their ongoing work, in everything from computation to publication and visualization applications.

We’ve also been investigating the desire among some testers for parallel, remote access to the storage, possibly using a tool like Pydio that would provide a Graphical User Interface (GUI) . This would allow for access to the drive in other kinds of ways, and might open up development opportunities that would interface with other parts of the Research Cloud Services portfolio, like OSF for Institutions, or our Faculty Digital Archive.

We look forward to continuing to work with our early pilot users over the summer and see how we can build the best tool possible to meet researcher needs. Stay tuned!

 

Mountable Storage Pilot Under Way

You may recall that when we started this project we laid out four service bands.  One of those bands, Service Band 2, focused on our need to provide access to a storage environment designed for ongoing activities.  Our target users for such services are both researchers and library curators who need to store, organize, process, and analyze data and digital assets, sometimes in collaboration with colleagues.  Since no such services existed at NYU, we have made developing out Service Band 2 a Research Cloud Services project priority, and we’re delighted to announce that we’ve begun a pilot of a service that will service some of these needs.

One of the major Service Band 2-related needs that has surfaced for our researchers is the need for mountable storage.  While NYU Box and NYU Drive fulfill a lot of the requirements in this space, there’s still a need for storage for larger data sets and other types of materials that don’t easily work in these other solutions.  We’ve been hard at work to create a new service that allows researchers to store and share these files.  The new service, which we’re referring to as mountable storage (new name coming soon!), has officially started this month as a pilot, with 10 researchers using it and giving us feedback.

Special thanks to Dylan Simon, our Research Data Storage Architect, for his hard work getting this new service going.  We will share more information about this service as the pilot progresses.

We have a new name!

The DRSR project has a new name! It is Research Cloud Services. With this new name, we can begin marketing our services to the public. We are currently ironing out the details of branding, Web presence, and other aspects of the effort to open our doors to the public.

Under this name, we will include services such as the Faculty Digital Archive, Open Science Framework, HPC Archive, and the new mountable storage pilot currently underway.

DRSR at the February 2017 TorchTech Share Fair

At the TorchTech Share Fair in February, we had the opportunity to present an update on the DRSR project. Here is our poster, which describes the many services that are part of this effort. The fair was very well attended, with participation from across the University, and many visitors came by with questions about our services.

TorchTech ShareFair Poster 2017

 

DRSR Project Update for December 2016

The DRSR Project work continues to chug along!  This month our work is mostly focused in three areas: mountable storage, researcher workspaces, and the FDA (Faculty Digital Archive).

Mountable storage and researcher workspaces are both part of our Service Band 2 and can be thought of as “working storage.”  We’re working to get ready for a mountable storage pilot project, which will offer researchers:

  • Mountable network drives, available for researchers from their own computing environments
  • Fast, secure, shareable storage for in-process work, with some built-in applications
  • Connections to other parts of the storage environment, including HPC, workspaces, and publication environments.

You’ll be hearing more from us on this work in early Spring 2017!

Researcher workspaces, the second part of our current Service Band 2 work, will support individual and group work throughout research lifecycle and integrate with existing and future storage tools and applications.  To that end, we’ve launched an institutional version of the Open Science Framework (OSF), which is now available via NYU single sign-on (SSO).

The DRSR project doesn’t only create new services; it also works to improve and connect existing services like our Faculty Digital Archive.  The FDA is part of Service Band 3, our research publication environment, and it was recently upgraded to allow for easier use, uploads, and discoverability/visibility of scholars’ work on the open web.  With this upgrade, the FDA now follows best standards for bit-level preservation and is file agnostic. It also allows scholars to comply with publication and grantor requirements on openness, data release and management, and persistent linking.  When the DRSR project is complete, the FDA will connect to other parts of the storage environment, including working storage and workspaces.