EARLY WINS FOR NEW RESEARCH WORKSPACE SERVICE

The research data lifecycle
Vicky Steeves, New York University

Now that we’re up and running in the new dynamic mountable storage service called Research Workspace (also known internally as the “SB2” service), we’ve had a couple of great early successes in using the service.

Don Mennerich in the Digital Library Technology Services and Archival Collections Management group in the library has noted how it’s already improved the technical work environment. In their case, the infrastructure has allowed for the creation of new workflows that more closely model the traditional processes of the transfer and accessioning of born-digital archival collections into NYU Libraries’ Special Collections. Previous to this storage being available, the process consisted of a series of manual hops of unique collection materials between various storage media and the preservation repository. The advantages afforded by this new infrastructure are many: granular permissions, system security and the bit-level preservation of collection materials prior to deposit into the preservation repository, and they will provide the Special Collections with the flexible and scalable infrastructure required for its increasing collection of born-digital archives.

In another library-collections success story, we’ve already been able to take advantage of this infrastructure to meet an emerging need for scholars to access larger data objects. The Libraries recently acquired a large set of “text-as-data” materials drawn from the Proquest Historical News and Congressional Record products. This 8TB corpus of historical sources provides researchers with direct machine access to scanned/OCR’ed .xml data that is highly valuable for machine learning, natural language processing, and other applications. Rather than providing this data only for individual download and analysis, using the RWS environment, we can now provide access to that data as a share to interested users, thereby eliminating the resource-intensive process of retrieval.

These new use cases nicely illustrate how having solid infrastructure available to us can open doors to new services and processes.

Research Workspace Goes Live

Research Workspace gives you storage, just like a folder on your desktop
Research Workspace helps you manage and share

After completing an extended pilot phase between fall 2017 and spring 2018, the DRSR mountable storage service known as Research Workspace became a full service for researchers on June 15. As those familiar with the DRSR project know, this service has been a long-anticipated addition to the suite of storage services available at NYU. Research Workspace consists of mountable storage designed to be accessed by a researcher on any of his or her local computers (or by way of a High Performance Computing account) to provide fast, reliable storage for management of research data. Researchers are able to request 2 TB of storage, with more capacity available for specialized projects requiring more significant storage in the near-to-medium term. Research Workspace is primarily aimed at users for whom cloud storage like NYU Box or NYU Drive, and local storage options such external hard drives and standalone servers, do not provide adequate performance in transfer of data or the type of robust backup and snapshots that is needed at scale for these particular projects.

Powering through the final phase of getting the full Research Workspace service off the ground was a truly collaborative effort encompassing members of NYU IT, Digital Library Technology Services, and Data Services. May and early June in particular involved a sprint to wrap up knowledge base articles, complete service level agreements, test provisioning systems, ensure cross-platform compatibility, and configure backup systems. Research Workspace joins a large ecosystem of NYU IT technologies, and it has been a rewarding experience for partners in NYU Libraries to learn about all the moving parts of standing up a new service. This collaboration between NYU IT technologists and NYU Libraries allows us to go further than we could independently, and hopefully make for a dependable and helpful resource.

Information on Research Workspace is available on its service home page at https://www.nyu.edu/life/information-technology/research-and-data-support/research-workspace.html. NYU Library’s Data Management team (https://guides.nyu.edu/data_management) will be incorporating it into its list of recommended services that it educates researchers about in its classes and consultations. And for researchers, it is hoped that managing the data storage needs of ongoing and future projects will be that much easier with Research Workspace in production.   

–Nicholas Wolf

Fall 2017 Update on Research Storage Project

STATUS REPORT: Maturing Research Data Storage Infrastructure Services 
High quality infrastructure services enable scholars to focus on the research work itself

NYU’s commitment to growth as a research institution coupled with the growing data needs in the research process drives the need to deliver strong services and support for storing and managing digital content within and between each stage of the Research Data Lifecycle.

The Research Data Lifecycle is a series of sequentially related stages or phases in which information is produced, processed and shared. In recent years, the amount of information has been growing exponentially.

The lines between circles represent the transitions that occur in research as work is finished and passed to the next stage. It is critical for a research institution such as NYU to support each stage of the lifecycle and to facilitate the smooth transition between stages for maximum use and impact.

In 2014, NYU IT and NYU Libraries teamed up to take a holistic view of the Research Data Lifecycle in order to consider services and environments that are interconnected and that would benefit everyone involved in these facets of research.

Last year we began planning for new infrastructure to support researchers’ needs. In particular, we delineated two areas of focus, Both of which would likely be comprised of multiple components, applications, or layers:

  1. A dynamic storage environment that allows for easy and fast access, sharing, and workflow management tasks.
  2. A publication environment that allows for deposit of finished digital content with persistent links, preservation, and discovery and access controls.

In Fall 2016 we began designing and building out pilots for both these areas. To facilitate that work, we brought on board storage architecture specialists, research data management librarians, application developers, and repository specialists from across our two organizations.

This year, to address dynamic workspace needs, we put into production the Open Science Framework for Institutions (OSF) software which allows for easy groups and research workflow management and connects to existing NYU storage options like NYU Drive (Google) and NYU Box, and to many other standard storage platforms.

To enhance underlying infrastructure options, we are now piloting a large storage system that is mountable from computers on the NYU network. This storage system allows researchers to access, manipulate, and analyze data from a large external drive with performance similar to their own desktop. The soft launch of the production version is expected to roll-out in Fall 2017 on new robust NetApp hardware. Researchers will get free access to 2TB of space, with a competitive cost structure for purchasing more. In addition, development will continue on integrating OSF with our new storage hardware, on creating more web accessibility for the storage, and on creating more and better publication repository options.

For the publication environment, we have made improvements to existing repository offerings like the Faculty Digital Archive, enhancing the user interface and metadata authoring capabilities. We have also used the parallel development of the new Spatial Data Repository as a use case for future development of other static preservation needs. Lastly, the OSF is giving users new ways to publish their work out by minting static Digital Object Identifiers (DOIs) at a project’s end.

Mountable Storage Pilot: First Impressions

Background

Our pilot group has started interacting with the new “mountable storage” element of our Service Band 2 infrastructure. As you’ll remember, this storage is meant to provide remotely accessible, fast-as-desktop storage for users and their in-process work. Users of this storage interact with it much as one does with a drive on one’s own computer: it appears as an extra drive in your menus, and you can easily move files in and out as desired, as well as share access to your drive with others, and even connect to it via existing NYU computing environments like HPC. In short,  It’s speedy, reliable, shareable within teams, and centrally managed, all things that our researchers tell us they require for their work. Our pilot group members — 8-10 researchers across a wide spectrum of disciplines — now have access to the beta version of this resource, and are giving us feedback about it to help us develop a production version, hopefully for fall 2017.

Early Feedback

One of our users has reported that though file copying was a bit slow, once files are there, accessing and using these files in computing applications has been quite smooth. We’re hoping that the production version will have increased speed and capacity, and understanding how users workflows are structured will help us develop the right sorts of tools to make these kinds of tasks run faster.

With the semester winding down, other testers reported their excitement at putting the storage through its paces. One researcher expressed satisfaction at having this storage directly connected to HPC, while others are just interested in seeing how it will interact with and impact their ongoing work, in everything from computation to publication and visualization applications.

We’ve also been investigating the desire among some testers for parallel, remote access to the storage, possibly using a tool like Pydio that would provide a Graphical User Interface (GUI) . This would allow for access to the drive in other kinds of ways, and might open up development opportunities that would interface with other parts of the Research Cloud Services portfolio, like OSF for Institutions, or our Faculty Digital Archive.

We look forward to continuing to work with our early pilot users over the summer and see how we can build the best tool possible to meet researcher needs. Stay tuned!

 

We have a new name!

The DRSR project has a new name! It is Research Cloud Services. With this new name, we can begin marketing our services to the public. We are currently ironing out the details of branding, Web presence, and other aspects of the effort to open our doors to the public.

Under this name, we will include services such as the Faculty Digital Archive, Open Science Framework, HPC Archive, and the new mountable storage pilot currently underway.

DRSR at the February 2017 TorchTech Share Fair

At the TorchTech Share Fair in February, we had the opportunity to present an update on the DRSR project. Here is our poster, which describes the many services that are part of this effort. The fair was very well attended, with participation from across the University, and many visitors came by with questions about our services.

TorchTech ShareFair Poster 2017

 

DRSR Project Update for December 2016

The DRSR Project work continues to chug along!  This month our work is mostly focused in three areas: mountable storage, researcher workspaces, and the FDA (Faculty Digital Archive).

Mountable storage and researcher workspaces are both part of our Service Band 2 and can be thought of as “working storage.”  We’re working to get ready for a mountable storage pilot project, which will offer researchers:

  • Mountable network drives, available for researchers from their own computing environments
  • Fast, secure, shareable storage for in-process work, with some built-in applications
  • Connections to other parts of the storage environment, including HPC, workspaces, and publication environments.

You’ll be hearing more from us on this work in early Spring 2017!

Researcher workspaces, the second part of our current Service Band 2 work, will support individual and group work throughout research lifecycle and integrate with existing and future storage tools and applications.  To that end, we’ve launched an institutional version of the Open Science Framework (OSF), which is now available via NYU single sign-on (SSO).

The DRSR project doesn’t only create new services; it also works to improve and connect existing services like our Faculty Digital Archive.  The FDA is part of Service Band 3, our research publication environment, and it was recently upgraded to allow for easier use, uploads, and discoverability/visibility of scholars’ work on the open web.  With this upgrade, the FDA now follows best standards for bit-level preservation and is file agnostic. It also allows scholars to comply with publication and grantor requirements on openness, data release and management, and persistent linking.  When the DRSR project is complete, the FDA will connect to other parts of the storage environment, including working storage and workspaces.

Groups Management Project

The goal of the Groups Management project is to evolve and extend NYU IT’s central group management platform, enabling wide use of self-service and official Univ-wide groups, shared across systems and applications.

Through this project, faculty, students, and staff will have the ability to create and manage collaborations and shared materials in support of their research, projects, and other activities.

Key desired capabilities include support for user-managed groups, automated provisioning and deprovisioning, administrative functions, and dissemination of standardized approaches and policies for integrating group data with many types of applications and storage systems. For DRSR, this means that researchers will be able to create and self manage groups to enable sharing files and data set.

DRSR begins the implementation phase

Over the summer, Dylan Simon joined the project as Research Data Storage Architect, bringing with him both post-doctoral neuroscience research and storage industry software development experience and having designed and built one data repository here at NYU already.  He will drive new technology offerings and continue design of new features based on the architecture and functional guidance provided by researchers, librarians, and technology specialists throughout this year.

One of Dylan’s first areas of emphasis is addressing the research activities we identified in Service Band 2, where there are currently the most unmet needs across the University.  Through the DRSR project, NYU hopes to offer a new research file storage service that will provide a secure, network-based collaboration space for active research data, accessible directly from any application on OSX, Windows, or Linux.  By consolidating many existing lab- and department-based file servers, hard disks, and other shared and personal storage, this new file storage service will reduce the risk, hassle, and cost of managing many types of research data.  It will also serve as a complement to storage services offered in Service Band 3, such as the Faculty Digital Archive, the RSTAR preservation repository, and the Spatial Data Repository, that offer additional preservation and curation options. Future offerings will aim to make it easier to share, manage, and follow these data throughout the research data lifecycle.

Upcoming DRSR Brown Bag Lunch

Please mark your calendars for an upcoming brown bag lunch update and discussion of the DRSR project.  The brown bag lunch will be Thursday, November 17th from 12:00-1:30pm in the Avery Room (Bobst Library, 2nd floor in the Avery Fisher Center.)

DRSR is a joint Libraries-IT project that is working to create central repository services that address the needs of NYU researchers and librarians with with respect to the research data lifecycle.  The project has been underway for just over a year, and you may have heard updates about the project at several meetings in the spring or at a Digital Infrastructure Roadshow over the summer.  We thought a fall update with an informal setting that allows lots of questions and answers might be a good idea!  

We’ll provide soft drinks and cookies to complement your brown bag lunch.