The research data lifecycle
Vicky Steeves, New York University

Now that we’re up and running in the new dynamic mountable storage service called Research Workspace (also known internally as the “SB2” service), we’ve had a couple of great early successes in using the service.

Don Mennerich in the Digital Library Technology Services and Archival Collections Management group in the library has noted how it’s already improved the technical work environment. In their case, the infrastructure has allowed for the creation of new workflows that more closely model the traditional processes of the transfer and accessioning of born-digital archival collections into NYU Libraries’ Special Collections. Previous to this storage being available, the process consisted of a series of manual hops of unique collection materials between various storage media and the preservation repository. The advantages afforded by this new infrastructure are many: granular permissions, system security and the bit-level preservation of collection materials prior to deposit into the preservation repository, and they will provide the Special Collections with the flexible and scalable infrastructure required for its increasing collection of born-digital archives.

In another library-collections success story, we’ve already been able to take advantage of this infrastructure to meet an emerging need for scholars to access larger data objects. The Libraries recently acquired a large set of “text-as-data” materials drawn from the Proquest Historical News and Congressional Record products. This 8TB corpus of historical sources provides researchers with direct machine access to scanned/OCR’ed .xml data that is highly valuable for machine learning, natural language processing, and other applications. Rather than providing this data only for individual download and analysis, using the RWS environment, we can now provide access to that data as a share to interested users, thereby eliminating the resource-intensive process of retrieval.

These new use cases nicely illustrate how having solid infrastructure available to us can open doors to new services and processes.

Research Workspace Goes Live

Research Workspace gives you storage, just like a folder on your desktop
Research Workspace helps you manage and share

After completing an extended pilot phase between fall 2017 and spring 2018, the DRSR mountable storage service known as Research Workspace became a full service for researchers on June 15. As those familiar with the DRSR project know, this service has been a long-anticipated addition to the suite of storage services available at NYU. Research Workspace consists of mountable storage designed to be accessed by a researcher on any of his or her local computers (or by way of a High Performance Computing account) to provide fast, reliable storage for management of research data. Researchers are able to request 2 TB of storage, with more capacity available for specialized projects requiring more significant storage in the near-to-medium term. Research Workspace is primarily aimed at users for whom cloud storage like NYU Box or NYU Drive, and local storage options such external hard drives and standalone servers, do not provide adequate performance in transfer of data or the type of robust backup and snapshots that is needed at scale for these particular projects.

Powering through the final phase of getting the full Research Workspace service off the ground was a truly collaborative effort encompassing members of NYU IT, Digital Library Technology Services, and Data Services. May and early June in particular involved a sprint to wrap up knowledge base articles, complete service level agreements, test provisioning systems, ensure cross-platform compatibility, and configure backup systems. Research Workspace joins a large ecosystem of NYU IT technologies, and it has been a rewarding experience for partners in NYU Libraries to learn about all the moving parts of standing up a new service. This collaboration between NYU IT technologists and NYU Libraries allows us to go further than we could independently, and hopefully make for a dependable and helpful resource.

Information on Research Workspace is available on its service home page at https://www.nyu.edu/life/information-technology/research-and-data-support/research-workspace.html. NYU Library’s Data Management team (https://guides.nyu.edu/data_management) will be incorporating it into its list of recommended services that it educates researchers about in its classes and consultations. And for researchers, it is hoped that managing the data storage needs of ongoing and future projects will be that much easier with Research Workspace in production.   

–Nicholas Wolf

Fall 2017 Update on Research Storage Project

STATUS REPORT: Maturing Research Data Storage Infrastructure Services 
High quality infrastructure services enable scholars to focus on the research work itself

NYU’s commitment to growth as a research institution coupled with the growing data needs in the research process drives the need to deliver strong services and support for storing and managing digital content within and between each stage of the Research Data Lifecycle.

The Research Data Lifecycle is a series of sequentially related stages or phases in which information is produced, processed and shared. In recent years, the amount of information has been growing exponentially.

The lines between circles represent the transitions that occur in research as work is finished and passed to the next stage. It is critical for a research institution such as NYU to support each stage of the lifecycle and to facilitate the smooth transition between stages for maximum use and impact.

In 2014, NYU IT and NYU Libraries teamed up to take a holistic view of the Research Data Lifecycle in order to consider services and environments that are interconnected and that would benefit everyone involved in these facets of research.

Last year we began planning for new infrastructure to support researchers’ needs. In particular, we delineated two areas of focus, Both of which would likely be comprised of multiple components, applications, or layers:

  1. A dynamic storage environment that allows for easy and fast access, sharing, and workflow management tasks.
  2. A publication environment that allows for deposit of finished digital content with persistent links, preservation, and discovery and access controls.

In Fall 2016 we began designing and building out pilots for both these areas. To facilitate that work, we brought on board storage architecture specialists, research data management librarians, application developers, and repository specialists from across our two organizations.

This year, to address dynamic workspace needs, we put into production the Open Science Framework for Institutions (OSF) software which allows for easy groups and research workflow management and connects to existing NYU storage options like NYU Drive (Google) and NYU Box, and to many other standard storage platforms.

To enhance underlying infrastructure options, we are now piloting a large storage system that is mountable from computers on the NYU network. This storage system allows researchers to access, manipulate, and analyze data from a large external drive with performance similar to their own desktop. The soft launch of the production version is expected to roll-out in Fall 2017 on new robust NetApp hardware. Researchers will get free access to 2TB of space, with a competitive cost structure for purchasing more. In addition, development will continue on integrating OSF with our new storage hardware, on creating more web accessibility for the storage, and on creating more and better publication repository options.

For the publication environment, we have made improvements to existing repository offerings like the Faculty Digital Archive, enhancing the user interface and metadata authoring capabilities. We have also used the parallel development of the new Spatial Data Repository as a use case for future development of other static preservation needs. Lastly, the OSF is giving users new ways to publish their work out by minting static Digital Object Identifiers (DOIs) at a project’s end.

Mountable Storage Pilot: First Impressions


Our pilot group has started interacting with the new “mountable storage” element of our Service Band 2 infrastructure. As you’ll remember, this storage is meant to provide remotely accessible, fast-as-desktop storage for users and their in-process work. Users of this storage interact with it much as one does with a drive on one’s own computer: it appears as an extra drive in your menus, and you can easily move files in and out as desired, as well as share access to your drive with others, and even connect to it via existing NYU computing environments like HPC. In short,  It’s speedy, reliable, shareable within teams, and centrally managed, all things that our researchers tell us they require for their work. Our pilot group members — 8-10 researchers across a wide spectrum of disciplines — now have access to the beta version of this resource, and are giving us feedback about it to help us develop a production version, hopefully for fall 2017.

Early Feedback

One of our users has reported that though file copying was a bit slow, once files are there, accessing and using these files in computing applications has been quite smooth. We’re hoping that the production version will have increased speed and capacity, and understanding how users workflows are structured will help us develop the right sorts of tools to make these kinds of tasks run faster.

With the semester winding down, other testers reported their excitement at putting the storage through its paces. One researcher expressed satisfaction at having this storage directly connected to HPC, while others are just interested in seeing how it will interact with and impact their ongoing work, in everything from computation to publication and visualization applications.

We’ve also been investigating the desire among some testers for parallel, remote access to the storage, possibly using a tool like Pydio that would provide a Graphical User Interface (GUI) . This would allow for access to the drive in other kinds of ways, and might open up development opportunities that would interface with other parts of the Research Cloud Services portfolio, like OSF for Institutions, or our Faculty Digital Archive.

We look forward to continuing to work with our early pilot users over the summer and see how we can build the best tool possible to meet researcher needs. Stay tuned!


A new partnership between the Center for Open Science (COS) and NYU


The Center for Open Science and NYU’s Data Services are excited to announce the launch of OSF for Institutions at NYU. The Center for Open Science is a non-profit technology company that provides free and open-source services to increase inclusivity and transparency of research. osfdiagram Researchers who rely on the Open Science Framework (OSF) to manage their project workflows, build research transparency, encourage project visibility, and collaborate with other scholars are eligible to join an enhanced OSF for Institutions at New York University. Current users can now affiliate their OSF projects with NYU’s institutional membership, joining the growing OSF community at the University while continuing to enjoy all the benefits and functionality of the framework. Doing so will not require changing any of of your projects, public or private. To affiliate your existing OSF account with NYU, you will need to:

  1. Add your NetID@nyu.edu e-mail address to your profile as a primary or secondary e-mail if it isn’t one of your account e-mails currently.
  2. Login via the OSF Single Sign On link
  3. Go to your public project, go to the the “Settings” tab and look for “Project Affiliation/Branding” to list your project under NYU’s research page.

New users can now log in through NYU’s single sign on, using their netIDs and passwords, and have a new OSF account created. Through this partnership, users can leverage:

  • Easy project management from the inception of a project to it’s publication!
  • The OSF connects to many services that you might already use — Google Drive, GitHub, Zotero, etc. — so you can have all your project materials in one place!
  • Benefit from easier connections to NYU resources, storage apps, and research support made available through our institutional account.
  • Increase the discoverability of your research among local scholars.

For an overview of the OSF, see these materials created by Data Service’s Vicky Steeves (data management & reproducibility librarian). Any questions or for a simple walk-through of how to get started, email data.service@nyu.edu.

See the original post and more at the Data Dispatch.