EARLY WINS FOR NEW RESEARCH WORKSPACE SERVICE – Digital Repository Services for Research Project Updates

The research data lifecycle — Vicky Steeves, New York University

Now that we’re up and running in the new dynamic mountable storage service called Research Workspace (also known internally as the “SB2” service), we’ve had a couple of great early successes in using the service.

Don Mennerich in the Digital Library Technology Services and Archival Collections Management group in the library has noted how it’s already improved the technical work environment. In their case, the infrastructure has allowed for the creation of new workflows that more closely model the traditional processes of the transfer and accessioning of born-digital archival collections into NYU Libraries’ Special Collections. Previous to this storage being available, the process consisted of a series of manual hops of unique collection materials between various storage media and the preservation repository. The advantages afforded by this new infrastructure are many: granular permissions, system security and the bit-level preservation of collection materials prior to deposit into the preservation repository, and they will provide the Special Collections with the flexible and scalable infrastructure required for its increasing collection of born-digital archives.

In another library-collections success story, we’ve already been able to take advantage of this infrastructure to meet an emerging need for scholars to access larger data objects. The Libraries recently acquired a large set of “text-as-data” materials drawn from the Proquest Historical News and Congressional Record products. This 8TB corpus of historical sources provides researchers with direct machine access to scanned/OCR’ed .xml data that is highly valuable for machine learning, natural language processing, and other applications. Rather than providing this data only for individual download and analysis, using the RWS environment, we can now provide access to that data as a share to interested users, thereby eliminating the resource-intensive process of retrieval.

These new use cases nicely illustrate how having solid infrastructure available to us can open doors to new services and processes.

Leave a Reply Cancel reply