Audio collections are coming!

Director Robert Flaherty. Credit: Sense of CinemaDLTS published our first full audio collection, Fales’s Robert Flaherty Film Seminar Archive, earlier this year. We’ve been hard at work on the next several collections, including a re-publication of Voices from the Food Revolution that allows HTML5 streaming (and removes the need for Flash). Each collection has proven to have its challenges, but our workflow is improving, and we should be publishing hundreds of hours of audio very soon.

Born-Digital workflows and the Jeremy Blake Papers

(this post is a shortened version of the originally published NDSR blog post. This is the first in a series of updates by NYU Libraries Fellow Julia Kim)

While the Digital Library and Technical Services department has long worked to digitize invaluable materials, my post will introduce my National Digital Stewardship Residency‘s task to create access-driven workflows for the handling of complex, born-digital media. My work, then, does not stop at ingest but must account for researcher access. Collections can range in size from 30 MB on 2 floppy disks to multiple terabytes from an institution’s RAID.  Collection content may comprise simple .txt and ubiquitous .doc files or, as is the case of material collected from computer hard drives, may hold hundreds of unique and proprietary file types. Further complicating the task of workflow creation, collections of born-digital media often present thorny privacy and intellectual property issues, especially with regard to identity-specific (ex: social security) information which is generally considered off-limits in areas of public access.

At this point in the fellowship, I have conducted preliminary surveys of several small collections  with relatively simple image, text, moving image, and sound file formats. Through focusing on accessibility with these smaller collections first, I’ll develop a workflow that encompasses disparate collection characteristics. These initial efforts will help me to formulate a workflow as I approach two large, incredibly complex collections: the Jeremy Blake Papers and the Exit Art Collection.  I’ll spend the rest of this post discussing the Blake Papers.

Jeremy Blake (1971-2007) was an American digital artist best known for his “time-based paintings” and his innovations in new media. The Winchester trilogy exemplifies his methodology, which transversed myriad artistic practices: here, he combined 8mm film, vector graphics, and hand-painted imagery to create distinctive color-drenched, even hallucinatory, atmospheric works.  Blake cemented his reputation as a gifted artist with his early artistic and commercial successes, such as his consecutive Whitney Biennial entries (2000–2004, inclusive) and his animated sequences in P.T. Anderson’s Punch Drunk Love (2002).

The Jeremy Blake Papers include over 340 pieces of legacy media physical formats that span optical media, short-lived Zip and Jaz disks, digital linear tape cartridges, and multiple duplicative hard drives.  Much of what we recovered seemed to be a carefully kept personal working archive of drafts, digitized and digital source images, and various backups in multiple formats, both for himself and for exhibition. While the content was often bundled into stacks by artwork title (as exhibited), knowing that multiple individuals had already combed through the archive before and after acquisition of the material make any certainty as to provenance and dating impossible for now.

Through the work I’ll be doing over the course of this fellowship (stay tuned), researchers will be able to explore Blake’s work process, the software tools he used, and the different digital drafts of his moving images. Processing the Jeremy Blake Papers will necessitate exploration of the problems inherent in the treatment of digital materials.  Are emails, with their ease of transmission and seeming immateriality, actually analogous to the paper-based drafts and correspondences in the types of archives we have been processing for years? Or are we newly faced with the transition to a medium that requires seriously rethinking our understandings and retooling of our policy procedures to protect privacy and prevent future vulnerability?  While we haven’t explicitly addressed the issue yet, these are some of the bigger questions that our field will need to explore as we balance our obligations to donors as well as future researchers.

Examples of Jeremy Blake media.
Examples of media from the Jeremy Blake Papers (note the optical media that looks like vinyl).

At this point, Blake’s collections have been previewed, preliminarily processed, and arranged through Access Data’s FTK software. This is a powerful but expensive software program that can make an archivist’s task-—to dynamically sift through vast quantities of digital materials—even plausible as a 9-month project. While my mentor, Don Mennerich, and I manage the imaging and processing, we’ve also starting discussing what access types might look like. This necessitates discussions with representatives from all three of NYU’s archival bodies (Fales, University Archives, and Tamiment), as well as the head of its new (trans-archive) processing department, the Archival Collections Management Department. In our inaugural meeting last week, we discussed making a very small (30 MB) collection accessible to researchers in the very near future as a test case for providing access to some of our larger collections.

More specifically, we have also set up hardware and software formulations that may help us to understand Blake’s artistic output. In the past two weeks, for example, Don has identified the various Adobe Photoshop versions that Blake used through viewing the files through the hex (hexadecimal of the binary). We have sought out those obsolete versions of Adobe Photoshop, and my office area is now crowded with different computers configured to read materials from software versions common to Blake’s most active years of artistic production. Redundancy isn’t just conducive to preventing data loss, however. We still need multiple methods with which to view and assess Blake’s working files. In addition to using multiple operating systems, write-blockers, imaging techniques, and programs, I spent several days installing emulators on our contemporary computers. After imaging material, we’ll start systematically accessing outdated Photoshop files via these older environments, both emulated and actual.

Hex editor view used to help identify software versions used.
Hex editor view used to help identify software versions used (extra points if you recognize what Blake piece this file is from).

In the meantime, I still need to make a number of decisions and the workflow is still very much a work in progress! This underpins a larger point: This fellowship necessitates documentation to address gaps like these. That is, while there are concrete deliverables for each phase of the project, in order to deliver I’ll need to understand and investigate intricacies in the overall digital preservation strategy here at NYU. While working with very special collections like the Jeremy Blake Papers is a great opportunity, it’s also great that the questions we address will be useful at our host sites for many other projects down the line.

CNI fall 2014 Dispatch: It’s all about relationships

Image by Elco van Staveren used under a CC BY-SA 2.0 license.

[Author’s note: this post was also published on my personal website and is the second of two posts there about CNI.]

At CNI 2014, I decided to pick a topic and focus, rather than graze on all the various and interesting issues being discussed. So I attended all the sessions that had to do with linked data plus related technologies and standards.

Why linked data? It has the potential to radically change the way that library and other research data can be consumed, repurposed, discovered, and displayed on the web, through search engines, and in specialized catalogs and websites. But until yesterday, I didn’t really understand how it worked. Basically, linked data allows you to express relationships between and among things (or “entities”),  to make those relationships actionable (clickable, usable, sortable, displayable) by machines and humans. And, with enough linked data out there on the Internet, you can leverage and expose relationships among bits of information that are stored in different places. (I’ve found that most introductions to linked data can get pretty complicated pretty quickly. I’m happy for recommendations.)

At the update on the Linked Data For Libraries (LD4L) project, Dean Krafft (Cornell) and Tom Cramer (Stanford) shared their LD4L project use cases. This one is a good example of what linked data can do for scholars:

Example story: As a faculty member or librarian, I want to create a virtual collection or exhibit containing information resources from multiple collections across multiple universities either by direct selection or by a set of resource characteristics, so that I can share a focused collection with a <class, set of researchers, set of students in a disciplinary area>.

As you can see from this user story, linked data can connect things and facilitate discovery across the Internet in ways that we just can’t do without it.

In their session, Barbara Bushman and Nancy Fallgren described what they’re doing at the National Library of Medicine. See their GitHub repository of code and documentation for their beta MESH and RDF project.

In yet another linked data session, Kenning Arlitsch (Montana State University) underscored the need for libraries to mark up their data in ways that can make it more discoverable on the web (see and Google knowledge graph cards, for instance).

Another related technology is BIBFRAME, which is a new way to express library descriptive information (replacing MARC) and to leverage linked data to connect information. Karim Boughida (George Washington U) explained that the platform for library data is not just the library catalog anymore, it’s the web itself. BIBFRAME allows us to structure our data in a way that is ultimately consumable as linked data on the web.

Taking a break from linked data, I attended Jerome McDonough’s session on preserving our intangible cultural heritage (cuisine, dance, games, etc.), which requires that we stop thinking about archiving information and instead think about preserving knowledge. Jerry gave the example of archiving a game: to preserve it as an interpretable object you need to also preserve a body of information about and around the game (e.g., development documentation from the company, advertisements, collectables created in conjunction with the game, videos related to it). As well, preserved materials on gaming or cuisine will be stored in lots of places: libraries, museums, archives, etc. Jerry underscored that all the info you need for intellectual context is out there in the world, but the places where this context is stored don’t communicate with each other. So what do we do? Jerry suggests we stop educating people to be librarians, archivists & curators, and start training “librarchators” (his humorous word) who will think more broadly than we currently do about collecting and preserving knowledge. “To make progress we can’t just change the technologies we use, we also need to change the social side of our work.” (So relationships are important here as well).

I also saw a session by funders–the National Endowment for the Humanities, the Council on Library and Information Resources, the Institute of Museum and Library Services, and the National Historical Publications & Records Commission (part of NARA)–that all emphasized their interest in engaging the public in our work. For example, the NEH’s “Public Scholar” grant program, and the National Historical Publications & Records Commission’s Literacy and Engagement grant. Kathleen Williams summed it up when she said that the nature of the relationship between historical records and users has changed, and that it’s time to think in new ways to engage the public.

So I would say that the theme of this conference for me was the importance of RELATIONSHIPS, including creating relationships using linked data, collecting metadata and context in order to preserve our intangible cultural heritage, and especially the relationships among all the people who came together at CNI fall 2014 to share what they know with each other.

Format migration

duvalierWhen we preserve digitized content in our repository, we make a commitment not just to the content as it exists today, but also to migrating the content to a new format as standards and technologies change. We have two collections from Tamiment-Wagner, the Poster and Broadside Collection and the United Automobile Workers of America, District 65 Photographs. These images came to us several years ago in a proprietary Kodak format and we preserved them to the technical specifications recommended at the time. Today, the standard has changed, and so we will be migrating to a new file format. We will also be updating the colorspace for these images, as our standards have changed in that regard as well.