This past spring semester (2019), APH student Greg Ferguson interned at the CUNY TV’s Library and Archives in New York City. Below you’ll find Greg’s blog post about his experience.

—————————————————————————————————————————————————————

For my internship, I worked on open source digital preservation software at CUNY TV’s Library and Archives. The Archives maintain the station’s digital repository holding all of its programming as well as the Himan Brown Radio collection, which contains content digitized from audio tapes and playscripts donated by the family of the noted radio drama producer Himan Brown.

To manage the intake and storage of all of this digital content, the Archives rely on its own internally developed open source software. The code is organized in modular microservices, which are short pieces of code designed to accomplish specific tasks. Individual microservices are strung together to build more complex processes that automate the ingest of various file formats along with preservation metadata and derivative service and access copies.

The Archives’ software was developed from the digital preservation application Archivematica, which offers a lightweight open source option for digital archiving in line with the OAIS  reference model.

My objectives were to work on the microservices for audio and image files to make them more robust and less narrowly tailored to the Archives’ local workflows. To accomplish this, I used a suite of tools including the text editor Textmate, the version control application Github and Apple’s Terminal command line interface to edit and test the station’s code. As I had no previous experience with coding, I relied on my supervisors for advice and instruction and ultimately succeeded in making several improvements to the code, including:

  • Replacing MP3 files with M4A files for audio podcasting (M4A files offer superior sound at lower bitrates/file sizes, making them ideal for podcasts)
  • Optimizing PDF settings to produce color access files smaller than the previous monochrome files, without impeding OCR (Optical Character Recognition)
  • Embedding descriptive pbcore metadata (title, creator, date, etc.) in access files for display in iTunes and Adobe Acrobat
  • Testing the audio and image microservices to ensure they will work on files at other organizations without CUNY-specific attributes such as CUNY filenames or metadata
  • Automatically producing quality control reports for newly digitized audio files to identify problems like dead air, phase issues or bad levels.

As someone with no previous experience with coding, this experience was very challenging but also very rewarding. It was an especially good experience after already taking the program’s Digital Archives course. The internship gave me the opportunity to see things up close that we learned about in Digital Archives, including: the OAIS model, the DCC Curation Lifecycle model (especially the value of active preservation planning), the architecture of microservice software, and the open source ethos.

Code for embedding pbcore metadata into audio files (Photo Courtesy: Greg Ferguson)

Code for embedding pbcore metadata into audio files (Photo Courtesy: Greg Ferguson)