In the summer of 2020, Digital Scholarship Services was approached by NYU professor Jacqueline Bishop about finding a new home for Calabash: A Journal of Caribbean Arts and Letters. Multilingual and focused on centering unheard voices, Cal
abash was a pioneering journal showcasing poetry, literature, and visual arts from across the Caribbean. The journal, which Dr. Bishop edited from 2000-2008, had since ceased publishing, and the NYU server that had been hosting the site was due to be retired.
A team consisting of Zach Coble (Head, Digital Scholarship Services), Jonathan Greenberg (Digital Scholarly Publishing Specialist), Marii Nyrop (Digital Humanities Technology Specialist), Kate Pechekhonova (Senior DevOps Engineer), Alexandra Provo (Metadata Librarian for Arts & Cultural Heritage Resources), Nick Wolf (Interim Co-Head of Data Services), and Deb Verhoff (Digital Collections Manager) got together to migrate the article PDFs and metadata. After discussing options, checking with our colleagues about the decision to collect this material for the library, and consulting with the new Digital Collecting and Preservation Planning working groups, the team decided to move the content to our Faculty Digital Archive. That solution would allow for the Libraries to preserve the journal content while providing access and discovery at the journal, issue, and article level, using the Libraries’ existing infrastructure.
While it is not unusual to need to migrate content when systems become obsolete, this request required us to adapt existing workflows and develop some new ones. To kick off the work, we created a metadata application profile outlining the fields we wanted to include in the records. Since there were no existing metadata records, to populate some of the identified fields in a more automated way, Alex modified a web scraping script from a recent NYU Libraries Library Carpentry workshop in order to extract article metadata from the Calabash website. The metadata was further prepared using OpenRefine, open source software for data transformation and cleanup (and Alex’s favorite tool). Meanwhile, Marii and Zach pulled the PDFs from the Calabash site, removed cover pages, and stored the files in a Github repository. With Nick’s help, we used some of the scraped article metadata to register DOIs for each article through Crossref. Finally, we inserted the DOIs into our metadata spreadsheets and created file directories for each issue so that Kate could use SAF builder by Peter Dietz to prepare files and metadata for bulk ingest into our instance of DSpace, NYU’s Faculty Digital Archive.
The migration work was multifaceted, iterative, and cross-departmental. To collaborate, we relied on Github and Google Sheets. Along the way, we encountered some challenges, such as data that wouldn’t scrape, a need to reorder names, and decisions about which FDA import method to use. These challenges pushed us to learn more about web scraping, OpenRefine, and the DSpace import process. The scripted and semi-scripted methods we used got us part of the way there, but not quite all the way. To reach the finish line, in 2021 and 2022 we had the help of two outstanding students from the NYU/LIU Palmer Dual Degree program, Vita Kurland and Katherine Santana, who enhanced the descriptive metadata to improve discoverability so that the journal’s rich content can now reach a wider audience.