About

This site and the Early American Cookbooks collection on HathiTrust are created by Gioia Stevens, Special Collections Cataloger, New York University Libraries. The project was completed in January 2017 as a capstone project for the Digital Humanities track of the Master of Liberal Studies program at the CUNY Graduate Center.

For more information on the project, see New Metadata Recipes for Old Cookbooks: Creating and Analyzing a Digital Collection Using the HathiTrust Research Center Portal in The Code4Lib Journal, issue 37 (July 2017).

Project Overview

The idea for this project came to me after cataloging hundreds of early print cookbooks for the Marion Nestle Food Studies Collection at the Fales Library  & Special Collections at New York University Libraries. These books are an incredibly important resource for food historians or anyone with a passion for old cookbooks, but there are very few full text resources available online. 

The purposes of the project are to create a freely available, searchable online collection of early American cookbooks  to offer an overview of the scope and contents of the collection, and to use digital humanities tools to explore trends and patterns in the metadata and the full text of the collection.

Workflow and Tools

Building the Collection:

  1. Create public permanent collection in HathiTrust from search results in full-text search
  2. Evaluate contents of collection
  3. Check and de-dupe individual records (multiple printings, editions, and scans) 
  4. Download collection metadata and upload volume ID numbers to create workset in HathiTrust Research Center Portal (HTRC)

Analyzing Collection Metadata:

  1. Download catalog records in MARCXML format using Marc Downloader tool in HTRC
  2. Convert and join MARCXML records and export selected fields as csv file using MarcEdit
  3. Clean up metadata and create subgroups of titles using OpenRefine
  4. Upload csv files to Tableau to explore data and produce visualizations

Analyzing Full Text:

  1. Search selected terms in collection using public facing keyword search and download metadata
  2. Upload new worksets in HTRC for keyword search sets as well as subgroups of titles created using OpenRefine
  3. Run HTRC algorithms on worksets. This project used the Meandre Topic Modeling algorithm for topic modeling and the Meandre Dunning Log-likelihood to Tagcloud algorithm to compare and contrast two worksets.
  4. Export results from HTRC.