Gioia Stevens – Early American Cookbooks

“Vegetarian” timeline

The number of books in the Early American Cookbooks collection which contain the word “vegetarian” in the text increases slowly in the late 19th century and then grows substantially in the years from 1900 to 1920. The vegetarian movement in the United States grew over the same timespan and publishers began producing cookbooks devoted to a purely vegetarian diet. The timeline also reflects the increased number of references to a vegetarian diet not only in books such as How to Cook Vegetables (1891) by the bestselling author Sarah Tyson Rorer, but also in general cookbooks such as Fannie Farmer’s A New Book of Cookery (1917).

Use of word "vegetarian" over time — Use of word “vegetarian” over time

Vegetarian over-represented terms

Early vegetarian cookbooks featured recipes containing nuts and new forms of protein foods based on nuts. These ingredients are prominent in this word cloud showing over-represented terms in vegetarian cookbooks when compared to the full set of the Early American Cookbooks collection. Unusual words such as protose, nuttolene, trumese, and terralac (all nut based mixtures to be used instead of meat) appear along with new terms for grain products (granose, granola). Many of these products were invented by John Harvey Kellogg, an early proponent of vegetarianism and the inventor of Corn Flakes cereal.

This visualization was created by comparing two sets of texts, vegetarian cookbooks and the full Early American Cookbooks collection, using the Meandre Dunning Log-likelihood to Tagcloud algorithm in the HathiTrust Research Center Portal.

Frugal cookbooks over and under-represented terms

A text analysis comparison between the texts cookbooks containing the word “frugal” and the full Early American Cookbooks set shows some interesting differences. The “frugal” books have over-represented terms which feature everyday words such as them, that, good, not, should, your etc which have no obvious connection to cooking. The under-represented terms feature kitchen measurement terms such as teaspoons and also names of ingredients, notably some more luxurious items such as chicken, chocolate, cake, butter, and pineapple. While it is not possible to form definitive conclusions, it seems clear that the frugal books emphasize ordinary language (perhaps directed toward expenditure and lifestyle choices with a healthy dose of “should” and “not”?) and do not offer a wealth of different ingredient names.

Frugallity over-represented terms (Meandre Dunning Log Likelihood to Tagcloud Algorithm)

Frugal under-represented terms (Meandre Dunning Log Likelihood to Tagcloud Algorithm)

This visualization was created by comparing two sets of texts, cookbooks containing the word “frugal” and the full Early American Cookbooks set, using the Meandre Dunning Log-likelihood to Tagcloud algorithm in the HathiTrust Research Center Portal.

“Frugal” timeline

The number of books in the Early American Cookbooks collection which contain the word “frugal” in the text increases over the years 1800 to 1920. This increase may simply be a reflection of the overall increase in the number of books published over time in the collection (see books by year chart). The peaks in the numbers at the end of the 19th century may reflect an increase in the number of books directed at young, inexperienced housekeepers with a small budget such as The Cottage Kitchen: A Collection of Practical and Inexpensive Receipts or Motherly Talks: The Home, How to Make and Keep It.

Use of word "frugal" over time — Use of word “frugal” over time

Comparing two sets of texts

One very useful text analysis tool in the HathiTrust Research Center Portal is the Meandre Dunning LogLikelihood to Tagcloud algorithm. The algorithm compares and contrasts two worksets by identifying the words that are more and less common in one workset than in another workset. This tool has been very useful in analyzing how different subsets of the Early American Cookbooks collection differ from the collection as a whole. Tag clouds display for over and under represented terms for government publications, Fannie Farmer’s cookbooks, and the different census regions of the United States (Northeastern, Southern, Midwestern, and Western)

How it works:
• calculates Dunning Log-likelihood based on two worksets provided as inputs: an “analysis workset” and a “reference workset”
• loads each page of each workset, removes the first and last line of each page, joins hyphenated words that occur at the end of the line;
• performs part of speech tagging (selecting only NN|NNS|JJ.*|RB.*|PRP.*|RP|VB.*|IN);
• lowercases the tokens remaining;
• counts the tokens remaining for all volumes for each collection;
• compares counts from each collection using the Dunning Log-likelihood statistic; the “overused” tokens in the analysis collection (relative to the reference collection), 200 tokens by default, are displayed as a tag cloud and made available via a csv file; the “underused tokens” (also 200 tokens by default) in the analysis collection relative to the reference collection are, likewise, displayed as a tag cloud and made available via a csv file

What is topic modeling?

Topic modeling is a useful way to look for trends and patterns in the collection which may add to our understanding of early cookbooks. What is topic modeling?

As Megan R. Brett explains in Topic Modeling: A Basic Introduction, topic modeling is a form of text mining, a way of identifying patterns in a corpus. You take your corpus and run it through a tool which groups words across the corpus into ‘topics’ (Brett, 2012). Miriam Posner has described topic modeling as “a method for finding and tracing clusters of words (called “topics” in shorthand) in large bodies of texts” (Posner 2012).

Topic modeling is an automated text mining technique that offers a “suite of algorithms to discover hidden thematic structure in large collections of texts” (Blei 2013, 7). Topic modeling is a methodology developed in computer science, machine learning, and natural language processing that has recently become very popular in the digital humanities (Meeks 2013). New digital tools such as MALLET (McCallum 2002) generate comprehensive lists of subjects through statistical analysis of word occurrences in a corpus. The content of the documents, not a human indexer, determines the topics (Jockers 2013, 124). Unlike traditional classification systems with a pre-existing taxonomy of terms, topic modeling creates topics by clustering words that frequently occur together in a text. The resulting topical clusters can be readily interpreted as subject facets by human readers, allowing them to browse the topics of a collection quickly and find relevant material using topically expanded keyword searches (Mimno and McCallum 2007).

The topic models for Early American Cookbooks were generating using the Meandre Topic Modeling algorithm created by Loretta Auvil and available via the HathiTrust Research Center Portal. The algorithm serves to “identify “topics” in a workset based on words that have a high probability of occurring close together in the text. Topics are models trained on co-occurring text using Latent Dirichlet Allocation (LDA), where each topic is treated as a generative model and volumes are assigned a probability of how likely each topic is to have generated that text. The most likely words for a topic are displayed as a word cloud.” Please see Topics Models for Early American Cookbooks and Topic Models for Government Publications for the word cloud results and the About page for more details on the workflow.

WORKS CITED

Blei, David M. 2013. “Topic Modeling and Digital Humanities.” Journal of Digital Humanities.

Brett, Megan R. 2012. “Topic Modeling: A Basic Introduction.” Journal of Digital Humanities.

Jockers, Matthew Lee. Macroanalysis Digital Methods and Literary History. Urbana: University of Illinois Press, 2013.

McCallum, Andrew Kachites. 2002. “MALLET: A Machine Learning for Language Toolkit.”

Meeks, Elijah, and Scott Weingart, 2013. “The Digital Humanities Contribution to Topic Modeling.” Journal of Digital Humanities.

Mimno, David, and Andrew McCallum. “Organizing the OCA: Learning Faceted Subjects from a Library of Digital Books.” Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries. New York, NY, USA: ACM, 2007. 376–385.

Posner, Miriam. “Very Basic Strategies for Interpreting Results from the Topic Modeling Tool.” Miriam Posner’s Blog. 29 Oct. 2012.

Topic modeling for government publications

Here are the topic modeling results for the United States government publications in the collection. The ten word clouds in the chart below show different topics or clusters of words that recur across all of the texts. The names of the topics were not generated by the algorithm but rather added as a way to label and interpret the clusters. While it is impossible to draw definitive analytical conclusions, the topics do provide a interesting snapshot of the subject matter.

The government publications include primarily military cooking manuals with some additional USDA recipe booklets focusing on nutrition and use of substitute ingredients during wartime rationing. The subject matter of these publications is quite different from the rest of the cookbooks in the collection and this difference is demonstrated in the topic word clouds. The topics represent a more clearly defined, scientific approach to cooking with clear groups of ingredients and measurements. Topic 2 (dairy), topic 5 (bread), topic 6 (stew), topic 7 (meat), and topic 10 (equipment) are all quite straightforward descriptions of basic kitchen items. Topic 4 (meat analysis) and topic 9 (bread analysis) emphasize weights, measures, and nutrition terms such as results, protein and digestibility. Topic 1 (mess hall) includes words such as men, mess, recipe, meal, rations and serves to describe the daily workings of a military kitchen. Topic 8 (labor and costs) addresses the economic aspects of running a large food service operation.

Topic 1: Mess hall	Topic 2: Dairy
Topic 3: Nutrition	Topic 4: Meat analysis
Topic 5: Bread	Topic 6: Stew
Topic 7: Meat	Topic 8: Labor and costs
Topic 9: Bread analysis	Topic 10: Equipment

Clippings

Cookbook owners often pasted newspaper clippings to the inside covers and blank pages of their books. These clippings frequently include recipes or practical tips for food preparation and hygiene. This one for ptomaine poison was pasted inside an 1882 edition of How to Feed the Sick by Charles Gatchell, M.D. along with several other clippings: Acetylene Cooking, Beer Diet for Anthrax, Cleansing the Oyster, Sickroom Hints (1916), Hygienic Scorecard (1916), and Food Hints (1917).

Soy flour as meat substitute during World War I

Government pamphlets during World War I were published to help housewives adapt to wartime food rationing. Many recipes use substitutions to make “mock” versions of traditional dishes. These recipes use soybean flour as a substitute for meat, a technique also used in early vegetarian cookbooks . Modern vegetarian cooking continues this practice with recipes using soy products such as tofu, tempeh, and textured vegetable protein.

Recipe from Use soy-bean flour to save wheat, meat, and fat (Washington, DC: U.S. Dept. of Agriculture, Office of the Secretary, 1918).

Whale a la mode

While “a la mode” commonly means topped with ice cream in modern recipes, early cookbooks often used the term to describe a method of cooking beef by larding it and braising it with vegetables and herbs. This recipe for “whale a la mode” comes from a United Statues Bureau of Fisheries published in 1918. The report offers several recipes using whale and porpoise meat instead of beef and suggests that food production may become the future of the declining whale fishery.

Recipe from Whales and porpoises as food by Lewis Radcliffe (Washington, DC: Government Printing Office, 1918)