Topic modeling for government publications

Here are the topic modeling results for the United States government publications in the collection. The ten word clouds in the chart below show different topics or clusters of words that recur across all of the texts. The names of the topics were not generated by the algorithm but rather added as a way to label and interpret the clusters. While it is impossible to draw definitive analytical conclusions, the topics do provide a interesting snapshot of the subject matter.

The government publications include primarily military cooking manuals with some additional USDA recipe booklets focusing on nutrition and use of substitute ingredients during wartime rationing. The subject matter of these publications is quite different from the rest of the cookbooks in the collection and this difference is demonstrated in the topic word clouds. The topics represent a more clearly defined, scientific approach to cooking with clear groups of ingredients and measurements. Topic 2 (dairy), topic 5 (bread), topic 6 (stew), topic 7 (meat), and topic 10 (equipment) are all quite straightforward descriptions of basic kitchen items.  Topic 4 (meat analysis) and topic 9 (bread analysis) emphasize weights, measures, and nutrition terms such as results, protein and digestibility.  Topic 1 (mess hall) includes words such as men, mess, recipe, meal, rations  and serves to describe the daily workings of a military kitchen. Topic 8 (labor and costs) addresses the economic aspects of running a large food service operation. 

 

Topic 1: Mess hall
 

Topic 2: Dairy
 

Topic 3: Nutrition
 

Topic 4: Meat analysis
 

Topic 5: Bread
 

Topic 6: Stew
 

Topic 7: Meat
 

Topic 8: Labor and costs
 

Topic 9: Bread analysis
 

Topic 10: Equipment

Government publications over-represented terms

Government publications over-represented terms (Meandre Dunning Log Likelihood to Tagcloud Algorithm)
Government publications over-represented terms (Meandre Dunning Log Likelihood to Tagcloud Algorithm)

What do the words feces, urine, experiment, grams, and ration have to do with cookbooks? They are all over-represented terms in United States government publications on cooking. These publications include primarily military cooking manuals with some additional USDA recipe booklets focusing on nutrition and use of substitute ingredients during wartime rationing. The subject matter of these publications is quite different from the rest of the cookbooks in the collection and this difference is demonstrated in the word cloud above. The government publications take a much more scientific approach to cooking, focusing on experiments, nutrition and digestion, measurements, and rations per man. 

The word “feces” was a valuable clue in interpreting and correcting the data visualizations in this project. The word first appeared in an over-represented tag cloud for books published in the Southern census region of the United States. It seemed hard to believe that cookbooks on Southern cuisine featured feces so a re-examination of the dataset was in order. Washington, D.C. is part of the Southern census region, but it is also the place of publication for large numbers of government documents. Separating out the books published by government agencies from the larger Southern set proved to be the answer to the problem. Without the government publications, the over-represented terms for the Southern set no longer contained feces, urine, or any of the other nutrition related terms. 

This visualization was created by comparing two sets of texts, government publications and the full Early American Cookbooks set, using the Meandre Dunning Log-likelihood to Tagcloud algorithm in the HathiTrust Research Center Portal.