500 Themes

In Macroanalysis: Digital Methods and Literary History (UIUC Press, 2013), I explain how I extracted 500 themes from a corpus of 19th-century novels using Latent Dirichlet Allocation. On this page you can select any one of the 500 themes to see a cloud visualization of the key words and then a series of plots showing the prevalence of the theme in relation to time, author-gender, and author-nationality.

Assigning labels to topic clusters is a subjective process. The labels I have assigned here are most frequently derived from the topic headwords. Some may find the labels unhelpful or even controversial. My goal was not to label the topics in a way that would satisfy all tastes or interpretations, but instead to create a workable title by which I could easily refer to a given topic. By default the modeling process assigns topics a number (e.g. topic 1, topic 2, etc). While referring to topics by number is certainly less controversial, it’s not a very useful way to talk about them. These labels should be read as “general terms of convenience” and not as definitive statements on the ultimate meaning of the word cluster.