500 Themes from a corpus of 19th-Century Fiction

In Macroanalysis: Digital Methods and Literary History (UIUC Press, 2013), I explain how I extracted 500 themes from a corpus of 19th-century novels using Latent Dirichlet Allocation. On this page you can select any one of the 500 themes to see a cloud visualization of the key words and then a series of plots showing the prevalence of the theme in relation to time, author-gender, and author-nationality.

Assigning labels to topic clusters is a subjective process. The labels I have assigned here are most frequently derived from the topic headwords. Some may find the labels unhelpful or even controversial. My goal was not to label the topics in a way that would satisfy all tastes or interpretations, but instead to create a workable title by which I could easily refer to a given topic. By default the modeling process assigns topics a number (e.g. topic 1, topic 2, etc). While referring to topics by number is certainly less controversial, it's not a very useful way to talk about them. These labels should be read as "general terms of convenience" and not as definitive statements on the ultimate meaning of the word cluster.

NOW VIEWING: ANIMALS AND BEASTS

Permalink