A few weeks ago Ben Schmidt posted a provocative blog entry titled “Typical TV episodes: visualizing topics in screen time.” It’s worth a careful read. . .

Ben began by topic modeling the closed captioning data from a series of popular TV series and then visualizing the ten most common topics over the time span of each episode. In other words, the x-axis is time, and the y-axis is a measure of topical presence. The end result is something that begins to look a bit like what we could call plot.

Ben followed this post with an even more provocative one on 12/16/14 “Fundamental plot arcs, seen through multidimensional analysis of thousands of TV and movie scripts“. This post led a number of us (Underwood, Mimno, Cherny, etc.) to question what the approach might reveal if applied to novels . . .

In my own recent work, I have been attempting to model plot movement in narrative fiction by analyzing the rise and fall of emotional valence across narrative time. It has been clear to me, however, that my method is somewhat impoverished by a lack context for the emotions I am measuring; Ben’s topic-based approach to plot structure might be just the context I’m missing, and some correlation analysis might be just the right recipe . . . as usual, Ben has given us a lot to think about—i.e. Happy Holidays!

After following the discussion on Twitter and on Ben’s blog, David Mimno wrote to me about whipping up some of these topical plot lines based on the 500 Topic model that I had built for Macroanalysis. Needless to say, I thought this was a great idea. (David and I had previously revisited my topical data for an article in Poetics.) Within a few hours, David had run the entire collection of 500 topics and produced 500 graphs showing the general behavior of each topic across all of the 3,500 texts in my corpus. You will find the output of David’s work here: http://mimno.infosci.cornell.edu/novels/plot.html

In David’s short introductory paragraph, he calls our attention to two specific topic graphs, one for the topic labeled “school” and another labeled “punishment.” You can find my graphs for these two topics here (school) and here (punishment). In referencing these two plots, David calls our attention to one topic (school) that appears prominently at the beginnings of novels in this corpus (think Bildungsroman, perhaps?) and another topic (punishment) that tends to be prominent at the end of novels (think Newgate novels or Oliver Twist, perhaps?).

Like the data from Ben, this data David has mined from my 19th century novels topic model is incredibly rich and demands deeper inspection. I’ve only begun to digest it in bits, but I do observe that a lot of topics carrying negative valence seem to rise over the course of narrative time. This makes intuitive sense if we believe that the central conflict of a novel must grow more intense as the novel progresses. The exciting thing to do ext is to move from the macro to the micro scale and look at the individual novels within this larger context. Perhaps we’ll be able to identify archetypal patterns and then observe which novels stick to the archetypes and which digress. . . what a feast!

Luckily we have a whole new year to indulge!