While studying anthropology at the University of Chicago, Kurt Vonnegut proposed writing a master’s thesis on the shape of narratives. He argued that “the fundamental idea is that stories have shapes which can be drawn on graph paper, and that the shape of a given society’s stories is at least as interesting as the shape of its pots or spearheads.” The idea was rejected.

In 2011, Open Culture featured a video in which Vonnegut expanded on this idea and suggested that computers might someday be able to model the shape of stories, that is, the movement of the narratives, the plots. The video is about four minutes long; it’s worth watching.

About the same time that I discovered this video, I was working on a project in which I was applying the tools and techniques of sentiment analysis to works of fiction.[1] Initially I was interested in tracing the evolution of emotional content in novels over the course of the 19th century. By accident I discovered that the sentiment I was detecting and measuring in the fiction could be used as a highly accurate proxy for plot movement.

Joyce’s Portrait of the Artist as a Young Man is a story that I know fairly well. Once upon a time a moo cow came down along the road. . .and so on . . .

Here is the shape of Portrait of the Artist as a Young Man that my computer drew based on an analysis of the sentiment markers in the text:


If you are familiar with the plot, you’ll readily see that the computer’s version of the story is accurate. As it happens, I was teaching Portrait last fall, so I projected this image onto the white board and asked my students to annotate it. Here are a few of the high (and low) points that we identified.


Because the x-axis represents the progress of the narrative as a percentage, it is easy to move from the graph to the actual pages in the text, regardless of the edition one happens to be using. That’s precisely what we did in the class. We matched our human reading of the book with the points on the graph on a page-by-page basis.

Here is a graph from another Irish novel that you might know; this is Wilde’s Picture of Dorian Gray.


If you remember the story, you’ll see how well this plot line models the movement of the story. Discovering the accuracy of these graphs was quite thrilling.

This next image shows Dan Brown’s blockbuster novel The Da Vinci Code. Notice how much more regular the fluctuations are. This is the profile of a page turner. Notice too how the more generalized blue trend line hovers above neutral in terms of its emotional valence. Dan Brown never lets the plot become too troubled or too much of a downer. He baits us and teases us with fluctuating emotion.


Now compare Da Vinci Code to one of my favorite contemporary novels, Cormac McCarthy’s Blood Meridian. Blood Meridian is a dark book and the more generalized blue trend line lingers in the realms of negative emotion throughout the text; it is a very different book from The Da Vinci Code.[2]


I won’t get into the precise details of how I am measuring emotional valence in these books here.[3] It’s a bit too complicated for an already too long blog post. I will note, however, that the process involves two major components: a controlled vocabulary of positive and negative sentiment markers collected by Bing Liu of the University of Illinois at Chicago and a machine model that I trained to identify and score passages as positive or negative.

In a follow-up post, I’ll describe how I normalized the plot shapes in 40,000 novels in order to compare the shapes and discover what appear to be six archetypal plots!

[1] In the field natural language processing there is an area of research known as sentiment analysis or, sometimes, opinion mining. And when our colleagues engage in this kind of work, they very often focus their study on a highly stylized genre of non-fiction: the review, specifically movie reviews and product reviews. The idea behind this work is to develop computational methods for detecting what we, literary folk, might call mood, or tone, or sentiment, or perhaps even refer to as affect. The psychologists prefer the word valence, and valence seems most appropriate to this research of mine because the psychologists also like to measure degrees of positive and negative valence. I am not aware of anyone working in sentiment analysis who is specifically interested in modeling emotional valence in fiction. In fact, the great majority of work in this field is so far removed from what we care about in literary studies that I spent about six months simply wondering whether or not the methods developed by folks trying to gauge opinions in movie reviews could even be usefully employed in studies of literature.
[2] I gained access to some of these novels through a data transfer agreement made between the University of Nebraska and a private company that is no longer in business. See Unfolding the Novel.
[3] I’m working on a longer and more formal version of this research report for publication. The longer version will include all the details of the methodology. Stay Tuned:-)