Eight months ago I began a series of blog posts about my experiments using sentiment analysis as a proxy for plot movement. At the time, I had done a fair bit of anecdotal analysis of how well the sentiments detected by a machine matched my own sense of the sentiments in a series of familiar novels. In addition to the anecdotal spot-checking, I had also hand-coded every sentence of James Joyce’s novel Portrait of the Artist as a Young Man and then compared the various machine methods to my own human coded values. The similarities (seen in figure 1) were striking.
Soon after my first post about this work, David Bamann hired five Mechanical Turks to code the sentiment in each scene of Shakespeare’s Romeo and Juliet. David posted his results online and then Ted Underwood compared the trajectory produced by David’s turks to the machine values produced by the Syuzhet R package I had developed. Even though David’s Turks had coded scenes and Syuzhet had coded sentences, the human and machine trajectories that resulted were very similar. Figure 2 shows the two graphs, first from David’s blog and then from Ted’s.
Before releasing the package, I was fairly confident that the machine was doing a good job of approximating what human beings would think. I hoped that others, like David and Ted, would provide further validation. Many folks posted results online and many more emailed me saying the tool was producing trajectories that matched their sense of the novels they applied it to, but no one conducted anything beyond anecdotal spot checking. After returning to UNL in late August (I had been on leave for a year), I hired four students to code the sentiment of every sentence in six contemporary novels: All the Light We Cannot See by Anthony Doerr, The Da Vinci Code by Dan Brown, Gone Girl by Gillian Flynn, The Secret Life of Bees by Sue Monk Kidd, The Lovely Bones by Alice Sebold, and The Notebook by Nicholas Sparks.
These novels were selected to cover several major contemporary genres. They span a period from 2003 to 2014. None are experimental in the way that Portrait of the Artist is, but they do cover a range of styles between what we might call “low-brow” to “high-brow.” Each sentence of each novel was sentiment coded by three human raters. The precise details of this study, including statistics about inter-rater agreement and machine-to-human agreement, are part of a larger analysis I am conducting with Aaron Dominguez.
What follows are six graphs showing moving averages of the human coded sentiment along side moving averages from two of the sentiment detection methods implemented in the Syuzhet R package. The similarity of the shapes derived from the the human and machine data is quite striking.