We use cookies to improve your experience on our site. By using our site, you consent to cookies.
Manage your cookie preferences below:
Essential cookies enable basic functions and are necessary for the proper function of the website.
These cookies are needed for adding comments on this website.
These cookies are used for managing login functionality on this website.
Statistics cookies collect information anonymously. This information helps us understand how visitors use our website.
Google Analytics is a powerful tool that tracks and analyzes website traffic for informed marketing decisions.
Service URL: policies.google.com (opens in a new window)
Marketing cookies are used to follow visitors to websites. The intention is to show ads that are relevant and engaging to the individual user.
A video-sharing platform for users to upload, view, and share videos across various genres and topics.
Service URL: www.youtube.com (opens in a new window)
More Syuzhet Validation
Back in December I posted results from a human validation experiment in which machine extracted sentiment values were compared to human coded values. The results were encouraging. In the spring, we mined the human coded sentences to help create a new sentiment dictionary that would, in theory, be more sensitive to the sort of sentiment words common to fiction (whereas existing sentiment dictionaries tend to be derived from movie and/or product review corpora). This dictionary was implemented as the default in the latest release of the Syuzhet R package (2016-04-28).
Over the summer, a new group of six human-coders was hired to read novels and score the sentiment of every sentence. Each novel was read by three human-coders. In the graphs that follow below, a simple moving average is used to plot the mean sentiment of the three students (black line) along side the values derived from the new “Syuzhet” dictionary (red line). Each graph reports the Pearson product-moment correlation coefficient.
This fall we will continue gathering human data by reading additional books. Once we have a few more books read, we’ll post a more detailed report, including data about inter-coder agreement and which machine methods produced results closest to the humans.