I have just posted the syllabus for my spring macroanalysis class focusing on Characterization in Literature. The class is experimental in many senses of the word. We will be experimenting in the class and the class will be an experiment. If all goes according to plan, the only thing about this class that will be different from a research lab is the grade I have to assign at the end—that is the one remaining bit about collaborative learning that still kicks me . . .

To be successful everyone is going to have to be high-performing and self-motivated, me included. For me, at least, the motivation comes from what I think is a really tough nut to crack: algorithmic detection and analysis of character and character types. So far the work in this area has been largely about character networks: how is Hamlet related to Gertrude, etc. That’s good work, but it depends heavily upon the human coding of character metadata before processing. That is precisely why our early experiments at the Stanford Literary Lab focused on Drama. . . the character names are already explicit in the speaker markup. Beyond drama, there have been some important steps taken in the direction of auto-detection of character in fiction, such as those by Graham Sack and Elson et. al, but I think we still have a lot more stepping to do, a whole lot more.

The work I envision for the course will include leveraging obvious tools such as those for named entity recognition and then thinking through and dealing with the more complicated problems of pronoun disambiguation. But my deeper interest here goes far beyond simple detection of entities. The holy grail that I see here lies not in detecting the presence or absence of individual characters but in detecting and tracking character archetypes on a grand macroscale. What if we could begin to answer questions such as these:

  • Are there different classes of villains in the 19th century novel?
  • Do we see a rise in the number of minor characters over the 20th century?
  • What are the qualities that define heroines?
  • How, if at all, do those qualities change/evolve over time? (think Jane Austen’s Emma vs. Stieg Larsson’s Lisbeth).
  • Etc.

We may get nowhere; we may fail miserably. (Of course if I did not already have a couple of pretty good ideas for how to get at these questions I would not be bothering. . . but that, for now, is the secret sauce 😉 )

At the more practical, “skills” level, I’m requiring students to learn and submit all their work using LaTeX! (This may prove to be controversial or crazy–I only learned LaTeX six months ago.) For that they will also be learning how to use the knitr package for R in order to embed R code directly into the LaTeX, and all of this work will take place inside the (awesome) R IDE, RStudio. Hold on to your hats; it’s going to be a wild ride!