Academic Blog

  • Rethinking Range in the Age of Generative AI 

    I recently reread David Epstein’s Range (2019), a book I first encountered a few years ago when it seemed every leadership forum was extolling the virtues of grit, 10,000 hours, and early specialization. Epstein pushed back, persuasively arguing that generalists, not specialists, are better equipped to solve complex problems, especially in domains where rules are unclear and…

    read more…

  • Revisiting Chapter Nine of Macroanalysis

    Back when I was working on Macroanalysis, Gephi was a young and sometimes buggy application. So when it came to the network analysis in Chapter 9, I was limited in terms of the amount of data that could be visualized. For the network graphs, I reduced the number of edges from 5,660,695 down to 167,770…

    read more…

  • Syuzhet 1.0.4 now on CRAN

    On Friday I posted an updated version of Syuzhet (1.0.4) to CRAN. This version has been available over on GitHub for a while now. In version 1.0.4, support for sentiment detection in several languages was added by using the expanded NRC lexicon from Saif Mohammed. The lexicon includes sentiment values for 13,901 words in each…

    read more…

  • Resurrecting a Low Pass Filter (well, kind of)

    On April 6th, 2015, I posted Requiem for a low pass filter acknowledging that the smoothing filter as I had implemented it in the beta version of Syuzhet was not performing satisfactorily. Ben Schmidt had demonstrated that the filter was artificially distorting the edges of the plots, and prior to Ben’s post, Annie Swafford had…

    read more…

  • More Syuzhet Validation

    Back in December I posted results from a human validation experiment in which machine extracted sentiment values were compared to human coded values. The results were encouraging. In the spring, we mined the human coded sentences to help create a new sentiment dictionary that would, in theory, be more sensitive to the sort of sentiment…

    read more…

  • That Sentimental Feeling

    Eight months ago I began a series of blog posts about my experiments using sentiment analysis as a proxy for plot movement. At the time, I had done a fair bit of anecdotal analysis of how well the sentiments detected by a machine matched my own sense of the sentiments in a series of familiar novels. In…

    read more…

  • Cumulative Sentiments

    This morning Andrew N. Jackson posted an interesting alternative to the smoothing of sentiment trajectories.  Instead of smoothing the trajectories with a moving average, lowess, or, dare I say it, low-pass filter, Andrew suggests cumulative summing as a “simple but potentially powerful way of re-plotting” the sentiment data.  I spent a little time exploring and thinking about his approach this…

    read more…

  • Requiem for a low pass filter

    Ben Schmidt’s and Scott Enderle’s recent entries into the syuzhet discussion have beaten the last of the low pass filter out of me. I’m not entirely ready to concede that Fourier is useless for the larger problem, but they have convinced me that a better solution than the low pass is possible and probably warranted. What that better solution is remains an…

    read more…

  • My Sentiments (Exactly?)

    While developing the Syuzhet package–a tool for tracking relative shifts in narrative sentiment–I spent a fair amount of time gut-checking whether the sentiment values returned by the machine methods were a good match for my own sense of the narrative sentiment.  Between 70% and 80% of the time, they were what I considered to be good sentence level matches. .…

    read more…

  • A Ringing Endorsement of Smoothing

    On March 7, Annie Swafford posted an interesting critique of the transformation method implemented in Syuzhet.  Her basic argument is that setting the low-pass filter too low may result in misleading ringing artifacts.[1]  This post takes up the issue of ringing artifacts more directly and explains how Annie’s clever method of neutralizing values actually demonstrates just…

    read more…

  • Is that Your Syuzhet Ringing?

    Over the weekend, Annie Swafford published another installment in her ongoing critique of Syuzhet, the R package that I released in early February. In her recent blog post, an interesting approach for testing the get_transformed_values function is proposed[1]. Previously Annie had noted how using the default values for the low-pass filter may result in too much information loss, to which I…

    read more…

  • Some thoughts on Annie’s thoughts . . . about Syuzhet

    Annie Swafford has raised a couple of interesting points about how the syuzhet package works to estimate the emotional trajectory in a novel, a trajectory which I have suggested serves as a handy proxy for plot (in the spirit of Kurt Vonnegut). Annie expresses some concern about the level of precision the tool provides and…

    read more…

  • The Rest of the Story

    My blog on February 2, about the Syuzhet package I developed for R (now available on CRAN), generated some nice press that I was not expecting: Motherboard, then The Paris Review, and several R blogs (Revolutions, R-Bloggers, inside-R) all featured the work.  The press was nice, but I was not at all prepared for the focus to be placed on the one piece…

    read more…

  • Revealing Sentiment and Plot Arcs with the Syuzhet Package

    Introduction This post is a followup to A Novel Method for Detecting Plot posted June 15, 2014. For the past few years, I have been exploring the relationship between sentiment and plot shape in fiction. Earlier today I posted an R package titled “syuzhet” to github. The package is designed to extract sentiment and plot…

    read more…

  • Plot Arcs (Schmidt Style)

    A few weeks ago Ben Schmidt posted a provocative blog entry titled “Typical TV episodes: visualizing topics in screen time.” It’s worth a careful read. . . Ben began by topic modeling the closed captioning data from a series of popular TV series and then visualizing the ten most common topics over the time span…

    read more…

  • NHC Summer Institutes in Digital Humanities

    I’m pleased to announce that Willard McCarty and I are leading a two-year set of summer institutes in digital humanities at the National Humanities Center. Here is the official announcement: “The first of the National Humanities Center’s summer institutes in digital humanities, devoted to digital textual studies, will convene for two one-week sessions, first in…

    read more…

  • Reading Macroanalysis: The Hard Way!

    This past November, Judge Denny Chin ruled to dismiss the Authors Guild’s case against Google; the Guild vowed they would appeal the decision and two months ago their appeal was submitted. I’ll leave it to my legal colleagues to discuss the merit (or lack) in the Guild’s various arguments, but one thing I found curious…

    read more…

  • A Novel Method for Detecting Plot

    While studying anthropology at the University of Chicago, Kurt Vonnegut proposed writing a master’s thesis on the shape of narratives. He argued that “the fundamental idea is that stories have shapes which can be drawn on graph paper, and that the shape of a given society’s stories is at least as interesting as the shape…

    read more…

  • So What?

    Over the past few days, several people have written to ask what I thought about the article by Adam Kirsch in New Republic (“Technology Is Taking Over English Departments The false promise of the digital humanities.”) In short, I think it lacks insight and new knowledge. But, of course, that is precisely the complaint that…

    read more…

  • Text Analysis with R . . . coming soon.

    My new book, Text Analysis with R for Students of Literature is due from Springer sometime in May. I got the cover proofs this week (below). Looking good:-)

    read more…

  • Simple Point of View Detection

    [Note 4/6/14 @ 2:24 CST: oops, had a small error in the code and corrected it: the second if statement should have been “< 1.5" which made me think of a still simpler way to code the function as edited.] [Note 4/6/14 @ 2:52 CST: After getting some feedback from Jonathan Goodwin about Ford's The…

    read more…

  • Experimenting with “gender” package in R

    Yesterday afternoon, Lincoln Mullen and Cameron Blevins released a new R package that is designed to guess (infer) the gender of a name. In my class on literary characterization at the macroscale, students are working on a project that involves a computational study of character genders. . . needless to say, the ‘gender‘ package couldn’t…

    read more…

  • Characterization in Literature and the Macroanalysis Lab

    I have just posted the syllabus for my spring macroanalysis class focusing on Characterization in Literature. The class is experimental in many senses of the word. We will be experimenting in the class and the class will be an experiment. If all goes according to plan, the only thing about this class that will be…

    read more…

  • A Festivus Miracle: Some R Bingo code

    A few weeks ago my daughter’s class was gearing up to celebrate the Thanksgiving Holiday, and I was asked to help prepare some “holiday bingo cards” for the kid’s party. Naturally, I wrote a program in R for the job! (I know, I know, Maslow’s hammer) Since I learned a few R tricks for making…

    read more…

  • Text Analysis with R for Students of Literature

    [Update (9/3/13 8:15 CST): Contributors list now active at the main Text Analysis with R for Students of Literature Resource Page] Below this post you will find a link where you can download a draft of Text Analysis with R for Students of Literature. The book is under review with Springer as part of a…

    read more…

  • Obi Wan McCarty

    [Below is the text of my introduction of Willard McCarty, winner of the 2013 Busa Award.] As the chair of the awards committee that selected Prof. McCarty for this award it is my pleasure to offer a few words of introduction. I’m going to go out on a limb this afternoon and assume that you…

    read more…

  • 25 days until the 2013 DH Fun Run

    Below is the route/elevation for the July 18, 2013 Unofficial (as in run at your own risk this has nothing to do with the conference) DH 2013 Fun Run. The route begins and ends on the north side of the UNL Student Union (fountain area). From campus we will go a few blocks east to…

    read more…

  • “Secret” Recipe for Topic Modeling Themes

    The recently (yesterday) published issue of JDH is all about topic modeling. It’s a great issue, and it got me thinking about some of the lessons I have learned over seven or eight years of modeling literary corpora. One of the important things I have learned is that the quality of the final model (which…

    read more…

  • “A Matter of Scale”

    Back in November, Julia Flanders and I were invited to stage a debate on the matter of “scale” in digital humanities research for the “Boston Area Days of DH” conference keynote: Julia was to represent the micro scale and I the macro. Julia and I met up during the MLA conference in January and began…

    read more…

  • Pronouns in 19th Century Fiction

    Some folks I follow on Twitter (@scott_bot, @benmschmidt, @rayncordell, @foxyfolklorist, and others) were engaged in a conversation this week about the frequency of gendered pronouns in a corpus of 233 fairy tales from @foxyfolklorist’s dissertation. For a bit of literary contextualization, I tweeted a bar graph showing the frequency of 13 pronouns in a corpus…

    read more…

  • Unfolding the Novel

    I’m excited to announce a new research project dubbed “Unfolding the Novel” (which is a play on both “paper” and “protein” folding). In collaboration with colleagues from the Stanford Literary Lab and Arizona State University and in partnership with researchers of the Book Genome project of BookLamp.com we have begun work that traces stylistic and…

    read more…

  • Thoughts on a Literary Lab

    [For the “Theories and Practices of the Literary Lab” roundtable at MLA yesterday, panelists were asked to speak for 5 minutes about their vision of a literary lab. Here are my remarks from that session–#147] I take the descriptor “literary lab” literally, and to help explain my vision of a literary lab I want to…

    read more…

  • Some Advice for DH Newbies

    In preparation for a panel session at DH Commons today, I was asked to consider the question: “What one step would you recommend a newcomer to DH take in order to join current conversations in the field?” and then speak for 3 – 4 minutes. Below is the 5 minute version of my answer. .…

    read more…

  • Computing and Visualizing the 19th-Century Literary Genome

    I was unable to attend the DH 2012 meeting in Hamburg, but I recorded my paper as a screen cast, and my ever faithful colleague Glen Worthey kindly delivered it on my behalf. The full presentation can be viewed here as a QuickTime movie.

    read more…

  • DH2012 and the 2013 Busa Award

    I could not make it to the DH conference in Hamburg this year (though I did manage to appear virtually). As chair of the Busa Award committee I had the pleasure of announcing that Willard McCarty had won the award. Willard will accept the award in 2013 when DH meets at the University of Nebraska.…

    read more…

  • Amicus Brief Filed

    In the last chapter of forthcoming my book, I write about the challenges of copyright law and how many a digital humanist is destined to become a 19th-centuryist if the law isn’t reformed to specifically allow for and recognize the importance of “non-expressive” use of digitized content.* This week the Amicus Brief that I co-authored…

    read more…

  • Macroanalysis

    In preparation for the publication of my book (Macroanalysis: Digital Methods and Literary History, UIUC Press, 2013), I’ve begun posting some graphs and other data to my (new) website. To get the ball rolling, I have created an interactive “theme viewer” where visitors will find a drop down menu of the 500 themes I harvested…

    read more…

  • The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors

    For my forthcoming book, which includes a chapter on the uses of topic modeling in literary studies, I wrote the following vignette. It is my imperfect attempt at making the mathematical magic of LDA palatable to the average humanist. Imperfect, but hopefully more fun than plate notation. . . . . . imagine a quaint…

    read more…

  • Aberrant Adjectives in 19th Century Novels

    I created the visualization below using Many Eyes and a data set derived from part-of-speech tagged novels from 19th century Britain. Found here are the 100 most “aberrant adjectives.” Aberrant here is determined by selecting those words that have the greatest amount of usage deviation (measured by relative frequency) over a 13 decade time period.…

    read more…

  • On Distant Reading and Macroanalysis

    Earlier this week Kathryn Schultz of the New York Times published a rather provocative, challenging, and in my opinion under-researched and over-sensationalized article about my colleague Franco Morreti’s work theorizing a mode of literary analysis that he has termed “distant-reading.” Others have already pointed out some of the errors Schultz made, and I’m fairly certain…

    read more…

  • Kansas Irish Reprint

    Rowfont Press of Wichita, Kansas has just published a newly illustrated edition of Charles Driscoll’s memoir Kansas Irish (with my Critical Introduction). The book is available at Amazon. Kansas Irish and the two sequels that follow provide the most complete and authentic rendering of Irish life on the American prairie in the 19th Century.

    read more…

  • On Pamphleteering and Pamphlet One

    Several months ago, a group of us from the Stanford Literary Lab wrote and sent out for review the article that now appears in Pamphlet 1 of the Lab. The article, titled “Quantitative Formalism: an Experiment” was submitted, peer-reviewed, and approved for publication in a prestigious literary journal. There was, however, a catch. The editors…

    read more…

  • Unigrams, and bigrams, and trigrams, oh my

    I’ve been watching the ngrams flurry online, in twitter, and on various email lists over the last couple of days. Though I think there is great stuff to be leaned from Google’s ngram viewer, I’m advising colleagues to exercise restraint and caution. First, we still have a lot to learn about what can and cannot…

    read more…

  • SEASR Grant

    This month a group of researchers at Stanford, University of Illinois, University of Maryland, and George Mason were awarded a $790,000 grant from the Mellon Foundation to advance the prior work of the SEASR project. I’ll be serving as the overall Project Director and as one of the researchers in the Stanford component of the…

    read more…

  • On Collaboration

    I’ve been hearing a lot about “collaboration,” especially in the digital humanities. Lisa Spiro at Rice University has written a very informative post about Collaborative Authorship in the Humanities as well as another post providing Examples of Collaborative Digital Humanities Projects. Both of these posts are worth reading, and Spiro offers some well-thought out and…

    read more…

  • Auto Converting Project Gutenberg Text to TEI

    Those who do corpus level computational text analysis are always hungry for more and more texts to analyze. Though we’ve become adept at locating texts from a wide range of sources (our own institutional repositories as well as a number of other places including Google Books, the Internet Archive, and Project Gutenberg), we still face…

    read more…

  • Panning for Memes

    Over in the English Department Literature Lab, we have been experimenting with Topic Modeling as a means of discovering latent themes (aka topics) in a corpus of 19th century novels. Topic Modeling is an unsupervised machine learning process that employs Latent Dirichlet allocation. “It posits that each document is a mixture of a small number…

    read more…

  • What is a Literature Lab: Not Grunts and Dullards

    Yesterday’s Chronicle of Higher Education ran an article by Marc Parry about the work we are doing here in our new Literature Lab with “big data.” It’s awfully nice to be compared to Lewis and Clark exploring the frontiers of literary scholarship, but I think the article fails to give due credit to the exceptional…

    read more…

  • Stalker (R) and the journey of the Jockers iPhone

    Lot’s of hoopla in the last few days over the discovery that the iPhone keeps a database of locations it has traveled. Wasn’t long before someone in the R community figured out how to tap into this file and with a mere two lines of code you can visualize where your phone has been on…

    read more…

  • Digital Humanities: Methodology and Questions

    Students in our new Literature Lab doing what English Majors do! Folks keep expressing concern about the future of the humanities, and the “need” for a next big thing. In fact, the title of a blog entry in the April 23, 2010 New York Times takes it for granted that the humanities need “saving.” The…

    read more…

  • Who’s Your DH Blog Mate: Match-Making the Day of DH Bloggers with Topic Modeling

    Social Networking for digital humanities nerds? Which DH bloggers are you most compatible with? Let’s get the right nerds with the right nerds–match making made in digital humanities heaven. After seeing Stefan Sinclair’s Voyeuristic analysis of the Day of DH Blog posts, I wrote and asked him how to get access to the “corpus” of…

    read more…

  • Analyze This (Page)

    “TAToo” is a fun Flash widget developed by Peter Organisciak at the University of Alberta. Peter works under the supervision of Digital Humanists Par Excellence and TAPoR Gurus Geoffrey Rockwell and Stan Ruecker. The widget (just some embed-able code) does “layman’s” text analysis on the web pages in which its code is embedded. I’ve added…

    read more…

  • 65,000 Texts to Mine?

    A story in the Feb. 7th issue of the Telegraph reports that the British Library is going to make 65,000 first edition texts available for public download via Amazon’s Kindle. This news is almost as exciting as Google’s decision some years ago to partner with a consortium of big libraries in order to digitize all…

    read more…

  • Is it the Joyce Industry or the Shakespeare Industry?

    At the recent Digital Humanities Conference in Maryland, Matthew Wilkins and I got into a discussion about famous authors and the “industries” of scholarship that their works have inspired (see Matt’s blog post about our discussion and his survey analysis of the MLA bibliography). The first time I ever heard the term “industry” used in…

    read more…

  • Machine-Classifying Novels and Plays by Genre

    In the post that follows here, I describe some recent experiments that I (and others) have conducted. The goal of these experiments was to accurately machine-classify novels and plays (Shakespeare’s) by genre. One of the most interesting results ends up having more to do with feature extraction than classification algorithm Background Several weeks ago, Mike…

    read more…

  • Executing R in Php

    For their final project, the students in my Introduction to Digital Humanities seminar decided to analyze narrative style in Faulkner’s Sound and the Fury. In addition to significant off-line analysis, we are building a web-based application that allows visitors to compare the different sections of the novel to each other and also to new, unseen…

    read more…

  • Chronicle of Higher Education Article

    This week the Chronicle of Higher Education ran an article written by Jennifer Howard about “literary geospaces.” The article featured some work I have done mapping Irish-American literature using Google Earth (and also profiled the work of Janelle Jenstad who has been mapping early modern London). Photo by Noah Berger The bit about my Google…

    read more…

  • POS Tagging XML with xGrid and the Stanford Log-linear Part-Of-Speech Tagger

    Recently (4/2008) I had reason to Part-Of-Speech tag a whole mess of novels, around 1200. I installed the Stanford Tagger and ran my first job of 250 novels on an old G4 under my desk. Everything worked fine, but the job took six days. After that experience, I figured out how to utilize xGrid for…

    read more…

  • Sophie Online Editing Environment

    This is worth a look for those thinking about online collaboration: http://www.sophieproject.org/ “Sophie’s raison d’être is to enable people to create robust, elegant rich-media, networked documents without recourse to programming.” There is a useful demo movie at http://www.futureofthebook.org/sophie/files/Making_a_Sophie_Book.html

    read more…