Requiem for a low pass filter

Ben Schmidt’s and Scott Enderle’s recent entries into the syuzhet discussion have beaten the last of the low pass filter out of me. I’m not entirely ready to concede that Fourier is useless for the larger problem, but they have convinced me that a better solution than the low pass is possible and probably warranted. What that better solution is remains an open question, but Ben has given us some things to consider.

In a nutshell, there were two essential elements to Vonnegut’s challenge that the low pass method seemed to be solving. According to Vonnegut, this business of story shape “is an exercise in relativity” in which “it is the shape of the curves that matter and not their point of origin.” Vonnegut imagined a system of plot in which the high and lows of good fortune and ill fortune are internally relative. In this way, a very negative book such as Blood Meridian will have an absolute high and an absolute low that can be compared to another book that, though more positive on a whole, will also have an absolute high and an absolute low. The object of analysis is not the degree of positive or negative valence but the location of the spikes and troughs of that valence relative to the beginning and end of the book. When conceived of in these terms, the ringing artifacts of the low pass filter seem rather trivial because the objective was not to perfectly represent the valence but to dramatize the shifts in valence.

As Ben has pointed out, however, the edges of the Fourier method present a different sort of problem; they assume that story plots are periodic, repeating signals. The problem, as Ben puts it, is that the method “imposes an assumption that the start of [a] plot lines up with the end of a plot.”

Over the weekend, Ben and I exchanged a few emails, and I acknowledged that I had been overlooking these edge distortions in favor of a big picture perspective of the general shape. Some amount of distortion, after all, must be tolerated if we want to produce a smooth shape. As Israel Arroyo pointed out in a tweet, “endpoints are problematic in most smoothers and filters.” With a simple rolling window, for example, the averaging can’t start until we are already half the distance of the window into the sequence. Figure 1, which shows four options for smoothing Portrait of the Artist, highlights the moving average problem in blue.[1]

Figure 1

Looking only at figure one, it would be hard to argue against Fourier as a beautiful representation of the plot shape. Figure 2 shows the same four methods applied to Dorian Gray. Here again, the Fourier method seems to provide a fair representation. In this case, however, we begin to see a problem forming at the end of the book. The red lowess line is trending down while the green Fourier is reaching up in order to complete its cycle. The beginning still looks good, and perhaps the distortion at the end can be tolerated, but it’s certainly not ideal.

Figure 2

Unfortunately, some sentiment trajectories appear to create a far more pronounced problem. At Ben’s suggestion, I ran the same experiments with Madame Bovary. The resulting plot is shown in figure 3. I’ve not read Bovary in many years, so I can’t recall too many details about plot, but I do remember that it does not end well for anyone. The shape of the green Fourier line at the end of figure 3, however, suggests some sort of uptick in positive sentiment that I suspect is not present in the text. The start of the shape, on the left, also looks problematic compared to the other smoothers.

Figure 3

With the first two figures, I think a case can be made that the Fourier line offers a fair representation of the emotional trajectory. Making such a case for Bovary is not inconceivable if we ignore the edges, but it is clearly a stretch, and there is no denying that the lowess smoother does a better job.

In our email exchange about these different options, Ben included a graphic showing how various methods model four different books. At least in these examples, loess (fifth row of figure 4) appears to be the top contender if we seek a representation that is both maximally smooth and maximally approximate.

Figure 4

In order to fully solve Vonnegut’s challenge, an alternative to percentage chunking is still necessary. Longer segments in longer books will tend toward a neutral valence. Figuring that out is work for the future. For now, the Bovary example provides precisely the sort of validation/invalidation I was hoping to elicit by putting the package online.

RIP low-pass filter.[2]

FOOTNOTES:

[1] There are some more elegant ways to deal with filling in the flat edges, but keeping it simple here for illustration.

[2] I’m grateful to everyone who has engaged in this discussion, especially Annie Swafford, Daniel Lepage, Ted Underwood, Andrew Piper, David Bamman, Scott Enderle, and Ben Schmidt. It has been a very engaging couple of weeks, and along the way I could not help but think of what this discussion might have looked like in print: it would have taken years to unfold! Despite some emotional high and lows of its own, this has been a productive exercise and a great example of how valuable open code and the digital commons can be for progress.

Matthew L. Jockers

"Everything . . . in nature's vast workshop from the extinction of some remote sun to the blossoming of one of the countless flowers which beautify our public parks is subject to a law of numeration as yet unascertained.” (Joyce, Ulysses, 1922)