Reading Macroanalysis: The Hard Way!

This past November, Judge Denny Chin ruled to dismiss the Authors Guild’s case against Google; the Guild vowed they would appeal the decision and two months ago their appeal was submitted. I’ll leave it to my legal colleagues to discuss the merit (or lack) in the Guild’s various arguments, but one thing I found curious was the Guild’s assertion that 78% of every book is available, for free, to visitors to the Google Books pages.

According to the Guild’s appeal:

Since 2005, Google has displayed verbatim text from copyrighted books on these pages. . . Google generally divides each page image into eighths, which it calls “snippets.”. . . Once a user retrieves a book through her initial search, she can enter any other search terms she chooses, and the author’s verbatim words will be displayed in three snippets for each search. Although Google has stated that any given search by a user “only” displays three snippets of each book, a single user can view far more than three snippets from a Library Project book by performing multiple searches using different terms, including terms suggested by Google. . . Even minor variations in search terms will yield different displays of text. . . Google displays snippets from each book, except that it withholds display of 10% of the pages in each book and of one snippet per page. . .Thus, Google makes the vast majority of the text of these books—in all, 78% of each work—available for display to its users.

I decided to test the Guild’s assertion, and what better book to use than my own: Macroanalysis: Digital Methods and Literary History.

In the “Preview,” Google displays the front matter (table of contents, acknowledgements, etc) followed by the first 16 pages of my text. I consider this tempting pabulum for would be readers and within the bounds of fair use, not to mention free advertising for me. The last sentence in the displayed preview is cut off; it ends as follows: “We have not yet seen the scaling of our scholarly questions in accordance with the massive scaling of digital content that is now. . . ” Thus ends page 16 and thus ends Google’s preview.

According to the author’s guild, however, a visitor to this book page can access much more of the book by using a clever method of keyword searching. What the Guild does not tell us, however, is just how impractical and ridiculous such searching is. But that is my conclusion and I’m getting ahead of myself here. . .

To test the guild’s assertion, I decided to read my book for free via Google books. I began by reading the material just described above, the front matter and the first 16 pages (very exciting stuff, BTW). At the end of this last sentence, it is pretty easy to figure out what the next word would be; surely any reader of English could guess that the next word, after “. . .scaling of digital content that is now. . . ” would be the word “available.”

Just to be sure, though, I double-checked that I was guessing correctly by consulting the print copy of the book. Crap! The next word was not “available.” The full sentence reads as follows: “We have not yet seen the scaling of our scholarly questions in accordance with the massive scaling of digital content that is now held in twenty-first-century digital libraries.”

Now why is this mistake of mine important to note? Reading 78% of my book online, as the Guild asserts, requires that the reader anticipate what words will appear in the concealed sections of the book. When I entered the word “available” into the search field, I was hoping to get a snippet of text from the next page, a snippet that would allow me to read the rest of the sentence. But because I guessed wrong, I in fact got non-contiguous snippets from pages 77, 174, 72, 9, 56, 15, 37, 162, 8, 4, 80, 120, 154, 46, 133, 79, 27, 97, 147, and 17, in that order. These are all the pages in the book where I use the word “available” but none include the rest of the sentence I want to read. Ugh.

Fortunately, I have a copy of the full text on my desk. So I turn to page 17 and read the sentence. Aha! I now conduct a search for the word “held.” This search results in eight snippets; the last of these, as it happens, is the snippet I want from page 17. This new snippet contains the next 42 words. The snippet is in fact just the end of the incomplete sentence from page 16 followed by another incomplete sentence ending with the words: “but we have not yet fully articulated or explored the ways in which. . . ”

So here I have to admit that I’m the author of this book, and I have no idea what follows. I go back to my hard copy to find that the sentence ends as follows: “. . . these massive corpora offer new avenues for research and new ways of thinking about our literary subject.”

Without the full text by my side, I’d be hard pressed to come up with the right search terms to get the next snippet. Luckily I have the original text, so I enter the word “massive” hoping to get the next contiguous snippet. Six snippets are revealed, the last of these includes the sentence I was hoping to find and read. After the word “which,” I am rewarded with “these massive corpora offer new avenues for” and then the snippet ends! Crap, I really want to read this book for free!

So I think to myself, “what if instead of trying to guess a keyword from the next sentence, I just use a keyword from the last part of the snippet. “avenues” seems like a good candidate, so I plug it in. Crap! The same snippet is show again. Looks like I’m going to have to keep guessing. . .

Let’s see, “new avenues for. . . ” perhaps new avenues for “research”? (Ok, I’m cheating again by going back to the hard copy on my desk, but I think a savvy user determined to read this book for free might guess the word “research”). I plug it in. . . 38 snippets are returned! I scroll though them and find the one from page 17. The key snippet now includes the end of the sentence: “research and new ways of thinking about our literary subject.”

Now I’m making progress. Unfortunately, I have no idea what comes next. Not only is this the end of a sentence, but it looks like it might be the end of a paragraph. How to read the next sentence? I try the word “subject” and Google simply returns the same snippet again (along with assorted others from elsewhere in the book). So I cheat again and look at my copy of the book. I enter the word “extent” which appears in the next sentence. My cheating is rewarded and I get most of the next sentence: “To some extent, our thus-far limited use of digital content is a result of a disciplinary habit of thinking small: the traditionally minded scholar recognizes value in digital texts because they are individually searchable, but this same scholar, as a. . . ”

Thank goodness I have tenure and nothing better to do!

The next word is surely the word “result,” which I now dutifully enter into the search field. Among the 32 snippets that the search returns, I find my target snippet. I am rewarded with a copy of the exact same snippet I just saw with no additional words. Crap! I’m going to have to be even more cleaver if I’m going to game this system.

Back to my copy of the book I turn. The sentence continues “as a result of a traditional training,” so I enter the word “traditional,” and I’m rewarded with . . . the same damn passage again! I have already seen it twice, now thrice. My search for the term “traditional” returns a hit for “traditionally” in the passage I have already seen and, importantly, no hit for the instance of “traditional” that I know (from reading the copy of the book on my desk) appears in the next line. How about “training,” I wonder. Nothing! Clearly Google is on to me now. I get results for other instances of the word “training” but not for the one that I know appears in the continuation of the sentence I have already seen.

Well, this certainly is reading Macroanalysis the hard way. I’ve now spent 30 minutes to gain access to exactly 100 words beyond what was offered in the initial preview. And, of course, my method involved having access to the full text! Without the full text, I don’t think such a process of searching and reading is possible, and if it is possible, it is certainly not feasible!

But let’s assume that a super savvy text pirate, with extensive training in English language syntax could guess the right words to search and then perform at least as well as I did using a full text version of my book as a crutch. My book contains, roughly, 80,000 words. Not counting the ~5k offered in the preview, that leaves 75,000 words to steal. At a rate of 200 words per hour, it would take this super savvy text pirate 375 hours to reconstruct my book. That’s about 47 days of full-time, eight-hour work.

I get it. Times are tough and some folks simply need to steal books from snippet view because they can’t afford to buy them. I’m sympathetic to these folks; they need to satisfy their intense passion for reading and knowledge and who could blame them? Then again, if we consider the opportunity cost at $7.25 per hour (the current minimum wage), then stealing this book from snippet view would cost a savvy text pirate $2,218.75 in lost wages. The eBook version of my text, linked to from the Google Books web page, sells for $14.95. Hmmm?