Harvard: April 3, 2015

Introduction to Text Analysis with R

General Description:

This workshop provides a practical introduction to text analysis using the R programming language. We will cover basic text processing, data ingestion, data preparation, and analysis. The main computing environment for the workshops will be R: “the open source programming language and software environment for statistical computing and graphics.”

While no programming experience is required, students must have basic computer skills, must be familiar with their computer’s file system, and must be comfortable entering commands in a command line environment.

Suggested Workshop Preparation:

While not required, participants are encouraged to work through at least the first two of the seven basic R lessons available at R Code School prior to taking this workshop.

In advance of the workshop, students should:

  1. Download the current version of R (at the time of this writing version 3.0.2) from the CRAN website by clicking on the link that is appropriate to your operating system (see http://cran.at.r-project.org):
    • If you use MS Windows, click on the “base” and then on the link to the executable (i.e. “.exe”) setup file.
    • If you are running Mac OSX, choose the link to the most current package.
    • If you use Linux, choose your distribution and then the installer file. 
Follow the instructions for installing R on your system in the standard or “default” directory. You will now have the base installation of R on your system.
  2. Download and Install RStudio
    • RStudio is an application that offers a very nice user environment for writing and running R programs. RStudio is an IDE, that’s “Integrated Development Environment” for R. RStudio runs happily on Windows, Mac, and Linux. After you have downloaded R (by following the instructions above) you must download the “Desktop” version (i.e. not the Server version) of RStudio from http://www.rstudio.com. Follow the installation instructions and then launch RStudio just like you would any other program/application. When you launch RStudio, you do *not* need to also launch the R program. RStudio accesses the R program you installed in the first step. Here is a link to a video describing the RStudio programming environment.
  3. You will also need to download the workshop materials.
  4. Please also make sure that your Java Runtime-Environment installation is up to date.

Workshop Syllabus:

IMPORTANT: It is critical that you arrive on time to every session and be ready to roll with RStudio installed and running. The workshop will begin on schedule, and if you miss the first few minutes of any session you will be lost!

Summary: In this workshop you will be introduced to the R programming language while learning the basics of computational text analysis. You will learn basic R syntax and be introduced to the RStudio programing environment. Text analysis topics covered will include text ingestion and tokenization, word frequency analysis, dispersion plots, and if time permits, correlation analysis and, maybe, more!

  • SESSION ONE (9:30 – 12:30)
    • The R computing environment
    • R console vs. RStudio
    • Basic text manipulation in R
    • Word Frequency
  • BREAK (12:30 – 1:30)
  • SESSION TWO (1:30 – 4:30)
    • Dispersion Plots
    • Correlation
    • Bonus Material (time permitting)