Stat programming workshop – week 3 tasks

April 2013 update: this post is part 3 of 6 that were designed to help beginning R programmers get up and running with some simple data analyses. They were originally private for a specific course in Summer 2012, but they’re now public in case the tips might be useful for a broader audience. -Brian

In week 1 we looked at installing and running R, while week 2 focused on importing data. This week we start exploring how to write re-usable R code.
– Brian

  1. There are many advantages to doing statistical analysis with a programming language like R. Obviously, we use it to make the computer do what we want, but more importantly R enables us to write reusable code. This means a couple things: first, we want to be able to come back and run our programs long after we wrote them; second, we want to save time by not re-writing everything from scratch for each analysis; and lastly we want our analyses to be reproducible by other researchers.
  2. Open R, and open the text editor of your choice. If you’re using R on Windows or Mac you can use R’s built-in text editor. On Macs, go to File >> New Document. It should be similar in Windows. You can also use a separate text editor like Notepad (windows), gedit (linux), or TextEdit (mac).
  3. Enter the code for opening the car MPG data (from week 2 – you should have it saved as ‘cars.txt’) in the new document. cardata = read.table(‘cars.txt’)
  4. On the next line edit the names of the variables in the cardata data frame. names(cardata) = c(‘mpg’, ‘cylinders’, ‘displacement’, ‘horsepower’, ‘weight’, ‘acceleration’, ‘year’, ‘origin’, ‘name’)
  5. On the next line compute and display a table that shows the unique values in the cylinders column, and how many observations there are for each number of cylinders. print(table(cardata$cylinders)). Note the $ syntax to refer to a variable name in the cardata data frame.
  6. Save your program with a .R extension. For example, mine is called “mpg_analysis.R”.
  7. Run your program. Switch back to the R command line and type source([your program name]). For me, this is source(‘mpg_analysis.R’). You should see the table of cylinder values and counts come up.
This entry was posted in teaching. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s