Stat programming workshop – week 4 tasks

April 2013 update: this post is part 4 of 6 that were designed to help beginning R programmers get up and running with some simple data analyses. They were originally private for a specific course in Summer 2012, but they’re now public in case the tips might be useful for a broader audience. -Brian

In weeks 1-3 we installed and opened R, imported data from a text file, and started editing and saving code in source files. This week’s focus is R’s basic plot functions. One of the most important parts of any statistical analysis is communicating the results to other people, and plots are often a very effective way to do this.

  1. Open R and load the car data from weeks 2 and 3. Name the variables in the car data appropriately (see the week 3 tasks for the variable names).
  2. Last week we used the table command to see how many cars there are in our data set for each number of cylinders. Now we’ll check out the same information visually, using the command hist to produce a histogram. In the command window, type hist(cardata$cylinders).
  3. Save this plot, by using a sequence of three commands: 1. png([filename].png); 2. hist([some variable]); 3. If you want a pdf file instead of a png image, you can substitute the command pdf in place of png. Similarly, we will work with various plot commands in addition to hist; just substitute your command in place of hist.
  4. Figure out how to make the plot pretty: change the default title, and axis labels. Hint: use the help(hist) command to see descriptions of the arguments to hist.
  5. Now make a histogram of the weight variable. Notice that hist chooses a default number of break points, because the weights don’t naturally fall into a nice number of bins like the number of cylinders. Re-do the plot with 20 bins.
  6. Suppose we want to investigate how mpg changes over the years. Make a scatterplot of mpg vs year with the plot(year, mpg) command.
  7. Bonus: Add a horizontal line to the plot that shows where the overall mean of mpg is. Use the command abline(h=mean(cardata$mpg)). Try to make this line thicker than the default, and try to make it red (default is black).
This entry was posted in teaching. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s