April 2013 update: this post is part 6 of 6 that were designed to help beginning R programmers get up and running with some simple data analyses. They were originally private for a specific course in Summer 2012, but they’re now public in case the tips might be useful for a broader audience. -Brian

In week 6 we look at loops in R and how to avoid them at all costs. If you’ve had some programming experience in other languages like C, C++, or python you know that looping is an important part of many programs, but looping in R is really slow. When dealing with large data sets it becomes really important to avoid loops so your programs finish in a reasonable amount of time. Fortunately R has several built-in functions that are very good at replacing loops, although it is tricky to learn to think in R’s “vectorized” fashion.

-Brian

**The first task is to get the average of each row in a matrix.**Create a 10 x 5 matrix:
m = matrix(1:50, nrow=10, byrow=T)

- Use a
*for loop*to find the row averages. Here’s how I would do it:
avg = rep(NA, 10)
for(i in 1:10) {
avg[i] = mean(m[i,])
}

- The
*apply* function allows us to avoid the for loop in this case. Check the help file for *apply* then use it to find the row means. Make sure the answer is correct.
avg = apply(m, 1, mean)

Note the second argument is a 1 (instead of a 2) because we want to apply the function * mean* to each row of our matrix. Making this a 2 would find the column means.

- For common operations, R often has built-in functions that accomplish the task most efficiently. In this case we can use the function
*rowMeans* to do this job.
avg = rowMeans(m)

**The second task is to compute statistics about a data set arranged as a “ragged array”**. The cars data set that we’ve used in previous weeks is a good example of this. Suppose we want to find the mean gas mileage for each year. This is a ragged array because there are a different number of rows for each of the years.
- Use the
*table* command to confirm to get a list of the years and the number of observations for each year.
- Think about how you would use a for-loop to find the mean gas mileage for each year. It’s kind of a pain.
- The
*tapply* command is more elegant and much faster than the for-loop solution, so let’s use that instead. Look up the command’s help file, and use it to find the mean mileage for each year.
avg = tapply(cars$mpg, INDEX=cars$year, FUN=mean)

### Like this:

Like Loading...

*Related*