Programming Assignment 1 Air Pollution Coursera

Coursera Computing in Data Analysis Assignment 1 Part 3 Week 2

Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between
sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the
threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no
monitors meet the threshold requirement, then the function should return a numeric vector of length 0.

For this function you will need to use the 'cor' function in R which calculates the correlation between two vectors. Please read
the help page for this function via '?cor' and make sure that you know how to use it.

Please save your code to a file named corr.R. To run the test script for this part, make sure your working directory has the
file corr.R in it and the run:

source(“http://spark-public.s3.amazonaws.com/compdata/scripts/corr-test.R”)
corr.testscript()

The assignment for week 2 is kinda tough if you have not used R before. The video lectures also did not prepare you for it. If you have not taken the swirl tutorial, I strongly recommend that you finish it at the beginning of the week 2. You also want to start working on the assignment as soon as possible.

Derek Franks wrote a great tutorial. If you follow the step by step tutorial closely, you should have no problem finishing some problems in assignment 1. Here is the link to the tutorial:

https://github.com/derekfranks/practice_assignment/blob/master/Practice_Assignment.pdf

The second challenge I had about this assignment is that I did not know how to return a data frame in a function. After experimenting a bit and I finally got it to work. Here are the code for returning a data frame in a function.

## initiate the data frame results <- data.frame() ## loop through the files for (i in id) { ## read file and get completed cases ## add to the data frame. results <- rbind(results, data.frame(id=i,nobs=completed_cases)) } ## return the data frame return(results)

Function cor is used in one of the problems, but it’s not taught. You are supposed to figure it out by yourself. The usage is actually quite easy. Suppose you read the file and store it in a data frame called data. To calculate the correlation between column 2 and column 3, you use corr this way.

cor(data[,2], data[,3])

One thought on “Programming Assignment 1 Air Pollution Coursera

Leave a Reply

Your email address will not be published. Required fields are marked *