STAT10020: Exploratory Data Analysis
Statistical Programming with R
Lab 1
Log in using your student number and password. Click on the start button and then click on “Application Window”. Choose the “Mathematics and Statistics” folder and then “R v2.4.1”. Double-click on the icon and wait for it to open. When R opens, it will look like this:
We will do all our typing and see most of our results in this window. We will mainly be making
objects and performing operations on them using functions.
The simplest sort of object you will meet is a single number or string of letters. To create an object called number1 with the value 7, type
number1 <- 7
And press enter. To see the value of the object called number1, type its name and press enter. You can make objects whose values are letters instead of numbers. To make an object called word1 with the value “Word”, type
Notice that letters go inside quotation marks; numbers don’t.
Suppose you want an object called number2 whose value is two greater than number1. You have two choices. One is to type
number2 <- number1 + 2
This makes R find the value you already gave to number1, add 2 to it, and store the result as an object called number2.
You can also use a function – the sum function. Type number2 <- sum(number1, 2)
This asks R to take everything inside the brackets, perform the sum function on them (ie. add them up), and store the result as an object called number2. You can find more information on this function by typing
help(sum)
We aren’t limited to working with one number or word at a time. Another type of object is a vector. A vector is just a list of words or numbers. If you want to make a vector containing the numbers
1 3 6
you use a function called c. This function combines a list of objects into one vector. Type vector1 <- c(1, 3, 6)
That is, take the three numbers in brackets (notice that they are separated by commas), perform the c function on them (i.e. turn them into a vector) and save the result as an object called vector1. You can find out more about this function by typing
help(c) to view the help files.
To see the vector, just type its name.
You can use the c function on vectors as well as on numbers. If you want to add the values of number1 and number2 to the vector you just made, you can type
vector2 <- c(vector1, number1, number2)
This takes one vector with three numbers in it, and two more numbers, and puts them all together to make one vector with five numbers in it. Type
vector2 to see your new vector.
vector3 <- c(“one”, “three”, “six”, “seven”, “nine”)
This is just the same as making a vector with numbers in it, except that the words in the brackets are in quotation marks.
Type
vector3
to make sure that worked.
The third type of object we will use is called a dataframe. A dataframe is like a matrix or a table. You can make one by putting two or more vectors together. The function to do this is called data.frame.
data1 <- data.frame(vector2, vector3)
makes an object called data1 which contains both vectors as columns. Type data1
to see what it looks like. Notice that the columns are headed with the names of the vectors, and the rows are numbered. You can find more information on this function by typing
help(data.frame)
A teacher has the following data available on five students:
Name Exam 1 Mark Exam 2 Mark
John 92 82
Anne 75 96
Terry 98 60
Fred 62 55
Maria 79 72
We will begin by making a table in R with this data. To do that, we first make a column in R for each of the three variables.
At the prompt, type
studentname <- c(“John”, “Anne”, “Terry”, “Fred”, “Maria”)
and press enter. Notice that there are commas in between each name. The names are in quotation marks because they are made of letters, not numbers. This makes a single list of names in R. The list is called “studentname”. To check that it has worked, type
studentname
You should see the list of names printed in the window. Now we will do the same thing for Exam 1 and Exam 2:
exam1 <- (92, 75, 98, 62, 79)
Because these are numbers, there are no quotation marks. Make a third list of numbers called “exam2” containing the marks for Exam 2.
*Complete part A on your answer sheet*
Now, we want to make one table with all this information in it. Type results <- data.frame(studentname, exam1, exam2)
This makes a single table (which R calls a “dataframe”) with three columns and five rows. To see what it looks like, type
results
Next, the teacher wants to find the average of the two exams for each student. So we need to make a fourth column in the “results” dataset containing this value for each student. Type:
Notice how we name columns by typing the name of the dataset, then a dollar sign, then the name of the column.
Next, we will assign letter grades to the students. The bands are as follows: < 50 D
50 – 69 C 70 – 84 B 85 – 100 A Type:
results$grade <- cut(results$avg, breaks = c(0, 49, 69, 84, 100), labels = c("D", "C", "B", "A"))
Notice how we are still naming columns with the dataset – dollar sign – column name method, and how the letters are inside quotation marks but the numbers are not.
*Complete part B on your answer sheet*
Next, we want to sort the data from highest mark to lowest mark. Type: results <- results[order(-results$avg),]
What is going on here? Well, we are asking it to sort all of the data in “results” in order of the data in the column called “average”. Because R sorts in ascending order by default, we asked it to sort in the opposite order by putting a minus sign before “results$avg”.
Lastly, suppose we want to print only the students’ names and letter grades. First, we make a new dataframe with only those two columns in it. Type
results2 <- data.frame(results$studentname, results$grade)
And then print it: results2
STAT10020: Exploratory Data Analysis
Answer Sheet: Lab 1
Name:
Student Number:
Today’s Date:
Part A
What code creates the list of exam2 marks?
Part B
Who has a better grade, Maria or Terry?
Part C
Suppose you wanted to print out only the names and exam 1 marks of the students. What code would you type?