• No results found

STAT10020: Exploratory Data Analysis

N/A
N/A
Protected

Academic year: 2021

Share "STAT10020: Exploratory Data Analysis"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

STAT10020: Exploratory Data Analysis

Statistical Programming with R

Lab 1

Log in using your student number and password. Click on the start button and then click on “Application Window”. Choose the “Mathematics and Statistics” folder and then “R v2.4.1”. Double-click on the icon and wait for it to open. When R opens, it will look like this:

We will do all our typing and see most of our results in this window. We will mainly be making

objects and performing operations on them using functions.

The simplest sort of object you will meet is a single number or string of letters. To create an object called number1 with the value 7, type

number1 <- 7

And press enter. To see the value of the object called number1, type its name and press enter. You can make objects whose values are letters instead of numbers. To make an object called word1 with the value “Word”, type

(2)

Notice that letters go inside quotation marks; numbers don’t.

Suppose you want an object called number2 whose value is two greater than number1. You have two choices. One is to type

number2 <- number1 + 2

This makes R find the value you already gave to number1, add 2 to it, and store the result as an object called number2.

You can also use a function – the sum function. Type number2 <- sum(number1, 2)

This asks R to take everything inside the brackets, perform the sum function on them (ie. add them up), and store the result as an object called number2. You can find more information on this function by typing

help(sum)

We aren’t limited to working with one number or word at a time. Another type of object is a vector. A vector is just a list of words or numbers. If you want to make a vector containing the numbers

1 3 6

you use a function called c. This function combines a list of objects into one vector. Type vector1 <- c(1, 3, 6)

That is, take the three numbers in brackets (notice that they are separated by commas), perform the c function on them (i.e. turn them into a vector) and save the result as an object called vector1. You can find out more about this function by typing

help(c) to view the help files.

To see the vector, just type its name.

You can use the c function on vectors as well as on numbers. If you want to add the values of number1 and number2 to the vector you just made, you can type

vector2 <- c(vector1, number1, number2)

This takes one vector with three numbers in it, and two more numbers, and puts them all together to make one vector with five numbers in it. Type

vector2 to see your new vector.

(3)

vector3 <- c(“one”, “three”, “six”, “seven”, “nine”)

This is just the same as making a vector with numbers in it, except that the words in the brackets are in quotation marks.

Type

vector3

to make sure that worked.

The third type of object we will use is called a dataframe. A dataframe is like a matrix or a table. You can make one by putting two or more vectors together. The function to do this is called data.frame.

data1 <- data.frame(vector2, vector3)

makes an object called data1 which contains both vectors as columns. Type data1

to see what it looks like. Notice that the columns are headed with the names of the vectors, and the rows are numbered. You can find more information on this function by typing

help(data.frame)

(4)

A teacher has the following data available on five students:

Name Exam 1 Mark Exam 2 Mark

John 92 82

Anne 75 96

Terry 98 60

Fred 62 55

Maria 79 72

We will begin by making a table in R with this data. To do that, we first make a column in R for each of the three variables.

At the prompt, type

studentname <- c(“John”, “Anne”, “Terry”, “Fred”, “Maria”)

and press enter. Notice that there are commas in between each name. The names are in quotation marks because they are made of letters, not numbers. This makes a single list of names in R. The list is called “studentname”. To check that it has worked, type

studentname

You should see the list of names printed in the window. Now we will do the same thing for Exam 1 and Exam 2:

exam1 <- (92, 75, 98, 62, 79)

Because these are numbers, there are no quotation marks. Make a third list of numbers called “exam2” containing the marks for Exam 2.

*Complete part A on your answer sheet*

Now, we want to make one table with all this information in it. Type results <- data.frame(studentname, exam1, exam2)

This makes a single table (which R calls a “dataframe”) with three columns and five rows. To see what it looks like, type

results

Next, the teacher wants to find the average of the two exams for each student. So we need to make a fourth column in the “results” dataset containing this value for each student. Type:

(5)

Notice how we name columns by typing the name of the dataset, then a dollar sign, then the name of the column.

Next, we will assign letter grades to the students. The bands are as follows: < 50 D

50 – 69 C 70 – 84 B 85 – 100 A Type:

results$grade <- cut(results$avg, breaks = c(0, 49, 69, 84, 100), labels = c("D", "C", "B", "A"))

Notice how we are still naming columns with the dataset – dollar sign – column name method, and how the letters are inside quotation marks but the numbers are not.

*Complete part B on your answer sheet*

Next, we want to sort the data from highest mark to lowest mark. Type: results <- results[order(-results$avg),]

What is going on here? Well, we are asking it to sort all of the data in “results” in order of the data in the column called “average”. Because R sorts in ascending order by default, we asked it to sort in the opposite order by putting a minus sign before “results$avg”.

Lastly, suppose we want to print only the students’ names and letter grades. First, we make a new dataframe with only those two columns in it. Type

results2 <- data.frame(results$studentname, results$grade)

And then print it: results2

(6)

STAT10020: Exploratory Data Analysis

Answer Sheet: Lab 1

Name:

Student Number:

Today’s Date:

Part A

What code creates the list of exam2 marks?

Part B

Who has a better grade, Maria or Terry?

Part C

Suppose you wanted to print out only the names and exam 1 marks of the students. What code would you type?

(7)

References

Related documents

1961/1993: 123).19 Merleau-Ponty’s work on style continues the lines of thought (influenced by Husserl) on the phenomenology of painting in “Cézanne’s Doubt” based on

Acknowledging the lack of empirical research on design rights, our paper wishes to investigate the risk of piracy and the perceptions of the registered and unregistered design

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

major graduate and the six University of South Australia Graduate Qualities (see columns one and two of Table 1), and the assessment structures that can best promote

Model and experimental data for (d) mechanical output work, (e) fluid input energy, and (f) efficiency all as a function of contraction for all the muscles at an operating pressure

Videographer to our two sample estate testimonials page dedicated agent would need to quite responsive and energy really good morning i thought they not.. Upfront investment as well

What types of information do the rice farmers need in Anyiin, Logo Local Government Area of Benue State, were do the rice farmers go to satisfy their information needs,

Kemudian pada masa kemerdekaan Peradilan Agama Kota Cirebon hubungan dengan negaranya keluar dari teori politik hukum Belanda bergabung dengan Negara Republik