• No results found

Getting Data from the Web with R

N/A
N/A
Protected

Academic year: 2021

Share "Getting Data from the Web with R"

Copied!
72
0
0

Loading.... (view fulltext now)

Full text

(1)

Getting Data from the Web with R

Part 2: Reading Files from the Web

Gaston Sanchez

April-May 2014

Content licensed underCC BY-NC-SA 4.0

(2)
(3)

Readme

License:

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-nc-sa/4.0/

You are free to:

Share— copy and redistribute the material Adapt— rebuild and transform the material

Under the following conditions:

Attribution— You must give appropriate credit, provide a link to the license, and indicate if changes were made.

NonCommercial— You may not use this work for commercial purposes.

Share Alike— If you remix, transform, or build upon this work, you must distribute your contributions under the same license to this one.

(4)

Lectures Menu

Slide Decks 1. Introduction

2. Reading files from the Web 3. Basics of XML and HTML 4. Parsing XML / HTML content 5. Handling JSON data

6. HTTP Basics and the RCurl Package 7. Getting data via Web Forms

(5)

Reading Files

from the Web

(6)

Data Files from the Web...

(7)

Goal

From the web to R

The goal of these slides is to show you different ways to read (data) files from the Web into R

(8)

Synopsis

In a nutshell

We’ll cover a variety of situations you most likely will find yourself dealing with:

I reading raw (plain) text

I reading tabular (spreadsheet-like) data

I reading structured data (xml, html) as text

I reading R scripts and Rdata files

(9)

Some References

I R Data Import / Export Manual by R Core Team

I Data Manipulation with R by Phil Spector

I R Programming for Bioinformatics by Robert Gentleman

I The Art of R Programming by Norm Matloff

I XML and Web Technlogies for Data Sciences with R by Deb Nolan and Duncan Temple Lang

(10)

Considerations

Keep in mind

All the material described in this presentation relies on 3 key assumptions:

I we knowwhere the data is located

I we knowhow the data is stored (i.e. type of file)

I all we want is to import the data in R

(11)

Reading Files

from the Web

(12)
(13)

Documentation

R Data Import / Export Manual

This is the authoritative source of information to read and learn almost all about importing —and exporting— data in R:

I html version

http://cran.r-project.org/doc/manuals/r-release/R-data.html I pdf version

http://cran.r-project.org/doc/manuals/r-release/R-data.pdf Good news: pretty much everything you need to know it’s there

Bad news: it is not beginner friendly =(

(14)

Basics First

R is equipped with a set of handy functions that allow us to read a wide range of data files

The trick to use those functions depends on the format of the data we want to read, and the way R handles the imported values:

I what type of objets(eg vector, list, data.frame)

I what kind of modes (eg character, numeric, factor)

(15)

Basics First (con’t)

Fundamentals

Let’s start with the basic reading functions and some R technicalities

I scan()

I readLines()

I connections

(16)

Conceptual Diagram

connections

scan() readLines()

read.table()

read.csv() read.fwf()

(17)

Built-in reading functions

I The primary functions to read files in R are scan()and readLines()

I readLines()is the workhorse function to read raw text in R as character strings

I scan()is a low-level function for reading data values, and it is extended byread.table() and its related functions

I When reading files, there’s the special concept under the hood called Rconnections

I Both scan() and readLines() take a connectionas input

(18)

Connections

R connections?

Connection is the term used in R to define a mechanism for handling input (reading) and output (writing) operations.

What do they do?

A connection is just an object that tells R to be prepared for opening a data source (eg file in local directory, or a file url) See the full help documentation with: ?connections

(19)

Types of Connections

Functions to create connections Function Input

file() path to the file to be opened or complete URL url() a complete URL

gzfile() path to a file compressed by gzip bzfile() path to a file compressed by bzip2 xzfile() path to a file compressed by xz

unz() path to the zip file with .zip extension pipe() command line to be piped to or from fifo() path of the fifo

By default, creating a connection does not open the connection. But they may be opened with the argumentopen

(20)

Connections (con’t)

Usefulness

Connections provide a means to have more control over the way R will “comunicate” with the resources to be read (or written).

Keep in mind

Most of the times you don’t need to use or worry about connections.

However, you should know that they can play an important role behind the built-in fuctions to read data in R.

(21)

Important Connections

file()

The most commonly used connection is file(), which is used by most reading functions (to open a local file for reading or writing data).

url()

Because we’re interested in getting data from the web, the one connection that becomes a protagonist is theurl() connection.

(22)

Connection for the web

Using url()

url(description, open = "", blocking = TRUE, encoding = getOption("encoding"))

The main input for url()is the descriptionwhich has to be a complete URL, including scheme such as http://, ftp://, or file://

(23)

Example of url connection

For instance, let’s create a connection to the R homepage:

# creating a url connection to the R homepage r_home=url("http://www.r-project.org/")

# what's in r_home r_home

## description class

## "http://www.r-project.org/" "url"

## mode text

## "r" "text"

## opened can read

## "closed" "yes"

## can write

## "no"

# is open?

isOpen(r_home)

## [1] FALSE

Note that we are just defining a connection. By default, the connection does not open anything

(24)

About Connections

Should we care?

I Again, most of the times we don’t need to explicitly useurl().

I Connections can be used anywhere a file name could be passed to functions likescan()or read.table().

I Usually, the reading functions —eg readLines(), read.table(), read.csv()— will take care of the URL connection for us.

I However, there may be occassions in which we will need to specify a url() connection.

(25)

Reading Text

(26)

Objective

Reading Text Files As Text

In this section we’ll talk about reading text files with the purpose of importing their contents as raw text (ie character strings) in R.

(27)

About Text Files

“In computer literature, there is often a distinction made between text files and binary files. That distinction is somewhat misleading —every file is binary in the sense that it consists of 0s and 1s. Let’s take the term text files to mean a file that consists mainly of ASCII characters ...

and that uses newline characters to give the humans the perception of lines.”

Norman Matloff (2011) The Art of R Programming

(28)

About Text Files

Some considerations so we can all be on the same page:

I By text files we mean plain text files

I Plain text as an umbrella term for any file that is in a human-readable form (eg .txt, .csv, .xml, .html)

I Text files stored as a sequence of characters

I Each character stored as a single byte of data

I Data is arranged in rows, with several values stored on each row

I Text files that can be read and manipulated with a text editor

(29)

Reading Text Functions

Functions for reading text

I readLines()is the main function to read text files as raw text in R

I scan()is another function that can be used to read text files.

It is more generic and low-level but we can specify some of its parameters to import content as text

(30)

About readLines()

Function readLines()

I readLines()is the workhorse function to read text files as raw text in R

I The main input is the file to be read, either specified with a connectionor with the file name

I readLines()treats each line as a string, and it returns a character vector

I The output vector will contain as many elements as number of lines in the read file

(31)

readLines()

Using readLines()

readLines(con = stdin(), n = -1L, ok = TRUE, warn = TRUE, encoding = "unknown")

I cona connection, which in our case will be a complete URL

I nthe number of lines to read

I ok whether to reach the end of the connection

I warn warning if there is no End-Of-Line

I encodingtypes of encoding for input strings

(32)

Moby Dick

(33)

Example

Project Gutenberg

A great collection of texts are available from the Project Gutenberg which has a catalog of more than 25,000 free online books:

http://www.gutenberg.org

Moby Dick

Let’s consider the famous novel Moby Dick by Herman Melville. A plain text file of Moby Dick can be found at:

http://www.gutenberg.org/ebooks/2701.txt.utf-8

(34)

Moby Dick text file

http://www.gutenberg.org/ebooks/2701.txt.utf-8

(35)

Reading Raw text

Here’s how you could read the first 500 lines of cotent with readLines()

# url of Moby Dick (project Gutenberg)

moby_url=url("http://www.gutenberg.org/ebooks/2701.txt.utf-8")

# reading the content (first 500 lines) moby_dick=readLines(moby_url,n=500)

# first five lines moby_dick[1:5]

## [1] "The Project Gutenberg EBook of Moby Dick; or The Whale, by Herman Melville"

## [2] ""

## [3] "This eBook is for the use of anyone anywhere at no cost and with"

## [4] "almost no restrictions whatsoever. You may copy it, give it away or"

## [5] "re-use it under the terms of the Project Gutenberg License included"

Note that each line read is stored as an element in the character vector moby dick

(36)

Goot to Know

Terms of Service

Some times, reading data directly from a website may be against the terms of use of the site.

Web Politeness

When you’re reading (and “playing” with) content from a web page, make a local copy as a courtesy to the owner of the web site so you don’t overload their server by constantly rereading the page. To make a copy from inside of R, look at the download.file() function.

(37)

Download Moby Dick

Downloading

It is good advice to download a copy of the file to your computer, and then play with it.

Let’s use download.file()to save a copy in our working directory.

In this case we create the file mobydick.txt

# download a copy in the working directory

download.file("http://www.gutenberg.org/cache/epub/2701/pg2701.txt",

"mobydick.txt")

(38)

Abut scan()

Function scan()

Another very useful function that we can use to read text is scan().

By default, scan() expects to read numeric values, but we can change this behavior with the argument what

scan(file = "", what = double(), nmax = -1, n = -1, sep = "", quote = if(identical(sep, "\n")) "" else "'\"", dec = ".", skip = 0, nlines = 0, na.strings = "NA",

flush = FALSE, fill = FALSE, strip.white = FALSE,

quiet = FALSE, blank.lines.skip = TRUE, multi.line = TRUE, comment.char = "", allowEscapes = FALSE,

fileEncoding = "", encoding = "unknown", text)

(39)

Function scan() (con’t)

Some scan() arguments

I file the file name or a connection

I what type of data to be read

I nmaximum number of data values to read

I septype of separator

I skip number of lines to skip before reading values

I nlinesmaximum number of lines to be read

(40)

Moby Dick Chapter 1

Chapter 1 starting at line 536.

How do we get the first lines of that chapter?

(41)

Reading Some Lines

Let’s make it more interesting

If we want to read just a pre-specified number of lines, we have to loop over the file lines and read the content with scan(). For instance, let’s skip the first 535 lines, and then read the following 10 lines of Chapter 1

# empty vector to store results moby_dick_chap1 =rep("",10)

# number of lines to skip until Chapter 1 skip=535

# reading 10 lines (line-by-line using scan) for(iin1L:10) {

one_line=scan("mobydick.txt",what="",skip= skip,nlines=1)

# pasting the contents in one_line

moby_dick_chap1[i]=paste(one_line,collapse=" ") skip=skip +1

}

Note that we are usingpaste() to join (collapse) all the scanned values in one line

(42)

Reading Some Lines (con’t)

# lines 536-545 moby_dick_chap1

## [1] "CHAPTER 1. Loomings."

## [2] ""

## [3] ""

## [4] "Call me Ishmael. Some years ago--never mind how long precisely--having"

## [5] "little or no money in my purse, and nothing particular to interest me on"

## [6] "shore, I thought I would sail about a little and see the watery part of"

## [7] "the world. It is a way I have of driving off the spleen and regulating"

## [8] "the circulation. Whenever I find myself growing grim about the mouth;"

## [9] "whenever it is a damp, drizzly November in my soul; whenever I find"

## [10] "myself involuntarily pausing before coffin warehouses, and bringing up"

(43)

Reading an HTML file

HTML File

Our third example involves reading the contents of an html file.

We’re just illustrating how to import html content as raw text in R.

(We are not parsing html; we’ll see that topic in the next lecture)

Egyptian Skulls

Let’s consider the file containing information about the Egyptian Skulls data set by Thomson et al :

http://lib.stat.cmu.edu/DASL/Datafiles/EgyptianSkulls.html

(44)

HTML File

http://lib.stat.cmu.edu/DASL/Datafiles/EgyptianSkulls.html

(45)

Skulls Data

To read the html content we use readLines()

# read html file content as a string vector

skulls=readLines("http://lib.stat.cmu.edu/DASL/Datafiles/EgyptianSkulls.html")

head(skulls,n=10)

## [1] "<TITLE>Egyptian Skulls Datafile</TITLE>"

## [2] ""

## [3] ""

## [4] "<hr size=2><center><table border=1 cellpadding=0 cellspacing=0><tr><td align=center><table border=1 cellpadding=1 cellspacing=0><tr><td><A HREF=\"../DataArchive.html\"><IMG SRC=\"../InlineImages/mainmenu.gif\" alt=\"Go to Main Menu\"></a></td></tr></table></td><td align=center><table border=1 cellpadding=1 cellspacing=0><tr><td><A HREF=\"/cgi-bin/dasl.cgi\"><IMG SRC=\"../InlineImages/powersearchsmall.gif\" alt=\"Go to Power Search\"></a></td></tr></table></td><td align=center><table border=1 cellpadding=1 cellspacing=0><tr><td><A HREF=\"../allsubjects.html\"><IMG SRC=\"../InlineImages/allsubjects.gif\" alt=\"Go to Datafile Subjects\"></a></td></tr></table></td></tr></table></center><hr size=2>"

## [5] "<B><DT>Datafile Name:</B> Egyptian Skulls"

## [6] "<B><DT>Datafile Subjects:</B> <dsubjects>"

## [7] "<A HREF=\"/cgi-bin/dasl.cgi?query=Archeology&submit=Search&metaname=dsubjects&sort=swishrank\">Archeology</A>"

## [8] ", "

## [9] ""

## [10] "<A HREF=\"/cgi-bin/dasl.cgi?query=Biology&submit=Search&metaname=dsubjects&sort=swishrank\">Biology</A>"

(46)

Reading Tabular Data

(47)

About Tabular Data

Tables

R is great for reading data in tabular (spreadsheet-like) format.

Tabular data, also known as rectangular data, are typically text files (ie can be read and manipulated with a text editor)

The conventional form is data values that can be seen as an array of rows and columns

(48)

Tabular Data File Formats

Two main formats

I delimited formats

I fixed-width formats Delimited

In a delimited format, values within a row are separated by a special character, or delimiter.

Fixed-Width

In a fixed-width format, each value is allocated a fixed number of

(49)

Data Table Toy Example

Imagine we have some tabular data

Name Gender Homeland Born Jedi

Anakin male Tatooine 41.9BBY yes

Amidala female Naboo 46BBY no

Luke male Tatooine 19BBY yes

Leia female Alderaan 19BBY no

Obi-Wan male Stewjon 57BBY yes

Han male Corellia 29BBY no

Palpatine male Naboo 82BBY no

R2-D2 unknown Naboo 33BBY no

(50)

Data table formats

space delimited

Name Gender Homeworld Born Jedi Anakin male Tatooine 41.9BBY yes Amidala female Naboo 46BBY no Luke male Tatooine 19BBY yes Leia female Alderaan 19BBY no Obi-Wan male Stewjon 57BBY yes Han male Corellia 29BBY no Palpatine male Naboo 82BBY no R2-D2 unknown Naboo 33BBY no

tab delimited

Name Gender Homeworld Born Jedi Anakin male Tatooine 41.9BBY yes Amidala female Naboo 46BBY no Luke male Tatooine 19BBY yes Leia female Alderaan 19BBY no Obi-Wan male Stewjon 57BBY yes Han male Corellia 29BBY no Palpatine male Naboo 82BBY no R2-D2 unknown Naboo 33BBY no

comma delimited

Name,Gender,Homeworld,Born,Jedi Anakin,male,Tatooine,41.9BBY,yes Amidala,female,Naboo,46BBY,no Luke,male,Tatooine,19BBY,yes Leia,female,Alderaan,19BBY,no

Fixed width

Name Gender Homeworld Born Jedi Anakin male Tatooine 41.9BBY yes Amidala female Naboo 46BBY no Luke male Tatooine 19BBY yes Leia female Alderaan 19BBY no

(51)

Functions

Main functions

I scan()reads data values (one by one)

I read.table() main function for reading tabular data

I read.csv()convenient wraper of read.table() designed for reading comma separated values (CSV) files

I read.delim()wrapper of read.table() for any delimited file format

I read.fwf()designed for reading files with fixed width separated values

(52)

Taxon Data

The R Book

Example from The R Book by Michael Crawley

http://www.bio.ic.ac.uk/research/mjcraw/therbook/

(53)

Taxon Data

Taxon Data (from The R Book)

http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/taxon.txt

(54)

Taxon Data

Let’s read the data "taxon"

# url of taxon data

taxon_url="http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/taxon.txt"

# import data in R

taxon=read.table(taxon_url,header=TRUE,row.names=1)

head(taxon)

## Petals Internode Sepal Bract Petiole Leaf Fruit

## 1 5.621 29.48 2.462 18.20 11.28 1.128 7.876

## 2 4.995 28.36 2.429 17.65 11.04 1.198 7.025

## 3 4.768 27.25 2.570 19.41 10.49 1.004 7.817

## 4 6.299 25.92 2.066 18.38 11.80 1.614 7.672

## 5 6.489 25.21 2.902 17.31 10.12 1.813 7.758

(55)

Iris Example

Iris Data

Data set "iris"from UCI Machine Learning Repo

https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

Example from the UCI Machine Learning Repository

gastonsanchez.com Getting data from the web with R CC BY-SA-NC 4.0 55 / 72

(56)

Iris Data

How do we read the data?

If you try to simply use read.csv(), you’ll be disappointed:

# URL of data file

iris_file="https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# this won't work

iris_data=read.csv(iris_file,header=FALSE)

Note that the URL starts withhttps://, that means a secured connection. The solution requires some special functions:

I We need to use the R package"RCurl" to make an HTTPS request with getURL()

(57)

Reading Iris Data

This is how to successfully read the iris data set in R:

# URL of data file

iris_file="https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

library(RCurl)

iris_url=getURL(iris_file)

iris_data=read.csv(textConnection(iris_url),header=FALSE)

head(iris_data)

## V1 V2 V3 V4 V5

## 1 5.1 3.5 1.4 0.2 Iris-setosa

## 2 4.9 3.0 1.4 0.2 Iris-setosa

## 3 4.7 3.2 1.3 0.2 Iris-setosa

## 4 4.6 3.1 1.5 0.2 Iris-setosa

## 5 5.0 3.6 1.4 0.2 Iris-setosa

## 6 5.4 3.9 1.7 0.4 Iris-setosa

(58)

Excel File Example

Excel file

Example from Data Mining Course by Lluis Belanche

http://www.lsi.upc.edu/˜belanche/Docencia/mineria/mineria.html

(59)

Excel alpha data

Alpha Data (from Data Mining course)

(60)

Reading alpha Data

We need to use the function read.xls() from the package

"gdata" (you need to have Perl installed in your machine)

# load package 'gdata' library(gdata)

# excel file (1st worksheet named "dades")

alpha_xls="http://www.lsi.upc.edu/˜belanche/Docencia/mineria/Practiques/alpha.xls"

Count the number of sheets in excel file, and list sheet names:

# how many sheets sheetCount(alpha_xls)

## [1] 2

# names of sheets sheetNames(alpha_xls)

(61)

Reading alpha Data

Since the data set is in the first worksheet we use the argument sheet = 1:

# import sheet 1 (dades) in R

alpha_data=read.xls(alpha_xls, sheet=1) head(alpha_data)

## iden beta1 gamma1 delta1 alfa1 altra1 estr1 beta2 gamma2 delta2 alfa2 edat

## 1 1 0 0 0 0 1 0 1 0 0 0 40

## 2 2 0 0 0 0 1 0 0 0 0 1 57

## 3 3 0 0 0 0 1 0 0 0 0 1 35

## 4 4 0 0 0 0 1 0 0 1 0 0 54

## 5 5 0 0 1 0 0 0 0 0 1 0 43

## 6 6 0 0 1 0 0 0 0 1 0 0 19

## memb estd eciv prof1 prof2 ingr

## 1 4 12 1 1 0 190

## 2 2 12 1 1 0 220

## 3 1 16 0 1 0 220

## 4 3 14 1 1 0 220

## 5 5 14 1 0 1 110

## 6 6 14 0 0 0 220

(62)

Google Spreadsheet

Cars2004 Data

Example with data in Google Docs Spreadsheet

(63)

Reading Cars2004 Google Doc

To read data from a Google Doc Spreadsheet we need to use the R package "RCurl"(to connect via a secured HTTP). In addition we need to know the publick key of the document. Here’s how to read the Cars2004 google doc:

# load package RCurl library(RCurl)

# google docs spreadsheets url

google_docs ="https://docs.google.com/spreadsheet/"

# public key of data 'cars'

cars_key="pub?key=0AjoVnZ9iB261dHRfQlVuWDRUSHdZQ1A4N294TEstc0E&output=csv"

# download URL of data file

cars_csv=getURL(paste(google_docs, cars_key,sep=""))

# import data in R (through a text connection)

cars2004=read.csv(textConnection(cars_csv),row.names=1,header=TRUE)

(64)

Reading Cars2004 Google Doc

cars2004=read.csv(mycars,row.names=1,header=TRUE)

## Cylinders Horsepower Speed Weight Width Length

## Citroen C2 1.1 Base 1124 61 158 932 1659 3666

## Smart Fortwo Coupe 698 52 135 730 1515 2500

## Mini 1.6 170 1598 170 218 1215 1690 3625

## Nissan Micra 1.2 65 1240 65 154 965 1660 3715

## Renault Clio 3.0 V6 2946 255 245 1400 1810 3812

## Audi A3 1.9 TDI 1896 105 187 1295 1765 4203

## Peugeot 307 1.4 HDI 70 1398 70 160 1179 1746 4202

## Peugeot 407 3.0 V6 BVA 2946 211 229 1640 1811 4676

## Mercedes Classe C 270 CDI 2685 170 230 1600 1728 4528

## BMW 530d 2993 218 245 1595 1846 4841

## Jaguar S-Type 2.7 V6 Bi-Turbo 2720 207 230 1722 1818 4905

## BMW 745i 4398 333 250 1870 1902 5029

## Mercedes Classe S 400 CDI 3966 260 250 1915 2092 5038

## Citroen C3 Pluriel 1.6i 1587 110 185 1177 1700 3934

## BMW Z4 2.5i 2494 192 235 1260 1781 4091

## Audi TT 1.8T 180 1781 180 228 1280 1764 4041

## Aston Martin Vanquish 5935 460 306 1835 1923 4665

## Bentley Continental GT 5998 560 318 2385 1918 4804

## Ferrari Enzo 5998 660 350 1365 2650 4700

(65)

Wikipedia Table

Wikipedia Table

Let’s read an HTML table from Wikipedia. This is not technically a file, but a piece of content from an html document

http://en.wikipedia.org/wiki/World_record_progression_1500_metres_freestyle

(66)

Reading data in an HTML Table

To read an HTML table we need to use the function readHTMLTablefrom the R package "XML"

# load XML library(XML)

# url

swim_wiki="http://en.wikipedia.org/wiki/World_record_progression_1500_metres_freestyle"

Since we want the first table, we specify the parameter which = 1

# reading HTML table

swim1500=readHTMLTable(swim_wiki,which=1,stringsAsFactors=FALSE)

(67)

Reading data in an HTML Table

head(swim1500)

## # Time Name Nationality

## 1 1 22:48.4 Taylor , Henry Henry Taylor Great Britain

## 2 2 22:00.0 Hodgson , George George Hodgson Canada

## 3 3 21:35.3 Borg , Arne Arne Borg Sweden

## 4 4 21:15.0 Borg , Arne Arne Borg Sweden

## 5 5 21:11.4 Borg , Arne Arne Borg Sweden

## 6 6 20:06.6 Charlton , Boy Boy Charlton Australia

## Date Meet

## 1 01908-07-25-0000Jul 25, 1908 Olympic Games

## 2 01912-07-10-0000Jul 10, 1912 Olympic Games

## 3 01923-07-08-0000Jul 8, 1923 -

## 4 01924-01-30-0000Jan 30, 1924 -

## 5 01924-07-13-0000Jul 13, 1924 -

## 6 01924-07-15-0000Jul 15, 1924 Olympic Games

## Location Ref

## 1 United Kingdom, London ! London, United Kingdom

## 2 Sweden, Stockholm ! Stockholm, Sweden

## 3 Sweden, Gothenburg ! Gothenburg, Sweden

## 4 Australia, Sydney ! Sydney, Australia

## 5 France, Paris ! Paris, France

## 6 France, Paris ! Paris, France

(68)

R script and RData

R script and RData

Last but not least, we can also import data inside an R script and in an .RDatafile. In this case the data files come from John

Maindonald’s website

I The table is in the form of an R script

http://maths-people.anu.edu.au/˜johnm/r/misc-data/travelbooks.R I The other type of data is in reality a bunch of data sets in the

form of an .RDtat file

http://maths-people.anu.edu.au/˜johnm/r/dsets/usingR.RData

(69)

travelbooks Data

Travelbooks Data (by John Maindolnald)

(70)

Reading R script with source()

To read the script we simply need to use the functionsource()

# url

travelbooks ="http://maths-people.anu.edu.au/˜johnm/r/misc-data/travelbooks.R"

# sourcing file source(travelbooks)

travelbooks

## density area type

## Aird's Guide to Sydney 0.71 270.1 Guide

## Moon's Australia handbook 0.88 245.0 Guide

## Explore Australia Road Atlas 0.83 552.0 Roadmaps

## Australian Motoring Guide 1.13 601.4 Roadmaps

## Penguin Touring Atlas 1.15 928.8 Roadmaps

## Canberra - The Guide 0.91 306.5 Guide

(71)

RData

.RData file usingR.RData contains several data frames

(72)

Reading RData data sets

For those data sets that are inside an .RData file, we need to use the function load()and pass the file withurl()

# let's remove all objects in session rm(list=ls())

# url with .RData

load(url("http://maths-people.anu.edu.au/˜johnm/r/dsets/usingR.RData"))

# list of read data sets ls()

## [1] "ais" "alpha_csv" "alpha_data" "alpha_xls"

## [5] "anesthetic" "austpop" "cars2004" "Cars93.summary"

## [9] "dewpoint" "dolphins" "elasticband" "florida"

## [13] "hills" "huron" "i" "iris_data"

## [17] "islandcities" "kiwishade" "leafshape" "milk"

## [21] "moby_dick" "moby_dick_chap1" "moby_text" "moths"

## [25] "mycars" "myiris" "myRData" "myswim"

References

Related documents

To understand how the establishment of tourism clusters has infl uenced the situation in the region in the fi rst years of their existence, the periods before and after

Whether we are speaking of corporate training, continuing education, academic courses, or even entire degree programs, the traditional mainstays of quality assurance such as

It serves as a channel of communication for various activities of the library and works as a knowledge portal to all the library users including information professionals

It clusters and analyses ongoing interventions in Egypt and Tunisia and explores the different stages that constitute the entrepreneurial life cycle and the six impact dimensions

During Phase 1, the Failure Mode and Effects Analysis (FMEA) methodology is used as a risk assessment tool to help systematically determine the organization’s threshold for acceptable

We have, for example, instruction manuals listing codes of conduct (e.g. what to wear and say on ritual occasions, what emotions to convey) alongside treatises explaining the

Influences on Problem Development: Contextual Predisposing Factors Parent-child Factors in Early Life.

For brick/block masonry cracks may be sealed with cement mortar, when crack width is between 5 and 10 mm and the thickness of the wall is small.. In cases where the wall is of