• No results found

Table 1.3 library() and data() usage

1.4.3 Other methods of data entry

What follows is a short description of other methods of data entry. It can be skipped on first reading and referenced as needed.

Cut and paste

Copying and pasting from some other program, such as a web browser, is a very common way to import data. If the data is separated by commas, then wrapping it inside the c() function works well. Sometimes, though, a data set doesn’t already have commas in it. In this case, using c() can be tedious. Use the function scan() instead. This function reads in the input until a blank line is entered.

For example, the whale data could have been entered in as

> whales = scan()

1:74 122 235 111 292 111 211 133 156 79 11:

Using source () to read in R commands

The function dump () can be used to write values of R objects to a text file. For example, dump ("x", "somefile. txt") will write the contents of the variable x into the file somefile .txt, which is in the current working directory. Find this with getwd(). We can dump more than one object per file by specifying a vector of object names. The source() function will read in the output of dump() to restore the objects, providing a convenient way to transfer data sets from one R session to another.

The function source() reads in a file of R commands as though they were typed at the prompt. This allows us to type our commands into a file using a text editor and read them into an R session. There are many advantages to this. For example, the commands can be edited all at once or saved in separate files for future reference.

For the most part, this book uses an interactive style to interface with R, but this is mostly for pedagogic reasons. Once the basic commands are learned, we begin to do more complicated combinations with the commands. At this point using source(), or something similar, is much more convenient.

Reading data from formatted data sources

Data can also be found in formatted data files, such as a file of numbers for a single data set, a table of numbers, or a file of comma-separated values (csv).R has ways of reading each of these (and others).

For example, if the Texas whale data were stored in a file called “whale.txt” in this format

74 122 235 111 292 111 211 133 156 79 then scan() could be used to read it in, as in

> whale = scan(file="whale.txt”) Read 10 items

Options exist that allow some formatting in the data set, such as including a separator, like a comma, (sep=), or allowing for comment lines (comment.char=).

Tables of data can be read in with the read. table () function. For example, if “whale.txt” contained data in this tabular format, with numbers separated by white space,

texas florida 74 89 122 254 … 79 90 then the data could be read in as

> read.table("whale.txt",header=TRUE) texas florida

1 74 89 2 122 254

….

10 79 90

The extra argument header=TRUE says that a header includes information for the column names. The function read.csv() will perform a similar task, only on csv files. Most spreadsheets can export csv files, which is a convenient way to import spreadsheet data.

Both read. table () and read.csv() return a data frame storing the data.

Specifying the file In the functions scan(), source(), read.table(), and read.csv(), the argument file= is used to specify the file name. The function file.choose() allows us to choose the file interactively, rather than typing it. It is used as follows:

> read.table(file = file.choose())

We can also specify the file name directly. A file is referred to by its name and sometimes its path. While R is running, it has a working directory to which file names may refer. The working directory is returned by the getwd() function and set by the setwd() function. If a file is in the working directory, then the file name may simply be quoted.

When a file is not in the working directory, it can be specified with its path. The syntax varies, depending on the operating system. UNIX traditionally uses a forward slash to separate directories, Windows a backward slash. As the backward slash has other uses in UNIX, it must be written with two backward slashes when used to separate directories. Windows users can also use the forward slash.

For example, both "C:/R/data.txt" and "C:\\R\\data.txt" refer to the same file, data. txt, in the R directory on the “C” drive.

With a UNIX operating system, we can specify the file as is done at the shell:

> source(file="~/R/data.txt") # tilde expansion works

Finding files from the internet R also has the ability to choose files from the internet using the url() function. Suppose the webpage http://www.math.csi.cuny.edu/UsingR/Data/whale.txt contained data in tabular format.

Then the following would read this web page as if it were a local file.

> site =

"http://www.math.csi.cuny.edu/UsingR/Data/whale.txt" > read.table(file=url(site), header=TRUE)

The url () function is used only for clarity, the file will be found without it, as in

1.4.4Problems

1.20 The built-in data set islands contains the size of the world’s land masses that exceed