• No results found

Arithmetic operators and special values

In document Statistics and Data With R (Page 36-52)

R includes the usual arithmetic operators and logical operators. It also has a set of symbols for special values, or no values at all.

1.4.1 Arithmetic operators

Arithmetic operators consist of +,−, ∗, / and the power operator ˆ. All of these oper-ate on vectors, element by element:

> x <- 1 : 3 ; x^2 [1] 1 4 9

Arithmetic operators and special values 21

1.4.2 Logical operators

Logical operators include “and” and “or”, denoted by & and|. We compare values with the logical operators >, >=, <, <=, == and != standing for greater than, greater equal than, less than, less equal than, equal and not equal. ! is the negation operator. Upon evaluation, logical operators return the logical values TRUE or FALSE. If the operation cannot be accomplished, then NA is returned. With vectors, logical operators work as usual—one element at a time. Here are some examples:

> 5 == 4 & 5 == 5

Like any other operation, if you operate on vectors, the values returned are element by element comparisons among the vectors. The rules of extending vectors to equal length still stand. Thus,

Here is an example that explains what happens when you compare two vectors of different lengths:

In line 1 we create two vectors, with lengths of 2 and 1. We column bind them. So R extends y with one element. When we implement the logical operations in lines 8, 10 and 12, R compares the vector 5, 4 to 4, 4. To make sure that you get what you want, when comparing vectors, always make sure that they are of equal length.

1.4.3 Special values

Because R’s orientation is toward data and statistical analysis, there are features to deal with logical values, missing values and results of computations that at first sight do not make sense. Sooner or later you will face these in your data and analysis. You need to know how to distinguish among these values and test for their existence. Here are the important ones.

Logical values

Logical values may be represented by the tokens TRUE or FALSE. You can specify them as T or F. However, you should avoid the shorthand notation. Here is an example why.

We wish to construct a vector with three logical elements, all set to TRUE. So we do this

> T <- 5 ...

> (x <- c(T, T, T)) [1] 5 5 5

Some time earlier during the session, we happened to assign 5 to T. Then, forgetting this fact, we assign c(T, T, T) to x. The result is not what we expect. Because TRUE and FALSE are reserved words, R will not permit the assignment TRUE <- 5. The tokens TRUE and FALSE are represented internally as 1 or 0. Thus,

> TRUE == 0 ; TRUE == 1 [1] FALSE

[1] TRUE

> TRUE == -0.1 ; FALSE == 0 [1] FALSE

[1] TRUE NA

This token stands for “Not Available” It is used to designate missing data. In the next example, we create a vector x with five elements, the first of which is missing.

To test for NA, we use the function is.na(). This function returns TRUE if an element is NA and FALSE otherwise.

> (x <- c(NA, 2 : 5)) [1] NA 2 3 4 5

> (test <- is.na(x))

[1] TRUE FALSE FALSE FALSE FALSE

It is important to realize that is.na() returns FALSE if the element tested for is not NA. Why? Because there are other values that are not numbers. They may result from computations that make no sense, but they are not NA.

Arithmetic operators and special values 23

NaN and Inf

These designate “Not a Number” and infinity, respectively. Division by zero does not result in a number; it therefore returns NaN. You may wish to assign Inf to a vector (for example when you wish any vector to be smaller than Inf in a comparison). In both cases, these are not NA; they are NaN and Inf, respectively. Furthermore, Inf is a number (you can verify this with the function is.numeric()); NaN is not. To distinguish among these possibilities, use the function is.nan().

Distinguishing among NA, NaN and Inf

Distinguishing among these in data can be confusing. Unless interested, you may skip this topic. Consider the following vector:

> (x <- c(NA, 0 / 0, Inf - Inf, Inf, 5)) [1] NA NaN NaN Inf 5

Here 0/0 is undefined and therefore not a number. So is Inf-Inf. Albeit not a real number, Inf is part of the set of numbers called extended real numbers. We need to distinguish among vector elements that are a number, NA, NaN and Inf in x. First, let us test for NA:

> is.na(x)

[1] TRUE TRUE TRUE FALSE FALSE

As you can see, NA and NaN are undefined and therefore the test returns TRUE for both. Now let us test x with is.nan():

> is.nan(x)

[1] FALSE TRUE TRUE FALSE FALSE

The first element of x is NA. It is distinguishable from NaN and we get FALSE for it.

Finally, because Inf is a value, we test it as usual with the logical operator ==. This operator returns TRUE if the left equals the right hand side:

> x == Inf

[1] NA NA NA TRUE FALSE

Note what happens. Because NA and NaN are undefined, comparing them to a defined value (Inf), we get NA. We therefore expect to get the similar result of the test

> x == 5

[1] NA NA NA FALSE TRUE

The next table summarizes these results.

x is.na(x) is.nan(x) Inf == x 5 == x

1 NA TRUE FALSE NA NA

2 NaN TRUE TRUE NA NA

3 NaN TRUE TRUE NA NA

4 Inf FALSE FALSE TRUE FALSE

5 5 FALSE FALSE FALSE TRUE

1.5 Objects

We discussed objects on numerous occasions before. That was necessary because we introduced other topics that required the notion of objects (learning R cannot be linear). Here we discuss these and additional object-related topics in more detail.

Understanding objects is key to working with R effectively.

In the next few statements, we assign values to x. We also explore the type of object created by the assignments:

> x <- 2 # x is a vector of length 1

> x <- vector() # x is a vector of 0 length

> x <- matrix() # x is a matrix of 1 column, 1 row

> x <- 'Hello Dolly' # x is a vector containing 1 string

> x <- c('Hello', 'Dolly') # x is a vector with 2 strings

> x <- function(){} # x is a function that does nothing

As we have seen, vectors are atomic objects—all of their elements must be of the same mode. In most cases, we work with vectors of modes logical, numeric or character.

Most other types of objects in R are more complex than vectors. They may consist of collections of vectors, matrices, data frames and functions. When an object is created (for example with the assignment <-), R must allocate memory for the object. The amount of memory allocated depends on the mode of the object. Beside their mode and length, objects have other properties which we will learn about as we progress.

1.5.1 Orientation

The following is a general exposition of the idea of objects. This section is not related to R directly. Rather, it is conceptual. It is intended to demystify some of the baffling aspects of R.

Usually, computer software that deals with data (e.g. Excel, Oracle, other database management systems, programming languages) distinguish between what we call data types. For example, in Excel, you can format a column so that it is known to con-tain numbers, or text, or dates. In the programming language C, you distinguish between data that represent integers, floating (decimal point) numbers, single char-acters, collections of characters (called strings) and so on. “Why do we need to make these distinctions?” you might ask. The short answer is because of efficiency and error checking. If the software knows the intended use of data, it will allocate as much memory as is needed for it and no more. For example, the amount of memory that is needed to represent an integer is less than the amount needed to represent a string that contains 100 characters. So if you tell the software that x is intended to represent integers and y strings, computations will be more efficient than oth-erwise. Other reasons for specifying data types are consistency, ability to check for errors, pointer arithmetic and so on. For example, if the software knows that x and y represent numbers, then it will take special actions if you ask it to compute x/y when y = 0.

This leads to the definition of simple data types. These are types that can-not be broken into simpler data types.3 An integer, a decimal number and a

3Unless you are ready to deal with bits.

Objects 25 character are examples. From these, more elaborate data types can be constructed.

For example, a string is a collection of characters and a collection of integers is a vector.

This gives rise to the idea of structures. Instead of defining simple data types, such as integers, floating point numbers and characters, we can define data structures. For example, we can define a structure named vector and specify that such a structure contains a set of numbers. Then we can tell the software that x is a vector and assign data to it with a statement like x <- c(1, 2, 3). Better yet, we can define a structure named matrix, for example, that contains two or more vector s of the same data type and same length. We can then tell the software that y is a matrix and write

> (y <- cbind(letters[1 : 4], LETTERS[1 : 4])) [,1] [,2]

[1,] "a" "A"

[2,] "b" "B"

[3,] "c" "C"

[4,] "d" "D"

(cbind() is a function that binds vectors as columns). Structures do not need to be atomic. For example, a structure may contain a numeric and a character vector. In short, structures are user-defined data types. But why should we stop with structures?

After all, we often apply similar actions to similar structures. Consider, for example, printing. All matrices are printed in the same way: numbers arranged in columns and rows. The only difference in printing matrix objects is their number of rows and columns. This leads to the idea of object types (also called classes). An object type is a definition of a collection of structures (data) and actions (functions) that we may apply to these structures.

Viewing a vector as a type, we can define it as a collection of elements (data) and a collection of actions (functions), such as printing and multiplying one vector by another.

An object type is a specification. As such, it is an abstract definition. It simply says what kinds of data and actions an object that is declared to be of that type can have. An object is a realization of a type. When we say that x is an object of type vector, we are creating a concrete object of type vector. By concrete we mean that R actually assigns memory to the object and we can assign data to it.

Suppose that we define a function print() for the object type vector. We also define objects of type matrix and a print() for it. Next, we say that x is an object of type vector and y is an object of type matrix. When we say print(x), the software knows that we are calling print() for vectors by context; that is, it knows we are asking for print() for vectors because x is of type vector. If we type print(y) then print()for matrices is invoked.

As you may guess, the whole approach can become much more syntactically involved, but we will not pursue it further. Instead, let us get back to R and see how all of this applies. Say we define a vector to be a collection of numbers:

> x <- 1 : 10 and a matrix

> y <- cbind(letters[1 : 4], LETTERS[1 : 4])

We can print x and y by simply saying

> x

[1] 1 2 3 4 5 6 7 8 9 10

and

> y

[,1] [,2]

[1,] "a" "A"

[2,] "b" "B"

[3,] "c" "C"

[4,] "d" "D"

By the assignments above, R knows that we wish to create a vector x and a matrix y.

When we say y, R knows that we wish to print y and it invokes the matrix print() function because y is an object of type matrix. To convince yourself that this in fact is the case, try this:

> x <- 1 : 10 ; x

[1] 1 2 3 4 5 6 7 8 9 10

> print(x)

[1] 1 2 3 4 5 6 7 8 9 10

Observe that x and print(x) produce identical results; in other words, the statement xand the function-call print(x) are one and the same.

Of course we can have object types that are more complicated than the atomic types vector and matrix. Both are atomic because they must contain a single mode—

strings of character only, numbers, or logical values. Lists and data frames are complex objects. Lists, for example, may consist of a collection of objects of any type (mode), including lists.

This, then, is the story of objects—behind every object lurks a type.

1.5.2 Object attributes

Object attributes can be examined and set with various functions: mode(), attributes(), attr(), typeof(), dim() (for dimension) and dimnames() (for dimen-sion names). Instead of defining object attributes, we shall discuss these functions.

Here we discuss mode(), is.x () and as.x () where x is the object type. The other functions to set and explore object attributes will be discussed when needed.

The functions mode(), is.object () and as.object ()

The mode attribute of an object is obtained with the function mode():

> x <- 1 : 5 ; mode(x) [1] "numeric"

> x <- c('a', 'b', 'c') ; mode(x) [1] "character"

> x <- c(TRUE, FALSE) ; mode(x) [1] "logical"

Objects 27

Here are the modes we will deal with:

> mode(mean) ; mode(1) ; mode(c(TRUE, FALSE)) [1] "function"

[1] "numeric"

[1] "logical"

> mode(letters) [1] "character"

Any of these can be created, tested and set (coerced) with the functions “mode name”,

“is” and “as”. Setting a mode from one to another is called coercion. Beware of coercion. If the coercion is not well defined (for example, attempting to change the mode of a vector from character to numeric), R will go through the coercion but will set all the elements to NA. Keep in mind that objects of types other than vector also have functions mode(), is.x () and as.x (), where x stands for the object type.

For example, x may be matrix, list or data.frame. Then, mode(), is.list() and as.list() parallel mode(), is.vector() and as.vector(). All of these functions take the object name as an argument.

We follow with some examples. Generalizing these examples to R’s rules of coercion and naming is immediate. First, we create vectors of various modes:

> logical(3) # a vector of 3 logical elements [1] FALSE FALSE FALSE

> (x <- numeric(3)) # a vector of 3 numeric values [1] 0 0 0

> (x <- integer(3)) # a vector of 3 integers [1] 0 0 0

> (x <- character(3)) # a vector of 3 empty strings [1] "" "" ""

Here we test a vector of mode logical for its mode:

> x <- c(TRUE, FALSE, TRUE, FALSE)

> # test for mode:

> is.logical(x) ; is.numeric(x) ; is.integer(x) [1] TRUE

[1] FALSE [1] FALSE

> is.character(x) [1] FALSE

Here we coerce a logical vector to numeric and character modes:

> as.numeric(x) ; as.character(x) [1] 1 0 1 0

[1] "TRUE" "FALSE" "TRUE" "FALSE"

Here we test a numeric vector for its mode:

> (x <- runif(3, 0, 20)) [1] 0.97567 0.14065 12.31121

> is.numeric(x) ; is.integer(x) ; is.character(x)

[1] TRUE [1] FALSE [1] FALSE

> is.logical(x) [1] FALSE

Here we coerce a numeric vector (x above) to integer, character and logical modes:

> as.integer(x) ; as.character(x) ; as.logical(x) [1] 0 0 12

[1] "0.975672248750925" "0.140645201317966" "12.3112069442868"

[1] TRUE TRUE TRUE

> # integer is numeric but numeric is not an integer

> is.numeric(as.integer(x)) [1] TRUE

The code indicates that TRUE is coerced to 1 and FALSE to 0. Note that in the coer-cion from numeric to character, R attempts to produce x in its internal representation, hence the added decimal digits to the numeric strings above. Exact internal represen-tation is not guaranteed. So you may lose precision in the process of

> as.numeric(as.character(x))

(where x is originally numeric) due to rounding errors.

Length

This object attribute is obtained with the function length():

> (x <- c(1 : 5, 8)) ; length(x) [1] 1 2 3 4 5 8

[1] 6

length()applies to matrix, data.frame and list objects as well.

1.6 Programming

Like other programming languages, R includes the usual conditional execution, loops and such constructs. In this section, we discuss these constructs briefly. Because of its rich collection of functions and packages and because of its object oriented approach, we will avoid programming in R as much as possible. There are, however, situations where we will need to rely on programming.

1.6.1 Execution controls

Occasionally, we need to execute some statements based on some condition. On other occasions we need to repeat execution.

Programming 29

Conditional execution

Conditional execution is accomplished with the if else idiom. It has the following syntax

if (test) {

executes something } else {

executes something else }

test must return a logical value. If the result of test is TRUE then R executes something (a collection of zero or more statements), otherwise, R executes some-thing else (also a collection of zero or more statements). Here is an example

> x <- TRUE

> if(x) y <- 1 else y <- 0

> y [1] 1

Note that because there are single statements following if and else, we do not need to group them with braces{}. If you do not need the alternative for if, you can drop the else.

> x <- FALSE ; if(!x){y <- 1 ; z <- 2} ; y ; z [1] 1

[1] 2

You can use & and | or && and || with if to accomplish more elaborate tests than shown thus far. The operators & and | apply to vectors element-wise. The operators

&&and|| apply to the first element of vectors.

Repetitive execution and break

To repeat execution, you can use the loop statements for, repeat and while. While you are within a repetitive execution you can break out of the loop with the break statement. Here is an example:

1 > x <- as.logical(as.integer(runif(5, 0, 2))) ; x

2 [1] FALSE FALSE FALSE FALSE TRUE

3 > y <- vector() ; y

4 logical(0)

5 > for(i in 1 : length(x)){if(x[i]){y[i] <- 1}

6 + else {y[i] <- 0}}

7 > y

8 [1] 0 0 0 0 1

The first line produces a 5-element logical vector with randomly dispersed TRUE FALSE values. To see this, we parse the innermost statement and then move out (always follow this approach to analyze code). First, we use runif(5, 0, 2) to produce 5 random

numbers between 0 and 2. All the numbers between 0 and 2 have the same probability of occurrence. It so happens that the first 4 were less than 1 and the last one was greater than 1. Once these numbers are produced, they are coerced into integers. So the first 4 are turned into 0 and the last into 1. Next, the integers are coerced into logical values. By now we know that 0 is turned into FALSE and 1 into TRUE. Finally we assign these 5 numbers to x. The assignment generates a vector of mode logical.

In the third line, we create an empty vector y. By default, its mode is logical. If we assign data of any other mode to y, then y will be coerced into the appropriate mode automatically. In the fifth line we use both for and if. We add the braces to clarify the execution groupings. We repeat the loop for i in the sequence 1 : 5 where 5 is the length of x. Now inside the loop, if x[i] is TRUE, then y[i] is set to 1. Otherwise, it is set to zero. Because the first 4 elements of x are FALSE, the first four elements of yare set to 0. The result can be achieved with fewer statements, but here we do not intend to be unduly terse.

As we progress with our study of statistics and R, we shall meet loops and execution controls again. Please be aware that because R is object oriented, you can accomplish many tasks without having to resort to loops. Avoid loops whenever you can. The execution will be faster and less prone to errors. Here is how the previous example is done with vectors.

> x

[1] FALSE FALSE FALSE FALSE TRUE

> (y <- vector()) logical(0)

[1] 0

> ifelse(x, y <- 1, y <- 0) [1] 0 0 0 0 1

The function ifelse(a, b, c) executes, element by element, b[i] if a[i] is TRUE and c[i] if a[i] is FALSE.

To use R efficiently, you should avoid using loops. There are numerous functions that help, but without motivation, it makes little sense to talk about them now. We shall meet these functions when we need them.

1.6.2 Functions

R has a rich set of functions. Before deciding to write a function of your own, see if one that does what you need already exists (refer to Section 1.7 for more details).

Occasionally, you may need to write your own functions.

A function has a name and zero or more arguments. It has a body and often returns values. So the general form of a function is

function.name <- function(arguments){

body and return values }

Here is a simple example:

> dumb <- function(){1}

> dumb() [1] 1

Programming 31

dumb()takes no arguments and when called it returns 1. Another example:

> dumber <- function(x){x + 1}

> dumber <- function(x){x + 1}

In document Statistics and Data With R (Page 36-52)