6.2 Understanding Types, Classes, and Coercion
6.2.4 As-Dot Coercion Functions
You’ve seen different ways to modify an object after it’s been created—by accessing and overwriting elements, for example. But what about the struc- ture of the object itself and the type of data contained within?
Converting from one object or data type to another is referred to as
coercion. Like other features of R you’ve met so far, coercion is performed
either implicitly or explicitly. Implicit coercion occurs automatically when
elements need to be converted to another type in order for an operation to complete. In fact, you’ve come across this behavior already, in Section 4.1.4, for example, when you used numeric values for logical values. Remember that logical values can be thought of as integers—one forTRUEand zero forFALSE. Implicit coercion of logical values to their numeric counterparts occurs in lines of code like this:
R> 1:4+c(T,F,F,T) [1] 2 2 3 5
In this operation, R recognizes that you’re attempting an arithmetic cal- culation with+, so it expects numeric quantities. Since the logical vector is not in this form, the software internally coerces it to ones and zeros before completing the task.
Another frequent example of implicit coercion is whenpasteandcatare used to glue together character strings, as explored in Section 4.2.2. Non- character entries are automatically coerced to strings before the concatena- tion takes place. Here’s an example:
R> foo <- 34 R> bar <- T
R> paste("Definitely foo: ",foo,"; definitely bar: ",bar,".",sep="") [1] "Definitely foo: 34; definitely bar: TRUE."
Here, the integer34and the logicalTare implicitly coerced to characters since R knows the output ofpastemust be a string.
In other situations, coercion won’t happen automatically and must be carried out by the user. This explicit coercion can be achieved with the
as-dot functions. Like is-dot functions, as-dot functions exist for most typical
R data types and object classes. The previous two examples can be coerced explicitly, as follows. R> as.numeric(c(T,F,F,T)) [1] 1 0 0 1 R> 1:4+as.numeric(c(T,F,F,T)) [1] 2 2 3 5 R> foo <- 34 R> foo.ch <- as.character(foo) R> foo.ch [1] "34" R> bar <- T R> bar.ch <- as.character(bar) R> bar.ch [1] "TRUE"
R> paste("Definitely foo: ",foo.ch,"; definitely bar: ",bar.ch,".",sep="") [1] "Definitely foo: 34; definitely bar: TRUE."
Coercions are possible in most cases that “make sense.” For example, it’s easy to see why R is able to read something like this:
R> as.numeric("32.4") [1] 32.4
However, the following conversion makes no sense:
R> as.numeric("g'day mate") [1] NA
Warning message:
NAs introduced by coercion
Since there is no logical way to translate “g’day mate” into numbers, the entry is returned asNA(in this case, R has also issued a warning mes- sage). This means that in certain cases, multiple coercions are needed to attain the final result. Suppose, for example, you have the character vector
c("1","0","1","0","0")and you want to coerce it to a logical-valued vector. Direct character to logical coercion is not possible, because even if all the character strings contained numbers, there is no guarantee in general that they would all be ones and zeros.
R> as.logical(c("1","0","1","0","0")) [1] NA NA NA NA NA
However, you know that character string numbers can be converted to a numeric data type, and you know that ones and zeros are easily coerced to logicals. So, you can perform the coercion in those two steps, as follows:
R> as.logical(as.numeric(c("1","0","1","0","0"))) [1] TRUE FALSE TRUE FALSE FALSE
Not all data-type coercion is entirely straightforward. Factors, for example, are trickier because R treats the levels as integers. In other words, regardless of how the levels of a given factor are actually labeled, the soft- ware will refer to them internally as level 1, level 2, and so on. This is clear if you try to coerce a factor to a numeric data type.
R> baz <- factor(x=c("male","male","female","male")) R> baz
[1] male male female male Levels: female male
R> as.numeric(baz) [1] 2 2 1 2
Here, you see that R has assigned the numeric representation of the fac- tor in the stored order of the factor labels (alphabetic by default). Level 1 refers tofemale, and level 2 refers tomale. This example is simple enough,
though it’s important to be aware of the behavior since coercion from fac- tors with numeric levels can cause confusion.
R> qux <- factor(x=c(2,2,3,5)) R> qux [1] 2 2 3 5 Levels: 2 3 5 R> as.numeric(qux) [1] 1 1 2 3
The numeric representation of the factorquxisc(1,1,2,3). This high- lights again that the levels ofquxare simply treated as level 1 (even though it has a label of2), level 2 (which has a label of3), and level 3 (which has a label of5).
Coercion between object classes and structures can also be useful. For example, you might need to store the contents of a matrix as a single vector.
R> foo <- matrix(data=1:4,nrow=2,ncol=2) R> foo [,1] [,2] [1,] 1 3 [2,] 2 4 R> as.vector(foo) [1] 1 2 3 4
Note thatas.vectorhas coerced the matrix by “stacking” the columns into a single vector. The same column-wise deconstruction occurs for higher-dimensional arrays, in order of layer or block.
R> bar <- array(data=c(8,1,9,5,5,1,3,4,3,9,8,8),dim=c(2,3,2)) R> bar , , 1 [,1] [,2] [,3] [1,] 8 9 5 [2,] 1 5 1 , , 2 [,1] [,2] [,3] [1,] 3 3 8 [2,] 4 9 8 R> as.matrix(bar) [,1] [1,] 8 [2,] 1 [3,] 9
[4,] 5 [5,] 5 [6,] 1 [7,] 3 [8,] 4 [9,] 3 [10,] 9 [11,] 8 [12,] 8 R> as.vector(bar) [1] 8 1 9 5 5 1 3 4 3 9 8 8
You can see thatas.matrixstores the array as a 12 × 1 matrix, and as.vectorstores it as a single vector. Similar commonsense rules for data types apply to coercion when working with object structures. For example, coercing the following listbazto a data frame produces an error:
R> baz <- list(var1=foo,var2=c(T,F,T),var3=factor(x=c(2,3,4,4,2))) R> baz $var1 [,1] [,2] [1,] 1 3 [2,] 2 4 $var2
[1] TRUE FALSE TRUE $var3
[1] 2 3 4 4 2 Levels: 2 3 4 R> as.data.frame(baz)
Error in data.frame(var1 = 1:4, var2 = c(TRUE, FALSE, TRUE), var3 = c(1L, : arguments imply differing number of rows: 2, 3, 5
The error occurs because the variables do not have matching lengths. But there is no problem with coercing the listqux, shown here, which has equal-length members: R> qux <- list(var1=c(3,4,5,1),var2=c(T,F,T,T),var3=factor(x=c(4,4,2,1))) R> qux $var1 [1] 3 4 5 1 $var2
[1] TRUE FALSE TRUE TRUE 124 Chapter 6
$var3 [1] 4 4 2 1 Levels: 1 2 4 R> as.data.frame(qux)
var1 var2 var3 1 3 TRUE 4 2 4 FALSE 4 3 5 TRUE 2 4 1 TRUE 1
This stores the variables as a data set in a column-wise fashion, in the order that your list supplies them as members.
This discussion on object classes, data types, and coercion is not exhaustive, but it serves as a useful introduction to how R deals with issues surrounding the formal identification, description, and handling of the objects you create—issues that are present for most high-level languages. Once you’re more familiar with R, the help files (such as the one accessed by entering?asat the prompt) provide further details about object handling in the software.
Exercise 6.3
a. Identify the class of the following objects. For each object, also state whether the class is explicitly or implicitly defined. i. foo <- array(data=1:36,dim=c(3,3,4))
ii. bar <- as.vector(foo)
iii. baz <- as.character(bar)
iv. qux <- as.factor(baz)
v. quux <- bar+c(-0.1,0.1)
b. For each object defined in (a), find the sum of the result of callingis.numericandis.integeron it separately. For example,
is.numeric(foo)+is.integer(foo)would compute the sum for (i). Turn the collection of five results into a factor with levels0,1, and2, identified by the results themselves. Compare this factor vector with the result of coercing it to a numeric vector. c. Turn the following:
[,1] [,2] [,3] [,4] [1,] 2 5 8 11 [2,] 3 6 9 12 [3,] 4 7 10 13
into the following:
[1] "2" "5" "8" "11" "3" "6" "9" "12" "4" "7" "10" "13"
d. Store the following matrix: 34 0 1 23 1 2 33 1 1 42 0 1 41 0 2 Then, do the following:
i. Coerce the matrix to a data frame.
ii. As a data frame, coerce the second column to be logical- valued.
iii. As a data frame, coerce the third column to be factor-valued.
Important Code in This Chapter
Function/operator Brief description First occurrence Inf,-Inf Value for ±infinity Section 6.1.1, p. 104 is.infinite Element-wise check forInf Section 6.1.1, p. 105 is.finite Element-wise check for finiteness Section 6.1.1, p. 105 NaN Value for invalid numerics Section 6.1.2, p. 106 is.nan Element-wise check forNaN Section 6.1.2, p. 107 NA Value for missing observation Section 6.1.3, p. 108 is.na Element-wise check forNAORNaN Section 6.1.3, p. 109 na.omit Delete allNAs andNaNs Section 6.1.3, p. 110 NULL Value for “empty” Section 6.1.4, p. 110 is.null Check forNULL Section 6.1.4, p. 111 attributes List explicit attributes Section 6.2.1, p. 114 attr Obtain specific attribute Section 6.2.1, p. 115 dimnames Get array dimension names Section 6.2.1, p. 116 class Get object class (S3) Section 6.2.2, p. 117 is. Object-checking functions Section 6.2.3, p. 120 as. Object-coercion functions Section 6.2.4, p. 121