• No results found

Logical Record Subsets

In document The Book of R (Page 134-138)

5.2 Data Frames

5.2.3 Logical Record Subsets

In Section 4.1.5, you saw how to use logical flag vectors to subset data struc- tures. This is a particularly useful technique with data frames, where you’ll often want to examine a subset of entries that meet certain criteria. For example, when working with data from a clinical drug trial, a researcher might want to examine the results for just male participants and compare them to the results for females. Or the researcher might want to look at the characteristics of individuals who responded most positively to the drug.

Let’s continue to work withmydata. Say you want to examine all records corresponding to males. From Section 4.3.1, you know that the following line will identify the relevant positions in thesexfactor vector:

R> mydata$sex=="M"

[1] TRUE FALSE FALSE TRUE TRUE TRUE

This flags the male records. You can use this with the matrix-like syntax you saw in Section 5.2.1 to get the male-only subset.

R> mydata[mydata$sex=="M",] person age sex funny age.mon 1 Peter 42 M High 504 4 Chris 14 M Med 168 5 Stewie 1 M High 12 6 Brian 7 M Med 84

This returns data for all variables for only the male participants. You can use the same behavior to pick and choose which variables to return in the subset. For example, since you know you are selecting the males only, you could omitsexfrom the result using a negative numeric index in the column dimension.

R> mydata[mydata$sex=="M",-3] person age funny age.mon 1 Peter 42 High 504 4 Chris 14 Med 168 5 Stewie 1 High 12 6 Brian 7 Med 84

If you don’t have the column number or if you want to have more con- trol over the returned columns, you can use a character vector of variable names instead.

R> mydata[mydata$sex=="M",c("person","age","funny","age.mon")] person age funny age.mon

1 Peter 42 High 504 4 Chris 14 Med 168 5 Stewie 1 High 12 6 Brian 7 Med 84

The logical conditions you use to subset a data frame can be as simple or as complicated as you need them to be. The logical flag vector you place in the square brackets just has to match the number of records in the data frame. Let’s extract frommydatathe full records for individuals who are more than 10 years old OR have a high degree of funniness.

R> mydata[mydata$age>10|mydata$funny=="High",] person age sex funny age.mon

1 Peter 42 M High 504 2 Lois 40 F High 480 3 Meg 17 F Low 204 4 Chris 14 M Med 168 5 Stewie 1 M High 12

Sometimes, asking for a subset will yield no records. In this case, R returns a data frame with zero rows, which looks like this:

R> mydata[mydata$age>45,]

[1] person age sex funny age.mon <0 rows> (or 0-length row.names)

In this example, no records are returned frommydatabecause there are no individuals older than 45. To check whether a subset will contain any records, you can usenrowon the result—if this is equal to zero, then no records have satisfied the specified condition(s).

Exercise 5.2

a. Create and store this data frame asdframein your R workspace:

person sex funny

Stan M High Francine F Med Steve M Low Roger M High Hayley F Med Klaus M Med

The variablesperson,sex, andfunnyshould be identical in nature to the variables in themydataobject studied throughout Section 5.2. That is,personshould be a character vector,sex

should be a factor with levelsFandM, andfunnyshould be a factor with levelsLow,Med, andHigh.

b. Stan and Francine are 41 years old, Steve is 15, Hayley is 21, and Klaus is 60. Roger is extremely old—1,600 years. Append these data as a new numeric column variable indframecalledage. c. Use your knowledge of reordering the column variables based

on column index positions to overwritedframe, bringing it in line withmydata. That is, the first column should beperson, the second columnage, the third columnsex, and the fourth columnfunny. d. Turn your attention tomydataas it was left after you included the

age.monvariable in Section 5.2.2. Create a new version ofmydata

calledmydata2by deleting theage.moncolumn.

e. Now, combinemydata2withdframe, naming the resulting object

mydataframe.

f. Write a single line of code that will extract frommydataframejust the names and ages of any records where the individual is female and has a level of funniness equal toMedORHigh.

g. Use your knowledge of handling character strings in R to extract all records frommydataframethat correspond to people whose names start with S. Hint: Recallsubstrfrom Section 4.2.4 (note thatsubstrcan be applied to a vector of multiple character strings).

Important Code in This Chapter

Function/operator Brief description First occurrence list Create a list Section 5.1.1, p. 89 [[ ]] Unnamed member reference Section 5.1.1, p. 90 [ ] List slicing (multiple members) Section 5.1.1, p. 91 $ Get named member/variable Section 5.1.2, p. 92 data.frame Create a data frame Section 5.2.1, p. 96 [ , ] Extract data frame row/columns Section 5.2.1, p. 96

6

S P E C I A L V A L U E S , C L A S S E S ,

A N D C O E R C I O N

You’ve now learned about numeric values,

logicals, character strings, and factors, as

well as their unique properties and applica-

tions. Now you’ll look at some special values

in R that aren’t as well-defined. You’ll see how they

might come about and how to handle and test for

them. Then you’ll look at different data types in R

and some general object class concepts.

6.1 Some Special Values

Many situations in R call for special values. For example, when a data set has missing observations or when a practically infinite number is calculated, the software has some unique terms that it reserves for these situations. These special values can be used to mark abnormal or missing values in vectors, arrays, or other data structures.

In document The Book of R (Page 134-138)