object.
1.5.7 Attributes and classes
Objects have a set of associated attributes (such as names of variables, di-mensions, or classes) which can be displayed or sometimes changed. While a powerful concept, this can often be initially confusing. For example, we can find the dimension of the matrix defined in Section 1.5.5.
> attributes(A)
$dim [1] 2 3
Other types of objects include lists (ordered objects that are not necessarily rectangular, Section 1.5.4), regression models (objects of class lm), and formulas (e.g., y ∼ x1 + x2). Examples of the use of formulas can be found in Sections 3.4.1 and 4.1.1.
Many objects have an associated Class attribute, which cause that object to inherit (or take on) properties depending on the class. Many functions have special capabilities when operating on a particular class. For example, when summary() is applied to a lm object, the summary.lm() function is called, while summary.aov() is called when an aov object is given as argument. The class() function returns the classes to which an object belongs, while the methods() function displays all of the classes supported by a function (e.g., methods(summary)).
The attributes() command displays the attributes associated with an object, while the typeof() function provides information about the object (e.g., logical, integer, double, complex, character, and list).
1.5.8 Options
The options() function can be used to change various default behaviors, for example, the default number of digits to display in output can be specified using the command options(digits=n) where n is the preferred number (see 2.13.1). The command help(options) lists all of the other settable options.
1.6 Built-in and user-defined functions
1.6.1 Calling functions
Fundamental actions are carried out by calling functions (either built-in or user-defined), as seen previously. Multiple arguments may be given, separated by commas. The function carries out operations using these arguments using a series of predefined expressions, then returns values (an object such as a vector or list) that are displayed (by default) or saved by assignment to an object.
As an example, the quantile() function takes a vector and returns the minimum, 25th percentile, median, 75th percentile and maximum, though if an optional vector of quantiles is given, those are calculated instead.
> vals = rnorm(1000) # generate 1000 standard normals
> quantile(vals)
0% 25% 50% 75% 100%
-3.1180 -0.6682 0.0180 0.6722 2.8629
> quantile(vals, c(.025, .975)) 2.5% 97.5%
-2.05 1.92
Return values can be saved for later use.
> res = quantile(vals, c(.025, .975))
> res[1]
2.5%
-2.05
Options are available for many functions. These are named arguments for the function, and are generally added after the other arguments, also separated by commas. The documentation specifies the default action if named argu-ments (options) are not specified. For the quantile() function, there is a type() option which allows specification of one of nine algorithms for calculat-ing quantiles. Settcalculat-ing type=3 specifies the “nearest even order statistic” option, which is the default for some other packages.
res = quantile(vals, c(.025, .975), type=3)
Some functions allow a variable number of arguments. An example is the paste() function (see usage in 2.4.6). The calling sequence is described in the documentation in the following manner.
paste(..., sep=" ", collapse=NULL)
To override the default behavior of a space being added between elements out-put by paste(), the user can specify a different value for sep (see 7.1.2).
1.6.2 Writing functions
One of the strengths of R is its extensibility, which is facilitated by its program-ming interface. A new function (here named newfun) is defined in the following way.
newfun = function(arglist) body
1.6. BUILT-IN AND USER-DEFINED FUNCTIONS 17 The body is made up of a series of commands (or expressions), enclosed between an opening { and a closing }. Here, we demonstrate a function to calculate the estimated confidence interval for a mean from Section 3.1.7.
# calculate a t confidence interval for a mean ci.calc = function(x, ci.conf=.95) {
sampsize = length(x)
tcrit = qt(1-((1-ci.conf)/2), sampsize) mymean = mean(x)
mysd = sd(x)
return(list(civals=c(mymean-tcrit*mysd/sqrt(sampsize), mymean+tcrit*mysd/sqrt(sampsize)),
ci.conf=ci.conf)) }
Here the appropriate quantile of the t distribution is calculated using the qt() function, and the appropriate confidence interval is calculated and returned as a list. The function is stored in the object ci.calc, which can then be run interactively on our vector from Section 1.5.1.
> ci.calc(x)
$civals
[1] 0.6238723 12.0427943
$ci.conf [1] 0.95
If only the lower confidence interval is needed, this can be saved as an object.
> lci = ci.calc(x)$civals[1]
> lci
[1] 0.6238723
The default confidence level is 95%; this can be changed by specifying a different value as the second argument.
> ci.calc(x, ci.conf=0.90)
$civals
[1] 1.799246 10.867421
$ci.conf [1] 0.9
This is equivalent to running ci.calc(x, 0.90). Other sample programs can be found in Sections 2.4.22 and 3.6.4 as well as Chapter 7.
1.6.3 The apply family of functions
Operations are most efficiently carried out using vector or list operations rather than looping. The apply() function can be used to perform many actions.
While somewhat subtle, the power of the vector language can be seen in this example. The apply() command is used to calculate column means or row means of the previously defined matrix in one fell swoop.
> A
[,1] [,2] [,3]
[1,] 5 9 -4
[2,] 7 13 8
> apply(A, 2, mean) [1] 6 11 2
> apply(A, 1, mean) [1] 3.333333 9.333333
Option 2 specifies that the mean should be calculated for each column, while option 1 calculates the mean of each row. Here we see some of the flexibility of the system, as functions (such as mean()) are also objects that can be passed as arguments to functions.
Other related functions include lapply(), which is helpful in avoiding loops when using lists, sapply() (see 2.3.2), and mapply() to do the same for dataframes and matrices, respectively, and tapply() (see 3.1.2) to perform an action on subsets of an object.