1.7.1 Introduction to libraries and packages
Additional functionality is added through packages, which consist of libraries of bundled functions, datasets, examples and help files that can be downloaded from CRAN. The function install.packages() or the windowing interface under Packages and Data must be used to download and install packages. The library() function can be used to load a previously installed package (that has been previously made available through use of the install.packages() function). As an example, to install and load the Hmisc package, two commands are needed:
install.packages("Hmisc") library(Hmisc)
Once a package has been installed, it can be loaded whenever a new session is run by executing the function library(libraryname). A package only needs to be installed once for a given version of R.
If a package is not installed, running the library() command will yield
1.7. ADD-ONS: LIBRARIES AND PACKAGES 19 an error. Here we try to load the Zelig package (which had not yet been installed).
> library(Zelig)
Error in library(Zelig) : there is no package called 'Zelig'
> install.packages("Zelig")
trying URL 'http://cran.stat.auckland.ac.nz/cran/bin/macosx/
leopard/contrib/2.10/Zelig_3.4-5.tgz'
Content type 'application/x-gzip' length 14460464 bytes (13.8 Mb) opened URL
==================================================
downloaded 13.8 Mb
The downloaded packages are in
/var/folders/Tmp/RtmpXaN7Kk/downloaded_packages
> library(Zelig)
Loading required package: MASS Loading required package: boot
## Zelig (Version 3.4-5, built: 2009-03-13)
## Please refer to http://gking.harvard.edu/zelig for full
## documentation or help.zelig() for help with commands and
## models supported by Zelig.
##
## To cite individual Zelig models, please use the citation
## format printed with each model run and in the documentation.
A user can test whether a package is loaded by running require(packagename);
this will load the library if it is installed, and generate an error message if it is not. The update.packages() function should be run periodically to ensure that packages are up-to-date.
As of February 2010, there were 2172 packages available from CRAN [20].
While each of these has met a minimal standard for inclusion, it is important to keep in mind that packages are created by individuals or small groups, and not endorsed by the R core group. As a result, they do not necessarily undergo the same level of testing and quality assurance that the core R system does.
Hadley Wickham’s Crantastic (http://crantastic.org) is a community site that reviews and tags CRAN packages.
1.7.2 CRAN task views
A very useful resource for finding packages are the Task Views on CRAN (http:
//cran.r-project.org/web/views). These are listings of relevant packages
within a particular application area (such as multivariate statistics, psycho-metrics, or survival analysis). Table 1.1 displays the Task Views available as of January 2010.
Bayesian Bayesian inference
ChemPhys Chemometrics and computational physics Clinical Trials Design, monitoring, and analysis of clinical trials Cluster Cluster analysis & finite mixture models
Distributions Probability distributions Econometrics Computational econometrics
Environmetrics Analysis of ecological and environmental data Experimental Design Design and analysis of experiments
Finance Empirical finance
Genetics Statistical genetics
Graphics Graphic displays, dynamic graphics, graphic devices, and visualization
gR Graphical models in R
HPC High-performance and parallel computing with R Machine Learning Machine and statistical learning
Medical Imaging Medical image analysis Multivariate Multivariate statistics
NLP Natural language processing
Optimization Optimization and mathematical programming Pharmacokinetics Analysis of pharmacokinetic data
Psychometrics Psychometric models and methods Robust Robust statistical methods
Social Sciences Statistics for the social sciences Spatial Analysis of spatial data
Survival Survival analysis Time Series Time series analysis
Table 1.1: CRAN Task Views
1.7.3 Installed libraries and packages
Running the command library(help="libraryname")) will display informa-tion about an installed package (assuming that it has been installed). Entries in the book that utilize packages include a line specifying how to access that library (e.g., library(foreign)). Vignettes showcase the ways that a package can be useful in practice: the command vignette() will list installed vignettes.
As of January 2010, the R distribution comes with the following packages.
base Base R functions datasets Base R datasets
1.7. ADD-ONS: LIBRARIES AND PACKAGES 21 grDevices Graphics devices for base and grid graphics
graphics R functions for base graphics
grid A rewrite of the graphics layout capabilities
methods Formally defined methods and classes, plus programming tools splines Regression spline functions and classes
stats R statistical functions
stats4 Statistical functions using S4 classes
tcltk Interface and language bindings to Tcl/Tk GUI elements tools Tools for package development and administration utils R utility functions
These are available without having to run the library() command and are effectively part of the base system.
1.7.4 Recommended packages
A set of recommended packages are to be included in all binary distributions of R. As of January 2010, these included the following list.
KernSmooth Functions for kernel smoothing (and density estimation) MASS Functions and datasets from the main package of Venables and Ripley,
“Modern Applied Statistics with S”
Matrix A Matrix package
boot Functions and datasets for bootstrapping
class Functions for classification (k-nearest neighbor and LVQ) cluster Functions for cluster analysis
codetools Code analysis tools
foreign Functions for reading and writing data stored by statistical software like Minitab, S, SAS, SPSS, Stata, Systat, etc.
lattice Lattice graphics, an implementation of Trellis Graphics functions mgcv Routines for GAMs and other generalized ridge regression problems
nlme Fit and compare Gaussian linear and nonlinear mixed-effects models nnet Software for single hidden layer perceptrons (“feed-forward neural
net-works”) and for multinomial log-linear models rpart Recursive partitioning and regression trees
spatial Functions for kriging and point pattern analysis from MASS survival Functions for survival analysis, including penalized likelihood
1.7.5 Packages referenced in the book
Other packages referenced in the book but not included in the R distribution are listed below (to see more information about a particular package, run the command citation(package="packagename").
amer Additive mixed models with lme4
chron Chronological objects which can handle dates and times circular Circular statistics
coda Output analysis and diagnostics for MCMC
coin Conditional inference procedures in a permutation test framework dispmod Dispersion models
ellipse Functions for drawing ellipses and ellipse-like confidence regions elrm Exact logistic regression via MCMC
epitools Epidemiology tools
exactRankTests Exact distributions for rank and permutation tests frailtypack Frailty models using maximum penalized likelihood estimation gam Generalized additive models
gee Generalized estimation equation solver GenKern Functions for kernel density estimates
ggplot2 An implementation of the Grammar of Graphics gmodels Various R programming tools for model fitting gtools Various R programming tools
Hmisc Harrell miscellaneous functions
1.7. ADD-ONS: LIBRARIES AND PACKAGES 23 irr Various coefficients of interrater reliability and agreement
lars Least angle regression, lasso and forward stagewise lme4 Linear mixed-effects models using S4 classes lmtest Testing linear regression models
lpSolve Interface to Lp solve v. 5.5 to solve linear/integer programs maps Draw geographical maps
Matching Propensity score matching with balance optimization MCMCpack Markov Chain Monte Carlo (MCMC) package mice Multivariate imputation by chained equations
mitools Tools for multiple imputation of missing data
mix Multiple imputation for mixed categorical and continuous data multcomp Simultaneous inference in general parametric models multilevel Multilevel functions
nnet Feed-forward neural networks and multinomial log-linear models nortest Tests for normality
odfWeave Sweave processing of Open Document Format (ODF) files plotrix Various plotting functions
plyr Tools for splitting, applying and combining data prettyR Pretty descriptive stats
pscl Political science computational laboratory, Stanford University pwr Basic functions for power analysis
quantreg Quantile regression Rcmdr R Commander
RColorBrewer ColorBrewer palettes reshape Flexibly reshape data rms Regression modeling strategies
RMySQL R interface to the MySQL database
ROCR Visualizing the performance of scoring classifiers RSQLite SQLite interface for R
scatterplot3d 3D scatterplot sos Search help pages of R packages
sqldf Perform SQL selects on R data frames survey Analysis of complex survey samples
tmvtnorm Truncated multivariate normal distribution vcd Visualizing categorical data
VGAM Vector generalized linear and additive models XML Tools for parsing and generating XML within R Zelig Everyone’s statistical software [31]
Many of these must be installed and loaded prior to use (see install.packages(), require(), library() and Section 1.7.1). To facilitate this process, we have created a script file to load those needed to replicate the example code in one step (see 1.2.1).
1.7.6 Datasets available with R
A number of datasets are available within the datasets package that is included in the R distribution. The data() function lists these, while the package option can be used to specify datasets from within a specific package.