• No results found

Add-ons: Libraries and packages

1.7.1 Introduction to libraries and packages

Additional functionality is added through packages, which consist of libraries of bundled functions, datasets, examples and help files that can be downloaded from CRAN. The function install.packages() or the windowing interface under Packages and Data must be used to download and install packages. The library() function can be used to load a previously installed package (that has been previously made available through use of the install.packages() function). As an example, to install and load the Hmisc package, two commands are needed:

install.packages("Hmisc") library(Hmisc)

Once a package has been installed, it can be loaded whenever a new session is run by executing the function library(libraryname). A package only needs to be installed once for a given version of R.

If a package is not installed, running the library() command will yield

1.7. ADD-ONS: LIBRARIES AND PACKAGES 19 an error. Here we try to load the Zelig package (which had not yet been installed).

> library(Zelig)

Error in library(Zelig) : there is no package called 'Zelig'

> install.packages("Zelig")

trying URL 'http://cran.stat.auckland.ac.nz/cran/bin/macosx/

leopard/contrib/2.10/Zelig_3.4-5.tgz'

Content type 'application/x-gzip' length 14460464 bytes (13.8 Mb) opened URL

==================================================

downloaded 13.8 Mb

The downloaded packages are in

/var/folders/Tmp/RtmpXaN7Kk/downloaded_packages

> library(Zelig)

Loading required package: MASS Loading required package: boot

## Zelig (Version 3.4-5, built: 2009-03-13)

## Please refer to http://gking.harvard.edu/zelig for full

## documentation or help.zelig() for help with commands and

## models supported by Zelig.

##

## To cite individual Zelig models, please use the citation

## format printed with each model run and in the documentation.

A user can test whether a package is loaded by running require(packagename);

this will load the library if it is installed, and generate an error message if it is not. The update.packages() function should be run periodically to ensure that packages are up-to-date.

As of February 2010, there were 2172 packages available from CRAN [20].

While each of these has met a minimal standard for inclusion, it is important to keep in mind that packages are created by individuals or small groups, and not endorsed by the R core group. As a result, they do not necessarily undergo the same level of testing and quality assurance that the core R system does.

Hadley Wickham’s Crantastic (http://crantastic.org) is a community site that reviews and tags CRAN packages.

1.7.2 CRAN task views

A very useful resource for finding packages are the Task Views on CRAN (http:

//cran.r-project.org/web/views). These are listings of relevant packages

within a particular application area (such as multivariate statistics, psycho-metrics, or survival analysis). Table 1.1 displays the Task Views available as of January 2010.

Bayesian Bayesian inference

ChemPhys Chemometrics and computational physics Clinical Trials Design, monitoring, and analysis of clinical trials Cluster Cluster analysis & finite mixture models

Distributions Probability distributions Econometrics Computational econometrics

Environmetrics Analysis of ecological and environmental data Experimental Design Design and analysis of experiments

Finance Empirical finance

Genetics Statistical genetics

Graphics Graphic displays, dynamic graphics, graphic devices, and visualization

gR Graphical models in R

HPC High-performance and parallel computing with R Machine Learning Machine and statistical learning

Medical Imaging Medical image analysis Multivariate Multivariate statistics

NLP Natural language processing

Optimization Optimization and mathematical programming Pharmacokinetics Analysis of pharmacokinetic data

Psychometrics Psychometric models and methods Robust Robust statistical methods

Social Sciences Statistics for the social sciences Spatial Analysis of spatial data

Survival Survival analysis Time Series Time series analysis

Table 1.1: CRAN Task Views

1.7.3 Installed libraries and packages

Running the command library(help="libraryname")) will display informa-tion about an installed package (assuming that it has been installed). Entries in the book that utilize packages include a line specifying how to access that library (e.g., library(foreign)). Vignettes showcase the ways that a package can be useful in practice: the command vignette() will list installed vignettes.

As of January 2010, the R distribution comes with the following packages.

base Base R functions datasets Base R datasets

1.7. ADD-ONS: LIBRARIES AND PACKAGES 21 grDevices Graphics devices for base and grid graphics

graphics R functions for base graphics

grid A rewrite of the graphics layout capabilities

methods Formally defined methods and classes, plus programming tools splines Regression spline functions and classes

stats R statistical functions

stats4 Statistical functions using S4 classes

tcltk Interface and language bindings to Tcl/Tk GUI elements tools Tools for package development and administration utils R utility functions

These are available without having to run the library() command and are effectively part of the base system.

1.7.4 Recommended packages

A set of recommended packages are to be included in all binary distributions of R. As of January 2010, these included the following list.

KernSmooth Functions for kernel smoothing (and density estimation) MASS Functions and datasets from the main package of Venables and Ripley,

“Modern Applied Statistics with S”

Matrix A Matrix package

boot Functions and datasets for bootstrapping

class Functions for classification (k-nearest neighbor and LVQ) cluster Functions for cluster analysis

codetools Code analysis tools

foreign Functions for reading and writing data stored by statistical software like Minitab, S, SAS, SPSS, Stata, Systat, etc.

lattice Lattice graphics, an implementation of Trellis Graphics functions mgcv Routines for GAMs and other generalized ridge regression problems

nlme Fit and compare Gaussian linear and nonlinear mixed-effects models nnet Software for single hidden layer perceptrons (“feed-forward neural

net-works”) and for multinomial log-linear models rpart Recursive partitioning and regression trees

spatial Functions for kriging and point pattern analysis from MASS survival Functions for survival analysis, including penalized likelihood

1.7.5 Packages referenced in the book

Other packages referenced in the book but not included in the R distribution are listed below (to see more information about a particular package, run the command citation(package="packagename").

amer Additive mixed models with lme4

chron Chronological objects which can handle dates and times circular Circular statistics

coda Output analysis and diagnostics for MCMC

coin Conditional inference procedures in a permutation test framework dispmod Dispersion models

ellipse Functions for drawing ellipses and ellipse-like confidence regions elrm Exact logistic regression via MCMC

epitools Epidemiology tools

exactRankTests Exact distributions for rank and permutation tests frailtypack Frailty models using maximum penalized likelihood estimation gam Generalized additive models

gee Generalized estimation equation solver GenKern Functions for kernel density estimates

ggplot2 An implementation of the Grammar of Graphics gmodels Various R programming tools for model fitting gtools Various R programming tools

Hmisc Harrell miscellaneous functions

1.7. ADD-ONS: LIBRARIES AND PACKAGES 23 irr Various coefficients of interrater reliability and agreement

lars Least angle regression, lasso and forward stagewise lme4 Linear mixed-effects models using S4 classes lmtest Testing linear regression models

lpSolve Interface to Lp solve v. 5.5 to solve linear/integer programs maps Draw geographical maps

Matching Propensity score matching with balance optimization MCMCpack Markov Chain Monte Carlo (MCMC) package mice Multivariate imputation by chained equations

mitools Tools for multiple imputation of missing data

mix Multiple imputation for mixed categorical and continuous data multcomp Simultaneous inference in general parametric models multilevel Multilevel functions

nnet Feed-forward neural networks and multinomial log-linear models nortest Tests for normality

odfWeave Sweave processing of Open Document Format (ODF) files plotrix Various plotting functions

plyr Tools for splitting, applying and combining data prettyR Pretty descriptive stats

pscl Political science computational laboratory, Stanford University pwr Basic functions for power analysis

quantreg Quantile regression Rcmdr R Commander

RColorBrewer ColorBrewer palettes reshape Flexibly reshape data rms Regression modeling strategies

RMySQL R interface to the MySQL database

ROCR Visualizing the performance of scoring classifiers RSQLite SQLite interface for R

scatterplot3d 3D scatterplot sos Search help pages of R packages

sqldf Perform SQL selects on R data frames survey Analysis of complex survey samples

tmvtnorm Truncated multivariate normal distribution vcd Visualizing categorical data

VGAM Vector generalized linear and additive models XML Tools for parsing and generating XML within R Zelig Everyone’s statistical software [31]

Many of these must be installed and loaded prior to use (see install.packages(), require(), library() and Section 1.7.1). To facilitate this process, we have created a script file to load those needed to replicate the example code in one step (see 1.2.1).

1.7.6 Datasets available with R

A number of datasets are available within the datasets package that is included in the R distribution. The data() function lists these, while the package option can be used to specify datasets from within a specific package.

Related documents