• No results found

Applications of Models for Longitudinal and Multilevel Data in R and Stan. ICPSR Summer Program

N/A
N/A
Protected

Academic year: 2022

Share "Applications of Models for Longitudinal and Multilevel Data in R and Stan. ICPSR Summer Program"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Applications of Models for Longitudinal and Multilevel Data in R and Stan ICPSR Summer Program

Georges Monette John Fox

McMaster University York University

JULY 17: Introduction to the R Statistical Computing Environment JULY 18-21: Applications of Models for Longitudinal and Multilevel Data

SECTION 1

Introduction to the R Statistical Computing Environment: One-Day Workshop

John Fox ICPSR Training Program

McMaster University Summer, 2017

The R statistical programming language and computing environment is the de-facto standard among statisticians for writing statistical software and has become very popular in other fields, including the social sciences: It is now possibly the most widely used statistical software in the world. R is a free, open-source implementation and extension of the S language, and is available for Windows, Mac OS X, and Unix/Linux systems. The substantial capabilities of the basic R software are augmented by nearly 10,000

contributed R “packages” for various statistical methods, freely available on the Comprehensive R Archive Network (CRAN) <https://cran.r-project.org/>.

This one-day workshop provides a basic introduction to R and RStudio, which is a sophisticated, and free, editor (“interactive development environment” or IDE)

customized for R. The goal of the workshop is to prepare participants who are unfamiliar with R for the subsequent four-day workshop on longitudinal and multilevel modeling.

Most, but not all, of the material for the R workshop is drawn from Fox and Weisberg, An R Companion to Applied Regression, Second Edition, and from the third edition of this book, which is in preparation. Topics to be covered in the workshop include:

1. Getting started with R and RStudio

2. Workflow in R and R Studio: Enabling reproducible research

(2)

3. Reading and manipulating data in R 4. Basic statistical graphics in R

5. Fitting and working with linear and generalized linear models in R Lecture Series Web Site

Materials for the workshop will be deposited at

<http://socserv.socsci.mcmaster.ca/jfox/Courses/R/York-R-course/>, abbreviation

< tinyurl.com/York-R-course>, which also has links to a variety of resources.

Acquiring R and RStudio

R, RStudio, and Stan are all free software, available on the internet. Stan implements state-of-art methods for Bayesian inference, and may be accessed through R via the rstan package. Instructions for installing R, RStudio, and Stan are on the workshop web site at

<http://socserv.socsci.mcmaster.ca/jfox/Courses/R/York-R-course/R-install- instructions.html>, with a link on the workshop home page.

Selected Bibliography

Publishers of statistical texts have been producing a steady stream of books on R. Of particular note is Springer's Use R! series <http://www.springer.com/series/6991> and Chapman and Hall/CRC’s The R Series

<http://www.crcpress.com/browse/series/crctherser>.

For a more extensive bibliography, see the syllabus for my R lectures at the ICPSR Summer Program in Ann Arbor.

Basic Text

A principal reference for this workshop is J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage (2011), but you should be able to follow the workshop without reading the book. Additional materials are available on the web site for the book <http://socserv.mcmaster.ca/jfox/Books/Companion/index.html>,

including several appendices (on multivariate linear models, structural-equation models, mixed models, survival analysis, and more). As mentioned, a third edition of this book is in preparation. The book is associated with the car and effects packages.

Manuals

R is distributed with a set of manuals, which are also available at the CRAN web site

<https://cran.r-project.org/manuals.html>.

A great deal of information about using the RStudio interactive development environment is available on the RStudio website at

<https://support.rstudio.com/hc/en-us> (see under “Documentation”).

(3)

Mixed-Effects Models in R

Also see the package listing on CRAN <https://cran.r-

project.org/web/packages/index.html> and the Bayesian Inference and Statistics for the Social Sciences CRAN “task views”

<https://cran.r-project.org/web/views/index.html>.

A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. Rubin, Bayesian Data Analysis, Third Edition. Boca Raton: CRC/Chapman & Hall, 2013. More

demanding than McElreath’s text, described below, this is a tour-de-force exposition of Bayesian methods, including for mixed-effects models. An appendix to the text explains how to use R and Stan for Bayesian inference. Andrew Gelman and Aki Vehtari are among the developers of Stan.

A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press, 2007. A wide-ranging and accessible yet deep treatment of hierarchical models and various related topics, predominantly but not exclusively from a Bayesian perspective, using both R and BUGS software.

R. McElreath, Statistical Retrinking: A Bayesian Course with Examples in R and Stan.

Boca Raton: CRC/Chapman & Hall, 2016. The title is reasonably descriptive of this very readable introduction to modern Bayesian methods. The use of R and Stan in the book is somewhat idiosyncratic, employing the author’s rethinking package, which is freely available but not from CRAN.

J. C. Pinheiro and D. M. Bates, Mixed-Effects Models in S and S-PLUS. New York:

Springer, 2000. An extensive treatment of linear and nonlinear mixed-effects models in S, focused on the authors' nlme package. Does not cover Bates, Maechler, and Bolker’s newer lme4 package.

W. N. Venables and B. D. Ripley. Modern Applied Statistics with S, Fourth Edition. New York: Springer, 2002. An influential and wide-ranging treatment of data analysis using S and R, including a chapter on mixed-effects models. Many of the facilities described in the book are programmed in the associated (and very useful) MASS, nnet, and spatial packages, which are included in the standard R distribution. This text is more advanced and has a broader focus than the R Companion. I once considered the MASS book the best moderately advanced reference on statistical data analysis in S and R. The book is still very useful, but it is showing its age.

(4)

SECTION 2

Applications of Models for Longitudinal and Multilevel Data: Four-Day Workshop

Georges Monette ICPSR Training Program

York University Summer, 2017

In the past 25 years, there have been enormous advances in statistical methods for the analysis of complex data. As tools become more powerful, researchers tackle data

structures and ask research questions that would have been impossible to broach not long ago.

For example, with multilevel and longitudinal data it is possible to perform analyses that get as close to valid causal inferences as is possible short of working with a randomized experiment. (Arjas and Parner 2004; Morgan and Winship 2014).

Most of the leading methods for longitudinal data analysis have limitations: they may be appropriate only for normally distributed responses variables, or they may allow

categorical response variables but provide limited possibilities for random effects and dependencies over time. The most general methods use Markov Chain Monte Carlo methods (MCMC) but until recently the implementation of these methods was forbidding for most researchers.

A recent approach to MCMC, known as Hamiltonian Monte Carlo, has been used to create a new modeling environment, the Stan modelling language (Carpenter et al. 2017), that is computationally efficient and whose use is accessible to researchers who are not statistical specialists.

This course does not assume any prior experience with hierarchical or longitudinal models, nor with MCMC or other Bayesian methods.

The first two days are devoted to an in-depth study of the classical methods used for normally distributed responses. After a review of statistical concepts using graphs and geometry as tools for statistical reasoning (Friendly, Monette, and Fox 2013), we follow the presentation of models for longitudinal data in well established textbooks

(Raudenbush and Bryk 2002; Singer and Willett 2003; Snijders and Boskers 2011). We will learn to implement these methods with the ‘nlme’ package in R (Pinheiro et al.

2017).

We will also consider interesting questions related to the causal interpretation of longitudinal models. (Morgan and Winship 2014; Raudenbush 2001).

In the last two days we will learn to use the Stan modelling language (Carpenter et al.

2017) to extend the methods learned in the first two days. We begin with an overview of Bayesian approaches to statistical inference and then apply Stan to a broad range of situations:

1. binary, multinomial and count responses, e.g. overdispersed Poisson

2. multivariate responses, including hurdle models, zero-inflated models and models

(5)

with mixed continuous and categorical responses, 3. models with measurement error in predictors,

4. models with responses at one point in time that are treated as covariates for the response at the next point in time.

Throughout the course, conceptual and theoretical discussions will be interwoven with practical examples and exercises in which you use the methods on your own computer, thus ensuring that you will be well equipped with the tools you need to use the methods explored in the course.

References:

Arjas, Elja, and Jan Parner. 2004. “Causal Reasoning from Longitudinal Data,”

Scandinavian Journal of Statistics 31 (2): 171–87. doi:10.1111/j.1467-9469.2004.02- 134.x.

Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017.

“Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1): 1–

32. doi:10.18637/jss.v076.i01.

Friendly, Michael, Georges Monette, and John Fox. 2013. “Elliptical Insights:

Understanding Statistical Methods through Elliptical Geometry.” Statistical Science 28 (1): 1–39. doi:10.1214/12-STS402.

Morgan, Stephen L, and Christopher Winship. 2014. Counterfactuals and causal inference. Cambridge University Press.

Pinheiro, Jose, Douglas Bates, Saikat DebRoy, Deepayan Sarkar, and R Core Team.

2017. nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R- project.org/package=nlme.

Raudenbush, Stephen W, and Anthony S Bryk. 2002. Hierarchical Linear Models:

Applications and Data Analysis Methods. Sage.

Raudenbush, Stephen W. 2001. “Comparing Personal Trajectories and Drawing Causal Inferences from Longitudinal Data.” Annual Review of Psychology 52 (1): 501–25.

doi:10.1146/annurev.psych.52.1.501.

Singer, Judith D, and John B Willett. 2003. Applied Longitudinal Data Analysis:

Modeling Change and Event Occurrence. Oxford University Press.

Snijders, Tom AB, and Roel J. Bosker. 2012. Multilevel Analysis. Springer.

References

Related documents

Patients and methods: Between 03/2009 and 10/2012 98 consecutive patients aged &gt;55 years presenting with breast cancer (invasive cancer: n = 95, ductal carcinoma in situ (DCIS): n

elements for the reduced network of 20 state leaders of G20 states in the Enwiki network with short names at both axes. The weight of this matrix component without diagonal is. Wqrnd

Da bi se dobio uvid u recepciju rezultata istraživanja objavljenih u časopisu Migracijske i etničke teme , provedena je analiza posredne citiranosti preko časopisa indeksiranih

Transport projects are often appraised using social cost benefit analysis (SCBA), in which the future welfare benefits and costs are estimated for the expected lifetime

Bottled Hand Sanitizer ● Effectively kills 99.99% of germs without soap or water. Enriched with moisturizers - aloe vera and vitamin E ● Case Quantity

We also consider model-based nowcasts (current-quarter forecasts); for GDP and inflation, the model uses the Bayesian mixed frequency formulation of Carriero, Clark, and

More importantly from the perspective of our interest in Merleau- Ponty, they write that &#34;This symbolic aspect of [Dreyfus'] interpretation plays a major role in his rejection

(Indeed the events in the chain must cause each other directly, unless we are to posit an infinity of links in every causal chain.) The necessary condition which Able’s throw meets