RANDALL E. SCHUMACKER is a Professor of Educational Research at The University of Alabama, where he teaches courses in structural equation modeling, multivariate statistics, multiple regression, and program evaluation. His research interests are varied, including modeling interaction in SEM, robust statistics (nor-mal scores, centering and variance inflation factor issues), specification search issues as well as measurement model issues related to estimation, mixed-item for-mats, and reliability. He has taught several international and national workshops on structural equation modeling.
Randall has written and co-edited several books, including A Beginner’s Guide to Structural Equation Modeling (third edition); Advanced Structural Equation Modeling: Issues and Techniques; Interaction and Non-Linear Effects in Structural Equation Modeling; New Developments and Techniques in Structural Equation Modeling; Understanding Statistical Concepts Using S-PLUS; Understanding Statistics Using R; Learning Statistics Using R; and Using R with Multivariate Statistics.
Randall has published in several journals including Academic Medicine, Educational and Psychological Measurement, Journal of Applied Measurement, Journal of Educational and Behavioral Statistics, Journal of Research Methodology, Multiple Linear Regression Viewpoints, and Structural Equation Modeling. He has served on the editorial boards of numerous journals and is a member of the American Educational Research Association, Past-President of the Southwest Educational Research Association, and Emeritus Editor of Structural Equation Modeling and Multiple Linear Regression Viewpoints.
Dr. Schumacker was the 1996 recipient of the Outstanding Scholar Award, and the 1998 recipient of the Charn Oswachoke International Award. In 2010, he
abouT The auThors
xxi
launched the DecisionKit App for the iPhone and iPad, which can assist research-ers in making decisions about which measurement, research design, or statistic to use in their research projects. In 2011, he received the Apple iPad Award, and in 2012, he received the CIT Faculty Technology Award. In 2013, he received the McCrory Faculty Excellence in Research Award from the College of Education at the University of Alabama. In 2014, Dr. Schumacker was the recipient of the Structural Equation Modeling Service Award at the American Educational Research Association, where he founded the Structural Equation Modeling Special Interest Group. He can be contacted at The University of Alabama, College of Education, P.O. Box 870231, 316 Carmichael Hall, Tuscaloosa, AL 35487-0231, USA or by e-mail at [email protected].
RICHARD G. LOMAX is a Professor in the Department of Educational Studies at The Ohio State University. He received his Ph.D. in Educational Research Methodology from the University of Pittsburgh. His research focuses on mod-els of literacy acquisition, multivariate statistics, analysis of variance, and assess-ment. He has thrice served as a Fulbright Scholar and is a Fellow of the American Educational Research Association. Richard can be contacted at The Ohio State University, College of Education and Human Ecology, 153 Arps Hall, 1945 N. High Street, Columbus, OH 43210, USA or by e-mail at [email protected].
This page intentionally left blank
Chapter 1
INTRODUCTION
CHAPTER CONCEPTS
What is structural equation modeling?
History of structural equation modeling Why conduct structural equation modeling?
Structural equation modeling software
WHAT IS STRUCTURAL EQUATION MODELING?
Structural equation modeling (SEM) depicts relations among observed and latent variables in various types of theoretical models, which provide a quantitative test of a hypothesis by the researcher. Basically, various theoretical models are hypoth-esized and tested in SEM. The SEM models hypothesize how sets of variables define constructs and how these constructs are related to each other. In SEM, the construct is called a latent variable. For example, an educational researcher might hypothesize that a student’s home environment influences her later achievement in school. A marketing researcher may hypothesize that consumer trust in a cor-poration leads to increased product sales for that corcor-poration. A health care pro-fessional might believe that a good diet and regular exercise reduces the risk of a heart attack.
In each example, based on theory and empirical research, the researcher wants to test whether a set of variables define the constructs that are hypothesized to be related in a certain way. The goal of SEM is to test whether the theoretical model is supported by sample data. If the sample data support the theoretical model, then the hypothesized relations exist amongst the constructs. If the sample data do not support the theoretical model, then either an alternative model will need to be specified and tested, or another theoretical model hypothesized and tested. Consequently, SEM tests theoretical models using the scientific method of
A Beginner’s guide to structurAl equAtion Modeling
2
hypothesis testing to advance our understanding of the complex relations among constructs.
SEM can test various types of theoretical models. The first types discussed in this book include regression, path, and confirmatory factor models (CFA), which form the basis for understanding the many different types of SEM models. The regres-sion models use observed variables, while path models can use either observed variables or latent variables. CFA models by definition use observed variables to define latent variables; however, second-order CFA models test relations using additional latent variables. Therefore, these two types of variables, observed vari-ables and latent varivari-ables, are used depending upon the type of SEM model.
Latent variables (constructs or factors) are variables that are not directly observed or measured. Latent variables are indirectly observed or measured, and hence are inferred from a set of observed variables that we actually measure using tests, sur-veys, scales, and so on. For example, intelligence is a latent variable that represents a psychological construct. The confidence of consumers in American business is another latent variable, one representing an economic construct. The physical condition of adults is a third latent variable, one representing a health-related construct.
The observed variables (measured or indicator) are a set of variables that we use to define or infer the latent variable or construct. For example, the Wechsler Intelligence Scale for Children—Revised (WISC-R) is an instrument that pro-duces a measured variable (scores), which is used to infer the construct of a child’s intelligence. Additional indicator variables of intelligence tests would be used to indicate or define the construct of intelligence (latent variable). The Dow Jones index is a standard measure of the American corporate economy construct.
Other indicator variables could include gross national product, retail sales, and export sales. Blood pressure is an indicator of a health-related latent variable that could be defined as Fitness. Other indicator variables could be exercise and diet.
Researchers use several indicator variables to define a latent variable. The CFA measurement model tests whether the indicator variables are a good fit in defining the latent variable. The SEM structural model then tests the hypothesized rela-tions amongst the latent variables.
Latent variables are defined as either independent variables or dependent variables.
An independent variable is a variable that is not manipulated or influenced by any other variable in the model. A dependent variable is a variable that is influenced by other variables in the model. The researcher specifies the independent and depend-ent variables. The educational researcher hypothesizes that a studdepend-ent’s home envir-onment (independent latent variable) influences school achievement (dependent
introduction
3
latent variable). The marketing researcher believes that consumer trust in a cor-poration (independent latent variable) leads to increased product sales (dependent latent variable). The health care professional wants to determine whether a good diet, regular exercise, and physiology (independent latent variable) influences the frequency of heart attacks (dependent latent variable).
The basic SEM models (regression, path, and CFA) illustrate the use of observed variables and latent variables which are defined as independent or dependent in the model. A regression model consists solely of observed variables where a sin-gle dependent observed variable is predicted or explained by one or more inde-pendent observed variables; for example, a parent’s income level (indeinde-pendent observed variable) is used to predict his or her child’s achievement score (depend-ent observed variable). A path model can also be specified (depend-entirely with observed variables, but the flexibility allows for multiple independent observed variables and multiple dependent observed variables; for example, export sales, gross national product, and NASDAQ index (independent observed variables) influence con-sumer trust and concon-sumer spending (dependent observed variables). Path mod-els test more complex modmod-els than regression modmod-els, include direct and indirect effects, and can include latent variables. For example, home environment, school environment, and relations with peers (independent latent variables) can explain student achievement and student–teacher relations (dependent latent variables).
Confirmatory factor models consist of observed variables that are hypothesized to measure both the independent and dependent latent variables; for example, diet, exercise, and physiology are observed measures of the independent latent variable, Fitness, while blood pressure, cholesterol, and stress are observed measures of the dependent latent variable, Heart Attack Proneness. As another example, an inde-pendent latent variable (home environment) influences a deinde-pendent latent vari-able (achievement), where both types of latent varivari-ables are measured by multiple observed (indicator) variables.
HISTORY OF STRUCTURAL EQUATION MODELING
To discuss the history of structural equation modeling, we explain the following four types of related models and their chronological order of development: regres-sion, path, confirmatory factor, and structural equation modeling.
The first model involves linear regression models that use a correlation coefficient and the least squares criterion to compute regression weights. Regression models were made possible because Karl Pearson created a formula for the correlation coefficient in 1896 that provided an index for the relation between two variables (Pearson, 1938). The regression model permits the prediction of dependent
A Beginner’s guide to structurAl equAtion Modeling
4
observed variable scores (Y scores), given a linear weighting of a set of independ-ent observed scores (X scores). The linear weighting of the independindepend-ent variables is done using regression coefficients, which are determined based on minimizing the sum of squared residual error values. The selection of the regression weights is therefore based on the Least Squares Criterion. The mathematical basis for the linear regression model is found in basic algebra. Regression analysis provides a test of a theoretical model that may be useful for prediction, for example, admis-sion to graduate school or budget projections. Delucchi (2006), for example, used regression analysis to predict student exam scores in statistics (dependent variable) from a series of collaborative learning group assignments (independent variables).
The results provided some support for collaborative learning groups improving statistics of exam performance, although not for all tasks.
Some years later, Charles Spearman (1904, 1927) used the correlation coeffi-cient to determine which items correlated or went together to create a measure of general intelligence. His basic idea was that if a set of items correlated or went together, individual responses to the set of items could be summed to yield a score that would measure or define a construct. Spearman was the first to use the term factor analysis in defining a two-factor construct for a theory of intelli-gence. D. N. Lawley and L. L. Thurstone in 1940 further developed applications of factor models, and proposed instruments (sets of items) that yielded observed scores from which constructs could be inferred. Most of the aptitude, achieve-ment, and diagnostic tests, surveys, and inventories in use today were created using factor analytic techniques. The term confirmatory factor analysis (CFA) is used today based in part on earlier work by Howe (1955), Anderson and Rubin (1956), and Lawley (1958). The CFA method was more fully developed by Karl Jöreskog in the 1960s to test whether a set of items defined a construct. Jöreskog completed his dissertation in 1963, published the first article on CFA in 1969, and subsequently helped develop the first CFA software program. Factor analysis has been used for more than 100 years to create measurement instruments in many academic disciplines. CFA today uses observed variables derived from measure-ment instrumeasure-ments to test the existence of a theoretical construct. Goldberg (1990) used CFA to confirm the Big Five model of personality. His five-factor model of extraversion, agreeableness, conscientiousness, neuroticism, and intellect was confirmed through the use of multiple indicator variables for each of the five hypothesized constructs.
The path model was developed by Sewell Wright (1918, 1921, 1934), a biologist.
Path models use correlation coefficients and multiple regression equations to model more complex relations amongst observed variables. The first application of path models dealt with animal behavior. Unfortunately, path analysis was largely over-looked until econometricians reconsidered it in the 1950s as a form of simultaneous
introduction
5
equation modeling (Wold, 1954) and sociologists rediscovered it in the 1960s (Duncan, 1966) and 1970s (Blalock, 1972). A path analysis involves solving a set of simultaneous regression equations that theoretically establish the relations amongst the observed variables in the path model. Parkerson et al. (1984) con-ducted a path analysis to test Walberg’s theoretical model of educational product-ivity for fifth- through eighth-grade students. The relations amongst the following variables were analyzed in a single model: home environment, peer group, media, ability, social environment, time on task, motivation, and instructional strategies.
All of the hypothesized paths among those variables were shown to be statistically significant, providing support for the educational productivity path model.
Structural equation models (SEM) combine path models and confirmatory fac-tor models when establishing hypothesized relations amongst latent variables. The early development of SEM models was due to Karl Jöreskog (1969, 1973), Ward Keesling (1972), and David Wiley (1973); the approach was initially known as the JKW model, but became known as the linear structural relations model (LISREL) with the development of the first software program, LISREL, in 1973.
Jöreskog and van Thillo originally developed the LISREL software program at the Educational Testing Service (ETS) using a matrix command language that used Greek and matrix notation. The first publicly available version, LISREL III, was released in 1976. By 1993, LISREL8 was released; it introduced the SIMPLIS (SIMPle LISrel) command language in which equations were written using variable names. In 1999, the first interactive version of LISREL was released. LISREL8 introduced the dialog box interface using pull-down menus and point-and-click features to develop models. The path diagram mode permitted drawing a program to develop models. LISREL9 has since been released with new features to address categorical and continuous variables. Karl Jöreskog was recognized by Cudeck, DuToit, and Sörbom (2001) who edited a Festschrift in honor of his contributions to the field of structural equation modeling. Their volume contains chapters by scholars who addressed the many topics, concerns, and applications in the field of structural equation modeling today, including milestones in factor analysis; meas-urement models; robustness, reliability, and fit assessment; repeated measmeas-urement designs; ordinal data; and interaction models.
The field of structural equation modeling across all disciplines has expanded since 1994. Hershberger (2003) found that between 1994 and 2001 the number of journal articles concerned with SEM increased, the number of journals publish-ing SEM research increased, SEM became a popular choice amongst multivari-ate methods, and the journal Structural Equation Modeling became the primary source for technical developments in structural equation modeling, and continues so today. SEM research articles are now more prevalent than ever in professional
A Beginner’s guide to structurAl equAtion Modeling
6
journals of several different academic disciplines (medicine, psychology, business, education, etc.).
WHY CONDUCT STRUCTURAL EQUATION MODELING?
Why is structural equation modeling popular? There are at least four major rea-sons for the popularity of SEM. First, researchers are becoming more aware of the need to use multiple observed variables to investigate their area of scientific inquiry. Basic statistical methods only utilize a limited number of independent and dependent variables, and thus do not test theoretical relations amongst mul-tiple variables. The use of a small number of variables to understand complex phenomena is limited. For instance, the bivariate correlation is not sufficient for examining prediction when using multiple variables in a regression equation. In contrast, structural equation modeling permits relations amongst multiple vari-ables to be modeled and statistically tested. SEM techniques are therefore a more preferred method to confirm (or disconfirm) theoretical models.
Second, a greater recognition has been given to the validity and reliability of observed scores from measurement instruments. Specifically, measurement error has become a major issue in many disciplines, although historically, statistical ana-lysis of data has ignored measurement error in the anaana-lysis of data. Structural equation modeling techniques explicitly take measurement error into account when statistically analyzing data. SEM analysis includes latent and observed vari-ables with their associated measurement error terms in the many different SEM models.
Third, structural equation modeling has matured over the past 40 years, espe-cially the software programs and the ability to analyze more advanced theoretical SEM models. For example, group differences in theoretical models can be tested with multiple-group SEM models. The analysis of educational data collected at more than one level (school districts, schools, and teachers) with student data is now possible using multi-level SEM models. SEM models are no longer limited to linear relations; interaction terms can now be included in an SEM model so that main effects and interaction effects can be tested. The improvement in SEM software has led to advanced SEM models and techniques, which have provided researchers with an increased capability to analyze sophisticated theoretical mod-els of complex phenomena.
Fourth, SEM software programs have become increasingly user-friendly. Until 1993, SEM modelers had to input the program syntax for their models using Greek and matrix notation. At that time, many researchers sought help because of
introduction
7
the complex programming and knowledge of the SEM syntax that was required.
Today, SEM software programs are easier to use and contain features similar to other Windows-based software packages, for example, pull-down menus, data spreadsheets, and a simple set of commands. However, the user-friendly SEM software, which permitted more access by researchers, comes with concerns about proper usage. Researchers need the pre-requisite training in statistics and specific-ally in SEM modeling. Fortunately, there are courses, workshops, and textbooks one can acquire to avoid mistakes and errors in analyzing sophisticated theoretical models using SEM. Which SEM software program you choose may also influence your level of expertise and what type of SEM models can be analyzed.
STRUCTURAL EQUATION MODELING SOFTWARE
Structural equation modeling can be easily understood if the researcher has a grounding in basic statistics, correlation, regression, and path analysis. Some SEM software programs provide a pull-down menu with these capabilities, while others come included in a statistics package where they can be computed. Although the LISREL program was the first SEM software program, other software pro-grams have subsequently been developed since the mid-1980s. These include EQS, Mplus, Mx, R, Proc Calis (SAS), AMOS (SPSS), Sepath (Statistica), and SEM (STATA), to name a few. These software programs are each unique in their own way, with some offering specialized features for conducting different SEM applica-tions. Many of these SEM software programs provide statistical analysis of raw data (means, correlations, missing data conventions), provide routines for hand-ling missing data and detecting outliers, generate the program’s syntax, diagram the model, and provide for import and export of data and figures of a theoretical model. Also, many of the SEM software programs come with sets of data and pro-gram examples that are clearly explained in their user guides. Many of these soft-ware programs have been reviewed in the journal Structural Equation Modeling.
The pricing information for SEM software varies depending on individual, group, or site license arrangements; corporate versus educational settings; and even
The pricing information for SEM software varies depending on individual, group, or site license arrangements; corporate versus educational settings; and even