• No results found

The Pennsylvania State University. The Graduate School. Department of Educational Psychology, Counseling, and Special Education

N/A
N/A
Protected

Academic year: 2021

Share "The Pennsylvania State University. The Graduate School. Department of Educational Psychology, Counseling, and Special Education"

Copied!
306
0
0

Loading.... (view fulltext now)

Full text

(1)

The Pennsylvania State University The Graduate School

Department of Educational Psychology, Counseling, and Special Education

THE PERFORMANCE OF MODEL FIT MEASURES BY

ROBUST WEIGHTED LEAST SQUARES ESTIMATORS IN

CONFIRMATORY FACTOR ANALYSIS

A Dissertation in Educational Psychology

by Yu Zhao

 2015 Yu Zhao

Submitted in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

(2)

The dissertation of Yu Zhao was reviewed and approved* by the following:

Pui-Wa Lei

Associate Professor of Education Dissertation Advisor

Chair of Committee

Hoi K. Suen

Distinguished Professor of Educational Psychology

Jonna M. Kulikowich Professor of Education

Aleksandra Slavkovic

Associate Professor of Statistics

Robert J. Stevens

Professor of Educational Psychology

Program Coordinator of Educational Psychology

(3)

iii ABSTRACT

Despite the prevalence of ordinal observed variables in applied structural equation

modeling (SEM) research, limited attention has been given to model evaluation methods suitable for ordinal variables, thus providing practitioners in the field with few guidelines to follow. This dissertation represents a first attempt to thoroughly examine the performance of five fit measures —𝜒2 statistic, Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error

of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR)—produced by the mean- and variance-corrected Weighted Least Squares (WLSMV) estimator from Mplus 7 and the Diagonally Weighted Least Squares (DWLS) estimator from LISREL 9.1, both of which are forms of Robust Weighted Least Squares (RWLS) estimator designed to accommodate ordinal and nonnormal observed variables, in Confirmatory Factor Analysis (CFA) model evaluation, under various realistic sample, data, and model conditions, especially when different types and degrees of model misspecification occur. This study also empirically examined the applicability of the most widely used cut-off criteria of the fit indices proposed by Hu and Bentler (1999) in RWLS estimation with ordinal variables.

Results showed that in evaluating the goodness-of-fit of CFA models with ordinal variables, fit measures generated by Mplus WLSMV seemed to be more effective and reliable than those produced by LISREL DWLS across studied conditions. The WLSMV fit measures generally maintained good Type I error control and were powerful enough to detect moderate model misspecification, provided that the model was not too large. The DWLS fit measures, on the other hand, were susceptible to influences of small sample size and could be largely inflated or deflated when a small sample was used to evaluate a large model. In addition, Hu and Bentler’s (1999) cut-off criteria, despite of their popularity among applied SEM researchers, were not

(4)

iv

universally applicable in RWLS model evaluation, mainly because all of the fit indices examined varied systematically with the size of the proposed model. Recommendations are made by the end of the dissertation, based on the results of the current study, on practical issues pertaining to real-life CFA model evaluation with ordinal observed variables, such as minimum sample size required and how to use information provided by the RWLS fit measures to make model-data fit decisions, while taking into consideration the sample, data, and model characteristics specific to researchers’ own studies.

(5)

v

TABLE OF CONTENTS

List of Figures ... vii

List of Tables ... ix

Acknowledgements ... x

Chapter 1 Introduction ... 1

1.1 Background ... 1

1.2 Factors that affect the performance of the model fit measures ... 12

1.2.1 Estimation methods ... 12

1.2.2 Model misspecification ... 13

1.2.3 Sample size and model size ... 17

1.2.4 Scale and distribution of the observed variables ... 19

1.3 Evaluation of model fit indices ... 20

1.4 Limitations of past research ... 21

1.5 Statement of the problem ... 24

1.6 Hypotheses ... 25

1.7 Significance and contribution of the study ... 26

Chapter 2 Literature Review ... 29

2.1 The Confirmatory Factor Analysis model ... 29

2.1.1 Model specification ... 29

2.1.2 Model estimation ... 30

2.1.3 Model evaluation ... 41

2.2 Empirical study results of the RWLS-based model fit measures ... 47

2.2.1 Findings on model fit measures ... 47

2.2.2 Findings on cut-off criteria for the model fit indices ... 55

2.3 Purpose of the study ... 60

Chapter 3 Method ... 61 3.1 Design factors ... 63 3.1.1 Estimation methods ... 63 3.1.2 Model Size ... 64 3.1.3 Sample size ... 66 3.1.4 Level of nonnormality ... 67

3.1.5 Number of variable categories ... 68

3.1.6 Model misspecification ... 69

3.2 Data generation ... 76

(6)

vi Chapter 4 Results ... 80 4.1 Convergence rates ... 80 4.2 The 𝜒2 statistics ... 90 4.3 CFI ... 99 4.4 TLI ... 106 4.5 RMSEA... 113 4.6 SRMR ... 121 Chapter 5 Discussion ... 128 5.1 Mplus WLSMV vs. LISREL DWLS ... 129

5.2 The RWLS fit statistics and indices ... 131

5.3 Model size effect ... 137

5.4 Cut-off criteria for the fit indices ... 140

5.5 Types and degrees of model misspecification ... 142

5.6 Conclusion ... 144

5.7 Recommendations ... 146

5.8 Examples of applying recommendations in practice ... 148

5.8.1 Example 1: Planning sample size with existing instruments (recommendation #2) ... 150

5.8.2 Example 2: Using 𝜒2 statistic with fit indices to justify decision (recommendation #4) ... 151

5.8.3 Example 3: Treating SRMR with caution (recommendation #5) ... 154

5.9 Limitations and future directions ... 156

Appendix A: Population Parameters of the Models ... 159

Appendix B: Examples of Programming Codes ... 181

Appendix C: Tables of the Convergence Rates, Estimated Mean Values and Rejection Rates of the RWLS Fit Measures ... 184

Appendix D: Examples of RMSEAs from LISREL 9.1, LISREL 8.8, and Mplus 7.0 ... 275

(7)

vii

LIST OF FIGURES

Figure 1-1: Percentage distribution of the 50 CFA studies

that used 2- to 6+-point categorical variables ... 9

Figure 1-2: Percentage distribution of the 50 CFA studies that used different sample sizes ... 18

Figure 1-3: Percentage distribution of the 50 CFA studies that used different N:t ratios ... 18

Figure 3-1: Relative frequency distribution of model sizes (# of indicators) from the 50 CFA studies ... 65

Figure 3-2: Misspecification of the number of factors ... 70

Figure 3-3: Misspecification of the cross-loadings ... 71

Figure 3-4: Misspecification of the error correlations. ... 72

Figure 3-5: Misspecification of the factor correlation. ... 73

Figure 4-1: Mplus convergence rates by MTD and sample size ... 84

Figure 4-2: Mplus convergence rates by variable scale and sample size ... 85

Figure 4-3: Mplus convergence rates by variable distribution and sample size. ... 86

Figure 4-4: LISREL convergence rates by model size, variable scale, and sample size. ... 88

Figure 4-5: LISREL convergence rates by model size, variable distribution, and sample size ... 89

Figure 4-6: Mplus chi-square rejection rates by MTD and sample size ... 94

Figure 4-7: Mplus chi-square rejection rates by model size ... 95

Figure 4-8: LISREL S-B scaled chi-square rejection rates by MTD and sample size ... 97

Figure 4-9: LISREL S-B adjusted chi-square rejection rates by MTD and sample size ... 98

Figure 4-10: Distributions of the Mplus and the LISREL CFIs by studied factors (correct model) ... 100

Figure 4-11: Mplus CFI by MTD and model size ... 103

Figure 4-12: LISREL CFI by MTD and model size ... 105

(8)

viii

Figure 4-14: Distributions of the Mplus and the LISREL TLIs

by studied factors (correct model) ... 107

Figure 4-15: Mplus TLI by MTD and model size ... 111

Figure 4-16: LISREL NNFI by MTD and model size ... 112

Figure 4-17: LISREL NNFI by sample size ... 113

Figure 4-18: Distributions of the Mplus and the LISREL RMSEAs by studied factors (correct model). ... 114

Figure 4-19: Mplus RMSEA by MTD and model size ... 118

Figure 4-20: LISREL RMSEA by model size and sample size ... 119

Figure 4-21: LISREL RMSEA by MTD ... 120

Figure 4-22: Distributions of the Mplus and the LISREL SRMRs by studied factors (correct model). ... 122

Figure 4-23: Mplus SRMR by sample size ... 125

(9)

ix

LIST OF TABLES

Table 1-1: Summary of 50 CFA Empirical Studies ... 4

Table 3-1: Summary of the simulation design ... 63

Table 3-2: List of cut points used in the study ... 77

Table 4-1: Proportion of Variance Due to Studied Factors: Convergence Rates ... 83

Table 4-2: Proportion of Variance Due to Studied Factors: Chi-square Rejection Rates ... 93

Table 4-3: Proportion of Variance Due to Studied Factors: CFI ... 102

Table 4-4: Proportion of Variance Due to Studied Factors: TLI ... 109

Table 4-5: Proportion of Variance Due to Studied Factors: RMSEA ... 117

Table 4-6: Proportion of Variance Due to Studied Factors: SRMR ... 124

(10)

x

ACKNOWLEDGEMENTS

This dissertation would not have been possible without many people. First and foremost, I would like to express my deepest gratitude to my academic and dissertation advisor, and a good friend, Dr. Pui-Wa Lei, for her excellent guidance, support, patience, and encouragement during my entire doctoral study. She not only ignited my interests in education-related methodological and statistical studies, but also saw me through the struggling stages with basic mathematical theories and psychometric concepts. Her guidance was strategic while flexible, allowing me the room to grow as an independent researcher. She believed in my abilities more than I did in myself, and challenged me with higher level academic courses and advanced research topics I would not have otherwise completed. She is also a cheerful friend, who has always been able to look at things from a positive side to enlighten my days. I am so fortunate to have Dr. Lei and could not have asked for a better advisor.

I also owe my thanks to my committee members, Dr. Hoi Suen, Dr. Jonna Kulikowich, and Dr. Aleksandra Slavkovic. Dr. Suen has been a role model for my future career. He is not just an accomplished researcher, but also an amazing instructor, who has the ability to enliven abstract methodological theories and concepts, making his measurement classes fun to attend. He is also a supportive and reassuring friend to me. Dr. Kulikowich constantly inspired me with her passion and persistency on connecting methodological studies to real-life educational research. Dr. Slavkovic is my dissertation committee member as well as my master’s advisor in applied

statistics. She taught me how to be professional and precise not only as an educational researcher, but also as a statistician. Their insightful and constructive feedback on my dissertation and other related research projects, and flexibility to work along a timeline that is most suitable to me are much appreciated.

(11)

xi

Special thanks must go to other professors with whom I have worked closely during my doctoral studies, including Dr. Lisa Lenze, Dr. Paul Morgan, and Khanjan Mehta. They have provided me with invaluable opportunities to apply the methodological theories I have learned into real-life practices. They mentored me with their expertise, treated me with patience, and respected me as an immature but independent researcher. It has been my honor to work with them.

I am indebted to my family and friends for many reasons. I want to thank my parents for supporting me, and each one of my decisions along the way. They saw my doctoral studies and my dissertation as the upmost important endeavor at this stage of my life, and tried their best to help me with other things so that I could focus on this academic goal. My husband, Kang, has been my biggest moral support. He encouraged me when I felt uninspired, comforted me when I was frustrated, complimented me for each small achievement, and embraced me during every little setback. I would not have come so far without his endless love. And my precious daughter Khloe, who has been so sweet and understanding, brightens up my days and brings so much joy to my life since she was born. My thanks also go to the friends I’ve worked with during my doctoral studies and those I have not had a chance to work with but who have always stood by my side. Thank you all for being part of my life.

(12)

xii

To my husband Kang Zhao my daughter Khloe Zhao

my parents Zhihong Sun and Qiusheng Zhao and my son who is still in my belly

(13)

1 Chapter 1

Introduction

Background

Structural equation modeling (SEM) has been widely used in social and behavioral research to model relationships among multiple variables, observed and/or latent. It is a family of versatile statistical modeling tools that encompass exploratory modeling (e.g., exploring direct effects, Asparouhov & Muthén, 2009), confirmatory modeling (e.g., confirming factor structures), cross-sectional modeling (e.g., testing factor invariance across groups), longitudinal modeling (e.g., modeling latent growth over time), and many more. SEM is a covariance structure analysis technique. Covariance structure analysis tests theories with correlated variables that are

represented in a system of equations, which describe the “unidirectional and bidirectional

influences of several variables on each other” (Bentler & Bonett, 1980, p.588). At the heart of the covariance structure analysis, as its name indicates, lays the correlations and covariances of the variables. Researchers derive models from substantive theories and fit models to empirical data to see if their models can explain inter-correlations or covariances among variables.

Confirmatory factor analysis (CFA) is a member of the general SEM family. The

multiple-indicator measurement models as analyzed in CFA represent “half the basic rationale of analyzing covariance structures in SEM” (Kline, 2011, p.230), with the other half being the analysis of structural models. CFA plays an important role in SEM analysis because many

(14)

2

little sense to relate constructs within an SEM model if the factors specified as part of the model are not worthy of further attention” (p.110). The method of CFA involves analyzing measurement models in which not only the number of latent factors, but also their specific relations to the indicators are pre-specified based on substantive theories. Theory plays a paramount role in the specification, testing, and interpretation of the models. Because of the theory-driven nature of the predictions, much stronger inferences can be made from confirmatory rather than exploratory models (Curran, 1994).

CFA is a popular technique that can be used to test a variety of hypothesis about a set of observed variables. For many applied researchers, it is a primary tool for construct validation (e.g., Gupta, Ganster, & Kepes, 2013; McDermott et al., 2011; Prati, 2012; Van Eck, Finney, & Evans, 2010), scale refinement (e.g., Immekus & Imbrie, 2010; Rowe, Kim, Baker, Kamphaus, & Horne, 2010), assessment of measurement invariance (e.g., Libbrecht, Lievens, & Schollaert, 2010; Randall & Engelhard, 2010; Segeritz & Pant, 2013), or even theory development (e.g., Greiff, Wüstenberg, & Funke, 2012; Kahraman, De Champlain, & Raymond, 2012). It is also used extensively in simulation studies (e.g., Beauducel & Herzberg, 2006; DiStefano, 2010; Flora & Curran, 2004; Lei, 2009; Muthén, du Toit, & Spisic, 1997; Yu, 2002, etc.).

Most applications of CFA involve five consecutive steps: model specification,

identification, estimation, evaluation, and (possibly) modification/respecification. Since CFA is inherently a confirmatory modeling technique, relationships between observed variables and their latent constructs need to be specified a priori, using substantive justifications. Pre-specified CFA models have to be identified so that unique set of parameter estimates is possible when the model is estimated. After data is collected, proper estimation methods need to be selected depending on characteristics of the data and the model. Different estimation methods will all generate a set of fit

(15)

3

statistics and indices, which can be used to evaluate if the model fits the empirical data well. If the fit of the model as evaluated by fit statistics and indices is less than acceptable, then either the model needs to be modified post hoc, or a completely different model needs to be re-specified.

Although all steps involved in CFA application are essential to the success of the analysis, arguably, model estimation and evaluation are more important (Bentler & Bonett, 1980; Hu & Bentler, 1999; Yuan, 2005). Traditionally, parameters in structural equation models are estimated by maximum likelihood (ML) estimation and the model-data fit is evaluated by chi-square tests. However, because some of the assumptions required by ML estimation and chi-square tests in SEM, such as multivariate normality of the observed variables and exact fit of the hypothesized model to the population covariance matrix, are not realistic in practice (Bollen, 1989), other estimators (e.g., weighted least squares, or WLS) and fit indices (e.g., approximate fit indices) that have more relaxed assumptions were developed.

Ever since Bentler and Bonett (1980) introduced using fit indices1 to compare and evaluate different models, many fit indices have been developed and examined to serve this purpose. With the advance of such indices, applied SEM researchers started to rely more on fit indices instead of chi-square tests for decisions of model fit. For example, a random review of 50 CFA empirical studies published recently (from 2010-2014) in major journals of educational and psychological research (e.g., Applied Psychological Measurement, Applied Measurement in Education, Educational and Psychological Measurement, Journal of Applied Psychology, Psychological Assessment, etc., see table 1-1) revealed that all of these papers used information provided by the fit indices (e.g., comparative fit index, root mean square of approximation, etc.)

1 Here the “fit indices” refers to the approximate fit indices and do not include the 𝜒2 statistic. The model fit

measures can be roughly classified into two categories: (1) model test statistics, such as the 𝜒2 statistic, which

evaluates the proposed model using statistical hypothesis testing; and (2) fit indices, which are considered mainly as descriptive. The current paper will follow this wording convention to distinguish the fit statistic from the fit indices. See Yuan (2005) for more details.

(16)

4 Table 1-1

Summary of 50 CFA Empirical Studies # First author Year Sample

size Number of indicators Scale of indicators Estimation

method Fit measures Cut-off criteria Software

1. Adelson 2010 303 18 4- & 5-point ML 𝜒

2, CFI, TLI,

RMSEA Amos

2. Benson 2013 730 31 ML 𝜒

2, CFI, RMSEA,

SRMR, AIC, BIC Hu & Bentler, 1999 Amos

3. Berzonsky 2013 440 27 5-point ML 𝜒

2, RMSEA,

SRMR Hu & Bentler, 1999 LISREL

4. Bowden 2011 2200/680 ML/MLM 𝜒2, CFI, TLI, RMSEA, SRMR, ECVI Mplus 5. Brown 2011 1282 14 5-point ML 𝜒 2, CFI, RMSEA,

SRMR Hu & Bentler, 1999 LISREL 6. Burrow-

Sánchez 2014 106 25 & 10 5-point ML

𝜒2, CFI, NFI,

RMSEA Byrne, 2010 Amos

7. Cai 2012 367 40 Dichotomous

& 5-point DWLS

𝜒2, CFI, RMSEA,

SRMR, AIC Hu & Bentler, 1999 LISREL

8. Cieslak 2013 247 7 7-point ML CFI, RMSEA,

SRMR Amos

9. Coates 2013 128 10 5-point WLSMV 𝜒2, CFI, RMSEA Hu & Bentler, 1999 Mplus

10. Curseu 2012 102 25 5-point ML 𝜒

2, CFI, TLI,

RMSEA

Yuan & Bentler,

2004 Amos

11. Dedrick 2011 378 12 5-point ML CFI, RMSEA,

SRMR, BIC Hu & Bentler, 1999 Mplus

12. Dunn 2012 175 8 7-point 𝜒

2, CFI, GFI,

(17)

5 Table 1-1 Continued

# First author Year Sample size Number of indicators Scale of indicators Estimation

method Fit measures Cut-off criteria Software

13. Esbjørn 2013 974 30 4-point WLSMV CFI, TLI, RMSEA

Schreiber, Nora, Stage, Barlow, & King, 2006

Mplus

14. France 2010 350 8 5-point ML 𝜒

2, CFI, RMSEA,

SRMR Hu & Bentler, 1999 LISREL 15. Gagné 2010 1115/529 12 4- & 7-point RML 𝜒2, CFI, RMSEA Ya

.

16. Greiff 2012 114 33 dichotomous WLSMV 𝜒

2, CFI, TLI,

RMSEA, Hu & Bentler, 1999 Mplus

17. Gupta 2013 445 14 5-point 𝜒

2, RMSEA,

SRMR Amos

18. Hardré 2012 130 60 7-point 𝜒2, CFI, RMSEA Schumacker &

Lomax, 2004

19. Immekus 2010 745/1500 20 5-point WLSMV 𝜒

2, CFI, RMSEA,

SRMR Hu & Bentler, 1999 Mplus

20. Joseph 2010 280 16 & 50 5-point 𝜒

2, CFI, TLI,

RMSEA, SRMR LISREL

21. Kahraman 2012 9000 11 9-point ML 𝜒2, CFI, TLI, AIC Y. Mplus

22. Koster 2011 237 8 5-point ML GFI, RMSEA Jaccard & Wan, 1996 LISREL

23. Kuenssberg 2011 110 72 4-point MLM 𝜒

2, CFI, TLI,

RMSEA, SRMR Hu & Bentler, 1999 Mplus

24. Lac 2013 289 17 7-point RML 𝜒 2, CFI, IFI, RMSEA Browne & Cudeck, 1993; MacCallum, Browne, & Sugawara, 1996 EQS

25. Lakin 2012 2631 108 dichotomous WLSMV 𝜒2, CFI, RMSEA Kline, 2004 Mplus

26. Leite 2010 14211 16 dichotomous WLSMV 𝜒

2, CFI, TLI,

(18)

6 Table 1-1 Continued

# First author Year Sample size Number of indicators Scale of indicators Estimation

method Fit measures Cut-off criteria Software

27. Libbrecht 2010 154 16 5-point RML 𝜒 2, CFI, IFI, RMSEA Y. EQS 28. Little 2013 20181 16 4-point WLSMV 𝜒 2, CFI, RMSEA, SRMR Hu & Bentler, 1999; Byrne, 2006 Mplus 29. Loke 2014 275 12 5-point WLSMV 𝜒 2, CFI, TLI,

RMSEA Hu & Bentler, 1999 Mplus

30. Martin 2010 21579 44 7-point RML 𝜒

2, CFI, RMSEA,

SRMR, AIC

Browne & Cudeck,

1993 Mplus

31. McDermott 2011 980 48 3-point CFI, RMSEA Hu & Bentler, 1999

32. Merz 2014 5313 12 4-point MLM 𝜒

2, RMSEA,

SRMR Hu & Bentler, 1999 Mplus 33.

Mesmer-Magnus 2010 390 31 5-point RML

𝜒2, CFI, IFI,

RMSEA Hu & Bentler, 1999 EQS

34. Myers 2010 257-419 18 5-point WLSMV 𝜒

2, CFI, TLI,

RMSEA, SRMR Mplus

35. NG 2010 640 33 5-point

𝜒2, CFI, NNFI, AGFI, PGFI, PNFI, RMSEA, SRMR

LISREL

36. Prati 2012 863 108 4- & 5-point WLSMV 𝜒

2, CFI, NFI,

RMSEA Mplus

37. Randall 2010 569/219 12 dichotomous WLSMV 𝜒2, CFI, RMSEA Byrne, 2006; Kline,

2005 Mplus

38. Rowe 2010 322 26 4-point 𝜒

2, CFI, RMSEA,

SRMR Hu & Bentler, 1999 Amos

39. Ryser 2010 1263/803 39 4-point RML 𝜒

2, CFI, RMSEA,

SRMR Yu & Muthén, 2002 LISREL

40. Schroeders 2011 200 57 dichotomous WLSMV 𝜒

2, CFI, RMSEA,

(19)

7 Table 1-1 Continued

# First author Year Sample size Number of indicators Scale of indicators Estimation

method Fit measures Cut-off criteria Software 41. Segeritz 2013 1200/1400

/13000 45 RML

𝜒2, CFI, TLI,

RMSEA, SRMR

Browne & Cudeck, 2003

42. Su 2014 100 12 Continuous-

summed ML

𝜒2, CFI, IFI, TLI,

RMSEA, SRMR, AIC

Hu & Bentler, 1999 LISREL 43.

Takishima-Lacasa 2014 604 29 5-point WLSMV

𝜒2, CFI, TLI, RMSEA

Hu & Bentler, 1999; Browne & Cudeck, 1993

Mplus

44. Teo 2010 313 18 5-point ML 𝜒

2, CFI, TLI,

RMSEA, SRMR Hu & Bentler, 1999 Amos

45. Thomas 2013 570/478 42 5-point ML 𝜒

2, CFI, RMSEA,

SRMR Hu & Bentler, 1999 LISREL 46. Van den

Broeck 2013 641 50 5-point WLSMV

𝜒2, CFI, TLI,

RMSEA Hu & Bentler, 1999 Mplus

47. Van Eck 2010 250 18 4-point

RML/ WLSM/ WLSMV

𝜒2, CFI, RMSEA Yu & Muthén, 2002 LISREL/

Mplus 48. Williams, M. 2014 940/235/ 424 39 & 26 5-point ML 𝜒2, CFI, NNFI, RMSEA, SRMR, AIC Schermelleh-Engel et al., 2003 SAS 49. Williams, S. 2014 173/184 17 6-point 𝜒2, CFI, RMSEA,

SRMR Weston & Gore, 2006

50. Xu 2010 681/306 15 4-point RML 𝜒

2, CFI, RMSEA,

SRMR

Hu & Bentler, 1999; Browne & Cudeck, 1993; Byrne, 2008

EQS Note. aThis means that the paper employed cut-off values of some sort, but did not specify sources.

(20)

8

to support their model fit decisions. Although most of these papers reported the values of the 𝜒2 statistics and their associated p values, few of them considered the chi-square test as evidence of model-data fit. The advantages of fit indices are that they are single summary measures that are supposed to provide information about the overall model fit. They are easy to understand

because many of them vary along the continuum of 0 to 1. However, despite of these advantages, robustness studies discovered that the performance of fit indices can be differentially affected by factors such as the estimation method used, sample size, scale, and distribution of the variables, and other factors that may not have been completely understood by researchers. The instability of a fit index essentially means that it can give different or even contradictory information of model fit when, for example, different estimation methods are employed to analyze the same model, or a same model is tested with samples of different sizes.

What makes the matter even more complicated is the evaluation of the fit indices. That is, after researchers obtain fit indices from SEM analysis softwares, how do they know if the

number means a “close” fit or not? There are some general guidelines being offered (e.g., Bentler & Bonnett, 1980; Browne & Cudeck, 1993; Hu & Bentler, 1999; Marsh & Hau, 1996; Rigdon, 1996; Yu, 2002) as to the ranges of values that constitute an “acceptable” model fit for different fit indices. These guidelines are employed in most of the SEM applied studies published in major educational and psychological journals (i.e., 41 out of the 50 papers reviewed used cut-off values for decision making, see table 1-1). However, only a few simulation studies have systematically examined the appropriateness of these proposed cut-off values under limited sets of conditions. With such a small number of simulated conditions, the external validity of the proposed cut-off values is questionable.

(21)

9

In social and behavioral science research, it is common that a set of ordinal-scaled items (e.g., items with five categories: 1 = strongly disagree, 2 = disagree, 3 = neither agree nor

disagree, 4 = agree, 5 = strongly agree) is used to measure one or more psychological constructs. For example, the 50 CFA empirical studies reviewed showed that 82% of these randomly chosen studies analyzed ordinal variables with five or fewer categories (see figure 1-1). These ordinally scaled items result in observed variables that are “coarse” and “crude” categorization of the latent continuous variables (Finney & DiStefano, 2013). While Finney and DiStefano (2013) pointed out that all ordinal variables are inherently non-continuous, whether or not a variable with many categories can be treated as continuous in analysis seems to be a subjective matter (e.g., the default of LISREL is to treat observed variables with 15 or fewer categories as ordered categorical variables). All in all, in many SEM applications, variables with five or fewer

categories are considered ordinal instead of continuous, and ML estimation is not recommended for analyzing ordinal variables (Finney & DiStefano, 2013).

Figure 1-1. Percentage distribution of the 50 CFA studies

that used 2- to 6+-point categorical variables.

12%

2%

21%

47%

18%

(22)

10

The mean- and variance-corrected weighted least squares (WLSMV) and the diagonally weighted least squares (DWLS) are estimation techniques developed to accommodate ordinal variables without the computational intensity of the full WLS estimation. The WLSMV is implemented in Mplus (Muthén & Muthén, 1998-2012) and the DWLS is built in in LISREL (Jöreskog & Sörbom, 2012). Both estimators are forms of robust weighted least squares (RWLS), which does not make any distributional assumptions. RWLS is commonly used with polychoric correlations to better estimate associations of the latent response variables underlying categorical observed variables. The WLSMV and the DWLS differ in the weight matrices they employ and the calculation of the 𝜒2 statistics (which is elaborated in chapter 2). As a result, the fit indices

produced by the two estimators may also vary, though the values are most likely to be asymptotically equivalent (Muthén, 2005, 2006).

Initial research findings about the two estimators are encouraging. The WLSMV

produced 𝜒2 statistic that had Type I error (i.e., rejecting a correctly specified model) rates close to the nominal level of 5% with relatively small samples (e.g., N > 250 when the model was not too complex), when the model was correctly specified (Bandalos, 2008; Beauducel & Herzberg, 2006; Flora & Curran, 2004; Lei, 2009; Muthén et al., 1997). The DWLS was also a significant improvement over the WLS in terms of convergence rates, and the DWLS 𝜒2 statistic had

acceptable Type I error rates when the sample size was smaller than that required by the WLS (DiStefano, 2010; Yang-Wallentin, Jöreskog, & Luo, 2010). However, due to the fact that the development of the two estimators is relatively recent (in 1997 and 2000, respectively, for the WLSMV and the DWLS), studies regarding their performance are rather limited, and some observations have only been preliminary. For example, little is known about the performance of the two estimators when the model is misspecified. Given the large number of applied CFA

(23)

11

studies that utilize ordinal data, and the high probability that a model is misspecified to some extent in real-life research (Curran, 1994; MacCallum, 1995), much more studies are needed to comprehensively assess the functioning of the two estimators and their associated fit measures (Finney & DiStefano, 2013).

Hu and Bentler (1999) once discussed two pressing issues pertaining to model evaluation faced by methodological and applied researchers:

“The first pressing issue is determination of adequacy of fit indexes under various data and model conditions often encountered in practice. These conditions include sensitivity of fit index to model misspecification, small sample bias, estimation method effect, effects of violation of normality and independence, and bias of fit indexes resulting from model complexity. The second pressing issue is the selection of the ‘rules of thumb’ conventional cutoff criteria for given fit indexes used to evaluate model fit” (p.4). Keeping the two issues in mind, this study sought to gain a better understanding of the

performance of five fit measures, 𝜒2 statistic, Comparative Fit Index (CFI), Tucker-Lewis Index

(TLI), Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR), using the WLSMV and the DWLS estimators, in small samples, large models, and highly nonnormal variables, with correct and misspecified models. In addition, this study also empirically examined the behavior of the most widely used cut-off values proposed by Hu and Bentler (1999), under broader conditions than the ones simulated in the original study, as an effort to investigate the external validity of the cut-off values.

(24)

12

Factors that affect the performance of the model fit measures

Hu and Bentler (1998) pointed out that there are four major problems involved in using fit indices for model evaluation: sensitivity of fit indices to model misspecification, estimation method effect, small-sample bias, and effects of violation of normality and independence (p.427). The first issue is related to the main practical point for using fit indices, that is, the abilities of the fit indices to discriminate well- from badly-fitted models (Maiti & Mukherjee, 1991). The latter three are all natural consequences of the fact that “(fit) indices typically are based on chi-square tests” (Hu & Bentler, 1998, p.427). As will be discussed briefly below, and with more details in chapter 2, the adequacy of chi-square test may depend on particular assumptions it requires. Violation of these assumptions may affect the performance of a chi-square test, and subsequently the fit indices. This section discusses these possible influences on the model fit measures and how they may relate to model evaluation under RWLS estimation.

Estimation methods

Choice of estimation methods affects the quality of model parameter estimates, their associated standard error estimates, and overall model fit statistics in CFA modeling (Lei & Wu, 2009). There are many estimation methods available for CFA. Each method is derived under a specific set of assumptions that must be met for proper estimation. Violation of these

assumptions may affect the performance of the fit measures produced by the estimation methods, thus leading to inaccurate information and invalid conclusions about the fit of the model.

The RWLS estimation is based upon full WLS estimation, which is a distribution-free method. Furthermore, the RWLS estimation analyzes polychoric correlations, which take into consideration the categorical nature of the ordinal observed variables. Two implicit assumptions

(25)

13

about the RWLS estimation are that the model is properly specified and the sample is sufficiently large. When these assumptions are met, the test statistics follow a chi-square distribution.

Previous research have found that when the model was properly specified, both WLSMV and DWLS 𝜒2 statistics had good control of Type I error at a relatively small sample size (between

250 and 400, e.g., Bandalos, 2009; DiStefano, 2010; Lei, 2009; Yang-Wallentin, 2010). However, no consistent conclusions can be reached when the estimators are used with misspecified models, and little is learned about the performance of the RWLS fit indices.

Model misspecification

The performance of the RWLS-based 𝜒2 statistics and fit indices in misspecified models

received inadequate amount of attention from methodological researchers. A correctly specified model is one that matches well with the population from which the sample is drawn. A model is said to be misspecified if it is either overparameterized, or underparameterized, or both (Hu & Bentler, 1998). Overparameterized models estimate one or more parameters when their

population values are zero, whereas underparameterized models fix one or more parameters to zero when their population values are in fact non-zero. MacCallum (1995) observed that most models specified in applied research are misspecified to some extent. Therefore, simulation study results about model fit measures based on properly specified models may have limited

generalizability in practice.

A model can be misspecified in many ways. For CFA models, Brown (2006) listed three ways that model misspecification can occur in practice: (1) misspecified numbers of factors; (2) misspecified indicators or factor loadings; and (3) misspecified correlated errors.

(26)

14

Misspecification of the numbers of factors can happen when a researcher either does not have a complete understanding of the latent factors that affect the observed variables, or is not aware of the fact that a set of observed variables are affected by some factors other than those modeled. The former usually takes place when a theory is just forming. For example, when investigating the factor structure of the Psychopathy Checklist: Youth Version (PCL:YV), Jones, Cauffman, Miller, and Mulvey (2006) noted that “(a)s this review demonstrates, the factor structure of the PCL:YV is far from clear” (p.35); and Ward (2006) stated that “BDI-II factor structure has not, however, been completely consistent in the several investigations that have been reported” (p. 81), as he compared different factor structure models for the Beck Depression Inventory--II. When the latent factor structure is unclear to researchers, it is likely that they mistakenly model too many or too few factors. The latter situation occurs when a researcher has overlooked a second factor that is affecting a subset of observed variables he/she studies. For example, the existence of a “method effect” due to positive and negative wording in self-esteem measurement has been repeatedly documented by several authors (as cited in Bandalos, 2008). However, a researcher who is not aware of such a method effect may overlook this factor when he or she analyzes self-esteem questionnaires.

If too many factors are specified, it is likely that two or more factors are highly correlated, so that they have poor discriminant validity. Cohen, Cohen, West, and Aiken (2003) suggested that a factor correlation of .85 or higher might signify problematic discriminant validity. So misspecification of too many factors can be easily detected by researchers. If too few factors are modeled, CFA will fail to reproduce the observed relationships among the indicators; parameter estimates will be affected, which will in turn affect the theoretical interpretation of the results. However, there is no direct way of knowing it. Modification indices may suggest correlated

(27)

15

errors but that might not be the real source of mis-fit. Therefore, it is important to know if the overall model fit indices are sensitive enough to pick up this kind of model misspecification.

The most common form of misspecification in CFA is omission of cross-loadings, because a standard CFA model assumes no cross-loadings. Subsequently, cross-loadings that exist in the population may be overlooked intentionally or unintentionally in practice. In some situations, a researcher may intentionally ignore cross-loadings when a simple structure CFA model is preferred (e.g., Greiff et al., 2012, deleted all items with cross-loadings from the Space Shuttle measure in order to analyze a “clean” CFA model). Or a researcher can simply be not aware of the multidimensionality of the items. For example, an item that measures mathematical ability can also assess reading ability. For researchers who analyze achievement tests on which both abilities are required, failing to recognize the effect of reading ability on the success of the mathematical items may cause missing cross-loadings.

Suppose the observed variable X1 loads on two latent factors F1 and F2 in the population, simulation study (Brown, 2006) showed that omission of a cross-loading (e.g., X1F2) may result in inflation of the estimated factor correlation between F1 and F2, and the estimated factor loading of X1F1, and underestimation of the other factor loadings on F1. When these factor loadings and correlation are interpreted, some effects may be overstated while others may be understated. Depending on the magnitude and the importance of the effects, this type of model misspecification can pose serious or nonessential problems in different scientific contexts.

In a CFA model, when all error variances are specified to be independent of one another, a researcher is making a claim that all covariations among indicators are accounted for by the set of latent factors in the model and all measurement errors are random. Correlated errors are

(28)

16

but rather by some other omitted common causes. One such common cause is the method effect discussed earlier. Depending on the purpose of the analysis and the underlying theoretical justification, method effect can be modeled as error covariances among indicators that are measured with the same method instead of as a separate factor (Brown, 2006).

Correlated error variances can be misspecified in two ways: estimating a non-existent error covariance and omitting an existing error covariance. Empirical examples of such are difficult to document because in practice whether error covariances truly exist is not known. As Brown (2006) described, estimating a non-existent error covariance by mistake is readily

detectable by non-significant statistical or clinical results (e.g., a non-significant test statistic or a very small parameter estimate of the path). More difficulties reside with detecting salient

correlated errors that are mistakenly omitted in the solution. Brown (2006) showed that in some cases, this type of misspecification may not be captured by overall model fit measures.

Researchers have to look into the standardized residuals and modification indices to catch the mistake. However, in reality, some researchers may not proceed to check out the standardized residuals and the modification indices when the overall model fit indices indicate an acceptable model fit. Consequences of this type of model misspecification include overestimation of the factor loadings of the indicators whose errors are supposed to co-vary, and underestimation of the factor loadings of the other indicators that load on the same factor. Therefore, it is imperative to examine to what extent are the overall model fit measures sensitive to this kind of model misspecification.

Another type of model misspecification that is not discussed by Brown (2006), but is commonly encountered in personality research, is the misspecification of the factor correlations. In personality psychology, it is commonly believed that the Big Five personality factors (i.e.,

(29)

17

openness, conscientiousness, extraversion, agreeableness, and neuroticism) are independent to one another. So personality researchers specify and test orthogonal factor models of various personality measures. However, it has been repeatedly shown that when the Big Five is operationalized, factors in personality measures are usually correlated to one another (see Saucier, 2002, for a more detailed discussion). In these cases, it is important that the fit indices catch such mis-fit of the model to the data, so that researchers are aware of the existence of the factor correlations, whether it is due to imperfect theory or unsatisfactory measurement.

Sample size and model size

Ding, Velicer, and Harlow (1995) once described a good fit index as being only sensitive to model misspecification, but “robust against … factors that are irrelevant to the correctness of the specified model” (p.120). Sample size and model size are such factors that are irrelevant to the correctness of the specified model. As a large sample technique, SEM’s minimum

recommended sample size is 200 (Hoogland & Boomsma, 1998; Kline, 2005). What makes the matter more complicated is that the minimum sample sizes required often depends on the size of the model tested. For example, 200 cases may seem large enough for a 6-indicator-2-factor CFA model, but will probably not suffice when a 50-indicator-10-factor CFA model is tested.

Problems arise when a relatively small sample is used to estimate a large number of parameters. Sampling error may be too large and parameter estimates may not be reliable. The recommended ratio of sample size (N) to number of parameters estimated (t) is at least 10:1 (Hoogland & Boomsma, 1998; Kline, 2005; Nunnally, 1967).

However, it has been observed that in applied CFA studies, it is not rare that a small sample is used to estimate a relatively large model. Figures 1-2 and 1-3 display the percentage

(30)

18

distribution of the sample sizes and the N:t ratios used in the 50 applied CFA studies. As can be seen from the graphs, 17% of these studies used a sample smaller than 200 cases and 27% used a ratio of N:t less than 5:1. Indeed, in reality, it is not easy for individual researchers to collect large samples, even when the number of indicators they analyze is decently large.

Figure 1-2. Percentage distribution of the 50 CFA studies that used different sample sizes.

Figure 1-3. Percentage distribution of the 50 CFA studies that used different N:t ratios.

17% 39% 24% 20% <= 200 201-500 501-1000 > 1000 27% 32% 41% < 5:1 5:1 - 10:1 > 10:1

(31)

19

As discussed earlier, both WLSMV and DWLS are built upon WLS, which requires a very large sample size to provide stable estimates. In addition, both WLSMV and DWLS only produce accurate model test statistics when correct asymptotic variances of the sample

correlations are calculated. Although research (Beauducel & Herzberg, 2006; Flora & Curran, 2004; Yang-Wallentin et al., 2010) have shown that neither WLSMV nor DWLS needed a sample that is as large as required by WLS to perform well, it is very likely that these estimators will not behave properly in the presence of an extremely small sample. Therefore, it is important to find out if the model fit measures produced by WLSMV and DWLS can be trusted when the sample is small relative to the model.

Scale and distribution of the observed variables

Micceri (1989) observed that most data collected from achievement and other measures are not normally distributed, which is generally true in the field of educational and psychological studies, where ordinal observed variables are often used to measure underlying psychological constructs. Among the 50 CFA studies that are reviewed, most studies that specifically

mentioned that they checked distributions of the observed variables reported having nonnormal variables, with some reporting severe nonnormality. Observed variables that are nonnormal or categorical should not post serious threats to RWLS-based fit measures because the RWLS estimation takes into account the categorical nature of the ordinal variables and is distribution-free. However, previous research on RWLS fit statistics and indices showed that they were affected differentially by nonnormal distributions and the scale of the observed variables. For example, observed ordinal variables with more severe nonnormality were found to affect convergence rates of the WLSMV and the DWLS estimators (e.g., DiStefano, 2010; Lei, 2009)

(32)

20

and Type I error control of the overall model fit measures (e.g., Lei, 2009; Yu, 2002), with higher rates of nonconvergence and inflated Type I error rates associated with more severe nonnormality. Nonnormality and the scale of the observed variables also affected Type II error (i.e., accepting a misspecified model) rates with more Type II errors associated with increasing nonnormality and decreasing numbers of variable categories (Bandalos, 2008). Therefore, it would be interesting to find out whether the WLSMV and the DWLS can produce acceptable overall model fit statistics/indices in commonly encountered less than ideal conditions, such as nonnormality and small sample sizes with different types of model misspecifications.

Evaluation of model fit indices

An important issue that is closely relevant to model assessment is the evaluation of the model fit indices. Hu and Bentler did a highly influential simulation study in 1999, in which they evaluated different cut-off values for a group of better performing fit indices under different data and sample conditions (specifics are discussed in chapter 2) with correct and misspecified models. They proposed a set of viable cut-off values for the fit indices examined, based on the fact that these cut-off values generally resulted in minimal incorrect rejection and acceptance rates2. They believed that these rule-of-thumb cut-off values would help researchers to arrive at a more “objective” decision regarding model-data fit (Hu & Bentler, 1999).

Although Hu and Bentler (1998, 1999) repeatedly stressed that the application of these cut-off values should be restricted to conditions that are similar to the ones studied in their papers

2 Hu and Bentler called these Type I and Type II error rates. Although strictly speaking, the incorrect rejection and

acceptance rates of a fit index are not Type I and Type II error rates as the fit indices are mainly used as descriptive statistics, this study will follow the convention of the field and loosely term the incorrect rejection of a true model as Type I error, the incorrect acceptance of a misspecified model as Type II error, and the correct rejection of a

(33)

21

(mainly, ML estimation and continuous variables), these tentative cut-off values became extremely popular as “golden rules” in SEM practice. Among the 50 CFA studies reviewed, 41 (82%) specifically claimed that they used cut-off criteria of some sort when making model-fit decisions, of which 23, or 56%, of the decisions were based on Hu and Bentler’s (1999) suggested cut-off values, despite the fact that almost all of these studies analyzed ordered categorical data, and 62% of them explicitly reported that an estimation method other than the regular ML estimation was used for the analysis (see table 1-1).

On the one hand, researchers do need some criteria to assess the “goodness” of their proposed model. On the other hand, applying these rule-of-thumb cut-off values as “golden rules” without considering the sample, data, and model conditions specific to the study may lead to incorrect interpretations. The overgeneralization and overinterpretation of Hu and Bentler’s (1999) cut-off criteria may partly be due to the fact that there are few other studies that researched and examined the appropriateness of the cut-off values in conditions that are not covered by Hu and Bentler (1999). Therefore, more research is needed to assess the workability of these cut-off thresholds under different conditions, such as with the RWLS estimation and ordinal variables.

Limitations of past research

Compared to research on traditional estimation methods such as ML estimation, studies that have looked at the WLSMV and the DWLS estimators and their associated fit measures are rather limited. While some general guidelines (e.g., minimum sample size required, cut-off values for different fit indices, etc.) have been offered for ML estimation, equivalencies have not been established for RWLS estimations. Although a few consistent conclusions can be drawn

(34)

22

from past research, many questions remain unanswered about these two estimators. This section discusses several limitations in the existing research that the current study attempts to address.

First of all, although model misspecification is one of the biggest concerns of applied researchers when they specify and test their models, few simulation studies have incorporated model misspecification as one of their conditions when they examined the performance of the RWLS fit statistics and indices. The sensitivity of the fit measures to model misspecification can be empirically examined in simulation studies using the power of the fit statistic/index. The power of a fit index is the probability that the fit index indicates a bad fit when a model is indeed misspecified. Lei (2009) introduced a small dosage of model misspecification in her simulation study examining the performance of WLSMV chi-square test. She found that the empirical power of the WLSMV chi-square test was generally low unless sample size was very large (N = 1000). It was unclear, however, if the low power of the WLSMV chi-square test was inherent to the estimator or just due to the small size of model misspecification. Yu (2002) also looked at the performance of the WLSMV chi-square test and other fit indices when the model was

misspecified. Contrary to Lei (2009), Yu (2002) found that the power of the WLSMV-base fit statistic and indices was high, especially when the misspecification was regarding omitted factor covariances. However, when the misspecification was regarding omitted cross-loadings, the power of the fit measures was low. Bandalos (2008) also attempted to address the issue of model misspecification with the WLSMV estimation. She examined the power of the 𝜒2statistic, CFI, and RMSEA when two latent factors in the population were specified to be one in the sample. She found very low power associated with the WLSMV fit statistic and indices, and the power seemed to decrease as the variable nonnormality increased. Overall, it is unclear what might have caused the high and the low power of the RWLS fit measures in these different studies. Is it the

(35)

23

small dosage of model misspecification in the design? Is it different types of model misspecification? Or is it inherent to different RWLS fit statistics/indices?

Secondly, there are not enough simulation studies to fully account the behavior of the DWLS estimation, though the estimator is being used in applied studies (e.g., Cai, 2012; Pandolfi, Magyar, & Dill, 2009; Wan & Henly, 2012). Only two simulation studies have been found regarding the performance of the DWLS estimator in LISREL. Yang-Wallentin, et al. (2010) examined the LISREL DWLS estimator and found that it had good convergence rates with small samples and large models, and produced 𝜒2 statistic that was close to its expected values, when the model was correctly specified. The variable distribution and the variable scale did not seem to affect 𝜒2 estimates. DiStefano (2010) investigated not only the DWLS-based

chi-square test, but also the fit indices. Unlike Yang-Wallentin et al. (2010), DiStefano (2010) encountered convergence and estimation difficulties with DWLS, especially when the sample was small and the variables were highly nonnormal. Overall, the DWLS 𝜒2 had good Type I error control but low power when the sample was small (N = 150). DiStefano (2010) also found that the DWLS-based CFI, RMSEA, and SRMR were sensitive, to varying degrees, to sample, data, and model characteristics, and some of the fit indices (i.e., CFI and RMSEA) had low power to detect misspecified models. DiStefano and Hess (2005) reported that the LISREL was the most popular computer program among SEM applied researchers. Given the popularity of the LISREL, more research is needed to replicate the results and extend the findings on the DWLS estimation from Yang-Wallentin et al. (2010) and DiStefano’s (2010) studies.

The last but not least important limitation points to the lack of studies that examined Hu and Bentler’s (1999) proposed cut-off criteria for the fit indices under the RWLS estimation. Only Yu (2002) has tested the cut-off values of the fit indices under the WLSMV estimation. Yu

(36)

24

(2002) found that Hu and Bentler’s (1999) cut-off criteria were somewhat applicable to WLSMV estimation and ordinal variables. She proposed several new cut-off criteria that were not too different from those of Hu and Bentler’s (1999), but worked better under the conditions she simulated. However, the generalizability of Yu’s (2002) study is limited as she only included dichotomous variables with almost exactly the same CFA population models and

misspecification conditions as those studied by Hu and Bentler (1999). Other conditions that are realistic and common in practice, such as smaller or larger model sizes, different

misspecification types and degrees, ordinal variables with more categories and more severe nonnormality need to be simulated to see how Hu and Bentler’s (1999) cut-off criteria perform under drastically different situations than the ones created by themselves.

Statement of the problem

The purpose of this dissertation was to address the abovementioned limitations in existing research on the performance of the WLSMV- and the DWLS-based fit measures under various conditions designed to simulate real CFA practice. These conditions include: different types of model misspecification, varying degrees of model misspecification, small and large samples, small and large models, mildly- to severely-nonnormal variable distributions, and ordinal variables with few and many categories. In addition, this study also empirically examined Hu and Bentler’s (1999) cut-off criteria with RWLS estimation methods and ordinal variables. Results from the current study will contribute to the understanding of model evaluation using RWLS estimation methods, and help establish more rigorous guidelines for CFA modeling with ordinal observed variables.

(37)

25

(1) How do 𝜒2 statistics, CFI, TLI, RMSEA, and SRMR vary with different types and degrees of model misspecification, sample sizes, model sizes, scale, and distribution of the observed ordinal variables, when WLSMV or DWLS is used? Are the fit statistics and indices relatively robust or sensitive to these studied factors?

(2) Using Hu and Bentler’s (1999) cut-off criteria, what are the Type I error rates of the WLSMV- and DWLS-based fit statistics and indices?

(3) Using Hu and Bentler’s (1999) cut-off criteria, how powerful are the WLSMV- and the DWLS-based fit statistics and indices in detecting model misspecification?

(4) Are Type I error rates and power of the RWLS-based fit measures relatively stable across different sample sizes, model sizes, scale, and distribution of the observed variables; i.e., are Hu and Bentler’s (1999) suggested cut-off criteria relatively robust to factors that are

irrelevant to the correctness of the specified models?

(5) Are there alternative cut-off criteria that can be recommended for use with RWLS estimators in practice, if Hu and Bentler’s (1999) cut-offs do not work?

(6) Are the WLSMV- and the DWLS-based fit statistics and indices differentially sensitive to different types of model misspecification?

Hypotheses

Based on an extensive review of the literature (which is given in chapter 2), several hypotheses were made to answer some of the above questions. It was expected that:

(1) The WLSMV- and the DWLS-based fit measures, evaluated at p > .05 for 𝜒2, CFI and TLI > .95, RMSEA < .06, and SRMR < .08, will maintain acceptable rates of Type I error

(38)

26

when the sample size is not too small, the model size is not too large, and the distribution is not highly nonnormal.

(2) When the sample size is small, the model size is large, and/or the distribution is highly nonnormal, Type I error rates of the WLSMV- and the DWLS-based fit measures will likely to be inflated or deflated.

(3) The power of the WLSMV- and the DWLS-based fit measures, evaluated using the same criteria above, will differ based on various types and degrees of model misspecification. Power should be higher for more severe model misspecification holding types of

misspecification constant, though it is hard to predict which type of misspecification will result in higher power.

(4) Model misspecification might interact with other factors examined in the study, such as sample size, model size, scale and distribution of the observed variables, on power of the WLSMV- and the DWLS-based fit measures. In general, higher power will be associated with larger samples, smaller models, and less skewed and kurtotic observed variables.

Significance and contribution of the study

Model estimation and evaluation are important parts of a modeling process. As Bentler and Bonett (1980) described, the main statistical problems involved in covariance structure analysis are to estimate parameters in the proposed models and evaluate the goodness-of-fit of the models. Model estimation involves choosing appropriate estimation methods for the types of variables analyzed and model evaluation has to do with using model fit statistics and indices to assess the closeness of fit of a model to empirical data.

(39)

27

Given the prevalence of ordinal and nonnormal variables in social and behavioral research, it is imperative to gain a comprehensive understanding of how various fit measures perform under estimation methods that are appropriate for ordinal and nonnormal variables, namely, the WLSMV and the DWLS estimators. However, this issue has only received limited attention from methodological researchers in the past years. This study attempted to fill in several gaps in existing research. Specifically, this study will contribute to the SEM literature and inform SEM practice in the following three ways:

First of all, this study will help gain a better understanding of the performance of the RWLS-based fit measures in detecting model misspecification with the systematic variation of different types and degrees of model misspecification in the design of the study. Previous

research is inconclusive with regard to the power of the RWLS-based fit measures due to the fact that different types and degrees of model misspecification were examined in different studies with non-comparable sample, model, and data designs. Putting these conditions all together in one controlled environment will help paint a clearer picture about the performance of the RWLS-based fit measures.

Secondly, this study is expected to offer more empirical evidence supporting or against the use of Hu and Bentler’s (1999) cut-off values in RWLS estimations with ordinal variables. Because few studies have examined these cut-off values under situations that are drastically different than those established by Hu and Bentler (1999), external validity of these cut-offs are rather questionable. Given the extreme popularity of these cut-offs in the community of applied SEM research regardless of estimation methods and sample characteristics, it is essential to evaluate the generalizability of Hu and Bentler’s (1999) cut-offs. Results of this study will inform SEM researchers whether these cut-offs can be generalized to RWLS estimations with

(40)

28

ordinal observed variables and a variety of sample, data, and model conditions that are common in applied settings.

Thirdly, this study will benefit applied SEM researchers who are most familiar with the LISREL computer program. As DiStefano and Hess (2005) observed, the LISREL software package “was the most popular choice to conduct CFA … because it was one of the first

packages widely distributed for PC use” (p.232). However, solid simulation studies regarding the DWLS estimator in LISREL are currently rather limited, which in part prevented LISREL users from using this estimator. Detailed results regarding the DWLS estimator are expected from this study, so that LISREL users will be informed about its performance in model evaluation with ordinal variables.

(41)

29 Chapter 2

Literature Review

This chapter revisits the fundamentals of the CFA model, the WLSMV, and the DWLS estimators. To keep the dissertation in a manageable scope, only critical concepts that are essential to the understanding of the current study are presented. References are provided for readers who are interested in more mathematical and technical details behind the concepts

reviewed here. Previous simulation results on the performance of the WLSMV and the DWLS fit measures are also discussed in details.

The Confirmatory Factor Analysis model

Model specification

Let 𝚺(θ) stand for model implied covariance matrix as a function of parameter vector θ, and 𝚺 stand for the population covariance matrix of observed variables. SEM tests if the null hypothesis H0: 𝚺(θ) = 𝚺 holds. The θ vector contains parameters of structural equation models. CFA is a special form of structural equation models. The general form of CFA is

𝐱 = 𝚲𝝃 + 𝜹, (2.1) where x represents the observed variables, 𝝃 is a vector of latent factors, 𝜹 is a vector of

(42)

30

CFA models presume that observed variables are functions of one or more latent

common factors, and their relations can be described using factor loadings, which are analogous to regression coefficients. Measurement errors represent the part of the observed variables that are not explained by the latent factors. The assumptions of standard CFA models are that the models are properly specified, measurement errors are independent of latent factors, and measurement errors themselves are uncorrelated with each other, which can be relaxed in nonstandard CFA models.

The model implied covariance matrix of CFA models can be written as:

𝚺(θ) = 𝚲𝚽𝚲′+ 𝚯, (2.2) which shows that the model implied covariance matrix can be decomposed in terms of

parameters in 𝚲, 𝚽 the covariance matrix of 𝝃, and 𝚯 the covariance matrix for 𝜹. If the CFA model is correctly specified, and all parameter values are known, then the population variances and covariances matrix of x (i.e., 𝚺) will be reproduced exactly. However, since the values of the model parameters are unknown in reality, neither is 𝚺, the parameters must be estimated to closely reproduce the observed covariance matrix of x, S.

Model estimation

Maximum likelihood estimation. When the model is correctly specified and the

population parameters are known, the null hypothesis 𝚺(θ) = 𝚺 holds exactly. However, since this is not usually the case, model parameters are estimated to minimize the discrepancy between the implied covariance matrix, 𝚺(θ), and the observed covariance matrix, S. Several fit functions can be used to measure this discrepancy. The maximum likelihood function is the most widely used one. The ML fit function that is minimized is:

(43)

31

𝐹𝑀𝐿 = ln|𝚺(θ)| − ln|𝑺| + 𝑇𝑟(𝑺𝚺(θ)−1) − 𝑝, (2.3) where p is the number of observed variables. ML is a popular estimator because it has the much desired properties of asymptotical unbiasedness, consistency, and efficiency. This mean that the expected value of the ML estimate equals the parameter that it estimates, the ML estimate approaches its true parameter as sample size increases toward infinity, and the variability of the ML estimate is the smallest among consistent estimators (Lei & Wu, 2012). Furthermore, under the null hypothesis H0: 𝚺(θ) = 𝚺, the distribution of 𝑇𝑀𝐿 = (𝑁 − 1)𝐹𝑀𝐿, where N is the sample size and 𝐹𝑀𝐿 is the minimum value reached in the last step of the iterative ML procedure,

asymptotically follows a chi-square distribution with 𝑝(𝑝 + 1)/2 − 𝑡 degrees of freedom, with t being the free parameters to be estimated. 𝑇𝑀𝐿 is thus often called the 𝜒2 statistic and used in tests of overall model fit. However, 𝑇𝑀𝐿 is asymptotically chi-square distributed under a set of assumptions. In SEM, some of these important assumptions include (Bollen, 1989):

(1). The observed variables have no excess kurtosis, relative to a multivariate normal distribution;

(2). The sample is sufficiently large, as 𝑇𝑀𝐿 only approximates a chi-square distribution

asymptotically; and

(3). The H0: 𝚺(θ) = 𝚺 holds, which means that the hypothesized model is accurately specified. If the observed variables are multivariate normal, the sample size is fairly large, and the model is correctly specified, then the ML 𝜒2 statistic is appropriate for statistical significance tests of model fit. However, in practice, one or more of these assumptions are often violated, and when the assumptions do not hold, the ML chi-square test may not be valid. For example,

research have repeatedly demonstrated that nonnormality leads to excessively large 𝜒2 estimates

(44)

32

more severe nonnormality (Chou, Bentler, & Satorra, 1991; Curran, West, & Finch, 1996; Hu, Bentler, & Kano, 1992; Yu, 2002). Similarly, small samples also inflated the Type I error rates of the 𝜒2 statistic too much (Anderson & Gerbing, 1984; Boomsma, 1983; Curran, Bollen, Paxton, Kirby, & Chen, 2002; Hu et al., 1992), making conclusions drawn from such tests untrustworthy.

Polychoric correlations. Ordinal variables, such as variables with Likert-type scales, are

commonly used in social and behavioral research. These ordinal variables are considered indicators of continuous latent variables, and when such ordinal variables are analyzed in SEM, interests often focus on the relationship between the latent responses that give rise to these ordinal variables (Edwards, Wirth, Houts, & Xi, 2012). The relationship between a latent distribution, y*, and an observed ordinal distribution, y, can be described as (Flora & Curran, 2004):

y = c, if c < y* <c+1, (2.4) with ,the threshold parameters defining the categories c = 0, 1, 2, …, C – 1, where 0 = – ∞ and C = ∞. Thus, the value of y, the observed ordinal variable, changes when a threshold  is

exceeded on the latent response variable y*. Because of this non-linear relationship between the observed and latent variables, Pearson product-moment correlations of the observed variables are not consistent estimators of the population correlations of the underlying latent responses

(Edwards et al., 2012) and are not suitable for analysis of ordinal observed variables (Muthén, 1983).

Indeed, earlier work (cited by Bollen, 1989, p.434) found that the Pearson product-moment correlations, when used to estimate relations between ordinal variables, generally

References

Related documents

expanding host range of DMV, testified by the present report and by numerous cases of DMV infection reported in the last 5 years in the Mediterranean Sea ( Mazzariol et al., 2013,

Commercial aircraft programs inventory included the following amounts related to the 747 program: $448 of deferred production costs at December 31, 2011, net of previously

The projected gains over the years 2000 to 2040 in life and active life expectancies, and expected years of dependency at age 65for males and females, for alternatives I, II, and

In terms of the estimates of region-specific observation noise (lower right panel), the EM scheme yields much lower precision estimates than the stochastic schemes (because it

Furthermore, participants received transcranial alternating current stimulation (tACS) over the right frontal and parietal cortex, which oscillated coherently in-phase within the

The corona radiata consists of one or more layers of follicular cells that surround the zona pellucida, the polar body, and the secondary oocyte.. The corona radiata is dispersed

The ergodic distribution (shown in the last row of each table) offers a more radical view since, according to the discrete information, probability mass concentrates increasingly in

The full width at half-maximum (FWHM) of one dimensional (1D) NMR spectral peaks defines the peak linewidth and is a common parameters used to describe ion mobility and ion-