Theoretical Introduction to Factor Analysis and Structural Equation Modelling
7.4 Five Step Process for Structural Equation Modelling
There are five main steps involved in structural equation modelling. They include:
Model Specification, Model Identification, Model Estimation, Model Testing, and Model Modification. In the following section, each of these steps will be explained in order to facilitate ease of understanding of the use of SEM in this research. Three popular software packages used in factor analysis and SEM (during the second and third steps in particular) are LISREL, Mplus, and Amos. Mplus was chosen as the best package to use during this research project. Mplus differs from the other two packages primarily in its inability to estimate models through drawing simple path diagrams and its ability to estimate a wider range of models.
7.4.1 Model Specification
The first step in the five step SEM process is Model Specification, where the theoretical model is developed and defined by both fixed and free parameters. Schumaker and Lomax highlight the importance of model specification to SEM modelling through pointing out that “path analysis does not provide a way to specify the model, but rather estimates the effects among the variables once the model has been specified a priori by the researcher on the basis of theoretical considerations” (2010, p. 147). Model Specification can either occur in the form of verbal explanation, drawing of path diagrams, or a series of equations. Diagrams were used to specify the various models in this piece of research, because they are easier to understand. Furthermore, various computer programmes are often used to estimate relationships between variables at a
later stage, making complicated equations redundant. Attention is also drawn to variables that have been excluded and links that have been missed when a path diagram is used, which in turn, may also increase the probability of an improved model conceptualization (Diamantopoulos, 1994). An enhanced understanding of structural models can be obtained through the use of path diagrams, which also aid in the construction of appropriate input files and decreased error in specification (Diamantopoulos & Siguaw, 2000; Raykov & Marcoulides, 2000).
Hair et al. 2006 have highlighted the importance of both previous empirical results and theory to this stage of the process. Generally speaking, fixed parameters are established at zero, which indicates that a relationship between the variables does not exist, while free parameters are estimates from the observed data. It is up to the researcher to designate the various parameters as either fixed or free, when determining where relationships are expected during the SEM process. In other words, the parameters at play in the observable sample variance and the covariance matrix have to be determined. This determination is often made when a researcher makes their a priori hypothesis. The parameters are used later in the SEM process to determine the way in which comparisons between the components of the model (diagram, covariance matrix, variance of sample population) will be carried out.
7.4.2 Model Identification and Model Estimation
After model specification, the identification of the model and its various parameters takes place. A model is considered identified when it becomes impossible for clear-cut sets of parameter estimates to recreate matching population variance-covariance matrices. Models can be considered identified, under-identified, just identified, and over-identified (Kelloway, 1998; Hair, et al., 2006). Identified models contain unique
observed variances and covariances, which are obtained through determining parameter estimates. Just identified models contain exactly the same number of ways to determine parameter estimates as the number of parameter estimates themselves. These types of models have zero degrees of freedom. On the other hand, when it is impossible to estimate all parameters within the model, a model is considered under-identified. Finally, when there are more known than free parameters, a model is considered over identified, which leads to constraints on the correlation or covariation matrix.
Model estimation involves the estimation of various model parameters, which are determined through establishing numeric values for each model parameter (element).
A properly specified model often contains a mixture of fixed and free parameters, of which the free parameters must be estimates obtained from the data (Lei & Wu, 2007).
The process of estimation begins with the calculation of an appropriate correlation/covariance matrix of the observed variables, moves on to the assignment of trial values to parameters, and ends with the calculation of the correlation/covariance matrix that these values imply. The main statistical benefit to covariance matrix analysis is that both standard errors of the estimates and fit indices are correct.
However, many survey responses involve ordinal and non-normal data, which when treated continuously, can result in a whole host of problems. Luckily, MPLUS and other software packages are capable of computing matrices by using variables of many scale types.
The estimation of free parameters involves a continuous attempt to minimize discrepancies between the observed covariance matrix (supplied from the data) and the inferred covariance matrix (supplied from the model). These discrepancies can lead to
a failure to estimate the model or the improper provision of solutions. Problems with estimating models usually occur due to models not being identified, variables being too highly correlated, and/or sample sizes being too small (Lei & Wu, 2007).
7.4.3 Testing Model Fit
Once the model has been specified and estimated, the fit of the model has to be assessed. In basic terms, a model is a hypothetical estimate of the phenomena being investigated. If the data and model are inconsistent, the model should be rejected. The primary statistics used to test model fit in this research were χ2, df, TLI, RMSEA, SRMR, and AIC. Each of these statistics will be discussed briefly in the following paragraphs.
Generally, fit indices can be classified in two ways: 1) as absolute indices or 2) as incremental indices (Hu & Bentler, 1999). When assessing the similarity between the observed and fitted model matrices, absolute indices are being put into play. “Absolute indices evaluate the overall discrepancy between observed and implied covariance matrices; fit improves as more parameters are added to the model and degrees of freedom decrease”, while on the other hand, incremental indices are used to “assess absolute or parsimonious fit relative to a baseline model” (Hancock & Mueller, 2010, p.
490). In other words, incremental indices are used when assessing the superiority of the hypothesised model to an alternative model.
In terms of absolute fit indices, the chi-square (χ2) statistic is commonly used to test whether a model fits the data. Before using this this statistic the null hypothesis must be established, which in this case is that the model fits the data (Lei & Wu, 2007).
When using this statistic, the aim of the researcher is to fail to reject the null hypothesis. When assessing goodness of fit using χ2, a researcher is looking for small, non-significant values to demonstrate a good fit, as in this case, large, significant values
are indicators of poor fit. Bentler (2007) recommends the use of adjunct fit indices to support the χ2 test such as CFI and RMSEA, while Hoyle & Panter (1995) advise researchers to always cite the chi-square value in research reports, despite the limitations to using it.
Browne and Cudeck describe the Root Mean Square Error of Approximation (RMSEA), as a measure of “discrepancy per degree of freedom” in a model (1993).
One of the strengths of this statistic is that it allows the calculation of both significance tests and confidence intervals, thanks to its known sampling distribution. Browne and Cudeck (1993) made several recommendations concerning good model fit cut-offs, including not being in favour of employing models with an RMSEA greater than 0.1.
They stated that “a value of the RMSEA of about 0.05 or less would indicate a close fit of the model in relation to the degrees of freedom,” while “the value of about 0.08 or less for the RMSEA would indicate a reasonable error of approximation” (p. 144).
Information criterion indices are an alternative to those indices used to ascertain absolute fit of specific models. These indices are used to compare models and to rank models. When using these indices to compare and rank models, the best model is marked by the smallest value. Akaike Information Crtierion (AIC) was used in this research and is one of several information criterion indices (Akaike, 1987). In determining overall model fit, researchers must determine to what extent the sample data supports the theoretical model. Various goodness-of-fit indices are used in the evaluation of the model, for example: comparative fit ratio (CFI), the root mean square error of approximation (RMSEA), and x2/df ratio (Schumacker & Lomax, 2004).
7.4.4 Model Modification
Many researchers end up with mis-specified models. According to Diamantopoulos and Siguaw, 2000, modifications can be made through connecting the indicators to the latent variable from free to fixed or fixed to free. This results in either the permitting or limiting of the various correlations amongst either the measurement errors or latent variables. When model modification is necessary, the type of specification error is key to determining how to modify the model, as Hancock & Mueller, 2010 explain:
With regard to external specification errors – when irrelevant variables were included in the model or substantively important ones were left out –
remediation can only occur by respecifying the model based on more relevant theory. On the other hand, internal specification errors – when unimportant paths among variables were included or when important paths were omitted – can potentially be diagnosed and remedied using Wald statistics and Lagrange multiplier statistics (p. 491).
Generally speaking, when measures of either component or overall fit indicate mis-specification, researchers generally have two options: reject the model or make minor modifications. If a strict confirmatory approach is chosen, the model is rejected, but this option is not common as many consider it too inflexible (Jöreskog, 1993). The more flexible approach to dealing with mis-specified models is employing minor modifications, which often involves either the addition of model parameters or omission of measurement paths.
When using the popular method of introducing additional model parameters, the Modification Index (MI) can be used in the determination of which parameters in particular could be used in the improvement of overall chi-square (Bechger, Verstralen
& Verhelst, 2002) but researchers should keep in mind that the inclusion of supplementary parameters often results in having to perform post-hoc explanations as
to why additional parameters were not included in the first place (Ruxton &
Beauchamp, 2008).
On the other hand, the omission of paths in the measurement element of the model usually involves either the reduction of latent variable indicators with low factor loadings or the creation of a composite score using multiple indicators of a latent variable. There are arguments supporting the idea that there is a loss of meaning for latent variables when indicators that represent important aspects of the variable are removed. However, Bollen and Lennox (1991) highlight the interchangeable nature of indicators of roughly equal reliabilities while pointing out that the composite indicators and latent variables are not equivalent to one another. An increasingly popular alternative to the above two options is testing a number of competing models and accepting the strongest and most appropriate model (Jöreskog, 1993). This model comparison approach was used in this piece of research and was based on the a priori development of a number of models. This approach was deemed most effective due to the other approaches being difficult in both practice and their ability to be replicated.