Path analysis - Statistical techniques

1.2.1 ‘Healthy’ and ‘successful’ ageing

3. Systematic Review

5.5 Statistical techniques

5.5.2 Path analysis

Path analysis is a technique which is part of the structural equation modelling (SEM) family of statistical procedures, which also includes factor analysis. Factor analysis is a method of data reduction, which seeks to group together observed variables, which are correlated, into hypothetical constructs or latent (unobserved) variables (Kline, 2011).

Modelling socioeconomic position as a latent variable would be one option for the analysis of the pathway life course model. However, for this analysis, the substantive interest was in the pathways from specific measures of childhood socioeconomic position

to quality of life, which latent variable modelling loses when combining several measures of socioeconomic position into a single variable. Therefore, factor analysis is not

considered further.

Path analysis involves the depiction (via a path diagram) of a mathematical model that is hypothesised to explain the correlations amongst observed variables (Olobatuyi, 2006).

Path diagrams are used to visually represent the hypothesised relationships between the observed variables. Figure 5.2 displays a simple path diagram whereby an individual’s education level is thought to indirectly influence their quality of life via their current income (interpretation of the coefficients is explained in section 5.5.2.1 below). In addition, education level and current income are considered to directly affect current quality of life. Single headed arrows therefore depict hypothesised causal relationships.

Double headed arrows can also be used to indicate covariance between variables, if causality is thought to operate in both directions or the variables are correlated. For example, current income influences current wealth, but current wealth can also generate income. Path analysis has the advantage of being able to test the direct, indirect, and total effects of one variable on another and allows the comparison of the relative contribution of alternative paths of influence (Kline, 2011; Olobatuyi, 2006).

5.5.2.1 Interpretation of path coefficients

Path coefficients are interpreted in a similar way to regression coefficients. In Figure 5.2 above, hypothetical unstandardised path coefficients are shown on the arrows. The example shows that the path coefficient from education level to current income is two;

this means that a one-unit increase in education level is associated with a two-unit increase in current income. Similarly, if the path coefficient from current income to current quality of life is five; a one-unit increase in current income is associated with a

Figure 5.2: A hypothetical example of a path diagram showing associated path coefficients for the influence of different measures of socioeconomic position on quality of life

Education level Current income Current quality of life

2 5

five-unit increase in quality of life. To calculate indirect effects, one multiplies each path coefficient for the direct effect from and to the variables of interest in the pathway. In the above example, the indirect effect from education level through current income is equal to 10 (two multiplied by five). This can be interpreted as the expected increase in quality of life for every one-unit increase in education level via its prior effect on current income. The total effect is simply interpreted as the sum of the direct and indirect effects, which in the example above would be 11 for the effect of education level on quality of life.

5.5.2.2 Estimation

All path models were estimated using maximum likelihood estimation. The decision was made to use single level path analysis, using dummy variables to control for country fixed effects in the pooled analysis across the 13 countries and stratification by welfare regime in the analysis looking at welfare state differences. Although the option for multilevel path analysis was available, it is recommended that this technique is avoided when the number of higher level units is below 100 (Hox & Maas, 2001). This is because inaccurate estimates may arise if the number of higher level groups is small (around 50 is specified) and the intraclass correlation is low. In addition, the residual variances and standard errors may be underestimated. For these reasons, multilevel path analysis was not adopted.

5.5.2.3 Assessment of model fit

In the structural equation modelling literature there are a range of fit statistics that can be used to assess how well hypothesised models fit the data (Hu & Bentler, 1999). MPlus provides several of these in the output generated after running a model. It should be noted that the use of model fit statistics is controversial. Some have argued that the use, particularly the cut-off criteria used to accept or reject models, risks the loss of

substantive theory at the expense of meeting arbitrary statistical criteria (Barrett, 2007).

Thus, the approach taken in this study was to test the hypothesised path model and report the model fit statistics. Paths were added or removed on the basis of theory, using the fit statistics to inform, but not dictate the final models. Therefore, a brief description of the key model fit statistics is required.

The chi-squared goodness of fit measure assesses the degree of discrepancy between the sample and fitted covariance matrices (Hu & Bentler, 1999). If the difference between the model implied covariances and the observed sample covariances is larger than the expected distribution value by a probability, usually at a 0.05 threshold, the model is considered not to fit the data (Barrett, 2007). However, a weakness of this test is that it is almost always found to be statistically significant (i.e. poor fitting), when using large sample sizes. Other incremental fit tests, such as the comparative fit index and the Tucker Lewis Index, are therefore used as measures of the implied model's fit relative to the null model (von Stumm et al., 2010). These indicate the size of the residual

correlations relative to the size of the original correlations (Weiner et al., 2012). Values of above 0.90 and 0.95 have been suggested to indicate good model fit (Hu & Bentler, 1999;

McDonald & Ho, 2002). The root mean square error of approximation (RMSEA) is also used as an absolute close-fit index (which indicates the overall extent of the residual correlations); adequate model fit is thought to be indicated by values below 0.06 (Hu &

Bentler, 1999; Weiner et al., 2012). In addition, Akaike Information Criteria (AIC) can be used to compare the fit of different nested models when appropriate, for example a path model with and without a particular direct effect. AIC is a measure of the goodness of fit of a statistical model given the data used; the model with the lowest value is considered to indicate better model fit (Hook & Regal, 1997). This can also be used to assess the fit of multilevel models.

In document An examination of the relationship between life course socioeconomic position and quality of life among Europeans in early old age and the influence of the welfare regime (Page 118-121)