Measurement Invariance Testing - Appendix to Chapter 2

2.10 Appendix to Chapter 2

2.10.5 Measurement Invariance Testing

Once the measurement model is established for each hypothesised construct, measurement invariance (MI) testing was performed to ensure the latent variables constructed have a consistent interpretation across independent groups. Multi-group analysis setting is chosen instead of a simpler Multiple Indicator Multiple Causes model (MIMIC) for MI testing as a previous study that reported a difficulty in detecting MI for factor loadings using a MIMIC model (Kim, Yoon and Lee, 2012).

The purpose of measurement invariance (MI) testing is to ensure that the latent constructs (factors) can be compared consistently across groups in terms of means as well as their qualitative interpretations. Loadings and intercepts are tested for invariance; loadings indicate the contribution that the observed items make to factor interpretation. Intercepts are the expected means of the observed variable when the factor score is zero.

In the MI testing, the model with more restrictions is retained if the less restricted model is not statistically significantly better, which is tested using the Chi-square difference or likelihood ratio testing. The widely accepted four levels of MI are, in the order of the least and the strictest form, Configural, Weak, Strong, and Strict MI (Byrne, 2012; Millsap, 2011). Starting with the configural model, in which the same number of observed items mapped for a latent variable, restrictions for next level of invariance are added. A weak invariance model has loadings of observed variables equal across groups in addition to the constraints of a configural model. A strong invariance model is one in which intercepts of ordinal/binary observed items are same across groups while strict invariance also tests whether the residual variances are identical between groups (Millsap, 2011; Millsap and Yun-Tein, 2004). The sequence of MI testing follows that of Millsap (2011). Details of the configuration and corresponding equations for the measurement models for group g for G number of groups are as shown in Table 2.10.

In this study, MI testing is performed using three group variables: age (aged 30-39 and 40-49), gender (male and female) and household income (below or above the median) in a multi-group analysis framework in Mplus. This framework allows the estimation of a separate set of parameters for sub-populations separately but simultaneously. The equality

2.10 Appendix to Chapter 2 64

Table 2.10 Measurement invariance models in the multi-group analysis framework

MI level Model description Equation Configural invariance The factor(s) predicts the same number(s)

of observed items across groups.

yyy∗g= ΛΛΛgyηηηg+ εεεg, τττ g y,

τ_cq1 ̸= · · · ̸= τ_cqG Weak invariance Factor loadings are restricted to be the

same across groups.

yyy∗g= ΛΛΛyηηηg+ εεεg, τ_cq1 ̸= · · · ̸= τ_cqG Strong invariance The thresholds and factor loadings are

restricted to be the same across groups.

yyy∗g= ΛΛΛyηηηg+ εεεg, τcq1 = · · · = τcqG

Strict invariance The error variances are restricted to be the same across the groups.

yyy∗= ΛΛΛyηηηg+ εεε,

τ_cq1 = · · · = τ_cqG.

Note: Weak invariance is alternatively called metric invariance. An alternative name for strong invariance is scalar invariance.

of parameters across groups are tested using the Chi-square test of difference, DIFFTEST in Mplus, see (Muthén and Muthén, 2017). This test is performed at the three levels of MI first, before locating the source of variance (observed items) using the Chi square test of difference. Rejecting the null hypothesis, which states that the model with group-specific estimates fits the data better than the restricted model, would indicate that items (at least some of them) need to be allowed to vary across groups.

The results from the MI testing using age and gender show weak invariance, while a partial weak invariance is achieved when using the dichotomised income variable. The loadings for the items ‘run-out’, ‘money-left’, ‘credit’ and ‘understand’ differ depending on whether the household is below or above the median (see Figure 2.6). This implies that the factors are interpreted slightly different by income group. In particular, as the interpretation of financial resilience essentially captures individuals’ economic agency to which income is a crucial element, assuming full MI leads to biased estimation in both measurement and structural parts. However, keeping this measurement structure is only possible when the income variable is categorical. This can be interpreted in the line of an ‘interaction effect’; that is, the loading for the factor varies depending on a group membership. However, this poses an issue when the MI testing variable is continuous; in the current analytical setting in Mplus, it is not possible to measure an interaction effect between a latent factor (i.e.

2.10 Appendix to Chapter 2 65

Fig. 2.6 Factor loadings by dichotomised income groups: below (Lower) or above (Upper) the median household income) (2012/14, n=5,755)

Note: For the survey questions corresponding to the names, see Table 2.3.

Financial resilience) and a continuous control variable (i.e. income), whether such interaction is additive or multiplicative in nature (see Van Der Weele and Knol, 2014).

Additionally, it is important to note that these MI testing results would vary depending on the categorisation of the income variable. Also, different categorisation methods may result in more than two groups; the extent of the MI testing may differ from those obtained when using dichotomised groups. Another important aspect to consider is whether household income should be categorical for the purpose of the study. Household income is better kept as a continuous variable as categorising it arbitrarily may be costly as it is one of the key variables in the study. Therefore, factor loadings are assumed to be invariant across implicit income groups.

The intercepts, which are the expected mean of the survey question when the factor score is zero, may also vary by group. The initial MI testing using the binary income variable suggested partial weak invariance in which thresholds for observed variables were freely estimated. As the sample size is relatively large, it would be useful to examine the extent of

2.10 Appendix to Chapter 2 66

difference graphically and to determine whether differences are indeed meaningful. Figure 2.7 shows the proportion of respondents in each response category in selected observed items by income group. Except for ‘credit’, the patterns appear different between the Lower, and Upper income groups and these differences are deemed not ignorable. As a continuous income variable is preferred to a categorical one, these differences can be accommodated by introducing the direct effects of income on these observed variables.

An MI testing by sex also showed three items that were non-invariant, but the extent of non-invariance was not substantial. Therefore, the cost of assuming full MI in this case is assumed to be low. This is further explained in the Appendix to Chapter 3 that investigates gender difference in retirement saving. The factors were measurement invariant with respect to age (not reported). Therefore, a weak invariance model, where income has a direct effect on the above-mentioned items, is used below.

Fig. 2.7 Response pattern differences between upper and lower income groups (2012/14)

2.10 Appendix to Chapter 2 67

Fig. 2.8 Path diagram of the direct effect of income on financial resilience and its items only (Standardised coefficients, 2012/14, n=5,755)

2.10 Appendix to Chapter 2 68

In document Younger adults’ retirement saving and wealth accumulation in Britain a quantitative investigation (Page 79-84)