Chapter 4: Model Structure and Analytical Issues
4.6 Testing Model Validity
4.6.1 Omitted Variables
A version of the Ramsey RESET test adapted to the IV–GMM environment (Baum, Schaffer et al. 2003) was used to test for omitted variables or for structural
incompleteness for all models. The test used was the “difference in Sargan” or “GMM– distance” test described by Pesaran & Taylor (1999), which tested the impact of
including the square of the fitted value of the dependent variable in the main equation.
4.6.2 Goodness of Fit and Comparing Alternative Specifications
Comparison of non-nested specifications was required in several situations, including the comparison of the SEIFA measure and mean personal income to reflect the socio- economic status of the community, and comparing the linear and log-log forms.
The R2 is not appropriate for non-nested tests. Further, the standard R2 reported in the software is not appropriate for the structural equations as it is based on the actual values of endogenous variables, while the second stage estimation is based on predicted values. Pesaran & Smith (1994) suggested a generalised R2 which was, in effect, the value of the R2 of the second stage estimation based on predicted values of the endogenous variables, which again is not readily interpreted. In the context of structural equation modelling, the issue of relevance is not the overall goodness of fit but the strength of the
coefficient estimates, and this is the view taken in this thesis. The R2 is only reported in the reduced form modelling.
The concerns with the nature of the R2 flow through to the standard methods for the comparison of alternative variables within a specification such as the Akaike
Information Criterion (AIC). Andrews (1999) proposed an alternative GMM–AIC and related measures, which are further considered by Hall & Peixe (2003), and Peixe, Hall
et al. (2006). The latter two articles show the GMM–AIC is not efficient when there is
substantial redundancy, and suggest a canonical correlation approach. For relatively simple comparison of non-nested variables such as SEIFA and income, however, GMM–AIC is suitable and was used. To compare the linear and transformed data, the approach outlined in Section 4.3.2 based on the tests of MacKinnon, White et al. (1983)
was followed.
4.6.3 Orthogonality of Exogenous Variables
As all structural models were over-identified, the Hansen J-statistic was used to test for overall orthogonality of instruments (Baum, Schaffer et al. 2003). This test was
significant when either endogenous variables were specified as exogenous, or instruments in the subsidiary equations were correlated with the error in the main equation. While the J-statistic does not identify the source of the problem, the Sargan C-statistic was used to establish the source of any failed test, allowing the model to be corrected.
In practice, the orthogonality tests were the major determinants of the instrument set, leading to exclusion of numbers of proposed instruments which were found to be correlated with the error in the main equation.
4.6.4 Weak Instruments
Staiger & Stock (1997) analysed the situation where instruments were weakly correlated with endogenous regressors, and found conventional asymptotic results failed even
badly biased and confidence intervals need not cover the true values when instruments are weak.
When there was a single endogenous regressor, Staiger & Stock (1997) recommended that an F statistic on the first stage regression of over 10 was an indicator of adequate instruments. However, with multiple endogenous regressors, the tests were less clear, as the relationship between the first stage regression equations must also be considered. Stock & Yogo (2002) developed critical values for the test statistic developed by Cragg & Donald (1993). The critical values depended on the number of endogenous
regressors, the number of instruments and the acceptable level of bias of the IV estimator relative to the OLS estimator.
Hahn & Hausman (2002) showed that the number of instruments, the R2 from a reduced form equation, the correlation between the error terms and the sample size all influence the bias, leading to the suggestion it is desirable to minimise the number of instruments used.
Except where noted otherwise, the final specification of the equations met the 10 per cent bias criteria set by Stock & Yogo (2002) for the Cragg Donald statistics. As far as possible, the number of instruments used has been minimised.
4.6.5 Multicollinearity
Multicollinearity was tested by examining the Variance Inflation Factor (VIF) (Greene 2003, p. 57). A VIF of greater than 10 for any variable implies that the R2 of a notional model with that variable as a dependent variable and all other independent variables as explanatory variables is greater than 0.9. The general advice (Hamilton 2004, p. 212) is that a VIF of over 10 needs to be considered further and possibly addressed.
While collinearity between the original variables was rarely of concern, inclusion of squared variables and interaction terms led to extreme collinearity. The average VIF value was calculated for each equation, and in cases where it was extreme or where individual VIF values were extreme, alternate approaches were sought.
4.6.6 Overfitting
Overfitting arises when the model contains too many explanatory variables, and while fitting the data well does not enable prediction (see, for example, Lui & Enders 2003; Babyak 2004; Clark 2004). Following Shao (1993), this was tested by Monte Carlo Cross Validation with 1,000 replications of estimates based on 70 per cent of the data projected across the remaining 30 per cent. Overfitting would be suspected when the average of the mean square errors in the predicted samples was significantly different from the average in the estimation sample (Harrell 2001, p. 90), however, this did not arise.