• No results found

Comparison to previous studies using FLS data

We now compare the MTP results with those of the other popular model selection strategies reviewed in Section 1.2.

PcGets/Autometrics

The similarity between multiple testing and general-to-specific also becomes visible when

13Results using the procedures of Storey, Taylor, and Siegmund (2004) and Benjamini, Krieger, and Yekutieli (2006) can be found in Table 1.19 in the Appendix.

comparing the variables found significant (Table 1.6).14 Hendry and Krolzig (2004) find 16 significant variables, compared to 15 for the bootstrap approach at γ = 0.1. Of these, 14 coincide. Only Outward orientation is included by the bootstrap but not by PcGets/Autometrics, whereas the latter includes the Number of years as an open economy, unlike the bootstrap method.15 It may seem wise not to overstate even these small dif-ferences between the two models. Outward orientation and Number of years as an open economy are variables that rather readily substitute for each other, that is, exhibit negative jointness in the terminology of Doppelhofer and Weeks (2009). Hence, the final specifi-cations are likely to be even more similar to each other than the overlap of 14 significant variables suggests.

To the extent that our Monte Carlo study is informative about cross-section growth data sets, this result is plausible: consider the column ρ = 0.5, which arguably comes closest to real-world growth data sets. In e.g. Table 1.3, PcGets/Autometrics has a higher power (at the expense of a higher FDR) than the MTPs and should hence be expected to produce slightly more rejections.

Bayesian model averaging

Table 1.6 shows that BMA using the settings of FLS yields eight variables with a posterior inclusion probability above the threshold of 50% employed here. This roughly compares with the 10 significant variables using the MTPs at γ = 0.05. Again, these results are plausible given the findings of our Monte Carlo simulations: consider once more column ρ = 0.5 and Table 1.3. BMA using the FLS settings (m = k/2, g = k−2, fixed θ), BH and the bootstrap method find around 10 relevant variables, with the MTPs being somewhat more powerful in this particular scenario. The five variables with the highest marginal posterior probability in FLS are also significant according to the bootstrap method at γ = 0.05.

Again, some of the apparent differences in results between the methods are likely driven by jointness effects. For instance, specific religion variables are often assigned some im-portance by one method but not another. E.g., Fraction Hindu is significant at γ = 0.01 while its marginal posterior probability is only 0.097. Conversely, the fraction of Muslims has a rather high posterior inclusion probability, but is not included by any of the MTPs nor PcGets/Autometrics. This suggests that the relevance of a variable should not only be

14Hoover and Perez (2004) also apply their variant of Gets to growth data. The dataset they work however differs somewhat from the one used in FLS, and hence we prefer to compare the MTP procedures to the results of Hendry and Krolzig (2004) who use the same data as FLS.

15A J-test (Davidson and MacKinnon, 1981) rejects both models, although Hendry and Krolzig’s model is rejected only at larger significance levels.

0 50 100 150

Note: Sorted log predictive scores for the eight priors from Ley and Steel (2009), BH and the Best model using the FLS prior settings. 150 subsamples.

analyzed marginally, but also jointly with that of other, related variables that may either complement or substitute for that variable. Consequently, the approach of Doppelhofer and Weeks (2009) may enrich the lessons that can be drawn from the marginal view inherent in both the BMA variant analyzed here and the MTPs.

Regarding the robustness of BMA results, Ley and Steel (2009) show that for the eight prior choices considered in their paper (cf. Section 1.2), the posterior mean model size ranges from 6.03 with m = k/2 = 20.5, random θ and g = 1/k2 to 19.84 for m = k/2 = 20.5, fixed θ and g = 1/n. Their Table II shows the ranking of the marginal posterior probability of including a certain variable to also be highly sensitive to the prior settings. Comparing the prior choice g = 1/n, fixed θ and m = 20.5 with that of g = 1/n, fixed θ and m = 7 they note: ‘Fraction Hindu, the Labor force size, and Higher education enrollment go from virtually always included with m = 20.5 to virtually never included with m = 7.’

Similarly, recent work by Eicher, Papageorgiou, and Raftery (2011) shows that some alternative ‘default’ priors can lead to rather different growth models using the FLS data, with ‘as few as three and as many as 22 regressors’ being found to influence growth. They recommend a unit information prior which is very closely related to the UIP discussed in Section 1.3, cf. their Table I. It will be interesting to see whether the BMA literature will

Table 1.7: Log predictive scores

Notes: LPS scores are calculated using the FLS data, using 75% of the data (i.e. 54 observations) as a training sample, and the remainder of n = 72 as a holdout sample. 150 subsamples. The BMA variants are those considered by Ley and Steel (2009), the best model uses the settings of FLS.

henceforth adopt this choice, or whether different models continue to be put forward using different variants of BMA.16

These findings might help explain the differences between the MTPs and the marginal posterior inclusion probabilities of FLS, as well as the differences between the latter and the results of the other model selection procedures (see below). The findings of Ley and Steel (2009) and Eicher, Papageorgiou, and Raftery (2011) imply that the robustness of BMA must be interpreted with care.

We also follow FLS in calculating the predictive performance of the BH procedure as well as of all BMA procedures considered by Ley and Steel (2009). The procedures’ predictive performance is measured by their log predictive scores, a statistic that increases in both lack of predictive fit and sampling uncertainty. We employ the R (R Core Team, 2012) package BMS of Feldkircher and Zeugner (2009) and follow the design of FLS. That is, we randomly split the n = 72 observations into a training (or ‘inference’) subsample of size 0.75 · 72 = 54 and a holdout (or ‘prediction’) subsample of size 18. Figure 1.2 reports results for 150 subsamples. Table 1.7 gives the corresponding minimum, mean and maximum LPS.

For BH at γ = 0.05, we find a minimum, mean and maximum LPS of −3.35, −2.66 and −0.81 over 150 subsamples. These values are higher, hence worse, than for instance those for the BMA prior settings of FLS (m = k/2, g = 1/k2, fixed θ), i.e. −3.48, −2.92 and −0.99. This reflects, as is also known from the forecast combination literature, that

16As with the MTPs, our discussion of alternative variants of BMA is constrained by space considerations and focuses on those that we believe are most prominent in the literature. Other recent proposals include Liang et al. (2008), Feldkircher and Zeugner (2009) or Crespo Cuaresma (2011).

using evidence from multiple models tends to improve out of sample performance. That said, the differences seem to be modest.17 In general, the different BMA settings have quite similar LPS in the center of the distributions, and hence also similar mean LPS. Thus, all of these have better mean LPS than BH. The different BMA settings however lead to rather different best and worst LPS. The cases m = k/2, g = 1/k2, random θ as well as m = 7, g = 1/k2, random θ for instance have worse best-case LPS than BH. On the other hand, all BMA settings lead to better worst-case LPS than BH. BH seems to be more competitive with the BMA procedures when bad LPS are considered than when looking at favorable ones.

Overall, the distance between BH and the BMA variants appears to be modest when compared to the variance of the LPS of the different procedures over the different subsam-ples: the predictive performance of an average BH model is much better than that of a poor BMA model. Hence, on average we expect BMA to predict more accurately, but there is no guarantee that this also holds true for any given sample one uses for prediction in practice.

Finally, the BH procedure (as do the BMA variants) outperforms the best model, i.e., the one with the highest posterior probability, of the BMA exercise using the FLS settings.

Sala-i-Martin

In addition to the varying selection of three control variables, Sala-i-Martin (1997) imposes inclusion of three more variables deemed to be important by default—GDP level in 1960, Life expectancy and Primary school enrollment—in his empirical application. He finds 22 significant variables, but assumes relevance of the three default variables. There are five variables found significant by Sala-i-Martin (1997) (including defaults)—GDP level in 1960, Fraction Confucian, Life expectancy, Equipment investment, Sub-Saharan Dummy—that can be confirmed by the bootstrap approach (γ = 0.05), FLS and PcGets/Autometrics.

Beyond that, there is little agreement with the MTPs. Using a tolerated FDR up to 10%, we can only confirm 9 of his 25 significant variables. In light of our Monte Carlo results, it is not implausible to interpret this high number of rejections as resulting from a high FDR of Sala-i-Martin’s approach.

Lasso

Schneider and Wagner’s (2012) model includes 15 variables, of which 12 coincide with those identified by the bootstrap method at γ = 0.1. These 12 variables are also among the 16 selected in Hendry and Krolzig (2004). The three additional variables that are selected by

17The values for the FLS settings are a bit larger than those reported by FLS. This suggests that the 20 subsamples drawn by FLS may have happened to be rather favorable to prediction.

Lasso are Fraction Muslim, Rule of law and Non-Equipment investment. The Lasso does not include the number of years as an open economy and the Spanish, French and British colony dummies. Of these, only the Spanish dummy is also included by both BH and the bootstrap. Overall, this indicates some robustness concerning the significance of the 12 variables that are selected by the bootstrap, PcGets/Autometrics and Lasso. Interestingly, some of these 12 variables have very low marginal posterior probabilities when BMA is used. The Lasso however agrees with BMA in including the fraction of Muslims.

Overall, there are five variables jointly significant in FLS, Sala-i-Martin (1997), Hendry and Krolzig (2004), with the Lasso and for the MTPs at γ ≥ 0.05: GDP level 1960, Fraction Confucian, Life expectancy, Equipment investment and the Sub-Saharan dummy.

Hence, the relevance of these variables appears quite robust. These variables mostly have a plausible economic or cultural and religious motivation.