7.1.1 Data
Since the emergence of classical political economy, economists have tried to make sense of the apparent link between demographic and economic variables observed in the stylized facts. In attempting to verify the four classical principles empirically, statisticans face the diculties of decient preindustrial data and endogeneity between demographic and economic variables.
Regarding the data, with the construction of Wrigley and Schoeld's (1981) preindustrial time series on birth rates and death rates, quantitative studies were able to yield evidence of falsiable hypotheses for the British case, although Clark's data used here on preindustrial GDP per capita are as we have seen still debated. Moreover, in search of an overarching unied growth theory formed by universal principles, it is essential to compare the existence of the principles internationally. Mitchell's international historical statistics arguably provide the longest and most comprehensive ocial national series on vital rates and GDP per capita.1 This historical database, partly corrected by the author to eliminate some obvious typing errors, will be made use of in this chapter.2
7.1.2 Vectorautoregression
For dealing with endogeneity, the author regards time series analysis as being the most appro-priate tool. Lee (1981) was the rst author to evaluate the relationship between vital rates and economic variables by employing distributed lag regressions on Wrigley and Schoeld's dataset.
Eckstein et al. (1986) were one the rst to test economic hypotheses using a VAR model. Using
1 Since population is an internationally relatively immobile factor of production, an estimation on the country level is expected to yield signicant results.
2 The adjusted data can be found on the attached data media.
95
a Vectorautoregressive (VAR) model allows for solving the problem of endogeneity by treat-ing all variables as endogenous.3 However, a VAR model must be applied with care, as its form strongly depends on the underlying theoretical foundation. Nicolini (2007) rened Lee's approach by using a VAR model and illustrating impulse response functions that allowed assessment of the qualitative relationships between the three variables. Building on the VAR and developing a more complex methodology, Herzer et al. (2012) employed a Vector error correction (VEC) model to account for possible cointegration between the variables, while Rathke and Sarferaz (2014) introduced time-varying coecients.
This paper will retain the traditional VAR approach, since it is most arguably the easiest and most transparent estimation, as it mainly requires the knowledge of simple OLS estimation.
The idea of the VAR approach is to recover the relevant coecients from an OLS regression of contemporary values on lagged values of the variables and to use the recorded parameters to project the average impact of an exogenous shock in one of the variables over time. The obtained impulse response functions are expected to conform to the classical principles as formulated in chapter three. However, for the linear system to qualify as a VAR representation, some further reservations will be made in the following. To evaluate the principles in question, the system of our unied growth model constructed in the last chapter might initially be written in matrix notation as
To apply the VAR approach consistently, the presumed real relationships are expected to be linear. While the estimations of Lee, Nicolini, Herzer et al. and Rathke and Sarferaz were usually based on the usage of a level variable of real wages or real GDP per capita, they will in this case be replaced with growth rates of real GDP per capita, since the major part of the true relationships between the variables of our system becomes linear only when employing growth rates as has been justied in the theoretical parts of his work. Moreover, as growth rates display the same internationally valid unit of measurement, the principles may be simultaneously estimated across countries. Thirdly, instead of level variables, growth rates are most arguably stationary, which is required to avoid spurious autoregressions when not specically accounting for cointegration.
7.1.3 Stationarity of the variables
An OLS estimation over time requires at least some of the data series to be stationary, as integrated or trended variables will almost certainly give spurious results. Since the British data provide the longest national time series available, ranging from the year 1541 to 2010, tests on the order of integration as well as the tests for lag selection will be representatively conducted on this sample. The annual data on which the VAR model will be based are displayed in Figure 7.2.4
3 See, for example, Lee and Anderson (2002) and Crafts and Mills (2009).
4 GDP per capita growth is divided by ten for better visualization.
Figure 7.1: Britain: Time series on birth rate, death rate, GDP per capita growth 15412010. Sources: Clark (2009), Mitchell (2013), Wrigley and Schoeld (1981).
Sources: GDP: Clark (2009) for 15411871, Mitchell (2013) for 18712010, Vital Rates: Wrigley and Schoeld (1981) for 15411871, Mitchell (2013) for 18712010.
In the case of GDP per capita growth, the results from running Augmented DickeyFuller tests on nonstationarity seem to unequivocally indicate stationarity of the variable (see Table 7.1), while the application of the same test to death rate and particularly to birth rate does not always reject the null hypothesis of nonstationarity on a 1% level (see Table 7.1). Indeed, the pattern of birth rate and death rate has led to a debate on their order of integration.
Table 7.1: Unit root tests on the relevant variables.
Augmented DickeyFuller test for unit root Number of obs = 468
1% Crit. Test Stat. Test Stat. Test Stat. Test Stat.
model Value GDP pc gr Birth Rate Death Rate Pop growth 2 lags, no constant -2.580 -16.737*** -1.235 -1.451 -3.747***
2 lags, constant -3.443 -17.460*** -0.941 -3.539*** -6.303***
2 lags, linear trend -3.981 -17.620*** -1.682 -5.925*** -6.304***
*** indicates signicance at 1% level
Firstly, following Nicolini (2007), vital rates could be treated as stationary variables, as their values represent (population) growth rates and are by denition restricted to lie within the range (0,1). Besides, despite vital rates displaying high persistence, it seems implausible to believe that they have ever exceeded a certain maximum value, say ten percent, or that they have fallen below a minimum value, say zero, in the long run. Accordingly, they cannot in reality follow a random walk or a trend and the assumption 0 < α1, α3 < 1 should hold. Nevertheless, stationarity
of these two variables might be questioned by having found evidence of the variable natural population growth rate being stationary on a one percent level. As the latter is by denition a linear combination of birth rate and death rate, there is strong indication for the vital rates being cointegrated.5 However, as was pointed out by Fanchon and Wendel,
VAR models can be estimated with data on stationary and non-stationary variables if the non-stationary data is also cointegrated because recent theoretical work proves that estimation with such data will yield consistent parameter estimates.6
Thirdly, as was suggested by Sims (1980), if we are more interested in the nature of relationships between variables with the end purpose being estimation of the impulse response functions to capture the dynamic responses and less interested in point estimates, estimating a VAR with non-stationary variables can give us important insights into their relationships. Accordingly, the question of the order of integration of the vital rates does not pose problems with regard to a consistent estimation of a VAR model.
7.1.4 Lag order selection
In the foregoing simulation, the benets from the division of labor were strongly simplied.
However, there are at least two important reasons complicating their measurement in empirical analyses. Firstly, since we will employ national data without accounting for an international labor division, the eects of foreign population growth on domestic output are not captured in the regression. Since external trade shocks might be suspected to cause a major part of the strong uctuations of GDP per capita data as shown in Figure 7.1, this eect should not be underestimated. Secondly, to roughly illustrate the positive delayed eect of births on the extent of the division of labor, a lag of fteen years was employed in the simulation. For all real applications, the exact timing of an average individual entering the division of labor cannot be suciently determined, much less the resulting benets, which are arguably spread over the subsequent lifetime. It is assumed that a VAR model is too costly in terms of parameters to be able to properly estimate the eect of the PoLD, which is why the fteenth lag will be eliminated from estimation. This issue will be dealt with in chapter eight. Vice versa, omission of the fteenth lag greatly increases the number of degrees of freedom, which is particularly valuable when using small sample sizes.
Nevertheless, it is advisable to include a third lag by which the additional information stemming from the PoLD, stored in the remaining error terms, might be captured. The use of a VAR(3) model is supported by running a series of lagselection tests on the British data, as the most parsimonious model is suggested by the SchwarzBayesian information criterion to use three lags (see Table C1 in Appendix 11.3). With regard to a delayed fertility decision when accounting for the PoG and the PoP, a lag of three years appears plausible as well, whereas every additional lag
5 Using a VEC model specication similar to that of Herzer et al. (2012), explicitly accounting for the cointe-grated variables birth rate and death rate, or estimating a restricted structural VAR model do not yield very dierent results.
6 Fanchon and Wendel (1992).
may unnecessarily increase the number of parameters to estimate. Replacement of the fteenth
where the trend h employed in the simulation has been excluded, since it is supposed to be empirically captured by Φ1.
7.1.5 Ordering of the variables
The estimation of an unrestricted VAR model of the above system of equations is complicated by its inclusion of contemporaneous eects required to measure the PoDR and other possible immediate eects. To analyze the interactions between annual demographic and economic vari-ables, Nicolini (2007) proposed a recursive VAR structure based on Theil (1971) of the vector form
A0Yt=Ps
j=1AjYt−j+ ut
where the vector Ytcontains the contemporary values of the endogenous variables, each of which depends on its own lagged values and on contemporaneous and lagged values of the other vari-ables. Aj are the coecient matrices of the lagged values. The components of the residuals ut
are supposed to be uncorrelated, i.e. clean of those contemporaneous eects that are already included in the coecient matrix A0 (orthogonalized residuals). Multiplying both sides by A−10 yields the conventional VAR form
where consistent estimators of Σ and the Φj's are easily obtained by running OLS regressions equation by equation.
Additionally, estimation of A−10 is necessary to recover the contemporaneous response of the variables to orthogonalized shocks. However, as this requires estimation of an additional number of parameters, the system is not identied. A sucient condition to reduce the amount of parameters is to restrict the VAR model by imposing lower triangularity of the matrix A−10
from using a Cholesky decomposition Σ = A−10 A−10 0. Multiplying the residuals by a lower triangular matrix implies that, given a particular ordering inside the vector Yt, each variable is allowed to react within the current period to a shock in any of the variables of a higher ordering, while it is completely unresponsive to shocks in variables that are lower in the ordering. In this
context, yearly demographic variables seem to t the framework almost ideally as it can be clearly distinguished between contemporaneous and lagged eects. In the last chapters it was argued that childbirth rarely takes place in the same year as the fertility decision, in particular due to a pregnancy lag. Since this natural lag prevents it from being contemporaneously eected by death rate and GDP per capita, birth rate is the only plausible candidate to be the rst variable in the vector Yt. Furthermore, the death rate is placed as second variable to preserve the possibility of contemporaneous eects of GDP per capita growth on moratlity by an extended operation of the PoDR and the PoLD, which have so far been assumed to merely operate through the birth rate only. Finally, it is assumed that a change in GDP per capita does not aect the death rate in the same year, while a delayed negative eect retains the possibility of an endogenized mortality, yielding the following system for estimation:
To nd evidence for the classical growth model, the suggested linear relations should be approx-imately recovered by applying impulse response analysis to the above restricted VAR(3) model.
To this end, nine orthogonalized impulse response functions are computed by shocking the error terms of each variable's equation by one standard deviation. The initial shock instantly aects the assigned contemporaneous variables and subsequently propagates through the system. Since childbirth is, as a response to shocks in death rate and GDP per capita growth, most arguably spread over a number of years, it is reasonable to expect accumulated orthogonalized impulse response functions to yield more pronounced and signicant eects. On the other hand, as the period in question should not exceed the short term, a time horizon of more than ve years seems inappropriate, granting that the fertility decision is usually made after four periods and that a longer horizon will not provide additional information. To test for signicant eects, two hundred bootstrap replications are used to generate 95% condence intervals. The size of the shocks will be given by the standard deviation of the corresponding variables.
If the considerations made in chapter three are correct, the causal relationships given by the estimated cumulative orthogonalized impulse response functions (coirfs) following a shock in the corresponding variable should be of the form
high persistence (+) short run (+)1 short run (+)2
(x) high persistence (+) (x)
contempor.(−)3, long run (+)4 (x) low persistence (+)
where (+)1 is expected to display the positive average eect of the PoG and (+)2 to capture the positive average eects of the PoP. (−)3 is supposed to reect the negative eect of the PoDR. This relation exists by denition and the eect will be observed as long as it is not outweighed by the impact of the PoLD. As was mentioned, (+)4 will not be captured suciently well to account for the positive eect of the PoLD and its presence even poses a threat to a clear identication of the PoDR. Consequently, since PoDR and PoLD may not be clearly identiable
within the VAR(3) framework, their existence can certainly not be falsied in this framework.
Therefore, the subsequent investigation will focus on the evaluation of the hypotheses of the PoG and the PoP, the classical theory of population. The existence of the PoLD and the PoDR will be conjointly estimated in the next chapter by applying on OLS estimation to equation 3.6 Persistence eects are expected to be measured for the variables birth rate and death rate, much less for GDP per capita growth. The remaining three impulse response functions denoted (x) will also be estimated to capture further potential eects by which the classical model might be extended ex post. The resulting impulse responses should be interpreted with care, as the eects of the PoG and the PoP are supposed to be timevarying, whereas the estimation can merely give average results over the whole period in question.