2. Economic specifications and the database
2.6. The econometric methods
Data preparation
For all estimations methods described below, I do not use the panel data as it is, but I use the data in a prepared form. The data transformation helps to address the problem of possible endogeneity (inverse causality) and therefore lessens the risk of obtaining biased and inconsistent estimators.
I prepare the data the following way: For every country, I use means of 5 years for the observations of the endogenous variables and observations of the beginning year of the
is the mean of the years 1980-1984, the corresponding observation of lnGDP is from 1980. This implies that I use lagged exogenous variables. This procedure responds to possible endogeneity, because it impedes that FLF inversely affects lnGDP. It is not possible that FLF observed in 1984 impacts GDP per capita in 1980.
The data preparation procedure provides quinquennial data. In order to avoid inverse causality, I also could simply carry out the estimations based on yearly observations and use one year lags of the exogenous variables. For example, if a country’s observation for FLF would be from 1985, the corresponding observation of lnGDP would be from 1984. In fact, I use means of five years for the observations of the endogenous variables because the technical structure of the data is unknown. The data contains mainly yearly observations, but in some countries the data inquiry probably takes place only every five to ten years. The database from Barro and Lee (2000), for example, provides only observations at 5-year- intervals for the educational variable EDU. Moreover, the partition of the measured time period in five year-sections limits time series variations, because five year-intervals are less likely to be serially correlated than annual data.
The downside of the described procedure of the data preparation is that it reduces the numbers of observations. A comparison of table 10 to table 11 shows that data preparing leads to a reduction in the numbers of observations by 50 to 75% for each variable apart from EDU.27 Furthermore, the use of observations of the beginning year of the means for the
exogenous variables also entails the risk of losing data. If the observation of the beginning year is missing, all observations belonging to the left hand side (up to 5 observations that build the respective mean) drop out, too. The reduced numbers of observations increases the risk of biased and inconsistent estimates. On the other hand, in a complete case estimation based on unedited data, missing observations of the endogenous variable would reduce the database. Using means for the endogenous variable mitigates the problem caused by missing observations of FLF, FAR and RAR, which in turn lowers the risk of achieving biased and inconsistent estimates.
As the data contains observations that vary over time, it is possible that the cross sectional time series are marked with a trend. If this is the case, the data would be non-stationary, meaning that the mean and the variance of a variable’s observations does change over time. Consequently, the estimated coefficients would be inefficient, because the standard errors and t-values would be estimated too high (spurious regression results). To test for possible non-stationarity, I apply a panel data unit root test that goes back to Levin, Lin and Chu
(2002). Levin, Lin and Chu (2002) assume that the stochastic process of a time series has a “unit root” when the coefficient of the lag is 1, meaning that the actual value of a variable “keeps its past value completely in memory”. If the coefficient is smaller than 1, the memory decreases with the size of the lag (geometric series), meaning that the time series is stationary. The test’s null hypothesis is that each variable’s time series contains a unit root against the alternative that each time series is stationary. The unit root test demands balanced panel data. Therefore I apply the test for a sub-set of the quinquennial data, using only observations of the OECD countries and the years 1980-2000. This seems appropriate since time trends are especially important for homogenous groups of countries. The drawback is that the balanced data have a smaller time dimension (only five periods) than the original data. I do the unit root test for four variables FLF, FAR, RAR and lnGDP. Table 12 shows the test results. The results reveal that the lagged level of all the four series is negative and significant, indicating that the presence of a unit root is rejected. This means that all four variables are stationary, which implies that it is appropriate to apply standard interference to the estimation results. Nevertheless, it cannot be excluded that the test finds stationary processes because of the small time dimension of the data.
OLS-estimation
Based on the quinquennial data, I start with a pooled OLS regression that measures both between and within-country variation. Pooled OLS results should be regarded with reservation, because the estimated OLS-coefficients may be biased and inconsistent due to omitted variables. This problem can occur if the estimation model omits important exogenous variables that are relevant to explain the levels of female labour market participation. It is very likely that estimation model (1) that contains only lnGDP and (lnGDP)² as exogenous variables produces biased OLS-estimates. Estimation model (2) only contains two more exogenous variables, FERT and EDU and captures further effects by dummy variables. High dummy variable coefficients would indicate that the estimation model does not sufficiently describe the endogenous variable.
Fixed Effects-estimation
The fixed effects model captures only within-country variation and therefore controls for level- differences (between groups of countries of different income levels). Using a fixed effects
are constant over time (country specific dummy variables). Therefore, the fixed effects model avoids biased estimation results caused by omitted variables that are constant over time. On the other hand, by introducing country dummies in the estimation equation, which allows the elimination of time constant variables, the fixed effects model can weaken the significance of the estimated coefficients due to a dummy variable trap.
Random Effects-estimation
The random effects model captures both within and between-country variation by assuming that country-specific effects that are constant over time are random factors and that the exogenous variables are uncorrelated with the random effect. If this is the case, unobserved country specific variables that are constant over time are captured by an additional residual and the estimators are unbiased and asymptotically consistent. I carry out a Hausman test in order to see if this assumption is appropriate and in order to choose between the fixed effects and the random effects model.
Instrumental Variables Estimator (2SLS)
To further control for possible endogeneity apart from the data preparation, I use an instrumental variables estimator. For the basic model (1), I use lagged variables of lnGDP as instruments for lnGDP and lagged variables of (lnGDP)² as instruments for (lnGDP)². I create the lagged variables again by using the quinquennial data and I perform the IV-regression in two steps (Two Stage Least Squares Estimator).
I start by estimating a reduced form in the first step:
t i t i t i
GDP
P
D
Gˆ
, 1 2ln
, 1 ,ln
=β
+β
−+ε
(11.3)which regresses the endogenous regressor
lnGDˆP
i,t over the instrumentlnGDP
i,t−1.Then I calculate
lnGDˆP
i,t based on the estimated coefficientsβ
1 andβ
2 and I calculate2 ,
)
ˆ
In the second step, I estimate the female labour market participation (FLF, FAR and RAR) based on 2 , ) ˆ (ln G D Pi t and on
lnGDˆP
i,t: t i t i t i t iGDP
GDP
n
rticipatio
urMarketPa
FemaleLabo
2 , , 3 , 2 1 ,=β
+β
ln
ˆ
+β
(ln
ˆ
)
+ε
(11.4)Concerning model (2), which includes other exogenous variables, I use lagged variables of
FERT as instruments for FERT and lagged variables of EDU as instruments for EDU.
System GMM-estimation
A Generalized Method of Moments (GMM) estimator is a dynamic panel-data estimator, appropriate to capture both between-country and within-country variation. GMM allows omitting unobserved variables that are constant over time and considers possible endogeneity at the same time. The GMM method goes back to Arellano and Bond (1991), who obtained additional instruments by introducing first differences of the endogenous and exogenous variables (difference GMM). The differencing process allows leaving out country specific variables that are constant over time, but it magnifies gaps in panels with missing observations. I use a one step System GMM estimator that makes orthogonal deviations instead of differencing (based on Arellano and Bover, 1995; Blundell and Bond, 1998). Instead of subtracting the previous observation from the current one, it subtracts the average of all future available observations of a variable to minimise data loss. The System GMM combines the level equation and the difference equation as a “system”. Like differencing, making orthogonal deviations reduces the risk that the stochastic processess of the exogenous variables are non-stationary. Furthermore, the System GMM specification differs from the other estimation models by the presence of a lagged endogenous variable (L.FLF respectively L.FAR respectively L.RAR) among the exogenous variables. This allows controlling for the dynamics of adjustment.