Identification and Estimation of the wage returns to Education in

We now return to the dynamic multisector model of education choice described in section 2.2. This is a more complex model because it recognises the sequential nature of education choice and allows for uncertainty, which gets revealed gradually between different educational stages. It also allows for the possibility of comparative advantage for a particular educational level.²⁶Our aim is to review approaches to identifying and estimating measures of the returns to education, such as the average treatment effect or the Local Average treatment effect.

25valid in the traditional sense of being uncorrelated to unobservables and correlated with education.

26See for example Heckman and Sedlacek (1985).

There is a vast literature on discrete treatment effects and their identification. Some of the most important results are presented in Heckman and Robb (1985), Heckman, LaLonde and Smith (1999), Imbens and Angrist (1994), Heckman and Vytlacil (2005), Carneiro, Heckman and Vytlacil (2010) and many others. Most papers define statist-ical assumptions that lead to identification. Some make the additional important step of relating these assumptions to the underlying economic behaviour a prime example being Vytlacil (2002).

We focus on two issues that arise when using the model of section 2.2 as an organ-ising framework. There are two issues. First to what extent is the full model nonpara-metrically identified and second if the full model is not identified under what conditions can we at least identify the marginal distribution of earnings for each education level, so as to get to the average returns to education.

Magnac and Thesmar (2002) explicitly analyse the identification in dynamic dis-crete choice models with uncertainty where the data only include the disdis-crete choices as well as observations on the relevant state variables. In our case these would be the education choices and the variables determining the costs of education respectively (Z).

No observations on outcomes motivating such choices, such as income are observed.

They show that even in the absence of persistent unobserved heterogeneity the model is seriously underidentified: to identify the within period utility, without functional form restrictions one needs to know the distribution of the shocks, the discount factor and the current and future preferences for a reference alternative. Exclusion restrictions between alternatives can improve things but not by much.

However, when we observe outcome variables such as earnings and when we can link the choice of a level of education to the observed outcome as indeed the model presented in section 2.2 the prospects for identification improve. Heckman and Navarro (2007) present a number of identification theorems relating to the dynamic structural model itself and to the distribution of earnings in each education level. Many of the

issues can be understood by taking the simpler framework of Heckman, Urzua and Vytlacil (2006a,b) (HUV henceforth).

Consider first a simple framework where education choice can be expressed as a once and for all choice at a point in time. Individuals choose the level of education among a set of possible levels. However the levels are not necessarily ordered. The underlying reason why the choices are not ordered are the dynamics: it is possible that an individual choosing between dropping out of school or attending high school, could choose the former, in the absence of any other choice, but that if the choice of college is added, then they could progress to college (via high school graduation). HUV study identification of models with discrete treatments and unordered choices and provide identification results, which we outline briefly here.

Write the net payoff to education as

R^J(Z^J,X) =ϑ^J(Z^J,X) −V^J, J = S,H,C (35)

where Z^Jis the set of variables that affect education choice J; these could be the costs that affect a particular education level, such as fees or transport costs to the closest educational institution. Because of dynamics all Z^Jmay be the same (see below). Let Z = {Z^S,Z^H,Z^C}. The payoff to education is earnings and is given by 5. Now make the following assumptions as in HUV

1. The unobservables are jointly independent of Z, X and age: (τ^S+ε_t^J,V^J,J = S,H,C)⊥⊥ {Z, age, Xi}

2. The support (supp) of the functionsϑ^J(Z^J)and m^J(age,X) is independent of

each other so that

supp{ϑ^J(Z^J,X),m^J(age,X),J = S,H,C} =

ϑ^S(Z^S,X) × m^S(age,X) × ϑ^H(Z^H,X) × m^H(age,X) × ϑ^C(Z^C,X) × m^C(age,X)

3. The structures of the functionsϑ^J(Z^J)and of the variables Z^J is such that their support is at least as large as the support of V^J:

supp{ϑ^J(Z^J)} ⊇ supp{V^J}

4. Given age and Z, X has full rank

These assumptions imply that we can find combinations of values of Z such that the probability of any choice J becomes 1. Within that "limit set" as Heckman and Navarro (2007) call them we can identify the marginal distribution of earnings Y^Jconditional on age and X. The latter follows from the independence assumption that ensures that the distribution of earnings is the same for whatever value of Z and by the rank condition that ensures that whatever the value of Z and age there is sufficient variation in X. In addition if all we are interested is average earnings given X and age, then all we need is that the errors in the earnings equation are mean independent of Z, age and X.

This identification result suggests an estimation strategy for mean earnings. Sup-pose the only X regressor was age. Then we can estimate mean earnings for education level J at age a as

ˆY(a,J) = ∑^Nⁱ⁼¹K( ˆp(Zi)− 1)Yi^J(agei=a)

∑^N_i=1K( ˆp(Zi)− 1)

where K( ˆp(Zi)− 1) is a kernel giving maximum weight when ˆp(Zⁱ) =1. This is a

weighted average of the earnings of (potentially) all individuals with education level J, with the weights being higher the higher the predicted probability of attaining that level. The weights towards individuals with probability equal to one of achieving this level increase as the sample size increases, but at a rate which is slower than the sample size increase. Clearly this estimation procedure is only justified if the limit sets exist in the population and the assumptions detailed above are justified. Below we discuss further the support assumptions and the consequences of them being violated. But before this we turn to the model that is explicitly dynamic.

The question is how different is the dynamic context in terms of the required as-sumptions for identifying the marginal distributions of earnings corresponding to dif-ferent education levels. By examining equations 12, 13 and 14 it is apparent that all probabilities depend on all Zs so long as these are all known when the decisions are made sequentially over time. Second it is also apparent that the probabilities are non-linear functions of unobserved heterogeneity and the decision problem is not separable in observables and unobservables as in 35. The education choice model has the non-separable form

R^J=ϑ^J(Z,X,e) J = S,H,C (36)

where R^Jis the lifecycle value of alternative J and where e is a vector of unobservables.

Of course there is some structure to this, which will matter both in terms of understand-ing whether the assumptions are valid or not and for identifyunderstand-ing the dynamic discrete choice model. Heckman and Navarro (2007) make assumptions on the primitives of the model so that the support conditions discussed above carry over to this context. As before we need to be able to argue that the required limit sets exist. Given they exist the same identification argument applies as before.

In document Econometric methods for research in education (Page 48-53)