MULTIPLE ENDOGENOUS EXPLANATORY VARIABLES

Extending the previous methods to a vector of endogenous explanatory variables, w, is, in principle, straightforward, assuming that we can

estimate E(w

|

x) and Var(w

|

x). We now write the structural model as

E(y

|

w,c) = a + wb, (6.1) where w is a 1

*

k vector and b is a k

*

1 vector. The key assumptions are identical to Assumptions 2.1 and 2.2, except that the scalar w is replaced with the vector w. We still refer to these as Assumptions 2.1 and 2.2.

Allowing for a vector in (6.1) considerably expands the scope of models. For example, suppose that v is a scalar and the structural model is

E(y

|

z,c) = a + b z + b z ,₁ ₂ (6.2) so that the model is quadratic in the explanatory variable of interest, z.

model (6.2) can be written as in (6.1) with w = (z,z ). Estimating E(w

|

x) and Var(w

|

x) means that we must estimate the first four moments of z given x. We might do this by estimating a very flexible model for the distribution of

z given x, and then extracting the first four moments.

Model (6.1) also includes the treatment effect setup with multiple treatment states. For example, a person may participate full time, part time, or not at all in a training program. Then, w could contain two

indicators for full time participation. Or, w could contain indicators for participation in different programs that are not mutually exclusive. In such examples, the practical difficulty is estimating E(w

|

x) and Var(w

|

x).

Multinomial response models, such as multinomial logit or probit, can be used with flexible functions of x.

The following proposition is a straightforward extension of Proposition 2.1:

PROPOSITION 6.1: Under Assumptions 2.1 and 2.2 (but where w is now a

other words, provided Var(w

|

_ )

(x) is nonsingular, -1 E(b

|

x) =

)

(x) Cov(w,y

|

x) (6.3) -1 =

)

(x) E{[w -

M

(x)]

’

|

x}, (6.4) where

M

(x)

_

E(w

|

x).

PROOF: First, the equivalence between (6.3) and (6.4) is immediate. Next, by Assumption 2.2,

M

(x) = E(w

|

x,c) and

)

(x) = Var(w

|

x,c). Under Assumption 2.1 we can write

y = a + wb + u, E(u

|

w,c,x) = 0. (6.5) Therefore,

[w -

M

(x)]

’

y = [w -

M

(x)]

’

a + [w -

M

(x)]

’

+ [w -

M

(x)]

’

u. (6.6) By (6.5), the third term on the right hand side of (6.6) has zero expectation conditional on x. By Assumption 2.2, the first term also has zero

expectation given x because E{[w -

M

(x)]

|

x,a} = 0. Therefore, taking the expectation of (6.6) conditional on x gives

E{[w -

M

(x)]

’

|

x} = E(E{[w -

M

(x)]

’

|

x,c}b

|

x) = E(E{[w -

M

(x)]

’

|

x}b

|

x) = E[Var(w

|

x)b

|

x] =

)

(x)E(b

|

x).

It follows that E(b

|

x) is equal to (6.4) provided that

)

(x) is positive definite. This completes the proof.

)

As in the scalar case, it follows by iterated expectations that -1

B

= E[E(b

|

x)] = E{

)

(x) [w -

M

(x)]

’

y}, (6.7) and so a consistent estimator of

B

^ -1 ^ -1 ^

B

= n

S

{

)

(x )_i [w_i -

M

(x )]_i

’

y },_i (6.8) i= 1

^ ^ for suitable estimators

M

(

W

) and

)

(

W

For some purposes, especially for the case of multiple treatments, it is convenient to express the equation of interest without necessarily having an intercept, as in

E(y

|

w,c) = wc, (6.9)

where now w is 1

*

(k + 1) and c is (k + 1)

*

1. For example, let w be a set of mutually exclusive treatment indicators, w

_

(w ,w ,...,w ), where w₀ ₁ _k ₀

might correspond to no treatment. Let y , y , ..., y₀ ₁ _k denote the

counterfactual outcomes and assume, as in Section 4, that each y_j is mean independent of w, conditional on x. Then we can write

y = w E(y₀ ₀

|

x) + w E(y₁ ₁

|

x) + ... + w E(y_k _k

|

x) + u, (6.10) where u = w u_{0 0} + w u_{1 1} + ... + w u ._{k k} It follows under the conditional mean independence assumption that E(u

|

w,x) = 0, which means we can take c

_

[E(y₀

|

x),E(y₁

|

x),...,E(y_k

|

x)], which shows that Assumption 2.1 holds. As in the scalar case, Assumption 2.2 holds by construction since c is a function of x.

A very slight modification of the proof of Proposition 6.1 gives -1

E(c

|

x) = [

*

(x)] E(w

’

|

x), (6.11) where

*

(x)

_

E(w

’

|

x). By the usual iterated expectations argument, the APEs,

G _

E(c), can be obtained as

-1

G

= E{[

*

(x)] w

’

y}. (6.12) Estimation is now straightforward, once

*

(x) has been estimated.

Why might we use (6.11)? In the case of mutually exclusive, exhaustive treatments, (6.11) gives a simple way to derive the multivariate extension of the Horvitz-Thompson estimator. (We could use (6.8), but it is much more tedious.) When w is defined as the the vector of mutually exclusive

treatment indicators, with w₀ = 1 indicating no treatment, w

’

w is simply the th

(k + 1)

*

(k + 1) diagonal matrix with j diagonal element w , since w w_j _{j h} = 2

0, h

$

j, and w_j = w ._j Therefore,

*

(x) is the (k + 1)

*

(k + 1) diagonal th

matrix with j diagonal

p (_j x)

_

P(w_j = 1

|

x) > 0, j = 1,...,k, (6.13) the probability of receiving treatment level j as a function of the

-1 th

covariates. Therefore, [

*

(x)] w

’

y is the (k + 1)

*

1 vector with j element w y/p (x), which means the estimator of_j _j g is_j

^ -1 ^

g = n_j

S

[w_{ij i}y /p (_j x )],_i (6.14) i= 1

which is simply the average of outcomes over treatment level j, weighted by the inverse of the propensity score for treatment level j. Since g =_j

E[E(y_j

|

x)] = E(y ),_j g is a consistent estimator of the expected level under_j treatment regime j; it is the standard inverse probability weighted

estimator. The treatment effect for treatment level j, compared with no treatment (j = 0), is consistently estimated as

^ ^ ^

b = g - g ,_j _j ₀ (6.15)

^ ^ ^ ^

where p (x) = 1 - p (x) - ... - p (x) is used in obtaining₀ ₁ _k g . Depending on₀ ^

the nature of the treatments, the p_j could come from an unordered multinomial response model, such as multinomial logit or probit, or an ordered response model, such as ordered logit or probit. It is easy seen that when k = 1, (6.15) reduces to the Horvitz-Thompson estimator in (4.5).

Estimating conditional APEs is also a straightforward extension of the methods in Section 5.

7. APPLICATION

We apply the previous methods to estimating the effect of class

attendence on final exam performance in principles of microeconomics. The data were collected by Ronald C. Fisher and Carl E. Liedholm, both of whom taught Economics 201 during Fall 1996 at Michigan State University.

Attendance was taken electronically, using a card reader monitored by

teaching assistants. The identical final exam was given in all sections of the course.

The variable to be explained is the standardized final exam score (stndfnl) and the key explanatory variable w is the fraction of courses

attended (atndrte). The elements of x include prior grade point average, ACT score, the squares and cubes of these, and binary indicators for first-year and second-year studentas. The sample size used is n = 680.

The estimated coefficient on the attendance rate from the "kitchen sink" ^

OLS regression is b = .667 (standard error = .240), where the standard error, and all that follow, are robust to heteroskedasticity. To apply the

estimator in (3.2), we need models for E(w

|

x) and Var(w

|

x). Because atndrte is bounded between zero and one, E(w

|

x) is specified as the logistic function -- with the same x as in the OLS regression. The logit quasi-MLE is used to estimate the parameters of the conditional mean [see Papke and Wooldridge (1996)]. A choice for the variance is less obvious function. In order to keep the variance estimates nonnegative, we use an exponential function that includes as explanatory variables a cubic in the estimated mean function. The parameters are estimated using the Poisson QMLE, since this is fully robust to distributional misspecification and easy to obtain in Stata 7.0.

Given the mean and variance estimates, we construct r_i as in (3.8) and, ^

rather than compute (3.2), we use r_i as an IV for w_i in y_i = bw + e_i _i ^

(without a constant). This gives b = .730 (standard error = .363). This is somewhat larger than the OLS estimate, and, not surprisingly, it has a

notably larger standard error.

When estimating a conditional APE, the different estimation strategies do give markedly different results. Suppose we want to know the APE at various values of prior grade point averages. As the sample average prior GPA is about 2.6, we write E(b

|

prigpa) = d + d (prigpa - 2.6), so that d is₀ ₁ ₀

the partial effect of atndrte at the average of prigpa. To estimate d and₀ d by OLS, we include atndrte and the interaction atndrte(prigpa - 2.6) in1 the regression, along with the other controls. We obtain

E(b

|

prigpa) = .815 + .581 (prigpa - 2.6), (7.1) (.251) (.444)

which suggests that the CAPE increases with prior GPA, although the effect is only significant at the 20 percent level.

When we apply the estimator from Section 5 -- specifically, equation (5.5) with the controls listed above in g(x) -- we obtain

E(b

|

prigpa) = .679 + 1.325 (prigpa - 2.6). (7.2) (.283) (.466)

Now the CAPE depends very strongly on prigpa, and the t statistic is very significant, too (t = 2.85). To test whether the two sets of parameter

^ ^

estimates differ significantly, r_i and r (prigpa_i _i - 2.6) -- the instruments ^

used for atndrte_i and r (prigpa_i _i - 2.6) -- are added to the regression used to obtain (7.1). This gives a regression-based Hausman test with two degrees-of- freedom. The heteroskedasticity-robust test -- more precisely, the F-type exclusion restriction statistic reported by Stata 7.0 -- gives a p-value of .0046, which shows that the two estimates are statistically different, too.

8. CONCLUSIONS

I have shown how to estimate average partial effects, both

unconditionally and conditional on a set of observed covariates, in a model with a nonconstant partial effect. The partial effect is allowed to depend on unobserved heterogeneity as well as on observed covariates. The key requirements are specification of a structural conditional expectation that is linear in the endogenous explanatory variable and the presence of good proxy variables for unobserved heterogeneity. Also, the mean and variance of the EEV, conditional on the set of covariates, need to be estimated. Here, I have focused on flexible parametric methods, but it seems reasonable to

---

expect

r

n-asymptotically normal estimation of b when nonparametric methods

are used.

A sensible way to view the new estimators of the APE is that they are extensions of the standard "kitchen sink" regressions that are used when the treatment effect is assumed constant. In the special case of a linear, homoskedastic model for w given x, the particular kitchen sink regression turns out to be consistent, even if b is not constant. We also showed that, under the weaker assumption that Var(w

|

x) is uncorrelated with the slope b,

an OLS regression that simply adds the estimated mean function, E(w

|

x), consistently estimates b.

A limitation of this paper in the scalar case is that, except in the case of a binary treatment, the model imposes a particular functional form on how the EEVs affect the outcome. Nevertheless, this is often what economists have in mind when a partial effect depends on unobserved heterogeneity. To

some extent the functional form can be made more flexible by adding

polynomials in the EEVs and using the methods of Section 6; certainly the multiple treatment effect case can be handled in this way.

The approach to estimating conditional APEs in Section 5 is simple, but assumes that E(b

|

q) is linear in parameters. [Alternatively, we always consistently estimate L(b

|

q).] Because the CAPEs are nonparametrically identified, a useful topic for future research is to obtain estimators and asymptotic results for an interesting class of CAPEs.

The nonrandom sample selection problem also deserves further study because whether some data are missing may be systematically related to the random slope, b.

REFERENCES

Angrist, J. (1991), "Instrumental Variables Estimation of Average Treatment Effects in Econometrics and Epidemiology," NBER Technical Working PaperNo. 115.

Angrist, J., G.W. Imbens, and D. Rubin (1996), "Identification of Causal Effects Using Instrumental Variables," Journal of the American Statistical

Association 91, 444-455.

Barnow, B., G. Cain, and A. Goldberger (1980), "Issues in the Analysis of Selectivity Bias," Evaluation Studies 5, 42-59.

Bjorklund, A. and R. Moffitt (1987), "Estimation of Wage Gains and Welfare Gains in Self-Selection Models," Review of Economics and Statistics 69, 42- 49.

Dehejia, R.H. and S. Wahba (1999), "Causal Effects in Non-Experimental Studies: Evaluating the Evaluation of Training Programs," JASA 94, 1053- 1062.

Hahn, J. (1998), "On the Role of the Propensity Score in Efficient Semiparametric Estimation," Econometrica 66, 315-331.

Garen, J. (1984), "The Returns to Schooling: A Selectivity Bias Approach With a Continuous Choice Variable," Econometrica 52, 1199-1218.

Heckman, J. (1978), "Dummy Endogenous Variables in a Simultaneous System,"Econometrica 46, 931-960.

Heckman, J. (1997), "Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations," Journal of Human Resources 32, 441-462.

Heckman, J. and B. Honoré (1990), "The Empirical Content of the Roy Model,"

Econometrica 58, 1121-1149.

Heckman, J., H. Ichimura, and P. Todd (1997), "Matching as an Econometric Evaluation Estimator: Theory and Methods," Review of Economic Studies 64, 605-654.

Heckman J. and R. Robb (1985), "Alternative Methods for Evaluating the Impact of Interventions." In Longitudinal Analysis of Labor Market Data, J. Heckman and B. Singer (eds.), 156-245. New York: Wiley.

Heckman, J.J. and E. Vytlacil (1998), "Instrumental Variables Methods for the Correlated Random Coefficient Model," Journal of Human Resources 33, 974-987. Hirano, K., G.W. Imbens, and G. Ridder (2003), "Efficient Estimation of

Average Treatment Effects Using the Estimated Propensity Score," Econometrica 71, 1161-1189.

Horvitz, D. and D. Thompson (1952), "A Generalization of Sampling without Replacement from a Finite Population," Journal of the American Statistical

Association 47, 663-685.

Imbens, G.W. and J.D. Angrist (1994), "Identification and Estimation of Local Average Treatment Effects," Econometrica 62, 467-475.

Maddala, G.S. (1983), Quantitative and Limited Dependent Variable Models in

Econometrics. Cambridge: Cambridge University Press.

Manksi, C.F. (1986), "Learning About Treatment Effects from Experiments with Random Assignments of Treatments," Journal of Human Resources 31, 709-733. Newey, W.K. and D. McFadden (1994), "Large Sample Estimation and Hypothesis Testing," in R.F. Engle and D. McFadden (eds.), Handbook of Econometrics, Volume 4. Amsterdam: North Holland, 2111-2245.

Papke, L.E. and J.M. Wooldridge (1996), "Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates,"

Journal of Applied Econometrics 11, 619-632

Robins, J.M. and S. Greenland (1996), "Identification of Causal Effects Using Instrumental Variables: Comment," Journal of the American Statistical

Association 91, 456-458.

Robins, J.M. and A. Rotnitzky (1995), "Semiparametric Efficiency in

Multivariate Regression Models with Missing Data," Journal of the American

Statistical Association 90, 122-129.

Rosenbaum, P.R. and D.B. Rubin (1983), "The Central Role of the Propensity Score in Observational Studies for Causal Effects," Biometrika 70, 41-55. Vella, F. and M. Verbeek (1999), "Estimating and Interpreting Models with Endogenous Treatment Effects," Journal of Business and Economic Statistics 17, 473-478.

Wooldridge, J.M. (1999), "Distribution-Free Estimation of Some Nonlinear Panel Data Models," Journal of Econometrics 90, 77-97.

Wooldridge, J.M. (2003a), "Further Results on Instrumental Variables

Estimation of Average Treatment Effects in the Correlated Random Coefficient Model," Economics Letters 79, 185-191.

Wooldridge, J.M. (2003b), "Inverse Probability Weighted M-Estimators for General Missing Data Problems," mimeo, Michigan State University Department of Economics.

In document Estimating average partial effects under conditional moment independence assumptions (Page 31-41)