Binary Outcome Models: Endogeneity and Panel Data

(1)

ECMT 676 (Econometric II) Lecture Notes

TAMU

April 14, 2014

(2)

Topics

Issues in binary response models:

Endogeneity

I Continuous endogenous x

F Control function approach

F IV probit approach

I Binary endogenous x Panel data

I RE

I CRE

I FE

(3)

Binary response model with endogeneity

Endogeneity arises naturally in the latent variable model

LPM (linear probability model) with 2SLS is a handy solution for binary response with endogeneity. But it does not allow for individual-speci…c marginal e¤ects

We will deal with probit/logit with endogeneity

Why endogeneity causes problem in binary model? not as transparent as in the linear model

(4)

The binary outcome model:

y_i = If^yi >0g y_i = x_i⁰β+u_i

Now we allow Ex_iu_i 6=0 by assuming u_ij^xi N(q(x_i), 1). Then

P(y_i = 1j^xi) =P(u_i <x_i⁰βj^xi)

= P(u_i q(x_i) <x_i⁰β q(x_i)j^xi) =_Φ(x_i⁰β q(x_i)) Probit of y_i on x_i is not consistent.

We will see general endogeneity (σuv 6=0) introduces not only non-constant mean but also non-unity variance.

(5)

When the endogenous variable is continuous

A control function (CF) approach

The part is based on Blundell and Powell (2004 REStud) and also Wooldridge (15.7.2).

Unanswered question: why not traditional 2 stages?

The model

y_1i = If^y1i >0g

y_1i = x_i⁰β₀+u_i =z_1i⁰ β+αy_2i +u_i

Now suppose part of x_i (called y_2i, assuming as a scalar) is endogenous:

x_i = (z_1i⁰ , y_2i)⁰.

The vector of IVs z_i = (z_1i⁰ , z_2i⁰ )⁰. The reduced form (RF)

y_2i =z_i⁰δ+v_i =z_1i⁰ δ1+z_2i⁰ δ2+v_i

(6)

The parameters(α, β)index the average structural function (ASF) (or, the response probability).

Again, this is de…ned by …xing all observed explanatory variables and integrating out the unobservable, u:

ASF(y₂, z₁) = Euf¹[z₁⁰β+αy2+u>0]g

= _Φ(z₁⁰β+αy2)

Thus, α and β are the parameters that appear in the APEs (derivatives of ASF).

(7)

Assume

u_i

v_i j^z ^N ^0, ¹ ^σ^uv σuv σ²_v Endogeneity comes from σuv 6=^0.

Normality of v is not realistic if y_2i is binary. So we are assuming for now the endogenous variable is continuous.

Probit has some advantage here over logit, because we can then assume joint normality

(8)

Rivers and Vuong (1998) proposed a "control function (CF)" approach:

"which introduces residuals from the reduced form for the regressors as covariates in the binary response model to account for endogeneity " (BP, 2004).

Linear projection

u=θv+e where θ=σuv/σ²_v and e is also normal.

Having the linear projection, we get

I(y₁ = 1) =I(z_1i⁰ β+αy_2i+u_i >0)

= I(z_1i⁰ β+αy_2i +θv_i

the mean part

+e_i >0)

We will need D(ej^z1, y₂, v).

(9)

First we have e

v = ^u ^θv

v = ¹ ^θ

0 1

u v N(0, 1 ρ² 0

0 σ²_v ), given z, where ρ=corr(u, v) =σuv/σv.

Then e is independent of v : uncorrelatedness of e and v (which also follows from the linear projection), implies independence, since they are jointly normal.

(10)

Let D( j )mean conditional distribution

D(ej^z1, y₂, v) = D(ej^{z, v})(since y₂ is generated by z and v )

= D(ej^z)(since e is independent of v ) N(0, 1 ρ²).

So standard assumptions hold: conditional normality, homoskedasticity But the identi…cation condition fails: Var(e) 6=^1.

The outcome equation becomes (CF)

y_1i =z_1i⁰ β+αy_2i +θv_i+e_i

(11)

Then

P(y1 = 1j^{x, v}) =P(y1 =1j^z1, y2, v)

= P(z_1i⁰ β+αy2i+θvi+e_i >0j^z1, y₂, v)

= P(e_i > (z_1i⁰ β+αy_2i+θv_i)j^z1, y₂, v)

= Φ (z_1i⁰ β+αy_2i+θv_i)_/^p1 ρ² (1)

v_i can be estimated from RF, asbvi

A probit of y1 on z1, y2 andbv gives the estimates, called eβ, eα and eθ.

So

eβ ! ^β=β/

q 1 ρ² eα ! ^α=α/

q 1 ρ² eθ ! ^θ=_θ/^q1 ρ²

(12)

Recall we are interested in β and α (especially α).

The problem is now how to estimate ρ.

Note that

θ = _θ/^q1 ρ²

θ = σuv/σ²_v =ρσv/σ²_v =_ρ/σ_v Solve the system for ρ and θ, since σv and θ can be estimated.

Sop

1 ρ² can be estimated: p

1 ρ²= q

1+θ²σ²_v. So

bβ = eβ/q

1+eθ²bσ²v

bα = _eα/

q

1+eθ²bσ²v

(13)

An alternative derivation of (1)

By Fact 1 (see below), u_ij^vi N(θv_i, 1 ρ²).

This conditional normality is essentially all we need (we don’t really need joint normality)

So

E(y_1ij^xi, v_i) = P(u_i > x_i⁰β₀j^xi, v_i)

= 1 P(u_i < x_i⁰β₀j^xi, v_i)

= 1 P(_p^uⁱ ^θvⁱ 1 ρ²

< ^x

i0β₀ θvi

p1 ρ² j^xi, v_i)

= 1 Φ( ^x

i0β₀ θvi

p1 ρ² )

= _Φ(^xⁱ⁰^β⁰+θv_i p1 ρ²

)

(14)

Fact 1

This result is also crucial in Kalman …lter.

If

u

v N m_u

m_v , S_uu S_uv S_vu S_vv then

uj^v ^N(m_ujv, S_ujv) where

m_ujv = m_u+S_uvS_vv¹(v m_v) S_ujv = Suu SuvS_vv¹Svu

(15)

Since v_i has to be estimated, so CF regression involves a generated regressor.

So getting the standard error is hard.

An alternative approach: MLE (it is also called IV probit)

(16)

Control function approach: in the linear model, it leads to the same estimate as 2SLS

Wooldridge, Section 6.2, p. 127 & Problem 5.1 Exercise: show CF=2SLS (using OLS algebra)

(17)

When the endogenous variable is continuous

IV probit approach

IV probit: the key is to obtain f(y₁, y₂j^z) f(y₁, y₂j^z) =f(y₁j^y2, z) f(y₂j^z)

First, f(y2j^z) =f(z_i⁰δ+v_ij^z) N(z_i⁰δ, σ²_v) =ϕ((y2 z_i⁰δ)_/σ_v)_/σ_v Second, to get f(y₁j^y2, z), essentially P(y₁ =1j^y2, z), we need the distribution of u given y₂, z, where y_1i =z_1i⁰ β+αy_2i+u_i

(18)

By Fact 1,

f(uj^y2, z) =f(uj^{z, v}) N(σuvσ_v²v , 1 σ²_uvσ_v²) given z

This can be compared with the exogenous probit, in which case f(uj^y2, z) =f(uj^{z, v})^u?v= f(uj^z) N(0, 1)

So endogeneity not only introduces non-zero mean but also non-unit variance in the conditional distribution of u.

(19)

So P(y1=1j^y2, z) =

= P(z_1i⁰ β+αy_2i +u_i >0) =P(u_i > (z_1i⁰ β+αy_2i))

= P(u_i σuvσ_v²v / q

1 ρ²

> ( z_1i⁰ β αy_2i σuvσ_v²v)_/^q1 ρ²)

= _Φ((z_1i⁰ β+αy2i+ρσ_v¹v)/ q

1 ρ²)

= _Φ((z_1i⁰ β+αy_2i+ρσ_v¹

| {z }

=θ

(y2 z⁰δ))_/^q1 ρ²)

Φ(w)

The likelihood for the i th observation:

l_i =_Φ(w)^y¹[1 Φ(w)]^{1 y}¹ϕ((y₂ z_i⁰δ)/σv)/σv

Log likelihood for the sample L(β, α, ρ, σv, δ)=∑ⁿi =0log l_i Maximization over β, α, ρ, σv, δ.

Standard error: the estimated Hessian, or the outer product of the score.

(20)

A simple test of exogeneity: ρ=0.

Control function approach: only focuses on f(y₁j^y2, z), is thus a limited information procedure.

CF approach replaces v bybv.

(21)

When the endogenous variable is binary

Binary exogenous variable needs special treatment because the linear model in the …rst stage is not reasonable any more.

The model

y_1i = If^y1i >0g

y_1i = x_i⁰β₀+u_i =z_1i⁰ β+y_2i⁰ α+u_i y₂ = If^zi⁰δ+v_i >0g

Like before, we assume (u, v ) is jointly normal given z But here Var(v) =1 for identi…cation

Assume, given z

u_i

v_i j^zi N 0, 1 ρ ρ 1

(22)

The likelihood of observation i : li =f(y_1i, y_2ij^zi) =f(y_1ij^y2i, z_i) f(y_2ij^zi). The second factor is easy to obtain:

f(y₂j^z) = P(y₂ =1)^y²[1 P(y₂=1)]^{1 y}²

= _Φ(z⁰δ)^y²[1 Φ(z⁰δ)]^{1 y}² The …rst factor is one of four cases:

P(y₁ = 1j^y2=1, z)

P(y1 = 0j^y2=1, z)[=1-P(y1 =1j^y2 =1, z)] P(y₁ = 1j^y2=0, z)

P(y₁ = 0j^y2=0, z)[=1-P(y₁ =1j^y2 =0, z)] Thus MLE is obtained: L(β, α, ρ, δ) =_∑ⁿ_{i =0}log l_i

We will calculate these four cases now.

(23)

Fact 2

If v is standard normal,

PDF(vj^v >a) ϕ(v)/P(v>a) =ϕ(v)_/Φ( a) And PDF(vj^v<a) ϕ(v)/P(v <a) =ϕ(v)/[1 Φ( a)](proof?)

(24)

When the endogenous variable is binary

P(y₁ = 1j^y2 =1, z) =E(I(y₁=1)j^y2 =1, z)

= Ef^E[I(y₁ =1)j^{v , z}]j^y2=1, zg(smaller conditional set dominates)

= E(P(y₁=1j^{v , z})j^y2=1, z)

=E(_Φ((z_1i⁰ β+αy2i +ρv)/ q

1 ρ²)jy2=1,z)

= E(_Φ((z_1i⁰ β+αy_2i+ρv)_/ q

1 ρ²)j^vi > z_i⁰δ, z)

= Z

support of vΦ((z_1i⁰ β+αy2i +ρv)/ q

1 ρ²)f(vj^vi > z_i⁰δ, z)dv

by Fact 2

= Z _∞

z_i⁰δ

Φ((z_1i⁰ β+αy_2i+ρv)_/^q1 ρ²)ϕ(v)_/Φ(z_i⁰δ)dv ,

where (*) uses a projection.

(25)

Similarly

P(y1 = 1j^y2=0, z)

=

Z z_i⁰δ

∞ Φ((z_1i⁰ β+αy2i +ρv)/ q

1 ρ²)ϕ(v)/[1 Φ(z_i⁰δ)]dv Similarly we can compute P(y₁ =0j^y2 =1, z) and P(y₁ =0j^y2 =0, z)

(26)

EXAMPLE: E¤ects of Children on Labor Force Participation, LABSUP.txt y₁=worked , y₂ =morekids (a dummy for having more than two children).

Population is women with at least two children.

worked =1[α morekids+β₀+β₁nonmomi+β₂educ +β₃age+β₄age²+β₅black+β₆hispan+u>0] The binary variable samesex is the IV for morekids.

(27)

Binary panel data model

Pooled probit/logit

Panel model without unobserved e¤ects:

P(y_it =1j^xit) =_Φ(x_it⁰β)

Strict exogeneity (SE): D(y_itj^xi) =D(y_itj^xit)for t=1, , T , where x_i = (x_{i 1}, , x_iT)⁰. (D means distribution)

Stronger than the one in the linear model: in terms of expectation there.

One di¢ culty of using MLE (which is necessary for binary models) is y_it is not independent over t, which is inconvenient in constructing the likelihood function

Conditional independence (CI): y_{i 1}, y_{i 2}, , y_iT are independent conditional on x_i.

(28)

Under these two assumptions:

f(y_{i 1}, y_{i 2}, , y_iTj^xi)^CI=_Π^T_{t =1}f(y_itj^xi)^SE= _Π^T_{t =1}f(y_itj^xit), where f(y_itj^xit) =_Φ(x_itβ)^y^it[1 Φ(x_itβ)]^{1 y}^it.

Pooled likelihood

∑n i =1

∑T t =1

[y_itlogΦ(x_it⁰β) + (1 y_it)log(1 Φ(x_it⁰β))]

Maximizer: pooled probit estimator.

Partial likelihood theory says it is still consistent if CI doesn’t hold.

(29)

RE model

Now we add unobserved e¤ects Response probability:

P(y_it =1j^xit, c_i) =_Φ(x_it⁰β+c_i) SE: D(y_itj^xi, c_i) =D(y_itj^xit, c_i)for t =1, , T .

CI: y_{i 1}, y_{i 2}, , y_iT are independent conditional on x_i, c_i y_{i 1}, y_{i 2}, , y_iT are unconditionally dependent because of c_i.

(30)

CI implies:

f(y_{i 1}, y_{i 2}, , y_iTj^xi, c_i)^CI=Π^T_{t =1}f(y_itj^xi, c_i)^SE= Π^T_{t =1}f(y_itj^xit, c_i), where f(y_itj^xit, c_i) =_Φ(x_itβ+c_i)^y^it[1 Φ(x_itβ+c_i)]^{1 y}^it.

But c_i is unobservable, so c_i shouldn’t appear in the likelihood function Viewing c_i as parameters along with β leads to an incidental parameters problem.

It means that MLEsbci and bβ are inconsistent. (unlike in the linear case, in which we havep

n consistency)

(31)

Hallmark of FE analysis (in nonlinear models): no speci…cation of a distribution for c_i given x_i.

RE (random e¤ects) probit: assuming

c_ij^xi N(0, σ²_c) (The RE assumption)

Since the …rst element of x_i is 1, it implies c_i N(_{0, σ}²_c). Then a conditional ML can be applied to β and σ²_c as follows.

(32)

Dealing with c_i : integrate out c_i f(y_{i 1}, y_{i 2}, , y_iTj^xi) =

Z _∞

∞Π^T_{t =1}f(y_itj^xit, c_i)ϕ(_{c /σ}_c)_/σ_cdc.

The likelihood function for the entire sample is then straightforward:

production over i

It is called RE probit estimator.

The estimate is also called population averaged.

(33)

As shown before, under RE assumption, viewing c_i+u_it as the composite error in the latent variable model and applying pooled probit will give the attenuation bias

This is in contrast to the linear model.

Latent variable model

y_i = x_it⁰β+c_i+u_it x_it⁰ β+v_it y_it = I(y_i >0)

Var(v_itj^xit) =Var(c_i+u_itj^xit)ûît=^?cⁱ Var(u_itj^xit) +σ²_c ûît=^?cⁱ Var(u_itj^xit, c_i) +σ²_c =1+σ²_c

Response probability:

P(y_it =1j^xit) =_Φ(x_it⁰ β/

q 1+σ²_c) So bβ!^p ^β/^p¹+σ²_c

This result also motivates the necessity to specify the distribution of c (in contrast to linear RE model)

(34)

CRE model

RE and CI might be too strong.

Correlated RE (CRE) probit (Chamberlain’s): A relaxation of RE:

c_ij^xi N(x⁰_iξ, σ²_c) allowing dependence between c_i and x_i

Estimation is straightforward: only needs a modi…cation of density of c

f(y_{i 1}, y_{i 2}, , y_iTj^xi) = Z _∞

∞Π^Tt =1f(y_itj^xit, c_i)ϕ((c x⁰_iξ)/σc)/σcdc.

(35)

FE logit

Panel logit model:

P(y_it =1j^xit, c_i) =G(x_it⁰β+c_i), where G(x) =e^x/(1+e^x).

An important advantage of panel logit is that we can obtainp

N consistent and asymp. normal estimator of β without any assumptions on D(c_ij^xi), or achieving …xed e¤ects (FE) estimation.

Assumptions: SE, CI, and that each element of x_it is time-varying.

De…ne N_i =_∑^T_{t =1}y_it.

It turns out that D(y_ij^xi, c_i, N_i)does not depend on c_i. The functional form of logit helps.

We can’t do this in probit.

(36)

Consider T =2. Then N_i 2 f^{0, 1, 2}g^. Note that

P(y_{i 1} = 1j^xi, c_i, N_i =0) =0 P(y_{i 2} = 1j^xi, c_i, N_i =0) =0 P(y_{i 1} = 1j^xi, c_i, N_i =2) =1 P(y_{i 2} = 1j^xi, c_i, N_i =2) =1 neither of which is informative for β.

So we only consider N_i =1.

As we will show, consistent estimation of β can be obtained by a standard logit of y_{i 2} on x_{i 2} x_{i 1} using the observations for which N_i =1.

(37)

P(y_{i 2}=1j^xi, c_i, N_i =1) = ^P(y_{i 2}=1, N_i =1j^xi, c_i) P(N_i =1j^xi, c_i) where

Top=P(y_{i 2} =1, y_{i 1} =0j^xi, c_i)^CI=P(y_{i 2}=1j^xi, c_i)P(y_{i 1} =0j^xi, c_i)

SE= G(x_{i 2}⁰ β+c_i)[1 G(x_{i 1}⁰ β+c_i)] = ^e

x_{i 2}⁰β+ci

(1+e^x^{i 2}⁰^β+cⁱ)(1+e^x^{i 1}⁰^β+cⁱ)^, Bottom=P(y_{i 2} =1, y_{i 1} =0j^xi, c_i) +P(y_{i 2}=0, y_{i 1}=1j^xi, c_i)

= G(x_{i 2}⁰ β+c_i)[1 G(x_{i 1}⁰ β+c_i)] + [1 G(x_{i 2}⁰ β+c_i)]G(x_{i 1}⁰ β+c_i)

= ^e

x_{i 1}⁰β+ci +e^x^{i 2}⁰^β+cⁱ (1+e^x^{i 2}⁰^β+cⁱ)(1+e^x^{i 1}⁰^β+cⁱ)^.

(38)

So

P(y_{i 2} = 1j^xi, c_i, N_i =1) = ^e

x_{i 2}⁰β+ci

e^x^{i 1}⁰^β+cⁱ+e^x^{i 2}⁰^β+cⁱ = ^e

(x_{i 2} x_{i 1})⁰β

1+e^(x^{i 2} ^x^{i 1}⁾⁰^β

= G((x_{i 2} x_{i 1})⁰β) c_i is canceled!

On the other hand

P(y_{i 1} = 1j^xi, c_i, N_i =1) =P(y_{i 2} =0j^xi, c_i, N_i =1)

= 1 G((x_{i 2} x_{i 1})⁰β)

(39)

f(y_{i 1}, y_{i 2}j^xi, c_i, N_i =1)

= P(y_{i 1}=1j^xi, c_i, N_i =1)or P(y_{i 2} =1j^xi, c_i, N_i =1)

= P(y_{i 1}=1j^xi, c_i, N_i =1)^y^{i 1}P(y_{i 2} =1j^xi, c_i, N_i =1)^y^{i 2} So the conditional log likelihood function for observation i is

l_i(β) = I(N_i =1)[y_{i 1}log(1 G((x_{i 2} x_{i 1})⁰β)) +y_{i 2}log G((x_{i 2} x_{i 1})⁰β)]

yi 1=1 yi 2

= I(N_i = 1)[(1 y_{i 2})log(1 G((x_{i 2} x_{i 1})⁰β)) +y_{i 2}log G((x_{i 2} x_{i 1})⁰β)]

We select out the observations for which N_i =1.

Computationally, it is just a standard logit of y_{i 2} on x_{i 2} x_{i 1} using the observations for which N_i =1.