• No results found

Binary Outcome Models: Endogeneity and Panel Data

N/A
N/A
Protected

Academic year: 2021

Share "Binary Outcome Models: Endogeneity and Panel Data"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Binary Outcome Models: Endogeneity and Panel Data

ECMT 676 (Econometric II) Lecture Notes

TAMU

April 14, 2014

(2)

Topics

Issues in binary response models:

Endogeneity

I Continuous endogenous x

F Control function approach

F IV probit approach

I Binary endogenous x Panel data

I RE

I CRE

I FE

(3)

Binary response model with endogeneity

Endogeneity arises naturally in the latent variable model

LPM (linear probability model) with 2SLS is a handy solution for binary response with endogeneity. But it does not allow for individual-speci…c marginal e¤ects

We will deal with probit/logit with endogeneity

Why endogeneity causes problem in binary model? not as transparent as in the linear model

(4)

The binary outcome model:

yi = Ifyi >0g yi = xi0β+ui

Now we allow Exiui 6=0 by assuming uijxi N(q(xi), 1). Then

P(yi = 1jxi) =P(ui <xi0βjxi)

= P(ui q(xi) <xi0β q(xi)jxi) =Φ(xi0β q(xi)) Probit of yi on xi is not consistent.

We will see general endogeneity (σuv 6=0) introduces not only non-constant mean but also non-unity variance.

(5)

When the endogenous variable is continuous

A control function (CF) approach

The part is based on Blundell and Powell (2004 REStud) and also Wooldridge (15.7.2).

Unanswered question: why not traditional 2 stages?

The model

y1i = Ify1i >0g

y1i = xi0β0+ui =z1i0 β+αy2i +ui

Now suppose part of xi (called y2i, assuming as a scalar) is endogenous:

xi = (z1i0 , y2i)0.

The vector of IVs zi = (z1i0 , z2i0 )0. The reduced form (RF)

y2i =zi0δ+vi =z1i0 δ1+z2i0 δ2+vi

(6)

The parameters(α, β)index the average structural function (ASF) (or, the response probability).

Again, this is de…ned by …xing all observed explanatory variables and integrating out the unobservable, u:

ASF(y2, z1) = Euf1[z10β+αy2+u>0]g

= Φ(z10β+αy2)

Thus, α and β are the parameters that appear in the APEs (derivatives of ASF).

(7)

Assume

ui

vi jz N 0, 1 σuv σuv σ2v Endogeneity comes from σuv 6=0.

Normality of v is not realistic if y2i is binary. So we are assuming for now the endogenous variable is continuous.

Probit has some advantage here over logit, because we can then assume joint normality

(8)

Rivers and Vuong (1998) proposed a "control function (CF)" approach:

"which introduces residuals from the reduced form for the regressors as covariates in the binary response model to account for endogeneity " (BP, 2004).

Linear projection

u=θv+e where θ=σuv2v and e is also normal.

Having the linear projection, we get

I(y1 = 1) =I(z1i0 β+αy2i+ui >0)

= I(z1i0 β+αy2i +θvi

the mean part

+ei >0)

We will need D(ejz1, y2, v).

(9)

First we have e

v = u θv

v = 1 θ

0 1

u v N(0, 1 ρ2 0

0 σ2v ), given z, where ρ=corr(u, v) =σuvv.

Then e is independent of v : uncorrelatedness of e and v (which also follows from the linear projection), implies independence, since they are jointly normal.

(10)

Let D( j )mean conditional distribution

D(ejz1, y2, v) = D(ejz, v)(since y2 is generated by z and v )

= D(ejz)(since e is independent of v ) N(0, 1 ρ2).

So standard assumptions hold: conditional normality, homoskedasticity But the identi…cation condition fails: Var(e) 6=1.

The outcome equation becomes (CF)

y1i =z1i0 β+αy2i +θvi+ei

(11)

Then

P(y1 = 1jx, v) =P(y1 =1jz1, y2, v)

= P(z1i0 β+αy2i+θvi+ei >0jz1, y2, v)

= P(ei > (z1i0 β+αy2i+θvi)jz1, y2, v)

= Φ (z1i0 β+αy2i+θvi)/p1 ρ2 (1)

vi can be estimated from RF, asbvi

A probit of y1 on z1, y2 andbv gives the estimates, called eβ, eα and eθ.

So

! β=β/

q 1 ρ2 ! α=α/

q 1 ρ2 ! θ=θ/q1 ρ2

(12)

Recall we are interested in β and α (especially α).

The problem is now how to estimate ρ.

Note that

θ = θ/q1 ρ2

θ = σuv2v =ρσv2v =ρ/σv Solve the system for ρ and θ, since σv and θ can be estimated.

Sop

1 ρ2 can be estimated: p

1 ρ2= q

1+θ2σ2v. So

= eβ/q

1+22v

= eα/

q

1+22v

(13)

An alternative derivation of (1)

By Fact 1 (see below), uijvi N(θvi, 1 ρ2).

This conditional normality is essentially all we need (we don’t really need joint normality)

So

E(y1ijxi, vi) = P(ui > xi0β0jxi, vi)

= 1 P(ui < xi0β0jxi, vi)

= 1 P(pui θvi 1 ρ2

< x

i0β0 θvi

p1 ρ2 jxi, vi)

= 1 Φ( x

i0β0 θvi

p1 ρ2 )

= Φ(xi0β0+θvi p1 ρ2

)

(14)

Fact 1

This result is also crucial in Kalman …lter.

If

u

v N mu

mv , Suu Suv Svu Svv then

ujv N(mujv, Sujv) where

mujv = mu+SuvSvv1(v mv) Sujv = Suu SuvSvv1Svu

(15)

Since vi has to be estimated, so CF regression involves a generated regressor.

So getting the standard error is hard.

An alternative approach: MLE (it is also called IV probit)

(16)

Control function approach: in the linear model, it leads to the same estimate as 2SLS

Wooldridge, Section 6.2, p. 127 & Problem 5.1 Exercise: show CF=2SLS (using OLS algebra)

(17)

When the endogenous variable is continuous

IV probit approach

IV probit: the key is to obtain f(y1, y2jz) f(y1, y2jz) =f(y1jy2, z) f(y2jz)

First, f(y2jz) =f(zi0δ+vijz) N(zi0δ, σ2v) =ϕ((y2 zi0δ)v)v Second, to get f(y1jy2, z), essentially P(y1 =1jy2, z), we need the distribution of u given y2, z, where y1i =z1i0 β+αy2i+ui

(18)

By Fact 1,

f(ujy2, z) =f(ujz, v) N(σuvσv2v , 1 σ2uvσv2) given z

This can be compared with the exogenous probit, in which case f(ujy2, z) =f(ujz, v)u?v= f(ujz) N(0, 1)

So endogeneity not only introduces non-zero mean but also non-unit variance in the conditional distribution of u.

(19)

So P(y1=1jy2, z) =

= P(z1i0 β+αy2i +ui >0) =P(ui > (z1i0 β+αy2i))

= P(ui σuvσv2v / q

1 ρ2

> ( z1i0 β αy2i σuvσv2v)/q1 ρ2)

= Φ((z1i0 β+αy2i+ρσv1v)/ q

1 ρ2)

= Φ((z1i0 β+αy2i+ρσv1

| {z }

(y2 z0δ))/q1 ρ2)

Φ(w)

The likelihood for the i th observation:

li =Φ(w)y1[1 Φ(w)]1 y1ϕ((y2 zi0δ)v)v

Log likelihood for the sample L(β, α, ρ, σv, δ)=ni =0log li Maximization over β, α, ρ, σv, δ.

Standard error: the estimated Hessian, or the outer product of the score.

(20)

A simple test of exogeneity: ρ=0.

Control function approach: only focuses on f(y1jy2, z), is thus a limited information procedure.

CF approach replaces v bybv.

(21)

When the endogenous variable is binary

Binary exogenous variable needs special treatment because the linear model in the …rst stage is not reasonable any more.

The model

y1i = Ify1i >0g

y1i = xi0β0+ui =z1i0 β+y2i0 α+ui y2 = Ifzi0δ+vi >0g

Like before, we assume (u, v ) is jointly normal given z But here Var(v) =1 for identi…cation

Assume, given z

ui

vi jzi N 0, 1 ρ ρ 1

(22)

The likelihood of observation i : li =f(y1i, y2ijzi) =f(y1ijy2i, zi) f(y2ijzi). The second factor is easy to obtain:

f(y2jz) = P(y2 =1)y2[1 P(y2=1)]1 y2

= Φ(z0δ)y2[1 Φ(z0δ)]1 y2 The …rst factor is one of four cases:

P(y1 = 1jy2=1, z)

P(y1 = 0jy2=1, z)[=1-P(y1 =1jy2 =1, z)] P(y1 = 1jy2=0, z)

P(y1 = 0jy2=0, z)[=1-P(y1 =1jy2 =0, z)] Thus MLE is obtained: L(β, α, ρ, δ) =ni =0log li

We will calculate these four cases now.

(23)

Fact 2

If v is standard normal,

PDF(vjv >a) ϕ(v)/P(v>a) =ϕ(v)( a) And PDF(vjv<a) ϕ(v)/P(v <a) =ϕ(v)/[1 Φ( a)](proof?)

(24)

When the endogenous variable is binary

P(y1 = 1jy2 =1, z) =E(I(y1=1)jy2 =1, z)

= EfE[I(y1 =1)jv , z]jy2=1, zg(smaller conditional set dominates)

= E(P(y1=1jv , z)jy2=1, z)

=E(Φ((z1i0 β+αy2i +ρv)/ q

1 ρ2)jy2=1,z)

= E(Φ((z1i0 β+αy2i+ρv)/ q

1 ρ2)jvi > zi0δ, z)

= Z

support of vΦ((z1i0 β+αy2i +ρv)/ q

1 ρ2)f(vjvi > zi0δ, z)dv

by Fact 2

= Z

zi0δ

Φ((z1i0 β+αy2i+ρv)/q1 ρ2)ϕ(v)(zi0δ)dv ,

where (*) uses a projection.

(25)

Similarly

P(y1 = 1jy2=0, z)

=

Z zi0δ

Φ((z1i0 β+αy2i +ρv)/ q

1 ρ2)ϕ(v)/[1 Φ(zi0δ)]dv Similarly we can compute P(y1 =0jy2 =1, z) and P(y1 =0jy2 =0, z)

(26)

EXAMPLE: E¤ects of Children on Labor Force Participation, LABSUP.txt y1=worked , y2 =morekids (a dummy for having more than two children).

Population is women with at least two children.

worked =1[α morekids+β0+β1nonmomi+β2educ +β3age+β4age2+β5black+β6hispan+u>0] The binary variable samesex is the IV for morekids.

(27)

Binary panel data model

Pooled probit/logit

Panel model without unobserved e¤ects:

P(yit =1jxit) =Φ(xit0β)

Strict exogeneity (SE): D(yitjxi) =D(yitjxit)for t=1, , T , where xi = (xi 1, , xiT)0. (D means distribution)

Stronger than the one in the linear model: in terms of expectation there.

One di¢ culty of using MLE (which is necessary for binary models) is yit is not independent over t, which is inconvenient in constructing the likelihood function

Conditional independence (CI): yi 1, yi 2, , yiT are independent conditional on xi.

(28)

Under these two assumptions:

f(yi 1, yi 2, , yiTjxi)CI=ΠTt =1f(yitjxi)SE= ΠTt =1f(yitjxit), where f(yitjxit) =Φ(xitβ)yit[1 Φ(xitβ)]1 yit.

Pooled likelihood

n i =1

T t =1

[yitlogΦ(xit0β) + (1 yit)log(1 Φ(xit0β))]

Maximizer: pooled probit estimator.

Partial likelihood theory says it is still consistent if CI doesn’t hold.

(29)

Binary panel data model

RE model

Now we add unobserved e¤ects Response probability:

P(yit =1jxit, ci) =Φ(xit0β+ci) SE: D(yitjxi, ci) =D(yitjxit, ci)for t =1, , T .

CI: yi 1, yi 2, , yiT are independent conditional on xi, ci yi 1, yi 2, , yiT are unconditionally dependent because of ci.

(30)

CI implies:

f(yi 1, yi 2, , yiTjxi, ci)CI=ΠTt =1f(yitjxi, ci)SE= ΠTt =1f(yitjxit, ci), where f(yitjxit, ci) =Φ(xitβ+ci)yit[1 Φ(xitβ+ci)]1 yit.

But ci is unobservable, so ci shouldn’t appear in the likelihood function Viewing ci as parameters along with β leads to an incidental parameters problem.

It means that MLEsbci and bβ are inconsistent. (unlike in the linear case, in which we havep

n consistency)

(31)

Hallmark of FE analysis (in nonlinear models): no speci…cation of a distribution for ci given xi.

RE (random e¤ects) probit: assuming

cijxi N(0, σ2c) (The RE assumption)

Since the …rst element of xi is 1, it implies ci N(0, σ2c). Then a conditional ML can be applied to β and σ2c as follows.

(32)

Dealing with ci : integrate out ci f(yi 1, yi 2, , yiTjxi) =

Z

ΠTt =1f(yitjxit, ci)ϕ(c /σc)cdc.

The likelihood function for the entire sample is then straightforward:

production over i

It is called RE probit estimator.

The estimate is also called population averaged.

(33)

As shown before, under RE assumption, viewing ci+uit as the composite error in the latent variable model and applying pooled probit will give the attenuation bias

This is in contrast to the linear model.

Latent variable model

yi = xit0β+ci+uit xit0 β+vit yit = I(yi >0)

Var(vitjxit) =Var(ci+uitjxit)uit=?ci Var(uitjxit) +σ2c uit=?ci Var(uitjxit, ci) +σ2c =1+σ2c

Response probability:

P(yit =1jxit) =Φ(xit0 β/

q 1+σ2c) So bβ!p β/p1+σ2c

This result also motivates the necessity to specify the distribution of c (in contrast to linear RE model)

(34)

Binary panel data model

CRE model

RE and CI might be too strong.

Correlated RE (CRE) probit (Chamberlain’s): A relaxation of RE:

cijxi N(x0iξ, σ2c) allowing dependence between ci and xi

Estimation is straightforward: only needs a modi…cation of density of c

f(yi 1, yi 2, , yiTjxi) = Z

ΠTt =1f(yitjxit, ci)ϕ((c x0iξ)c)cdc.

(35)

Binary panel data model

FE logit

Panel logit model:

P(yit =1jxit, ci) =G(xit0β+ci), where G(x) =ex/(1+ex).

An important advantage of panel logit is that we can obtainp

N consistent and asymp. normal estimator of β without any assumptions on D(cijxi), or achieving …xed e¤ects (FE) estimation.

Assumptions: SE, CI, and that each element of xit is time-varying.

De…ne Ni =Tt =1yit.

It turns out that D(yijxi, ci, Ni)does not depend on ci. The functional form of logit helps.

We can’t do this in probit.

(36)

Consider T =2. Then Ni 2 f0, 1, 2g. Note that

P(yi 1 = 1jxi, ci, Ni =0) =0 P(yi 2 = 1jxi, ci, Ni =0) =0 P(yi 1 = 1jxi, ci, Ni =2) =1 P(yi 2 = 1jxi, ci, Ni =2) =1 neither of which is informative for β.

So we only consider Ni =1.

As we will show, consistent estimation of β can be obtained by a standard logit of yi 2 on xi 2 xi 1 using the observations for which Ni =1.

(37)

P(yi 2=1jxi, ci, Ni =1) = P(yi 2=1, Ni =1jxi, ci) P(Ni =1jxi, ci) where

Top=P(yi 2 =1, yi 1 =0jxi, ci)CI=P(yi 2=1jxi, ci)P(yi 1 =0jxi, ci)

SE= G(xi 20 β+ci)[1 G(xi 10 β+ci)] = e

xi 20β+ci

(1+exi 20β+ci)(1+exi 10β+ci), Bottom=P(yi 2 =1, yi 1 =0jxi, ci) +P(yi 2=0, yi 1=1jxi, ci)

= G(xi 20 β+ci)[1 G(xi 10 β+ci)] + [1 G(xi 20 β+ci)]G(xi 10 β+ci)

= e

xi 10β+ci +exi 20β+ci (1+exi 20β+ci)(1+exi 10β+ci).

(38)

So

P(yi 2 = 1jxi, ci, Ni =1) = e

xi 20β+ci

exi 10β+ci+exi 20β+ci = e

(xi 2 xi 1)0β

1+e(xi 2 xi 1)0β

= G((xi 2 xi 1)0β) ci is canceled!

On the other hand

P(yi 1 = 1jxi, ci, Ni =1) =P(yi 2 =0jxi, ci, Ni =1)

= 1 G((xi 2 xi 1)0β)

(39)

f(yi 1, yi 2jxi, ci, Ni =1)

= P(yi 1=1jxi, ci, Ni =1)or P(yi 2 =1jxi, ci, Ni =1)

= P(yi 1=1jxi, ci, Ni =1)yi 1P(yi 2 =1jxi, ci, Ni =1)yi 2 So the conditional log likelihood function for observation i is

li(β) = I(Ni =1)[yi 1log(1 G((xi 2 xi 1)0β)) +yi 2log G((xi 2 xi 1)0β)]

yi 1=1 yi 2

= I(Ni = 1)[(1 yi 2)log(1 G((xi 2 xi 1)0β)) +yi 2log G((xi 2 xi 1)0β)]

We select out the observations for which Ni =1.

Computationally, it is just a standard logit of yi 2 on xi 2 xi 1 using the observations for which Ni =1.

References

Related documents