A Proposed Estimator for Dynamic Probit Models

(1)

Munich Personal RePEc Archive

Proposed Estimators for Dynamic and

Static Probit Models with Panel Data

Gao, Wei and Yao, Qiwei and Bergsman, Wicher

Northeast Normal University, London School of Economics and

Political Science, London School of Economics and Political Science

15 July 2013

(2)

Proposed Estimators for Dynamic and Static Probit Models

with Panel Data

Wei Gao, Qiwei Yao and Wicher Bergsman

Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, Jilin 130024, China

Department of Statistics, London School of Economics, London, UK

1 Introduction

Suppose there have n independent individuals and two observations are made for

each one. For i-th individual, there have observations di1 and di2 with binary

re-sponses, 0 or 1. Here suppose they are generated from the latent dynamic Model:

di1 =I_{_τi+xi′₁β+ϵi₁>0}, di2 =I{τi+γdi₁+x′i₂β+ϵi₂>0} (1)

where I denotes the indicator function, ϵi1 and ϵi2 are independently and identically

distributed with mean 0 and variance 1, τi the individual effect which demonstrates

heterogeneities among individuals, x_i₁ and x_i₂ are covariates with dimension of k

independent of τi and (ϵi1, ϵi2)

′

, andβ and γ are an interested parameters.

The Model (1) is adopted by Heckman(1978), Arellano and Honore(2001), Hisao(2005,

p208). γ expresses the dynamic relationship between the previous state and the

fu-ture state and is of considerable substantive interest. The state dependence forγ _̸= 0

has been termed as the real(or true) state dependence by Heckman(1978,1980), which

means that an individual who has experienced the event will behavior differently in

(3)

and the state dependence for γ = 0 has been termed as the spurious state

depen-dence, in the sense that temporally persistent unobservables determine the previous

and future of experience or choice which behaviors similarly. The model has quite

applications in microeconomic data analysis.

When τi is thought as a fixed effect, and ϵi1 and ϵi2 are distributed by

logis-tic distributions, Chamberlain(1980, 1985), Honore and Kyriazidou(2000), and

Lan-caster(2002) gives a consistent estimator of γ and shows its convergent rate. In more

general cases, it is the incidental problem and a challenging one for microeconometrics

and statistics. For probit models(ϵi1 and ϵi2 normally distributed), Heckman(1980)

has shown that the maximum likelihood estimation ofγ behaviors badly for the large

variance of individual effect in his simulation studies given in Table 4.2.

Once treating τi as a random effect, one must give its prior distribution.

Cham-berlain(1980, 1985) also discusses the maximum likelihood estimation ofβ when γ is

0 and the prior distribution of τ is given. For long panel data, Arellano and

Bon-homme(2009) have proved the estimated results show robust with priors. But for such

short panel(T = 2), different priors may lead to quite different estimation of γ and so

it is necessary to choose a proper prior. In most cases, we do not know how to choose

a suitable prior.

Manski(1987) proposes maximum score methods to estimate β when the

distribu-tion of error do not know and γ is equal to zero for Models (1). Later smoothed

maximum score estimators are developed by Horowitz(1992). Arellano(2003) surveys

the exiting approaches to deal with binary panel data for static models with

individ-ual effects. By introducing a quadratic exponential model, Bartolucci and Farcomeni

(4)

data.

In this paper, we consider estimating problems of Models (1) when ϵi1 and ϵi2

are normally distributed, which is the probit model conditional on covariates and

individual effects. New estimating methods are proposed for γ, β, and simultaneous

γ and β in Section 2. In Section 3, Simulation studies are carried out.

2 Proposed Estimating methods

2.1 A proposed estimator of

γ

when covariates are zeroes

When covariates are zeroes and τi has the density f(x) and is independent of ϵi1

and ϵi2, then

P_{di1 = 0, di2 = 0}= ∫

Φ(₋x)Φ(₋x)f(x)dx, P_{di1 = 0, di2 = 1}= ∫

Φ(₋x)Φ(x)f(x)dx

P_{di1 = 1, di2 = 0}= ∫

Φ(x)Φ(₋x₋γ)f(x)dx, P_{di1 = 1, di2 = 1}= ∫

Φ(x)Φ(x+γ)f(x)dx

where Φ(x) is the distribution of standard normal variables. Iff(x) is known, the

max-imum likelihood estimation of γ is consistent and asymptotically normal distributed

as the sample size n tends to infinity. By the comments given by Heckman(1978),

we can deduce whether γ is equal to or greater or less than 0 from the ratio of

P_{di1 = 1, di2 = 0} and P{di1 = 0, di2 = 1}, which can be estimated by

W =

n

∑

i=1

I_{di₁=1, di₂=0}

n

∑

i=1

I_{di₁=0, di₂=1}

. (2)

Theorem 1. If

f(x) = 1

στ

g(x−µτ

στ

(5)

where g(x) is a density function with mean 0 and variance 1 and is continuous at 0,

and g(0) is finite, then

lim

στ→∞

∫

Φ(x)Φ(₋x₋γ)f(x)dx

∫

Φ(₋x)Φ(x)f(x)dx =−

√

πγΦ(₋_√γ

2) + exp{−

γ2

4}. (4)

Proof. It is obvious by Lemma 1 and Lemma 2 given in Appendix.

Thus when στ is sufficiently large, a proposed estimator of γ is given by

ˆ

γ =G−1₍_W₎ ₍₅₎

where

G(x) =₋√πxΦ(₋x/√2) + exp_{−x2_/₄

}

and W is given by (2). Furthermore, for sufficient large στ, the sample (di1 = 0, di2 =

0)′

or (di1 = 1, di2 = 1) cannot supply more information about γ since

lim

στ→∞

P_{di1 = 0, di2 = 0} = lim

στ→∞

∫

Φ(₋x)Φ(₋x)f(x)dx

= lim

στ→∞

∫

Φ(₋x)Φ(₋x) 1

στ

g(x−µτ

στ

)dx

= lim

στ→∞

∫

Φ(₋στt−µτ)Φ(−στt−µτ)g(t)dt

= G1(0)

and

lim

στ→∞

P_{di1 = 1, di2 = 1} = lim

στ→∞

∫

Φ(x)Φ(x+γ)f(x)dx

= lim

στ→∞

∫

Φ(x)Φ(x+γ) 1

στ

g(x−µτ

στ

)dx

= lim

στ→∞

∫

Φ(στt+µτ)Φ(στt+µτ+γ)g(t)dt

(6)

when G1(x) is the distribution of g(x). It may be one reason that the maximum

likelihood estimator of γ shows badly in simulation studies given by Hecknan(1980)

when στ is larger. The variance of parameters will become larger when additional

information has no direct connection with the interested parameters.

Theorem 2. For any ϵ >0,

lim

στ→∞

lim

n→∞P{|γˆ−γ| ≥ϵ}= 0.

Proof: By the large number law, Theorem 1 and continuous properties of G(x), it

can be easily proved.

Theorem 3. If στ =a√n(a >0), then for all t,

lim

n→∞P

 

 v u u t

n

∑

i=1

I_{di₁=0, di₂=1}(ˆγ−γ)< t

 



= Φ(t/σ)

where

σ2 ₌ G(γ) +G 2₍_γ₎ [G′

(γ)]2 =

G(γ) +G2₍_γ₎

πΦ2₍₋_γ/√₂₎.

Proof: By the Delta method, we can prove

√

n( ˆ

γ₋G−1₍_p₁₀₎)

=√n(

G−1₍_W₎

−G−1₍_p₁₀₎)

=_⇒N(0, σ∗2₎

where

p01=P{di1 = 0, di2 = 1}, p10=P{di1 = 1, di2 = 0}

and

σ∗2 ₌ 1 [G′

(γ)]2 [

p10

p2 01

+ p 2 10

p3 01

]

.

Then _v

u u t

n

∑

i=1

I_{di₁=0, di₂=1}

( ˆ

γ₋G−1(p10))

(7)

by √ _n

∑

i=1

I_{di₁=0, di₂=1}/n−→p01 in probability and (4).

G−1(p10)₋G−1(G(γ)) = p10−G(γ)

G′

(G(γ)) +o(p10−G(γ))

= c_×σ−1

τ +o(στ−1)

by Lemma 3 given in Appendix and

c=

√

π{γ₂2Φ(₋√γ

2) + Φ(−

γ

√

2)−

∫−γ/√2

−∞ t

2_ϕ₍_t₎_dt}

G′

(G(γ)) .

So v u u t n ∑ i=1

I_{di₁=0, di₂=1}(ˆγ−γ) =

v u u t n ∑ i=1

I_{di₁=0, di₂=1}

[ ˆ

γ₋G−1(p10)]

+ v u u t n ∑ i=1

I_{di₁=0, di₂=1}

[

G−1₍_p₁₀₎

−G−1₍_G₍_γ₎₎]

= v u u t n ∑ i=1

I_{di₁=0, di₂=1}

[ ˆ

γ₋G−1₍_p₁₀₎]

+op(1),

which implies that the Theorem holds.

Remark:

n

∑

i=1

I_{di₁=0, di₂=1}

n −→f(µτ)

∫

Φ(x)Φ(₋x)dx+o(σ−1

τ )

and

n

∑

i=1

I_{di₁=1, di₂=0}

n −→f(µτ)

∫

Φ(x)Φ(₋x₋γ)dx+o(σ_τ−1),

(8)

2.2 Estimation of

β

when

γ

is zero

Let

Dn={(di1, di2) ′

: di1+di2 = 1 for i= 1,· · · , n}

and m =# _D

n, the number of elements in Dn. Without loss of generality, suppose

that di1 +di2 = 1 for i= 1,· · · , m.

The conditional probability

P_{di1 = 1, di2 = 0|di1+di2 = 1, x_i₁,x_i₂}

=

∫ Φ(x′

i1β+t)Φ(−x ′

i2β−t)f(t)dt ∫

Φ(x′

i1β+t)Φ(−x ′

i2β−t)f(t)dt+ ∫

Φ(₋x′

i1β−t)Φ(x ′

i2β+t)f(t)dt

.

Under (3), we can similarly prove

lim

στ→∞

P_{di1 = 1, di2 = 0|di1+di2 = 1, xi1,xi2}=

G((x_i₂−x_i1)′β)

G((x_i₂ −x_i1)′β) +G(−(x_i₂−x_i₁)′β).

For sufficient large στ, we can replace the conditional likelihood of β given Dn by

L(β) =

m

∏

i=1

pzi

i (1−pi)1−zi (6)

where zi =I{di₁=1,di₂=0} and 1−zi =I{di₁=0,di₂=1}, and

pi =

G((x_i₂−x_i1)′β)

G((x_i₂−x_i1)′β) +G(−(x_i₂−x_i1)′β). (7)

If we define a function

K(t) = G(t)

G(t) +G(₋t),

we can show thatK(t) is monotonic intand then (7) can be expressed into generalized

linear models with the link function K−1₍_t_{), that is,}

K−1₍_p

(9)

So related results for generalized Models given by McCullagh and Nelder(1989) can

be applied to (6). Under some regular conditions andστ −→ ∞, the consistency of β

can be obtained.

2.3 Simultaneous estimation

γ

and

β

for Models (1)

As in Section 4, we have

lim

στ→∞

P_{di1 = 1, di2 = 0|di1+di2 = 1, x_i₁,x_i₂}= G(γ+ (

x_i₂−x_i1)′β)

G(γ+ (x_i₂−x_i1)′β) +G(−(x_i₂−x_i1)′β).

For the large στ, we replace the condition likelihood given Dn of γ and β by

L(β) =

m

∏

i=1

pzi

i (1−pi)1−zi (8)

where zi =I{di₁=1,di₂=0} and 1−zi =I{di₁=0,di₂=1}, and

pi =

G(γ+ (x_i₂ −x_i1)′β)

G(γ+ (x_i₂−x_i1)′β) +G(−(x_i₂−x_i1)′β). (9)

Let

X∗ = (x₁₂−x₁₁,x₂₂−x₂₁,· · · ,x_m₂−x_m1)

Theorem 4. (9) is identifiable forγ andβif the rank ofX∗ _{is equal to}_k_(dimension

of x₂_i−x₁_i) and at least there exits j and 1≤s₁,· · · , s_k ≤m which satisfy

x_j₂−x_j₁ =a1(x_s

12−xs11) +a2(xs22−xs21) +· · ·+ak(xsk2−xsk1)

where a1,· · · , ak is non-positive real number.

Proof: By Lemma 4 given in Appendix, it can be proved withri =pi/(1−pi) and

x_i =x_i₂−x_i1.

The conditions in Theorem 4 is sufficient and it can be satisfied with probability

near 1 for the large sample size n if the covariate x_i₂−x_i₁ is a continuous variable

(10)

Corollary Under the condition in Theorem 4, let1mbe them−dimensional vector

with its components 1 and then the rank of (1m, X∗

′

) is k+ 1.

Proof: Without loss of generality, suppose that x₁₂−x₁₁,· · · ,x_k₂−x_k₁ are linear

independent and

x_k_{+1 2}−x_k_{+1 1} =a₁(x₁₂−x11) +· · ·+a_k(x_k₂−x_k1)

where a1,· · · , ak is non-positive real number. Then the determinant

       x′

12−x ′

11 1

x′

22−x ′

21 1 ... ...

x′

k2−x ′

k1 1

x′

k+1 2−x ′

k+1 1 1        (10)

is equal to

x′

12−x ′ 11

x′

22−x ′ 21 ...

x′

k2−x ′ k1     

1₋(x_k_{+1 2}−x_k+1 1)

′      x′

12−x ′ 11

x′

22−x ′ 21 ...

x′

k2−x ′ k1      −1 1k     

=x₁₂−x₁₁,x₂₂−x₂₁,· · ·,x_k₂−x_k₁ [ 1₋ k ∑ i=1 ai ] ̸ = 0

by the assumption. This implies that the rank of (10) is k+ 1.

Since the rank of (1m, X∗

′

) is equal to that of (X∗′

,1m), which is a m×(k+ 1)

matrix, and (10) is a matrix obtained by the first k+ 1 rows of (X∗′,1m), thus the

rank of (1m, X∗

′

) is k+ 1.

From Corollary, it seems that identifiable conditions of (9) are stronger than that

of linear models since that the rank of design matrices is equal to the dimension of

(11)

3 Simulation studies

For the given sample sizen, 100 simulations are repeated and estimating results are

listed in the following table.

n = 1000 n = 5000

γ

-2 -1.955(0.158) -2.066(0.202) -1.5 -1.426 (0.233) -1.506 (0.188) -1 -1.027 (0.230) -1.002 (0.142) -0.5 -0.514 (0.198) -0.514 (0.151) U(-3,3) 0 -0.010 (0.154) U(-10,10) -0.009 (0.128) 0.5 0.514 (0.205) 0.517 (0.312) 1 1.067 (0.169) 0.984 (0.141) 1.5 1.495 (0.153) 1.495 (0.153) 2 1.997 (0.246) 2.037 (0.175) -2 -1.788(0.206) -1.955(0.158) -1.5 -1.483(0.146) -1.506 (0.188) -1 -0.970 (0.199) -0.994 (0.127) -0.5 -0.496 (0.148) -0.507 (0.117) N(0,4) 0 -0.030 (0.146) N(0,25) -0.010 (0.111) 0.5 0.509 (0.161) 0.472 (0.098) 1 1.029 (0.172) 1.008 (0.110) 1.5 1.507 (0.176) 1.496 (0.120) 2 2.073 (0.219) 2.032 (0.164)

Appendix

Lemma 1. If f(x) satisfies the conditions given in Theorem 1, then

∫

Φ(x)Φ(₋x₋γ)f(x)dx=f(µτ)

∫

Φ(x)Φ(₋x₋γ)dx+o(σ−1

τ )

and

∫

Φ(₋x)Φ(x)f(x)dx=f(µτ)

∫

Φ(₋x)Φ(x)dx+o(σ−1

(12)

Proof.

στ[

∫

Φ(x)Φ(₋x₋γ)f(x)dx₋f(µτ)

∫

Φ(x)Φ(₋x₋γ)dx] = ∫

Φ(x)Φ(₋x₋γ)g(x−µτ

στ

)dx₋g(0) ∫

Φ(x)Φ(₋x₋γ)dx

≤ ∫ x>M

στ

)dx+ ∫

x<−M

στ

)dx

+g(0) ∫

x>M

Φ(x)Φ(₋x₋γ)dx+g(0) ∫

x<−M

+ ∫

|x|≤M

Φ(x)Φ(₋x₋γ)

g(x−µτ

στ

)₋g(0)

dx

≤Φ(₋M₋γ) + Φ(₋M) +g(0) ∫

x>M

+g(0) ∫

x<−M

Φ(x)Φ(₋x₋γ)dx+ ∫

|x|≤M

Φ(x)Φ(₋x₋γ)

g(x−µτ

στ

)₋g(0)

dx.

For given γ , Φ(₋M ₋γ) and Φ(₋M) can be arbitrary small for sufficient large

M. Furthermore ∫

Φ(x)Φ(₋x₋γ) is integrable, and so∫

x<−MΦ(x)Φ(−x−γ)dxand

∫

x>M Φ(x)Φ(−x−γ)dx can also be arbitrary small for sufficient large M. For given

M,∫

|x|≤MΦ(x)Φ(−x−γ)

g(

x−µτ

στ )−g(0)

dxcan also be arbitrary small for sufficient large στ. So

∫

Φ(x)Φ(₋x₋γ)f(x)dx=f(µτ)

∫

Φ(x)Φ(₋x₋γ)dx+o(στ−1).

(13)

Lemma 2.

∫

Φ(₋x)Φ(x+β)dx=βΦ(_√β 2) +

1

√

πexp{− β2

4 }.

Proof. By the fact d(xΦ(x) +ϕ(x)) = Φ(x) and integration by parts,

∫

Φ(₋x)Φ(x+β)dx = ∫

ϕ(x)[(x+β)Φ(x+β) +ϕ(x+β)]dx

= β

∫

ϕ(x)Φ(x+β)dx+ ∫

xϕ(x)Φ(x+β)dx+ ∫

ϕ(x)ϕ(x+β)dx

= βΦ(√β

2) + 2 ∫

ϕ(x)ϕ(x+β)dx

= βΦ(√β

2) + 1

√

πexp{− β2

4 }.

Lemma 3. Iff(x) satisfies the conditions given in Theorem 1 and is derivative at

µτ, then

∫

Φ(₋x)Φ(x)f(x)dx −G(γ) =

{

γ2 2 Φ(−

γ

√

2) + Φ(−

γ

√

2)−

∫ −γ/√2

−∞

t2ϕ(t)dt

}√

π στ

+o(σ_τ−1).

Proof: Expand the function

∫

Φ(₋x)Φ(x)f(x)dx

atστ =∞ and then the Lemma can be obtained.

Lemma 4. Let x₁,x₂,· · · ,x_k,x_k₊₁ ∈ Rk satisfy: (a) x₁,x₂,· · · ,x_k are linearly

(14)

real number, and r1,· · · , rk, rk+1 be positive real number, then the equation                           

G(x′

1β+α)−r1G(−x ′

1β) = 0

G(x′

2β+α)−r2G(−x′

2β) = 0

· · · ·

G(x′

kβ+α)−rkG(−x′

kβ) = 0

G(x′

p+1β+α)−rk+1G(−x ′

k+1β) = 0

(11)

has unique solution β and α.

Proof: For fixed α, let

uα(z) =

G(z+α)

G(₋z)

and

duα(z)

dz = G′

(z+α)G(₋z) +G(z+α)G′ (₋z)

G2₍₋_z₎

= ₋√πΦ(−(z+α)/

√

2)G(₋z) +G(z+α)Φ(z/√2)

G2₍₋_z₎

< 0.

Souα(z) is deceasing inz and lim

z→−∞uα(z) = ∞and limz→∞uα(z) = 0. Thus for fixedα,

the equation                   

G(x′

1β+α)−r1G(−x ′

1β) = 0

G(x′

2β+α)−r2G(−x ′

2β) = 0

· · · ·

G(x′

kβ+α)−rkG(−x′

kβ) = 0

(12)

has a unique solution when x₁,· · · ,x_k are linearly independent.

Letβ∗ _{= (}_β₁₍_α₎_,_{· · ·} _{, β}

k(α))

′

the solution of (12), and then

dβ∗

dα =−X

′−1

(15)

where

δ= (δ1,· · · , δk)

′

, δi =

Φ(₋(x′

iβ∗+α)/

√

2) Φ(₋(x′

iβ∗ +α)/

√

2) +riΦ(x′

iβ∗/

√

2)

and

X = (x₁,x₂,· · · ,x_k).

Define

t(α) = G(x

′

k+1β∗+α)−rk+1G(−x

′

k+1β∗),

and then

dt(α)

dα = −

√

π

{[ Φ(₋x

′

k+1β∗+α

√

2 ) +rk+1Φ(

x′

k+1β∗

√

2 ) ]

x

′

k+1

dβ∗

dα + Φ(−

x′

k+1β∗+α

√

2 ) }

= ₋√π

{[ Φ(₋x

′

k+1β∗+α

√

2 ) +rk+1Φ(

x′

k+1β∗

√

2 ) ] ( _k

∑

j=1

cjδi

)

+ Φ(₋x ′

k+1β∗+α

√

2 ) }

< 0,

which implies t(α) = 0 have an unique solution and the Lemma is concluded.

References

Arellano, M. (2003). Discrete choices with panel data. Investigaciones Economicas,

27, 423-458.

Arellano, M. and Bonhomme, S. (2009). Robust priors in nonlinear panel data

models. Econometrica, 77, 489-536.

Arellano, M. and Honore, B. (2001). Panel data Models: some recent developments.

Handbook of Econometrics, Vol. V, ed. by J. Heckman and E. Leamer.

(16)

Bartolucci, F. and Farcomeni, A.(2009). A multivariate extension of the dynamic

logit model for longitudinal data based on a latent Markov heterogeneity

struc-ture. Journal of the American Statistical Association, 104, 816-833.

Bartolucci, F. and Nigro, V.(2010). A dynamic model for binary panel data with

un-observed heterogeneity admitting a√nconsistent conditional estimator.

Econo-metrica, 78, 719-733.

Chamberlain, G.(1980). Analysis of covariance with qualitative data. Review of

Economic Studies, 47, 225-238.

Chamberlain, G.(1985). Heterogeneity, omitted variables bias, and duration

depen-dence. Longitudinal Analysis of Labor Market Data, edited by Heckman, J. and

Singer, B. Cambridge University Press.

Heckman, J.(1980). The incidental parameters problem and the problem of initial

conditions in estimating a discrete time-discrete data stochastic process.

Struc-tural Analysis of Discrete Data with Econometric Applications, ed by C. F.

Manski and D. McFadden, p179-195. Cambridge, MA: MIT Press.

Heckman, J.(1978). Simple statistical models for discrete panel data developed and

applied to test the hypothesis of true state dependence against the hypothesis

of spurious state dependence. Annales de l’lNSEE 30/31, 227-269.

Hisao, C.(2005). Analysis of Panel Data(Second Ed.). New York: Cambridge

(17)

Honore, B. and Kyriazidou, E.(2000). Panel data discrete choice models with lagged

dependent variables. Econometrica, 68, 611-629.

Horowitz, J. L.(1992). A smoothed maximum score estimator for binary response

model. Econmetrica, 60, 505-531.

Lancaster, T.(2002). Orthogonal parameters and panel data. Review of Economic

Studies, 647-666.

Manski, C. (1987). Semiparametric analysis of random effects linear models from

binary panel data. Econometrica, 55, 357-362.

McCullgh, P. and Nelder, J. A.(1989). Generalized Linear Models. London: