Tests in Censored Models when the Structural Parameters Are Not Identified

(1)

Tests in Censored Models when the Structural Parameters Are Not

Identified

Leandro M. Magnusson Department of Economics Tulane University [email protected] December 2, 2008 Abstract

This paper presents tests for the structural parameters of a censored regression model with endogenous explanatory variables. These tests have the correct size even when the identification condition for the structural parameter is invalid. My approach starts from the estimation of the unrestricted parameters, which does not depend on the identification of the structural parame-ter. Next, I set up the optimal minimum distance objective function, from where I derive the tests. The proposed robust tests are implemented in many statistical software packages since they demand only the ‘Tobit’ and the ‘ordinary least squares’ estimation functions. By simulat-ing their power curves, I compare the robust to the Wald and the likelihood ratio tests. A case of the labor supply of married women illustrates the use of the robust tests for the construction of confidence intervals.

JEL Classification: C12, C34.

Keywords: Endogenous Tobit, weak instruments, minimum distance estimation, female la-bor supply.

(2)

1 Introduction

The purpose of this paper is to present tests for the structural parameters of censored models which have the correct size under the null hypothesis even when those parameters are not identified. These tests depart from the minimum distance objective function and can be performed in many statistical software packages.

In nonlinear models, general identification conditions for the structural parameters are hard to obtain. However, a necessary global identification condition is that the expected value of the Jacobian of the objective function under the true distribution must be a full rank matrix (Newey and McFadden (1994)). The lack of identification misguides the usual asymptotic theory behind point estimation and hypothesis testing (see Staiger and Stock (1997) and Stock and Wright (2000)).

New tests have been developed to overcome the deficiencies of the Wald, Lagrange multiplier and likelihood ratio tests when the identification condition fails. In the case of linear simultaneous equation model, the pioneering test is the AR-test (Anderson and Rubin (1949)). Kleibergen (2002) proposes a Lagrange multiplier test, also known as the K-test, based on the asymptotic independence between the empirical moment restriction and its Jacobian under the null hypothesis. This principle comes from the partition of an invariant sufficient statistic into two independent ones. Then, tests can be performed conditioning one statistic into the other. Using this principle Moreira (2003) derives the conditional Wald (CW) and the conditional likelihood ratio (CLR) tests which are not pivotal. Departing from the objective function of the continuous updating estimator (CUE), Kleibergen (2005) extends the K- and the CLR-tests to the generalized method of moments framework.

However, for models in which the structural parameters are not separable from the others, the K-test demands the identification and the consistent estimation of untested parameters under the null hypothesis. It also requires the estimation of the covariance between the empirical moments and the Jacobian. Often, the estimation of untested parameters and the covariance matrix between moments and the Jacobian are computationally intensive.

I derive weak instruments robust tests for censored models departing from the minimum dis-tance objective function. This approach avoids the estimation of untested parameters and co-variance matrix between moments and Jacobian under the null hypothesis. Moreover, they can be implementable using regular statistical software such as Stata and R. Those robust tests are modifications of existing ones, so I use the subscript M to denote them.

(3)

structural parameters. In the unrestricted model, the auxiliary parameters are well-identified inde-pendent of the presence of weak instruments. The auxiliary parameters can be estimated either by the two-stage conditional maximum likelihood method as proposed by Smith and Blundell (1986), or any other consistent estimators such as the symmetrically censored least square (Powell (1986)) and the winsorized mean estimator (Lee (1995)). Simple linear restrictions on the unrestricted parameters are enough to obtain the minimum distance objective function for the structural pa-rameter. Robust tests are derived from the minimum distance objective function following the same lines as Kleibergen (2005). The minimum distance approach allows the extension of the weak instrument robust tests to other classes of limited dependent variable models such as endogenous probit and endogenous ordered probit (see Magnusson (2006)).

In the next section I present the censored model with endogenous explanatory variables and the assumptions behind it. I also discuss the failure of identification and the weak instruments asymptotics for that model. The third section deals with the derivation of weak instruments robust tests using the minimum distance approach. The fourth section presents simulations of the rejection probability curves in order to compare the performance between the proposed tests and the Wald and likelihood ratio tests. In the fifth section I use the weak instruments robust tests to build confidence intervals for the structural parameter of a female labor supply model. The sixth section summarizes and concludes the paper. Proofs, mathematical passages and data description are in the appendices.

2 The Censored Model with Endogenous Explanatory Variables

2.1 Model and Identification

The censored model with endogenous explanatory variables, also known as the endogenous Tobit, is first addressed by economic literature in the seventies (see Amemiya (1979) and Lee (1981)). It departs from the following structural latent linear simultaneous equation model:

     Y_t∗ = Xtβ+ Ut Xt= ZtΠz+ Vt

where Y_t∗ and Ut are scalars, Xt and Vt are 1× m vectors of endogenous variables and residuals,

Zt is 1× k vector of excluded instruments. Yt∗ is observed only if Yt∗ > 0. Included exogenous

(4)

system with one censored endogenous variable is defined as:            Yt= max{0, Xtβ+ Ut} Xt= ZtΠz+ Vt Dt= 1(Yt∗>0) (2.1)

where 1(_{·) is a binary indicator function which assumes the value 1 if Y}∗

t > 0 and 0 otherwise.

Residuals follow an independent multivariate normal distribution.1

Assumption 1. Let_{Ut, Vt}T_t=1 be a sequence of independent random variables. Each pair{Ut, Vt}

follows a multivariate normal distribution conditional on Zt, i.e,

(Ut, Vt)|Zt∼ N  0,   σ2_u Σuv Σvu Σvv    

There are several ways to estimate the parameters of the above limited information model under assumption 1. Some examples include the maximum likelihood estimator (MLE), the Amemiya generalized least squares (AGLS - Amemiya (1979)), the two-stage conditional maximum likelihood (TSCML - Smith and Blundell (1986)) and the Newey conditional generalized least squares (CGLS - Newey (1987)).

The regular identification condition for the structural parameter β requires that Πz is full

column ranked, i.e., the instruments should be correlated with the endogenous variables. If there exists a non-full rank matrix in a “small” neighborhood of Πzor if Πz is not itself a full rank matrix,

then the exogenous variables are labeled as weak instruments. In their presence, the small sample distribution of a estimator is different from the asymptotic approximation. Additionally, statistical inference based on the classical tests (Wald, Lagrange multiplier (LM) and likelihood ratio (LR)) depends on the presence of nuisance parameters (Stock and Wright (2000)) and so is invalid.

Figure 1 illustrates the log-likelihood functions when the instruments are strong and weak. In this example there is only one instrumental variable Zt, which follows a standard normal

distri-bution. I set Πz = 1 and Πz = 0.1 in order to mimic, respectively, strong and weak instruments.

The residuals Ut and Vt are joint-normally distributed with σu2 = Σvv = 1 and Σuv = 0.5. The

log-likelihood functions are evaluated assuming that the covariance terms are known.

In the case of the strong instrument, the log-likelihood is globally concave and is uniquely maximized. When the instrument is weak the log-likelihood resembles a quasiconcave function. The smoothness along the line where Πz = 0 indicates the lack of global identification of β.

1

(5)

Fig. 1: Endogenous Tobit log-likelihood functions with strong and weak instrument. −4 −2 0 2 4 −2 −1 0 1 2 −15 −10 −5 0 x 104 Π strong instrument β log−likelihood −4 −2 0 2 4 −2 −1 0 1 2 −15 −10 −5 0 x 104 Π weak instrument β log−likelihood

Staiger and Stock (1997) model Πz as local to zero in order to describe weak instruments

asymptotics. I adopt the same assumption which is reproduced below as a definition.

Definition 1. Let C be a full rank matrix. Πz has the following asymptotic behavior in case of strong, weak and irrelevant instruments, respectively:

i) Πz= C,

ii) Πz= ΠT = √C_T,

iii) Πz= 0.

2.2 Likelihood and Score for the Structural Model

From the properties of the multivariate normal distribution, we have:

Ut= Vtα+ εt, εt|Vt,Zt∼ N (0, σε2)

where α = Σ−1_vvΣvu, σε2 = σu2(1− ρ′ρ) and ρ = Σ −1/2 vv Σvu

σu . The conditional structural model is

obtained by substituting the above relation into equation (2.1).            Yt= max{0, Xtβ+ Vtα+ εt} Xt= ZtΠ + Vt Dt= 1(Yt∗>0) (2.2)

(6)

and conditional distributions, since εt is conditionally independent of Xt. The log-likelihood

func-tion for the endogenous Tobit after concentrating Σvv out is:

ℓT(β, α, σε2,Πz,Σvv) = T X t=1 ℓy_|(x,z)(yt|xt, zt; β, α, σε2,Πz) + T X t=1 ℓc_x_|z(xt|zt; Πz)

where ℓy|(x,z) is the log-likelihood of the Tobit model with latent mean xtβ+ vtα and variance σε2

and ℓc

x_|z is the concentrated log-likelihood of the multivariate normal density.

Setting moment restrictions which are valid under the null hypothesis is the starting point for testing the value of the structural parameter β. In the maximum likelihood set up, the score functions provide the natural moments. Before presenting them, let me introduce more notations. The following vector

η=hα′ σ_ε2 vec(Πz)′

i′ ,

is defined as the vector of parameters that are not being tested. The pseudo-residuals are farther defined as follows:2 e(1)_t (β, η) = dt yt− wtδ σε − (1 − dt) φt 1_{− Φ}t (2.3) e(2)_t (β, η) = dt " yt− wtδ σε 2 − 1 # + (1− dt) wtδ σε φt 1_{− Φ}t (2.4)

where wtδ= xtβ+ vtα, φtand Φt are, respectively, the normal density and cumulative distribution

functions evaluated at wtδ

σε . Under the true data generating process, the following is observed: 3 Ehe(1)_t (β0, η0)|xt, zt i = Ehe(2)_t (β0, η0)|xt, zt i = 0

The score functions are, in matrix notation:

∇βℓT(β, η) = 1 σε e′₁X (2.5) ∇ηℓT(β, η) =                  1 σε e′₁V 1 2σ2 ε e′₂l vec Z′V V_T′V−1 ′ − α′_⊗e′1Z σε (2.6)

where l is column vector whose elements are 1. The K-test is based on the asymptotic indepen-dence between the moment conditions and the expected Jacobian. Since β lies in a subset of the

2

The pseudo-residuals play the same role as the generalized residuals proposed by Gorieroux et al. (1987) 3

(7)

parameter space, the K-test requires the estimation of η under the null hypothesis H0 : β = β0.

The identification condition for η demands that

E [∇ηηℓt(β0, η)]

should be continuous with respect to the parameters of the model and full rank at the true value (β0, η0) (see Kleibergen (2005) assumption 3). In this example, checking if the identification

as-sumption holds is not straightforward. A practical solution is to assume identification of η. Let ˆηβ0 be a consistent estimator for η. The subsequent calculation of the K-test requires an

estimator for the covariance matrix between the score and the Hessian under the null hypothesis, i.e, a statistic for:

1 T T X t=1 E (" vec ∇ββℓt(β, η) ∇βηℓt(β, η) ∇ηβℓt(β, η) ∇ηηℓt(β, η) !# h ∇βℓt(β, η) ∇ηℓt(β, η) i ) (β,η)=(β0,ˆη_β0)

Finding the above covariance matrix is analytically difficult and numerical approximations can be computationally unstable in some regions of the parameter space. Thus, the use of K-test for the construction of confidence intervals are challenging.4 In order to avoid the inherent difficulties behind the K-test I devise alternative weak instruments robust tests for censored models which are elaborated in section 3. Next subsection provides some theoretical results necessary for the derivation of the new tests.

2.3 Unrestricted Model and Its Likelihood

Instead of working directly with the structural model, I use the minimum distance framework to derive weak instruments robust tests. I first present a consistent asymptotically normal estimator for the unrestricted parameters and their covariance matrix.

From (2.2), the unrestricted conditional model is:            Yt= max{0, Ztπz+ Vtγ+ εt} Xt= ZtΠz+ Vt Dt= 1(Yt>0) (2.7)

Under assumption 1, a simple parametric estimator for the unrestricted parameters is the TSCML. I choose the TSCML because it allows the implementation of the robust tests in almost any statistical software. Moreover, the TSCML is the same as the maximum likelihood estimator

4

(8)

and therefore shares the efficient properties of the latter (see Newey (1987), proposition 7). In the first stage, I obtain an estimate of Πz using the ordinary least squares. The remaining parameters

are estimated from the conditional Tobit likelihood:

Lc_T(yt|zt; πz, γ, σε2,vec( ˆΠz)) = T Y t=1 Φ −ztπz_σ+ ˆvtγ ε 1−dt₁ σε φ yt− (ztπz+ ˆvtγ) σε dt

where ˆvtis the ordinary least squares residual. Instead of relying on the normality assumption, any

semi-parametric estimator of the unrestricted parameters are also suitable. Some examples are the symmetrically censored least squares and the winsorized mean (see Powell (1986) and Lee (1995), respectively).

The reduced form parameters from (2.7) are identified under mild assumptions, independently of weak instruments. Moreover, the likelihood function is twice continuous differentiable. Therefore the Law of Large Numbers and the Central Limit Theorem hold under the true data generating process and the estimator for the unrestricted parameters is consistent and asymptotically normal:

Lemma 1. If assumption 1 holds, E_kZ_t′Ztk < +∞ and E [Zt′Zt] is nonsingular, we have:

a) the TSCML estimator for the unrestricted parameters is consistent, i.e, as T → +∞,

ˆ π_z′ θˆ′ vec( ˆΠz)′ _p −→π_z′0 θ′0 vec(Πz0)′ (2.8) where θ0 = γ₀′ σ2_ǫ₀ ′ . b) As T _{→ +∞,} √ T h (ˆπz− πz0)′ (ˆθ− θ0)′ vec( ˆΠz− Πz0)′ i _d −→ N (0, G−1ΩG−1′) (2.9) where G=₋     Ωπzπz Ωπzθ −γ0′ ⊗ Ωπzπz Ωθπz Ωθθ −γ₀′ ⊗ Ωθθ 0 0 Im⊗ E [Zt′Zt]     , Ω =     Ωπzπz Ωπzθ 0 Ωθπz Ωθθ 0 0 0 Σvv⊗ E [Zt′Zt]     and " Ωπzπz Ωπzθ Ωθπz Ωθθ #

is the Fisher information matrix derived from the Tobit model. Proof. See Appendix A.2

The lemma 1 shows that the presence of weak instruments does not affect the consistency of TSCML estimator for the reduced form parameters as well as the consistency of the asymptotic covariance matrix estimator.

(9)

Since our interest is to test only the structural parameter β, we focus on the restriction πz = Πzβ.

Next lemma presents the joint asymptotic distribution of ˆπz and ˆΠz, which is an important result

for the definition of the minimum distance objective function.

Lemma 2. Under assumptions of lemma 1, we have:

a) The joint asymptotic distribution of ˆπ and ˆΠ is

√ T   ˆ π_{− π}0 vec( ˆΠ− Π0)   d −→ N     0 0  ,   Ω−1_π_z_π_z_.θ+ γ₀′Σvvγ0E [Zt′Zt]−1 γ₀′Σvv⊗ E [Zt′Zt]−1 Σvvγ0⊗ E [Zt′Zt]−1 Σvv⊗ E [Zt′Zt]−1     (2.10) where Ωπzπz.θ= Ωπzπz − ΩπzθΩ−1θθΩθπz.

b) Let ˆΩπzπz.θ be an estimator for Ωπzπz.θ as defined in the appendix. One may show that, as

T _{→ +∞,}

ˆ Ωπzπz.θ

p

−→ Ωπzπz.θ Proof. See Appendix A.3.

Any statistical software with the least squares and tobit functions can provide estimates for the unrestricted parameters, ˆΣvv and ˆΩ−1πzπz.θ. Since ˆγ is a consistent estimator for γ0, getting an

estimate for the asymptotic variance in (2.10) is straightforward by the “plug-in” method.

3 Weak Instruments Robust Tests for the Endogenous Tobit Model

In this section I present the weak instruments robust tests for the endogenous Tobit model. They are modified versions of existent tests and are denoted by the subscript M. Pre-multiplying (2.10) byhIk −β′⊗ Ik i′ results in √ T[(ˆπz− ˆΠzβ)− (πz0 − Πz0β)] d −→ N (0, Ψβ) (3.1) where: Ψβ = Ω−1_π_z_π_z_.θ+ (γ0− β)′Σvv(γ0− β)EZt′Zt −1 Rewrite πz0 as follows: πz0 = Πz0β0+ Π⊥z0ζ

where Π⊥_z0 is a k× (k − m) matrix, orthogonal to Πz0 and ζ is a (k− m) × 1 vector. A simple weak

instruments robust test for the structural parameter β is derived from the quadratic form of (3.1):

S(β) = T h (ˆπz− ˆΠzβ)− (Πz0(β0− β) + Π⊥z0ζ) i′ Ψ_β−1h(ˆπz− ˆΠzβ)− (Πz0(β0− β) + Π⊥z0ζ) i (3.2)

(10)

Under the null hypothesis H₀S : β = β0, ζ = 0, S(β) converges asymptotically to a χ2-distribution

with k degrees of freedom, the number of instruments, as stated in the following theorem:

Theorem 3.1. Define SM(β0) = T ˆ πz− ˆΠzβ0 ′ ˆ Ψ_β−1₀ πˆz− ˆΠzβ0 (3.3) where ˆΨβ0 = ˆΩ−1_π_z_π_z_.θ+ (ˆγ− β0)′Σˆvv(ˆγ− β0) Z′ Z T −1

. Under H₀S : β = β0, ζ = 0 and hypotheses of

Lemma 2, we have:

SM(β0) d

−→ χ2(k) independently of the quality of the instruments.

Proof. It follows directly from Lemma 2 and equation (3.2).

The SM(β0)-test is the minimum distance estimator objective function for the structural

pa-rameter evaluated at the hypothesized value.5 _{As explicitly shown in (3.2), the S}

M-test tests

simultaneously two hypotheses: one for the parameter value and the other for location. The second hypothesis is about the overidentification restriction.

The SM-test may always reject the null hypothesis parameter if ζ 6= 0 and, consequently, the

confidence regions constructed by inverting this statistic may be empty. On the other hand, if the instruments are weak, the test may not reject the null at any point in the parameter space, resulting in unbounded confidence regions.

In the context of linear limited dependent variable model, Anderson and Rubin (1949) proposed the following F -test:

AR(β0) = 1_k (y_{− Xβ}0)′Pz(y− Xβ0) ˆ σ2 ε(β0) = T k n (ˆπz− ˆΠzβ0)′Vˆβ−10 (ˆπz− ˆΠzβ0) o (3.4) where: ˆ πz = (Z′Z)−1Z′y ˆ Πz = (Z′Z)−1Z′X ˆ Vβ0 = ˆσ 2 ε(β0) Z′_Z T −1 ˆ σ_ε2(β0) = (y−Xβ0) ′ Mz(y−Xβ0) T_−k (3.5)

Therefore the SM-test is the extension of the AR-test to the endogenous Tobit model. Both

tests project the moments onto the space spanned by the instruments (or a function of them), which does not depend on the nuisance parameter.

One disadvantage of the SM-test is that the degrees of freedom equal the number of excluded

instruments. Thence the power against the alternative hypothesis decreases as the number of

5

(11)

instruments increases. This weakness motivated the development of robust tests in which the degrees of freedom and the number of structural parameters are the same.

Kleibergen’s solution comes from the asymptotic independence between the moment condition and its expected Jacobian (see Kleibergen (2004) and Kleibergen (2005)). I propose to derive the robust test based on the independence between ˆπz− ˆΠzβ, the mapping between unconstrained and

constrained parameters, and the Hessian of the minimum distance function (3.3). For now, assume that ζ = 0.

I start from the asymptotic joint distribution of ˆπz− ˆΠzβ and ˆΠz.

Theorem 3.2. Given that ζ = 0, under the null hypothesis HK

0 : β = β0 and assumptions of

Lemma 1, the asymptotic joint distribution of√T(ˆπz− ˆΠzβ0) and

√ Tvec( ˆΠz) is: √ T   ˆ πz− ˆΠzβ0 vec( ˆΠz− Πz0)   d −→N     0 0  , " Ψβ0 (γ0− β0)′Σvv⊗ (E [Z_t′Zt])−1 Σvv(γ0− β0)⊗ (E [Zt′Zt])−1 Σvv⊗ (E [Zt′Zt])−1 #  (3.6)

Proof. Immediate from Lemma 2.

The next collorary shows that the asymptotic independence between √T(ˆπz − ˆΠzβ0) and

√

T( ˆΠz− Πz0) is obtained by pre-multiplying (3.6) by the lower block triangular matrix:

  Ik 0 −Σvv(γ0− β0)⊗(Ψβ0E [Z_t′Zt]) −1 _I m⊗Ik  

Corollary 1. Given ζ = 0, under the null hypothesis HK

0 : β = β0 and assumptions of Lemma 1, we have: √ T   ˆ πz− ˆΠzβ0 vec( ¯Πβ0 − Πz0)   d −→ N     0 0  ,   Ψβ0 0 0 Ξβ0     (3.7) where: ¯ Πβ0 = ˆΠz− (EZ_t′Zt)−1Ψ_β−1₀ (ˆπz− ˆΠzβ0)(γ0− β0)′Σvv Ξβ0 = Σvv⊗EZt′Zt −1 − Σvv(γ0− β0)(γ0− β0)′Σvv⊗ EZ′ tZt Ψβ0EZt′Zt −1

Proof. Immediate from Theorem 3.2.

The statistic ¯Πβ0, whose distribution depends on Πz0, is a random variable which is

asymp-totically independent of ˆπz − ˆΠzβ0 under the null hypothesis H0K. Therefore, the distribution of

ˆ

πz− ˆΠzβ0 conditional on a given value of ¯Πβ0 does not depend on Πz0. I use this property to derive

(12)

Theorem 3.3. Define the modified version of the K-test as: KM(β0) = T ˆ πz− ˆΠzβ0 ′ ˆ Ψ_β−1₀ Πˆ¯β0 ˆ¯Π_β′0Ψˆ −1 β0 ˆ ¯ Πβ0 −1 ˆ¯ Π_β′0Ψˆ −1 β0 ˆ πz− ˆΠzβ0 = T ˆ πz− ˆΠzβ0 ′ ˆ Ψ− 1 2 ′ β0 ˆ ¯ Pβ0Ψˆ −1 2 β0 ˆ πz− ˆΠzβ0 (3.8) where: ˆ ¯ Pβ0 = ˆΨ −1 2 β0 ˆ ¯ Πβ0 ˆ¯Πβ′0Ψˆ −1 β0 ˆ ¯ Πβ0 −1 ˆ ¯ Π_β′0Ψˆ −1 2 ′ β0 (3.9a) ˆ ¯ Πβ0 = ˆΠz− Z′ Z T −1 ˆ Ψ−1_β₀(ˆπz− ˆΠzβ0)(ˆγ− β0)′Σˆvv (3.9b) ˆ Ψβ0 = ˆΩ−1_π_z_π_z_.θ+ (ˆγ− β0)′Σˆvv(ˆγ− β0) Z′ Z T −1 (3.9c)

Given ζ = 0, under the null hypothesis H₀K: β = β0 and assumptions of Lemma 1, as T → +∞:

KM(β0) d

−→ χ2(m) (3.10)

regardless whether the instruments are strong, weak or irrelevant as in definition 1. Proof. See appendix A.3.

The negative expected value of the Hessian of the minimum distance estimator is Ψ_β−1₀ Πz0.

Equation (3.8) shows that the KM-test is a quadratic form of the restriction mapping ˆπz− ˆΠzβ

projected onto the space spanned by an estimator of the Hessian. As explained by Moreira (2003), if the Hessian is estimated independently from the restriction function, then the conditional null distribution of the test is free from the nuisance parameter.

The K-test for the linear limited dependent variable model is (see Kleibergen (2002)):

K(β0) = (Y − Xβ0)′Z ˜Πβ0 ˜Π_β′0Z ′_{Z ˜}_Π_β 0 −1 ˜ Π_β′0Z ′_(Y _{− Xβ} 0) ˆ σ2_ε(β0) where: ˜ Πβ0 = (Z′Z)−1Z′ X₋ (Y −Xβ0)Y′MzX (Y −Xβ0)′Mz(Y −Xβ0)

Using the relations in (3.5) and defining ˆΣuv= Y ′ MzX T_−k , ˜Πβ0 becomes: ˆ Πz− Z′ Z T −1 ˆ V_β−1₀ (ˆπz− ˆΠzβ0) ˆΣuv

and the K-test is expressed as:

K(β0) = T ˆ πz− ˆΠzβ0 ′ ˆ V_β−1₀ Π˜β0 ˜Πβ′0Vˆ −1 β0 ˜ Πβ0 −1 ˜ Π_β′0Vˆ −1 β0 ˆ πz− ˆΠzβ0 = T ˆ πz− ˆΠzβ0 ′ ˆ V− 1 2 ′ β0 ˜ Pβ0Vˆ −1 2 β0 ˆ πz− ˆΠzβ0 (3.11)

(13)

where: ˜ Pβ0 = ˆV −1 2 β0 ˜ Πβ0 ˜Πβ′0Vˆ −1 β0 ˜ Πβ0 −1 ˜ Π_β′0Vˆ −1 2 ′ β0

Thence, similarly to the AR-test, the K-test also has a minimum distance interpretation. The first order condition of continuous updating estimator derived from the SM-test is:

∂SM(β) ∂β =−2 ˆ πz_{− ˆ}Πzβ ′ ˆ Ψ_β−1Πˆz − ˆ πz− ˆΠzβ ′ ˆ Ψ_β−1⊗πˆz− ˆΠzβ ′ ˆ Ψ_β−1 ∂vec ˆΨβ ∂β =−2 ˆ πz− ˆΠzβ ′ ˆ Ψ_β−1Πˆ¯β (3.12)

Therefore, as in the original K-test, the KM is a score static with the SM-test as its objective

functions. From (3.12), the minimum distance estimator:

ˆ βMD = ˆ¯Πβ′ˆMD ˆ Ψ−1_ˆ βMD ˆ Πz −1 ˆ ¯ Π_β′_ˆ MD ˆ Ψ−1_ˆ βMD ˆ πz (3.13)

will never reject the null hypothesis, implying that confidence regions derived from inverting the KM-test are not necessarily empty.

Nevertheless, the KM-test loses power at inflexion, local minimum and local maximum points,

since they also satisfy (3.12). This failure is related to the underlining hypothesis that the overi-dentification restriction is valid, i.e ζ = 0. Thence, a complementary test for overioveri-dentification is necessary, given that the value of the structural parameter is correct (β = β0). The following test is

the adaptation of the JK-test suggested by Kleibergen (2004) which is orthogonal to the KM-test:6

JKM(β0) = T ˆ πz− ˆΠzβ0 ′ ˆ Ψ− 1 2 ′ β0 ˆ ¯ Mβ0Ψˆ −1 2 β0 ˆ πz− ˆΠzβ0 (3.14) JKM(β0) d −→ χ2(k_{− m)} (3.15)

where ˆM¯β0 = Ik− ˆ¯Pβ0. Clearly, the JKM- and the KM-tests are independent. From (3.3) and (3.8)

the SM-test can be decomposed into two orthogonal statistics, i.e,

SM(β0) = KM(β0) + JKM(β0)

At points where KM-test suffers spurious decline of power, JKM-test assumes the value of the

SM-test which has always discriminatory power in those regions of the parameter space. Combining

the KM- and the JKM-tests defines a new statistic for the structural parameter. Let τKM and τJ_KM

be the levels of significance of KM and JKM, respectively. The combination test KMJKM has a

6 The original JK(β) is T ˆ πz− ˆΠzβ0 ′ ˆ V− 1 2′ β0 ˜ Mβ0Vˆ −1 2 β0 ˆ πz− ˆΠzβ0 , where ˜Mβ0 = Ik− ˜Pβ0

(14)

significance level of approximately τ = τKM+ τJ_KM.

7 _{Since our principal interest is to test the value}

of the structural parameter β, a choice for τKM is 0.04 and for τJ_KM is 0.01.

In the context of linear simultaneous equation models with only one endogenous variable, Mor-eira (2003) shows that the conditional likelihood ratio test is written as:

CLR(β0) = 1 2 AR∗(β0)− r(β0) + q (AR∗(β0) + r(β0))2− 4JK(β0)r(β0)

where AR∗(β0) = k× AR(β0) and r(β0) is a statistic that tests Πz = 0 under the assumption

that β = β0. The AR∗(β0) statistic can be decomposed as AR∗(β0) = K(β0) + JK(β0).8 For the

endogenous Tobit model, the modified conditional likelihood test is:

CLRM(β0) = 1 2 SM(β0)− rM(β0) + q (SM(β0) + rM(β0))2− 4JKM(β0)rM(β) (3.16) where: rM(β0) = Tn ˆ¯Πβ′0Ξˆ −1 β0 ˆ ¯ Πβ0 o ˆ Ξ_β−1₀ = ˆΣvv⊗ Z′ Z T −1 − ˆΣvv(ˆγ− β0)(ˆγ− β0)′Σˆvv⊗ Z′ Z T Ψˆβ0 Z′ Z T −1

The asymptotic distribution of the CLRM test is not pivotal since it depends on the value of

rM(β0). However, it is possible to simulate the critical values for the test by generating independent

values of χ2_{(m) and χ(k}_{− m), as explained by Moreira (2003).}9 _{The CLR}

M-test is a function of

the KM and the JM tests. Therefore there is no spurious decline of power. The limiting behavior

of the CLRM-test as a function of rMis:

CLRM−→ SM as rM−→ 0 and CLRM−→ KM as rM−→ +∞

4 Power Comparison

I investigate the rejection probability of the robust tests described in the previous section simulating their power curves. I also investigate the power of the classical likelihood ratio and

7

Let CRKJ, CRK and CRJbe the critical regions for KMJKM, KMand JKM tests. Thence:

Pr (KMJKM ∈ CRKJ) = Pr ({KM∈ CRK} ∩ {JM∈ CRJ}) + Pr ({KM∈ CRK} ∩ {JKM ∈ CR/ J})

+ Pr ({KM∈ CR/ K} ∩ {JKM∈ CRJ})

= τKMτJ_KM + τKM(1 − τJ_KM) + (1 − τKM)(τJ_KM)

= τKM+ τJ_KM − τKMτJ_KM ≈ τ.

8

K(β0) is defined in (3.11) and JK(β0) is defined at footnote 6. 9

See Kleibergen (2005) for the version of the conditional robust likelihood ratio test with two or more endogenous variables.

(15)

Wald tests, which are defined as: LR(β0) = 2 ℓ( ˆβ,η)ˆ _{− ℓ(β}0,η˜0) W(β0) = ( ˆβ− β0)′Vˆ( ˆβ)−1( ˆβ− β0)

where ℓ( ˆβ,η) and ℓ(βˆ 0,η˜0) are the unconstrained and constrained log-likelihood functions and ˆV( ˆβ)

is the variance of the unrestricted estimator of β.10 The simulations depart from the following linear latent model:      Y_t∗ = Xtβ+ Ut Xt= ZtΠz+ Vt Ut, Vt∼ N 0,h1 ρ_ρ₁i

I consider one endogenous variable and 3 instruments which come from independent random normals with zero mean and unitary variance. The instruments are the same for all simulations. The values of the concentration parameter λz = Π

′ zZ

′ ZΠz kσ2

v are 20, 10, 3, 1 and 0.01, in order to mimic

very strong, strong, medium, weak and inept instruments, respectively. The correlation coefficient ρ assumes values of 0, 0.5 and 0.9 and the number of observations is 300. Table 1 summarizes the simulations.

Table 1: Simulation design

k ρ λza 20 0.0 10 3 0.5 3 0.9 1 0.01 a λz= Π′ zZ′ZΠz kσ2 v .

It was generate 2000 endogenous Tobit samples from the above latent model. I test the hypoth-esis H0 : β = 0 for each simulation and compute the proportion of rejected tests in order to build

the power curves using 5% significance level. This section reports the results in which ρ = 0.9, leaving the remaining to appendix B.

When the instruments are very strong (λz equals 20) the KM-, KMJKM- and CLRM-tests have

the same shape as the classical tests as shown in the left-top of figure 2. Therefore, the efficiency loss

10

(16)

for using the former weak instruments robust tests is minimal. It is also possible to detect the gain of power of the KM-, KMJKM- and CLRM-tests over the SM-test due to model’s overidentification.

As the structural parameters moves towards the lack of identification, the classical tests start to perform wrongly. When the instruments are weak, both the Wald and the LR tests underreject the null hypothesis at the true value of β (the rejection probabilities of LR and Wald are above 7% and 10%, respectively). In case of inept instruments, the rejection probability rises above 10% for the LR test and near 30% for the Wald. Since those tests are based on the unrestricted estimative of β, it is clear that the maximum likelihood estimator is biased and this bias is affecting the two classical tests.

The weak instruments robust tests perform well even when the instruments are inept, attaining the expected rejection proportion of 5% under the null hypothesis. As the identification condition fails, their power curves approximate to the horizontal line, indicating that confidence intervals derived by inverting those tests increase according to the weakness of the instruments.

Although the CLRM test seems to dominate the remaining robust tests when the instruments

(17)

Fig. 2: Power curves for Robust, Wald and LR test. −2.50 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ z=20, ρ=0.9 S K M J K M KJ M LR M W LR −2.50 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ z=10, ρ=0.9 S K M J K M KJ M LR M W LR −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β₀ Rejection Probability λ z=3, ρ=0.9 S K M J K M KJ M LR M W LR −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β₀ Rejection Probability λ z=1, ρ=0.9 S K M J K M KJ M LR M W LR −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β Rejection Probability λ_z=0.01, ρ=0.9 S KM J_K M KJ_M LR_M W LR

(18)

5 Empirical Application: Labor Supply of Married Females

Blundell and Walker (1986) describe a model for married female labor supply as:            Yt= max{0, Xtβ+ Wtγ+ Ut} Xt= ZtΠz+ WtΠw+ Vt Dt= 1(Yt>0)

where Yt is weekly hours in paid work, Xtis other household income, which includes the husband’s

income, unearned income and savings. Besides a constant term, Wtincludes demographic variables:

female age and its square, education and its square, child dummy variables and a race dummy variable. The instruments Ztinclude regional unemployment rate, husband occupation and housing

tenure dummies. The term Dt is a labor force participation indicator. More details about the

variables are in appendix B.

I use the data set from Lee (1995) obtained from the 1987 cross-section of the Michigan Panel Study of Income Dynamics. The author selected married couples with nonnegative total family income and wife not self employed at working age (18-64). 895 out of 3382, the total number of married females, were not working (approximately 26.4%). Table 2 reports estimates for the structural parameter obtained from different estimation procedures and the first-stage F -test.

Table 2: Model estimates for the structure parameterab

Method estimate standard deviation

maximum likelihood (mle) 0.1878 0.0612

TSCML 0.1331 0.0592 CGLS 0.1328 0.0546 a 1st_{-stage F -test = 42.23.} b Exogeneity t − test = −5.44

The F -test is a measure of the strength of the instruments. Since its value is above 20, it suggests that the instruments are valid. This explains why the magnitude of the estimates is almost the same. The t-test proposed by Smith and Blundell (1986) rejects the hypothesis that other income is an exogenous variable.

I use robust tests to construct 95% confidence intervals for the structural parameter β. The points of the parameter space which do not reject the hypotheses H0: β = β0 at 5% belong to the

(19)

intersection between the 1− pvalue and the 95% horizontal lines delimits the confidence intervals. I

also report the confidence intervals obtained by the mle, TSCML and CGLS methods.

Fig. 3: Confidence intervals for the structural parameter

−0.15 0 0.15 0.3 0.45 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β 1 − p value ↑ mle TSCML→ S K M J K LR M CGLS

In this particular example, the SM-test generates an empty confidence interval, suggesting that

the overidentification condition is not satisfied. The KMand the CLRMproduce confidence intervals

very similar to each other, but different from the ones generated by the mle, TSCLM and CGLS methods.

The graph also shows that the KM-test is not minimized at the maximum likelihood estimator

but at the minimum distance estimator (3.13). Therefore it differs from the original K-test, which attains its minimum at the limited information maximum likelihood estimator.

6 Conclusion

I show how to obtain robust tests against weak instruments for censored models with endoge-nous explanatory variables. These tests depart from the minimum distance objective function. This approach has two advantages: firstly it requires less restrictive assumptions about the iden-tification of untested parameters than the K-test, and secondly, it is computationally simple to be implemented.

I carry out an empirical application of the robust test to build confidence intervals. It becomes evident that classical tests are jeopardized by the presence of weak instruments.

(20)

An-other possible extension is to use semi-parametric methods for the estimation of the unrestricted parameters.

(21)

References

Amemiya, T. (1979). The estimation of simultaneous-equation tobit model. International Economic

Review 20 (1), 169–181.

Anderson, T. W. and H. Rubin (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. The Annals of Mathematical Statistics 20 (1), 46–63.

Blundell, R. and I. Walker (1986). A life-cycle consistent empirical model of family labour supply using cross-section data. Review of Economic Studies 53, 539–558.

Blundell, R. W. and R. J. Smith (1989). Estimation in a class of simultaneous equation limited dependent variable models. Review of Economic Studies 56 (1), 37–58.

Gorieroux, C., A. Monfort, E. Renault, and A. Trognon (1987). Generalized residuals. Journal of

Econometrics 34, 5–32.

Kleibergen, F. (2002, September). Pivotal statistics for testing structural parameters in instrumen-tal variables regression. Econometrica 70 (5), 1781–1803.

Kleibergen, F. (2004, April). Generalizing weak instrument robust iv statistics towards multi-ple parameters, unrestricted covariance matrices and identification statistics. Working Paper, Department of Economics, Brown University.

Kleibergen, F. (2005). Testing parameters in gmm without assuming that they are identified.

Econometrica 73 (4), 1103–1124.

Lee, L.-F. (1981). Simultaneous equations models with discrete and censored dependent variables. In C. F. Manski and D. McFadden (Eds.), Structural Analysis of Discrete Data with Econometric

Applications, Chapter 9, pp. 346–364. MIT Press.

Lee, M.-J. (1995). Semi-parametric estimation of simultaneous equations with limited dependent variables: a case study of female labor supply. Journal of Applied Econometrics 10 (2), 187–200.

Magnusson, L. M. (2006). Robust test against weak instruments for limited dependent variable models. Working Paper, Deparment of Economics, Brown University.

Moreira, M. J. (2003). A conditional likelihood ratio test for structural models. Econometrica 71 (4), 1027–1048.

(22)

Newey, W. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. Engle and D. McFadden (Eds.), Handbook of Econometrics, Volume IV, Chapter 36, pp. 2111–2245. Elsevier.

Newey, W. K. (1987). Efficient estimation of limited dependent variable models with endogenous explanatory variables. Journal of Econometrics (36).

Olsen, R. J. (1978). Note on the uniqueness of the maximum likelihood estimator for the tobit model. Econometrica 46 (5), 1211–1215.

Powell, J. L. (1986). Symmetrically trimmed least squares estimation for tobit models.

Economet-rica 54 (6), 1435–1460.

Smith, R. J. and R. W. Blundell (1986). An exogeneity test for simultaneuos equation tobit model with an application to labor supply. Econometrica 54 (3), 679–685.

Staiger, D. and J. H. Stock (1997). Instrumental variables regression with weak instruments.

Econometrica 65 (3), 557–586.

Stock, J. H. and J. Wright (2000). Gmm with weak identification. Econometrica 68, 1055–1096.

Zivot, E., R. Startz, and C. R. Nelson (1998). Valid confidence intervals and inference in the presence of weak instruments. International Economic Review 39 (4), 1119–1144.

(23)

A

Proofs

A.1 Some results for the Endogenous Tobit Model A.1.1 The score as a moment restrition

The concentrated log-likelihood function for the endogenous Tobit model is :

ℓT β, α, σε2,Πz ∝ T X t=1 (1_{− d}t) ln 1_{− Φ} wtδ σε + dtln 1 σε φ yt− wtδ σε −T 2 ln (X−ZΠz)′(X−ZΠz) T (A.1)

The score function is defined in (2.5) and (2.6). The expected values of e(1)_t (β, η) and e(2)_t (β, η) conditional on Vt and Ztare:

Ehe(1)_t (β, η)i= Ehe(1)_t (β, η) Dt= 1 i Pr (Dt= 1) + E h e(1)_t (β, η) Dt= 0 i Pr (Dt= 0) = EhYt−Wtδ σε Dt= 1 i Φt− φt= 0 Ehe(2)_t (β, η)i= Ehe(2)_t (β, η) Dt= 1 i Pr (Dt= 1) + E h e(2)_t (β, η) Dt= 0 i Pr (Dt= 0) = φtW_σ_εtδ− Φt− 1₋ φt Φt Wtδ σε Φt = 0

A.1.2 The variance-covariance matrix for the unrestricted model

Similarly to equations (2.3) and (2.4), define the pseudo residuals:

ϕ(1)_t (πz, θ,Πz) = dt yt− stκ σε − (1 − dt) φt 1_{− Φ}t (A.2) ϕ(2)_t (πz, θ,Πz) = dt " yt− stκ σε 2 − 1 # + (1_{− d}t) stκ σε φt 1− Φt (A.3)

where stκ= ztπ+ vtγ, φt and Φtare, respectively, the normal density and cumulative distribution

functions evaluated at stκ

σε . The contribution of one observation to the score functions and to the

OLS moment restriction is:

gt(πz, θ,Πz) = h 1 σε ϕ (1) t Zt _σ1_ε ϕ(1)t Vt _2σ12 ε ϕ (2) t vec (Xt− ZtΠz)′(Im⊗ Zt) i

Define λ(ht) = _Φ(hφ(ht_t)₎, the inverse Mill’s ratio, evaluated at ht = s_σt_εκ. Using the information

equality, the variance-covariance matrix Ω and the expected value of the Jacobian G are, respec-tively: Ω = E        Z′ tZtι(1)t Zt′Vtι(1)t Zt′ι (2) t 0 V′ tZtι(1)t Vt′Vtι(1)t Vt′ι (2) t 0 Ztι(2)t Vtι(2)t ι (3) t 0 0 0 0 Σvv⊗ Zt′Zt        (A.4)

(24)

G= _{− E}        Z′ tXtι(1)t Zt′Vtι(1)t Zt′ι2t −γ0′ ⊗ Zt′Ztι(1)t V′ tXtι(1)t Vt′Vtι(1)t Vt′ι2t −γ0′ ⊗ Vt′Ztι(1)t Xtι(2)t Vtι(2)t ι (3) t −γ0′ ⊗ Ztι(2)t 0 0 0 Im⊗ Zt′Zt        (A.5) where: ι(1)_t = 1 σ2 ε Ehϕ(1)_t ϕ(1)_t _|Xt, Zt i = 1 σ2 ε Φt− (1 − Φt)λ′(−ht) = 1 σ2 ε {Φ t+ φt[λ(−ht)− ht]} (A.6a) ι(2)_t = 1 2σ3 ε Ehϕ(1)_t ϕ(2)_t _|Xt, Zt i = 1 2σ3 ε (1 − Φ t) λ(−ht) + λ′(−ht)ht = 1 2σ3 ε φt+ φth2t − φtλ(−ht)ht (A.6b) ι(3)_t = 1 4σ4 ε Ehϕ(2)_t ϕ(2)_t _|Xt, Zt i = 1 4σ4 ε 2Φt[1− λ(ht)ht] + (1− Φt)λ(−ht)− λ′(−ht)ht ht = 1 4σ4 ε 2Φt+ λ(−ht)φth2t − φtht− φ3t (A.6c)

One may show that _{∀h ∈ R, λ}′₍_{−h) < 0,}11 _λ₍_{−h) > h, 1 − λ (h) h − λ}2_{(h) > 0. Also, assuming}

0 < σε<+∞, there exist finite real numbers such that:

c₁ > ι(1)>0, c₂ > ι(2) >0 and c3 > ι(3)>0 (A.7) Define Ωπzθ = h Ωπzγ Ωπzσε2 i = Ω′_θπ_z and Ωθθ = " Ωγγ Ωγσ2 ε Ωσ2 εγ Ωσ2εσ2ε #

. From (A.4) and (A.5) one may find that:

G−1ΩG−1′=           Ωπzπz Ωπθ Ωθπz Ωθθ   −1 + γ′_Σ_vv_γ   (E [Z′ tZt])−1 0 0 0   γ′Σvv⊗   Ωπzπz Ωπzθ Ωθπz Ωθθ   −1  Ωπzπz Ωθπz  (E [Z_t′Zt])−1 Σvvγ⊗(E [Zt′Zt])−1 h ΩπzπzΩπzθ i   Ωπzπz Ωπzθ Ωθπz Ωθθ   −1 Σvv⊗ (E [Zt′Zt])−1         (A.8) 11 λ′ (−h) = −λ(−h) (−h + λ(−h)) = − φ (−h) Φ2_(−h)(−hΦ (−h) + φ (−h)) = − φ (−h) Φ2_(−h) R−_h −∞Φ(w)dw < 0. The inequal-ity holds because the integral of a strictly positive function is positive.

(25)

A.2 Proof of Lemma 1

Let ˆΠz be the ordinary least squares estimator which is derived from Z′(X− ZΠz) = 0. Since

E[Vt|Zt] = 0, E [Zt′Zt] is invertible and bounded, it follows that ˆΠ p

→ Π0. Let ˆπz, ˆγ and ˆσε2 be the

solution for the conditional maximum likelihood problem evaluated at ˆΠz:

max πz,γ,σε2 1 T ℓ˜ c T(πz, γ, σε2)≡ max πz,γ,σε2 1 T T X t=1 (1_{− d}t)lnΦ − ztπz+ (xt− ztΠˆz)γ σ2 ε !! +dtln 1 σε φ yt− ztπz− (xt− ztΠˆz)γ σε !! (A.9)

Olsen (1978) proved that the log-likelihood is concave under the reparametrization ξ1 = πσεz, ξ2 = γ σε

and ξ3 = _σ1_ε. Since the mapping (ξ1, ξ2, ξ3) → (πz, γ, σε) is bijective and differentiable, if ˆξ p

→ ψ0,

where ˆξ = arg max ℓc

T(ψ, ˆΠz), then (ˆπz,ˆγ,σˆε) p

→ (πz0, γ0, σ 2 ǫ0).

Let _Nξ0 and Mξ0 be two open neighborhoods of ξ0 on X, an open convex set, such that

Nξ0 ⊂ Mξ0. I want to prove that, as T → ∞, then P (ˆξ ∈ Nξ0)→ 1.

From concavity of the likelihood function on ξ, given that Πz is fixed, ℓT(ξ, Πz) p

→ ℓ0(ξ, Πz)

uniformly in any compact subset of X. The limiting function ℓ0(ξ, Πz) is also a concave function on

ξ.12 _{Assume that ℓ}

0 is uniquely maximized at (ξ0,Πz0) and define the compact set A = N c

ξ0∩ M

c ξ0.

By the continuity of ℓ0 there exists a ξ∗ which solves max

ξ_∈A ℓ0(ξ, Πz0).

Define_{B = A}c_{∩ M}c

ξ0. I claim that ∄ ξ∈ B such that ℓ0(ξ, Πz0) > ℓ0(ξ ∗_,_Π_z

0). By contradiction,

suppose that ∃˜ξ _{∈ B with ℓ}0(˜ξ,Πz0) ≥ ℓ0(ξ∗,Πz0). There is a line connecting ξ0 to ˜ξ, ς ∈ (0, 1)

and ξ′ _{∈ A such that ξ}′ = ς ˜ξ+ (1_{− ς)ξ}0. By concavity ℓ(ξ′,Πz0)≥ ςℓ(˜ξ,Πz0) + (1− ς)ℓ(ξ0,Πz0) >

ℓ(ξ∗_,_Π_z 0).

Set e = ℓ(ξ0,Πz0)− ℓ(ξ∗,Πz0) and define:

Using the above inequalities one may find that: e > ℓ0(ˆξ,Πz0)− ℓ0(ξ0,Πz0). Thence ∩ iC i T implies ˆ ξ_{∈ N}ξ0 or P (∩ iC i

T)≤ P (ˆξ ∈ Nξ0). Finally, one may show P (∩ iC

i

T)→ 1 as T → ∞.

The asymptotic normality of the TSCML estimator follows directly from Newey and McFadden (1994) theorem 3.1 with variance covariance matrix given by:

G−1ΩG−1′ (A.10)

where G−1ΩG−1′ is defined in (A.8).

12

(26)

A.3 Proof of Lemma 2

Define the selection matrix F as:

F =   Ik 0 0 0 0 Ikm  

Pre and pos-multiplying G−1ΩG−1′ by F results in:   Ω−1_π_z_π_z_.θ+ γ₀′Σvvγ0(E [Zt′Zt])−1 γ₀′Σvv⊗ (E [Zt′Zt])−1 Σvvγ0⊗ (E [Zt′Zt])−1 Σvv⊗ (E [Zt′Zt])−1   (A.11)

where Ωπzπz.θ= Ωπzπz − ΩπzθΩ−1θθΩθπz. Therefore, pre-multiplying (2.9) by F gives:

√ T   ˆ πz− πz vec( ˆΠz− Πz)   d −→ N     0 0  ,   Ω−1_π_z_π_z_.θ+ γ₀′Σvvγ0(E [Zt′Zt])−1 γ0′Σvv⊗ (E [Zt′Zt])−1 Σvvγ0⊗ (E [Zt′Zt])−1 Σvv⊗ (E [Zt′Zt])−1     (A.12)

Let ˆΩ, an estimator for the variance, be:

ˆ Ω = 1 T        Z′_Λˆ(1)_Z _Z′_Λˆ(1)_V_ˆ _Z′_Λˆ(2)_l ₀ ˆ V′_Λˆ(1)_Z _V_ˆ′_Λˆ(1)_V_ˆ _V_ˆ′_Λˆ(2)_l ₀ l′_Λˆ(2)_Z _l′_Λˆ(2)_V_ˆ _l′_Λˆ(3)_l ₀ 0 0 0 ΣˆV V ⊗ Z′Z        (A.13)

where, for i = 1, 2, 3, ˆΛ(i) _{= diag(ι}(i) 1 , . . . , ι

(i)

T ) evaluated at the TSCML estimator, ˆV = X− Z ˆΠz

and ˆΣV V = Vˆ ′_Vˆ

T_−k. Since the TSCML estimator is consistent and E

sup kΩ(πz, γ, σε,Πz)k

<+∞,13

ˆ

Ω−→ Ω, as T → +∞ by the law of large numbers combined with the continuous mapping theorem.p Consequently, ˆΩ−1_π

zπz.θ p

−→ Ω−1πzπz.θ.

A.4 Proof of Theorem 3.3

Define the random variables Q and G, where: √ T(ˆπz− ˆΠzβ0)−→ Gd √ T( ˆΠ¯β0− Πz0) d −→ Q

G and Q are independent normal distributions (see (3.7)). The proof is divided in three cases:

(i) The instruments are strong such that Πz0 = C:

13

(27)

When the instruments are strong, ˆΠ¯β0 p

−→ C which implies that: ˆ ¯ Π_β′0Ψˆ −1 β0 √ T(ˆπz− ˆΠzβ0)−→ Cd ′Ψβ−10 G d −→ N0, C′Ψ_β−1₀ C

Hence, the limited distribution of the KM-test is:

KM(β0) d −→ G′Ψ_β−1₀ C C′Ψ_β−1₀ C −1 C′Ψ_β−1₀ _G d −→ χ2(m)

(ii) The instruments are weak such that Πz0 = C √

T:

When the instruments are weak we have: √ T ˆΠ¯β0 d −→ Q + C T ˆΠ¯_β′0Ψˆ −1 β0 ˆ ¯ Πβ0 d −→ (Q + C)′Ψ−1_β₀(_{Q + C)} The following conditional distribution:

h

(_{Q + C)}′Ψ_β−1₀ (_{Q + C)}i−

1 2

(_{Q + C)}′Ψ_β−1₀ _{G | Q} (A.14)

isN (0, Im), which does not depend onQ. Therefore, the unconditional distribution also follows a

N (0, Im) and the limited distribution of the KM-test is:

KM(β0)−→ χd 2(m) (A.15)

(iii) The instruments are irrelevant such that Πz0 = 0

In case of irrelevant instruments we have: √ T ˆΠ¯β0 d −→ Q T ˆΠ¯_β′0Ψˆ −1 β0 ˆ ¯ Πβ0 d −→ Q′Ψ−1_β₀Q

One may derive the conditional distribution:

h

Q′Ψ−1_β₀Qi−

1 2

Q′Ψ−1_β₀G | Q ≡ N (0, Im) (A.16)

which does not depend on _{Q. Therefore, using the same argument presented above, the limited} distribution of the KM-test is:

KM(β0) d

(28)

B

Data Description

The data set was extracted from 1987 wave of Michigan Panel Study of Income Dynamics PSID. We rescale the variables in order to match the definition used by Blundell and Smith (1989)

Table 3: Definition of the variables, 3382 observations, 895 left-censored, 1987 US PSID

Variable definition

hf wife working hours per weak

wother the other household’s income in $1000

af a Age-40₁₀ a2_f (Age−40) 2 100 edf b (education-8) ed2_f (education_{− 8)}2

C1 1 for any child between ages 0 to 5 and 0 otherwise C2 1 for any child between ages 6 to 13 and 0 otherwise C3 1 for any child between ages 14 to 17 and 0 otherwise

Race 1 if non-white and 0 otherwise

Tenure 1 1 if home is owned by the household and 0 otherwise Tenure 2 1 if home is on mortgage and 0 otherwise

Husband occ 1 1 if husband is manager or professional and 0 otherwise

Husband occ 2 1 if husband is sales worker or clerical or craftsman and 0 otherwise Husband occ 3 1 if husband is farm-related worker and 0 otherwise

un local unemployment rate in %

a

age of the wife in years, b

(29)

C

Power Curves

Fig. 4: Power curves for Robust, Wald and LR tests.

−2.50 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ z=20, ρ=0 S K M J K M KJ M LR M W LR −2.50 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ z=10, ρ=0 S K M J K M KJ M LR M W LR −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ z=3, ρ=0 S K M J K M KJ M LR M W LR −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ z=1, ρ=0 S K M J K M KJ M LR M W LR −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β β Rejection Probability λz=0.01, ρ=0 S K_M J_K M KJ_M LR_M W LR

(30)

Fig. 5: Power curves for Robust, Wald and LR tests. −2.50 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β₀ Rejection Probability λ_z=20, ρ=0.5 S K M J K M KJ M LR M W LR −2.50 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β₀ Rejection Probability λ_z=10, ρ=0.5 S K M J K M KJ M LR M W LR −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ_z=3, ρ=0.5 S K M J K M KJ M LR_M W LR −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β 0 Rejection Probability λ_z=1, ρ=0.5 S K_M J K M KJ M LR_M W LR −5 −4 −3 −2 −1 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 β minus β0 Rejection Probability λz=0.01, ρ=0.5 S K_M J_K M KJ M LR_M W LR