Three Essays in Econometric Methods

(1)

ABSTRACT

RIQUELME WON, JUAN ANDRES. Three Essays in Econometric Methods. (Under the direction of Mehmet Caner.)

In this document I present three essays that comprise my doctoral dissertation. These essays

are positioned in the area of inference with instrumental variables setups and model and moment

selection. In the first essay the problem of empirical inference in instrumental variables (iv)

setups where the instruments do that do not perfectly satisfy the exclusion restriction is studied.

The fractionally resampled Anderson–Rubin Berkowitz et al (2012) [Journal of Econometrics,

Vol. 166, pp. 255–266 (2012)] is applied under different scenarios of correlation between the

instruments and the structural variables and used to find empirical bounds to the subsampling

blocks that allows to perform valid inference over the structural parameters. In the second paper

the relative performance of several moment selection criteria currently used in the literature

is compared in terms of (1) how accurate is the moment selection and (2) how good is the

post–estimation performance in term of bias and root mean square errors. The analysis is

performed under different correlation structures in the presence of fixed and diverging number

of moments. In the third paper an alternative estimator to the one in Lee et al (2014) [Journal

of the Royal Statistical Society: Series B, forthcoming] is proposed. This estimator can find the

variables that switch the regimes consistently in a linear setup. The paper starts by showing

that the `∞ bound can be represented as a multiple of the lasso tuning parameter and then this bound is used in a thresholded lasso-type estimator. This estimator achieves a better

(2)

Three Essays in Econometric Methods

by

Juan Andres Riquelme Won

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

Economics

Raleigh, North Carolina

2015

APPROVED BY:

Barry Goodwin Robert Hammond

Roger von Haefen Mehmet Caner

(3)

DEDICATION

(4)

BIOGRAPHY

Juan Andres Riquelme Wonwas born in Chile in 1979. He studied at the University of Concepci´on, where he earned a Bachelor Degree in Commercial Engineering with specialization

in Economics, and a Master of Science in Natural Resources and Environmental Economics.

After working for four years as teacher and researcher he started his doctoral studies at North

Carolina State University, USA, and earned his Ph.D in Economics in 2015, specializing in

(5)

ACKNOWLEDGEMENTS

I am deeply grateful to my advisor Dr. Mehmet Caner for his constant support, guidance

and encouragement. I also acknowledge the financial support provided by conicyt(Chilean

(6)

TABLE OF CONTENTS

LIST OF TABLES . . . vii

LIST OF FIGURES . . . ix

CHAPTER 1 VALID TEST WHEN INSTRUMENTAL VARIABLES DO NOT PER-FECTLY SATISFY THE EXCLUSION RESTRICTION . . . 1

1.1 Preliminaries . . . 1

1.2 Introduction . . . 4

1.3 Inferences When Instruments are not Perfectly Exogenous . . . 5

1.4 The Fractionally Resampled artest . . . 8

1.5 The farcommand . . . 10

1.5.1 Syntax . . . 10

1.5.2 Description . . . 10

1.5.3 Options . . . 10

1.5.4 Saved Results . . . 11

1.5.5 Example . . . 11

1.6 Simulations . . . 21

1.7 Conclusion . . . 24

1.8 References . . . 28

CHAPTER 2 MOMENT AND IV SELECTION APPROACHES: A COMPARATIVE SIMULATION STUDY . . . 35

2.1 Preliminaries . . . 35

2.3 Theoretical Framework . . . 39

2.3.1 Moment Selection Methods . . . 39

2.3.2 Parameter Estimation . . . 43

2.4 Monte Carlo Simulations . . . 44

2.5 Results . . . 46

2.5.1 Model Selection . . . 50

2.5.2 Post Selection Performance . . . 51

2.7 Additional Table . . . 59

2.8 References . . . 60

CHAPTER 3 THRESHOLDED LASSO: VARIABLE SELECTION IN HIGH DIMEN-SIONS . . . 62

3.2 Weighted Lasso for Threshold Regression . . . 63

3.3 Performance Bounds . . . 65

3.4 Thresholded Weighted Lasso . . . 70

3.4.1 Assumptions in Theorem 3 of Lee et al. (2014) . . . 71

3.5 Monte Carlo Simulations . . . 73

(7)

APPENDICES . . . 85

Appendix A A Note on the Use of lars andglmnet . . . 86

A.1 Introduction . . . 86

A.2 The lasso problem . . . 86

A.3 Customizing the λgrid for speed improvement . . . 95

A.4 Conclusion . . . 103

A.5 References . . . 104

Appendix B Compact Storage for High–dimensional Models . . . 105

B.1 Introduction . . . 105

B.2 The key–lock method . . . 105

B.3 Conclusion . . . 111

B.4 References . . . 113

(8)

LIST OF TABLES

Table 1.1: Results Saved by thefarcommand . . . 12

Table 1.2: Fractionally resampled Anderson and Rubin test. Default Options. . . 13

Table 1.3: fartest after using the reps()and kappa()options. . . 14

Table 1.4: fartest. Use of the theta() option. . . 14

Table 1.5: fartest. Use of the ci andgrid()options. . . 16

Table 1.6: Ther(ci) matrix. . . 17

Table 1.7: fartest. Use of the kappa() andreps()options. . . 18

Table 1.8: fartest. Use of the ci andgrid()options. . . 18

Table 1.9: Ther(ci) matrix. . . 19

Table 1.10: Size of thefar test at θ0= 0. . . 23

Table 1.11: Power of thefartest at θ0= 0, covariance setup 1. . . 25

Table 2.1: Summary of the Performance of the Moment Selection Techniques . . . 48

Table 2.2: Summary of the Performance of the Post Selection Techniques . . . 49

Table 2.3: Probabilities: Moment Selection Criteria. Setup 1 . . . 52

Table 2.4: Probabilities: Moment Selection Criteria. Setup 2 . . . 53

Table 2.5: Monte Carlo results for ˆθ. Setup 1,σ_zz2 = 0.5·Iq. . . 55

Table 2.6: Monte Carlo results for ˆθ. Setup 1,σ_zz2 = 1·Iq. . . 56

Table 2.7: Monte Carlo results for ˆθ. Setup 2,σ2_zz = 0.5·Iq. . . 57

Table 2.8: Monte Carlo results for ˆθ. Setup 2,σ_zz2 = 1·Iq. . . 58

Table 2.9: R2 of the First Stage Estimation for Each Method . . . 59

Table 3.1: Simulation Results forM = 50, ρ= 0,n= 200 . . . 80

Table 3.2: Lasso and Thresholded Lasso,ρ= 0, Increasing Number of Observations . 81 Table 3.3: Lasso and Thresholded Lasso,ρ= 0, Increasing Number of Variables . . . 82

Table 3.4: Lasso and Thresholded Lasso,ρ= 0,D =Ip . . . 83

Table 3.5: Lasso and Thresholded Lasso,ρ= 0,q =X(1). . . 84

Table A.1: Parameters estimates. . . 89

Table A.2: Results forλ= 10. . . 90

Table A.3: Results after scaling. . . 92

Table A.4: Results after scaling using theexactoption. . . 93

Table A.5: Results for fixed lambda. . . 94

Table A.6: Scaling factors forλfor some common lasso problems . . . 95

Table A.7: Results with the modified grid. . . 97

Table A.8: Results from using the custom grid. . . 99

Table A.9: Comparison of the three proposed approaches. . . 101

Table B.1: Output from thebincomb() function. . . 107

Table C.1: Simulation Results forM = 50, ρ= 0.3,n= 200 . . . 115

Table C.2: Lasso and Thresholded Lasso,ρ= 0.3, Increasing Number of Observations 116 Table C.3: Lasso and Thresholded Lasso,ρ= 0.3, Increasing Number of Variables . . 117

(9)

(10)

LIST OF FIGURES

Figure 1.1: Code line using thefarcommand syntax. . . 13

Figure 1.2: Code to use thereps()and kappa() options. . . 14

Figure 1.3: Code to use thetheta() option. . . 14

Figure 1.4: Code to use theci() option. . . 15

Figure 1.5: Code to use thegrid()option. . . 15

Figure 1.6: Code to call ther(ci)matrix. . . 16

Figure 1.7: Code to modify theκ value. . . 17

Figure 1.10: Code to get the upper limit of the 95% confidence interval. . . 20

Figure 1.11: Source code of thefarcommand . . . 30

Figure A.1: Loading of thelars and glmnetpackages. . . 87

Figure A.2: Parameter setup. . . 88

Figure A.3: Data generation. . . 88

Figure A.4: Creation of thelarsand glmnetobjects. . . 88

Figure A.5: Code to get the lasso parameters. . . 89

Figure A.6: Suppression of the Standardization. . . 90

Figure A.7: Code to scale theglmnetestimator. . . 91

Figure A.8: Use of theexactoption. . . 93

Figure A.9: Scaling with fixed lambda. . . 94

Figure A.10:The default grid. . . 96

Figure A.11:Modified grid. . . 96

Figure A.12:Estimation with the modified grid. . . 96

Figure A.13:Modification of theglmnetdefault grid. . . 98

Figure A.14:Custom grid estimation . . . 98

Figure A.15:Themethod1() function. . . 99

Figure A.18:Code to compare the three proposed approaches. . . 101

Figure A.19:Execution times for each method, 1000 repetitions . . . 102

Figure B.1: Thebincomb() function. . . 106

Figure B.2: Call to thebincomb() fuunction. . . 106

Figure B.3: Size of the combination matrix. . . 107

Figure B.4: Thegencomb() function. . . 108

Figure B.5: List after using thegencomb function. . . 109

Figure B.6: Size of the combinations list. . . 109

Figure B.7: Theindexes function. . . 110

Figure B.8: Output of theindexes function. . . 110

Figure B.9: Another output of theindexes function. . . 110

Figure B.10: Therev.indexesfunction. . . 111

(11)

C h a p t e r 1 : Va l i d t e s t s w h e n i n s t r u m e n ta l va r i a b l e s d o n o t

p e r f e c t ly s at i s f y t h e e x c l u s i o n r e s t r i c t i o n

1.1 Preliminaries

This chapter is intended for the applied researcher that is not sure about the validity of their

instruments in an instrumental variables estimation setup. In particular, we consider situations

in which the instruments do not perfectly satisfy the exclusion restriction, but come close to it.

This situation is common in the applied literature when the instrumental variables approach is

used. For example Acemoglu et al. (2001) use early settler mortality data from as far back as

the fifteenth and sixteenth centuries as an instrument for contemporary institutions, Angrist

(1990) use draft lotteries as instrument for the veteran status to study lifetime earnings, and

Card (1995) uses a variable measuring whether a man grew up in the vicinity of a four–year

college as instrument for estimating the returns to schooling. In each case, however, there are

good reasons to believe that the exclusion restriction is not necessarily perfect (for technical

details see Wooldridge (2002), pp. 89–90.)

In this chapter we give a non–technical description of a modification of the Anderson–Rubin

(1949) test proposed by Berkowitz, Caner and Fang (2012) that accounts for violations of the

(12)

this test is that in small sample sizes, the resampling blocks can be modified to obtain valid

critical values. We work under the following ivsetup:

y =W B+Y θ0+u

Y =WΓ +ZΠ +V

whereyis an×1 vector of outcomes,nis the sample size,Y is an×mmatrix of endogenous

variables andZ is an×kmatrix of instruments, the magnitude of the block size modification is

data–driven and it is assumed that the violation of the exclusion restriction has a local–to–zero

form:

E[Zjuj] =

C

√

n

where C is a k×1 vector (one component for each instrument), and each element of C,

denoted by Cj (j = 1, . . . , k) is a constant. The sign of each Cj depends on the sign of the

covariance between thej-th instrument and the error term. The proof in Berkowitz, Caner and

Fang (2012) require only that Cj belong to a compact subset ofRk, but their specific values are unknown. This poses a challenge to the empirical researcher since the modification to the

resampling block sizes are depending on the value ofCj and can lead to different results when

testing the exclusion restriction.

This chapter has three main objectives:

1. To implement an estimation tool for the fartest for applied researchers.

2. To analyze the size and power properties of the far test under different covariance

structures and degrees of violation of the exclusion restriction.

3. Make use of these results to find numerical bounds for the vector C.

(13)

Using this command, extensive simulations are performed to achieve the objective number two.

These simulations provide the empirical bounds for Cj and the resampling block sizes. Our

results for small samples exhibit good size and power combinations when we select approximately

the 20–25% of the total sample for the resampling block sizes. The use of this percentages is

proposed as a rule of thumb.

The source code is provided in the last section of the chapter and can be installed directly

(14)

1.2 Introduction

Instrumental variables methods are used in economics to study major questions including the

impact of institutions on economic performance and the returns to schooling. Valid instruments

must be relevant and exogenous. In the case of relevance, there has been substantial progress

made in understanding the asymptotic properties of weak instruments. Stock and Wright (2000)

show how the Anderson and Rubin (1949) test (for herein, denoted theartest) can be used to

draw valid inferences when the instruments are weak.

In the case of exogeneity, however, there is a growing concern among researchers about the

difficulty of picking instruments that perfectly satisfy the exclusion restriction. For example, in

an influential study of the impact of institutions on long run growth, Acemoglu et al. (2001)

use early settler mortality data from as far back as the fifteenth and sixteenth centuries as an

instrument for contemporary institutions∗. Glaeser, La Porta, Lopez-de-Silanes and Shleifer (2004) argue that the early settlers brought their attitudes about education to their colonies,

affecting the long run growth through their influence on the human capital accumulation process.

In a similar manner, draft lotteries (Angrist, 1990) and whether a man grew up in the vicinity

of a four-year college (Card, 1995) are influential instruments for estimating the returns to

schooling. In each case, however, there are good reasons to believe that the exclusion restriction

is not necessarily perfect (See Wooldridge (2002), pp. 89-90).

This chapter contains a non-technical summary of the new test statistic derived in Berkowitz,

Caner and Fang (2012) for instruments that come “close” to satisfying the exclusion restriction

but do not satisfy it perfectly. In our analysis, we use the ar test because it is robust to

weak identification. However, because the ar test uses the overly strong assumption that

an instrument is perfectly exogenous it can have bad small sample properties (Caner, 2010;

∗

(15)

Guggenberger, 2012). The fractionally resampled ar(far) test modifies the artest based on

results from Wu (1990, section 2) accounting for the extent to which an instrument violates the

orthogonality condition and it is not oversized in large samples.

The rest of the chapter is as follows: In Section 1.3 we describe the artest in a setup that

allows for instruments that do not perfectly satisfy the orthogonality condition. Section 1.4

summarizes the far test and shows how the block size for the far test can be adjusted to

improve the test size and power. Section 1.5 describes the syntax and output of our user-written

stataprogram and discusses the different available options using an example from Acemoglu et al. (2001, 2011). Section 1.6 presents the results of size and power simulations under different

levels of violation of the orthogonality condition. Section 1.7 concludes.

1.3 Inferences When Instruments are not Perfectly Exogenous

Consider the following setup:

y=W B+Y θ0+u (1.1)

Y =WΓ +ZΠ +V (1.2)

In this system of equations, y is an×1 vector of outcomes, n is the sample size, Y is a

n×m matrix of endogenous variables andZ is an×k matrix of instruments. For example,y

can be long rungnp per capita, ndenotes the number of countries that are former colonies and

Y is a set of contemporary institutions. In Acemoglu and Johnson (2005)m= 2, and includes

property rights and contract enforcement. For simplicity, and without loss of generality, we

consider the case where Y is a n×1 vector of property rights institutions.

There are a host of exogenous covariates in W, which is a n×l matrix. For example, if

l = 3, then W could include gnp, human capital and temperature in 1960. The coefficients

(16)

W from the system. By using the projection matrixP =W(W0W)−1W0 we define:

yW =y−P y

ZW =Z−P Z

YW =Y −P Y

And the system of equations in (1.1) and (1.2) can be written as:

yW =YWθ0+uW (1.3)

YW =ZWΠ +VW (1.4)

thus, the vector W of covariates can be ignored.

In Acemoglu et al. (2001), the parameter of interest θ0 in equation (1.3) is the impact of

institutions on long run growth. Because long rungnpper capita also influences institutions and

there are potentially omitted variables in the residual uW that influence both institutions and gdpper capita, the variable Y is endogenous. Technically this means thatcov(YW, uW)6= 0. In

order to correct for the endogeneity of institutions, an instrument or a set of instruments,ZW is

used as an exogenous source of variation for institutions. The instruments satisfy the condition:

E[ZW iVW i0 ] = 0, i= 1, . . . , n (1.5)

Acemoglu et al. (2001) use early settler mortality data from hundreds of years ago as an

instrument for contemporary institutions.

There is large literature for drawing inferences when instruments are weak but still sufficiently

relevant (Stock and Wright, 2000) and there are now commands for implementing valid tests in

stata(see Moreira and Poi, 2003). Here we consider tests for instruments that are not perfectly exogenous, in which case the standard t-statistic and thear test for testingH0 :θ=θ0 have

(17)

as in equation (1.5). More realistically, a set of instruments may exhibit near exogeneity as

follows:

E[ZW iuW i] =

C

√

n (1.6)

Equation (1.6) allows for a slight covariance between the instruments and the error term.

C is a k×1 vector (one component for each instrument), and each element ofC, denoted by

Cj (j = 1, . . . , k) is a constant. The sign of each Cj depends on the sign of the covariance

between thej-th instrument and the error term. For example, when k= 2, then we can have

C= (−1,2)0. For simplicity and without loss of generality we assume that the upper and lower bound of the set containing theCj values are the same for all the instruments. Further technical

details are described in the section 2 of Berkowitz, Caner and Fang (2012).

In order test the null hypothesis H0 :θ =θ0, the ar-test is preferred for several reasons.

First, it can be used when the instruments are weak. Moreover, Guggenberger (2012) shows

that the ar-test is the best choice for limiting size distortion when the exclusion restriction

is slightly violated. Caner (2010) also shows that the ar-test is slightly oversized in a many

instruments framework.

Let then×1 vector of residuals of the structural equation under the null be denoteduW(θ0):

uW(θ0) =yW −YWθ0 (1.7)

Then, the ar test for testingH0:θ=θ0, assumes thatC= 0 (i.e., the instruments perfectly

satisfy the exclusion restriction). The test statistic is given by:

AR(θ0) =n×S¯n0(θ0) ˆΩ−1S¯n(θ0) (1.8)

where ˆΩ = 1_nPn

i=1ZW iZ

0

W i(uW(θ0))2 and ¯Sn(θ0) =

h_Z0

WuW(θ0) n

i

(18)

k×1 vector of estimated covariances between the instruments and the residuals in the structural

equation under the null hypothesisH0 :θ=θ0.

The limiting distribution of the ar test is central chi square with k degrees of freedom.

Berkowitz, Caner and Fang (2008, 2012) show that the ar–test over–rejects the null when the

orthogonality condition is not perfectly satisfied. Moreover, in small samples the test can be

oversized even when the correlation between the instruments and structural error is close to zero.

This size distortion gets worse as the correlation between an instrument and the structural error

terms gets stronger. This problem arises because thear–test assumes that C= 0 in equation

(1.6). In the next section we explain how the fartest accounts forC 6= 0 and, thus, allow the

researcher to draw valid, but conservative inferences.

1.4 The Fractionally Resampled

_ar

test

The far test uses Wu’s (1990) jackknife histogram estimator to recover the limits of the

population mean of θ by taking a subset of sizeb from then observations in the full sample.

There are n_b

blocks of size b with equal probability of being selected, which are drawn via

simple random sampling without replacement. To test the null hypothesisH0:θ=θ0, we need

to estimateZ_W0 uW(θ0). Following Berkowitz, Caner and Fang (2012) we use the subscript (*) to label the resampled estimates. Using this notation thefar test can be written as:

F AR(θ0) =

bS¯_b0(θ0) ˆΩ−1S¯b(θ0)

(1−f) (1.9)

where ¯Sb(θ0) = Pb

i=1Ziui

b andf is the fraction of the sample that generates the block of size

b.∗ Note that ˆΩ is obtained from the full sample and replaced by (1−f) ˆΩ in each iteration. From Theorem 1 in Berkowitz, Caner and Fang (2012), pp. 258, under suitable assumptions the

statistic Jb(t) =P∗(F AR(θ0)≤t), whereP∗ stands for the resampled probability, converges to ∗

(19)

φmf(t), the cumulative distribution of

1 +

√

f

√

1−f

2

χ2_k,nc (1.10)

where χ2_k,nc is the non-central χ2 with k degrees of freedom and noncentrality parameter nc = ₁₊₂√1

f√1−f CΩ−1C

2 . If half of the sample is resampled, f = 1/2 and the limit in (1.10) becomes:

4χ2_k+ 4C0Ω−1L+C0Ω−1C (1.11)

where L≡N(0,1) whereas the artest limit is

χ2_k+ 2C0Ω−1L+C0Ω−1C

Equation (1.11) is used for testing H0 : θ = θ0 when C 6= 0 and corrects for the size

distortions obtained in the standard artest. This version of thefar test is very conservative,

especially in small samples. To correct for this, Theorem 1 in Berkowitz, Caner and Fang (2012),

pp. 258, shows that the resampled fractionf can be modified:

fn= 1/2−κn (1.12)

Whereκn>0 is a data driven deterministic sequence converging to 0. In practice,κn=κ/ √

n

is used. For example if n= 100 andκ= 2.5 thenκn= 2.5/ √

100 = 0.25 andfn= 0.25, so each

resampling consists of 25 observations. κn= 2.5/ √

nprovides good power in our simulations,

and κn = 3/ √

n is recommended when the researcher is confident that that the instrument

comes close to perfectly satisfying the exclusion restriction.

Our user-written farcommand takes advantage of the flexibility and fast execution of the

(20)

The command is introduced in the next section.

1.5 The

far

command

1.5.1 Syntax

far depvar

varlist 1

(varlist2 = varlist iv)if

in

, reps(#) kappa(#) theta(numlist1) ci

grid(numlist2) level(#)

1.5.2 Description

The farcommand performs the Fractionally Resampled Anderson Rubin test

(Berkowitz, Caner and Fang, 2012) for the joint significance of the endogenous regressors in an

instrumental variables regression ofdepvar using the optional controls invarlist1, the endogenous

regressors invarlist2 and the instrumental variables invarlist iv.

1.5.3 Options

The following options are provided:

reps(#)specifies the number of repetitions of the resampling procedure. A large number of repetitions is necessary for the results in the section 1.4 to be valid. The default value is

reps(10000) and it gives fast and reliable estimates in small samples (n < 100). If the number of repetitions is not large enough, the fartest p-values may vary.

kappa(#)specifies the value of the κ constant. Note thatκn=κ/ √

nin equation (1.12). Any

positive real number can be used. The default value is kappa(3)(see the simulations section for the justification of the selected default value).

(21)

is not specified the farcommand will perform a significance test (all the values in numlist1 will be set as zero). By implementing this option the far test can be inverted to find

confidence intervals for θ0.

cienables the user to test for a grid of different values ofθ0 and search for the (1−α)% confidence interval for the true scalarθ. The significance level and the grid can be customized by using

the options level(#)andgrid(numlist2). This option is available when there is only one endogenous variable.

level(#)is the significance level for the test in the grid search. The default value islevel(95).

grid(numlist2)specifies the grid for the values ofθ0 to be tested. numlist2 consists of three elements: the minimum level, the maximum level and the increments of the grid. The default

values aregrid(-30, 30, 0.01).

1.5.4 Saved Results

As an r-class command, farstores the results in the Table 1.1: We use an example to illustrate the use of the farcommand.

1.5.5 Example

Acemoglu, Johnson and Robinson (2001) use two stage least squares methods to estimate

the effect of institutions on long run economic growth. Their baseline data set consists of 64

countries that are former European colonies. They use the log of per-capita gdp using ppp

(purchasing power corrected) prices (logpgp95) as the measure of long run growth, an index of

protection against expropriation during 1985-95 (avexpr) as the measure of institutions and

the log early settler mortality of colonizers (logem4) as the instrument for institutions. The

fundamental identifying assumption then is that early settler mortality influences long term

growth exclusively through the quality of contemporary institutions (see equation (1.5)).

(22)

Table 1.1: Results Saved by the farcommand

Scalars

r(ar) Full samplearStatistic r(arp) Full samplep-value

r(farp) arp-value r(reps) Resampling repetitions

r(kappa) The constantκ r(k) Number of instruments

r(l) Number of controls r(m) Number of endogenous

r(n) Number of observations variables

Macros

r(title) Title in the output r(cmdline) Executed command line

r(depvar) Dependent variable r(exogenous) List of controls

r(endogenous) List of endogenous r(instruments) List of instruments

variables r(grid) Grid values

Matrices

r(theta) Endogenous parameters tested

r(ci) Fractionally resampled p-values for the parame-ters in the grid

robustness checks is the incidence of malaria in 1994 (malfal94). There are two missing

values for this variable, which reduces the sample size to 62. This control is critical for their

exclusion restriction because it offsets the potential impact of early settler mortality through

the contemporary disease environment. However, even after controlling for the contemporary

disease environment, there are still reasons to argue that the exclusion restriction that Acemoglu,

Johnson and Robinson (2001) employ is not perfect (see, for example, Glaeser et al. 2004). Thus,

we relax the strict exclusion restriction in equation (1.5) and allow for the early settler mortality

instrument to exhibit near exogeneity as in equation (1.6). We compare the arandfartest to

examine how the potential correlation between the instrument and structural error will affect

inference∗. In the next 2 command lines we load the local data filefardata.dtaand call the ∗

Acemoglu et al. (2011) point out that the inclusion of the variablemalfal94 is “highly problematic” because the current prevalence of malaria is endogenous. In our example we includemalfal94 to show how ourfar

(23)

farcommand for the specified iv regression∗:

. use fardata, clear

. far logpgp95 malfal94 (avexp = logem4)

Figure 1.1: Code line using thefarcommand syntax.

Table 1.2: Fractionally resampled Anderson and Rubin test. Default Options.

Full sample Full sample FAR

statistic p-value p-value reps N

AR-Test 5.5421 0.0186 0.1462 10000 62

The output in the Table 1.2 displays the full sample ar statistic, the full sample and

the fractionally resampled p-values, the number of resampling repetitions and the number of

observations. In this case under the full samplear test, the hypothesis H0 :θ0= 0 is rejected

at 5% (with ap-value of 1.86%), but thefar test does not reject it. This is consistent with the

result in equation 1.11 that shows that thefar is a more conservative test.

To show other available options of thefar command we perform the same hypothesis test, but this time we increased the number of repetitions to 100,000 and set κ= 2. Note that the

null is not rejected at the 15% under thefar test after decreasing κ (see Table 1.3):

∗

(24)

. far logpgp95 malfal94 (avexp = logem4), reps(100000) kappa(2)

Figure 1.2: Code to use the reps()andkappa() options.

Table 1.3: far test after using the reps()andkappa() options.

AR-Test 5.5421 0.0186 0.1505 100000 62

To test if the θ0 parameter is equal to (say) 3, use:

. far logpgp95 malfal94 (avexp = logem4), theta(3)

Figure 1.3: Code to use the theta() option.

Table 1.4: fartest. Use of the theta()option.

(25)

Note that the p-value of the far test increases when testingH0 :θ0= 3. We have rejected

that θ0 is equal to 0 and 3 already. To look for theθ0 values for which the null hypothesis is not

rejected at some fixed α significance level we can perform a grid search. The implementation of

the grid search is done by using theci option. To test the null under the default grid∗ simply use

. far logpgp malfal94 (avexp = logem4), ci

(output omitted)

Figure 1.4: Code to use the ci() option.

We are not presenting the default grid here due to its extension.† The user can list the grid stored in ther(ci)matrix to inspect it. It is enough to say that all thefarp-values are greater than 0.05, thus the 95% confidence interval forθ0 obtained from this search is [−∞,+∞]. A

portion of the default grid can be displayed using the lines in the Figures 1.5 and 1.6.

. far logpgp95 malfal94 (avexp = logem4), ci grid(-1,1,0.1)

Figure 1.5: Code to use thegrid()option.

∗

This is equivalent to execute:

. far logpgp malfal94 (avexp = logem4), ci grid(-30, 30, 0.01) level(95) †

(26)

Table 1.5: far test. Use of the ciand grid()options.

AR-Test 5.5421 0.0186 0.1495 10000 62

. matrix list r(ci)

Figure 1.6: Code to call ther(ci) matrix.

The first column of the r(ci)matrix in the Table 1.6 contains the grid ofθ0 values defined by the grid(numlist2)option. The second column corresponds to thefartest p-values at each differentθ0 values. The third column contains a dummy variable that takes the value of 1 if the

corresponding θ0 is included in the confidence interval defined by theleveloption (this occurs if thep-value in the column 2 is greater than the critical α level).

The default confidence level corresponds to anα level of 5%, therefore the elements in the

third column will be one if the corresponding far p-value is greater than 0.05. In the next

example we derive a bounded confidence interval. In the light of the debate between Albouy

(2008) and Acemoglu, Johnson and Robinson (2001), Acemoglu, Johnson and Robinson (2011)

recommend capping the settler mortality at 250 per 1,000 per annum. We can generate a

transformed variable, estimate thearandfartest and perform the grid search in two command

lines. We increased the number of resampling repetitions to 100,000 to improve the precision of

the estimated interval and setκ= 3.1 to show the full usage of the grid search:

(27)

Table 1.6: Ther(ci) matrix.

theta FAR-p test

r1 -1 .2191 1

r2 -.9 .2166 1

r3 -.8 .2095 1

r4 -.7 .205 1

r5 -.6 .1967 1

r6 -.5 .1921 1

r7 -.4 .1791 1

r8 -.3 .1782 1

r9 -.2 .1706 1

r10 -.1 .1621 1

r11 0 .1536 1

r12 .1 .146 1

r13 .2 .1437 1

r14 .3 .1617 1

r15 .4 .2446 1

r16 .5 .3633 1

r17 .6 .6744 1

r18 .7 .9634 1

r19 .8 .7547 1

r20 .9 .6336 1

r21 1 .5617 1

. gen malaria250 = min(malfal94, 0.250) if malfal != . (2 missing values generated)

. far logpgp95 malaria250 (avexp = logem4), kappa(3.1) reps(100000)

(28)

Table 1.7: far test. Use of thekappa() and reps()options.

AR-Test 9.2185 0.0024 0.0180 100000 62

search gives a 95% confidence interval forθ0 of [0.34; 4.39]. Acemoglu et al. (2011) obtained the

confidence interval [0.27; 0.95] using the ar test and including other covariates. Ours is more

conservative, but it does not suffer the small sample problems discussed in Section 1.3.

The lower limit of the confidence interval can be obtained using the following command line:

. far logpgp95 malaria250 (avexp = logem4), kappa(3.1) reps(10000) ci > grid(.3,.5,.01)

Table 1.8: far test. Use of the ciand grid()options.

(29)

. matrix list r(ci)

Table 1.9: Ther(ci) matrix.

theta FAR-p test

r1 .3 .0402 0 r2 .31 .0397 0 r3 .32 .0426 0 r4 .33 .0464 0 r5 .34 .053 1 r6 .35 .054 1 r7 .36 .0595 1 r8 .37 .0694 1 r9 .38 .07 1 r10 .39 .0892 1 r11 .4 .0939 1 r12 .41 .0974 1 r13 .42 .1123 1 r14 .43 .1257 1 r15 .44 .1459 1 r16 .45 .1592 1 r17 .46 .1761 1 r18 .47 .2027 1 r19 .48 .2243 1 r20 .49 .2423 1 r21 .5 .27 1

By inspecting the grid in the Table 1.9, it is easy to see that the lower limit of the interval is

(30)

To make the dummy in the third column take the value 1 based on the two decimal places

roundedfarp-values, the user must set the confidence level to 95.5. In this example the rounded

lower limit is 0.33.

In a similar manner, the upper limit can be obtained by using:

. far logpgp malaria250 (avexp=logem4), reps(100000) ci kappa(3.1) > grid(4.3,4.5,0.01)

Figure 1.10: Code to get the upper limit of the 95% confidence interval.

This last result is for illustrative purposes and it needs to be carefully considered. With

a sample of 62 observations, selecting κ = 3.1 corresponds to a resampled fraction f = 0.11,

which implies a block size of size b= 7. This fraction is too small. As the block size diminishes,

the resampling technique turns into a subsampling procedure and Berkowitz, Caner and Fang

(2012) (section 4) show that as f → 0 the ar test is always oversized. We choose κ = 3.1

only because it generates a bounded interval although in our simulations we find that the best

combinations of size and power are obtained by selecting sub-sample sizes between 20–25%

of the total observations. κ values that generates f <0.2 generates unreliable and unstable

results but there should be further examination of this topic. In our example the best choice is

κ <2. The statistical implications of the estimates obtained by this smallerκ values and the

best choice of the block size are beyond the scope of this chapter.

In order to empirically obtain valid confidence intervals we suggest exploring the default

grid under κ values that corresponds to f above 0.2 to check the overall sequence of the test

(31)

trials to find the presented bounded interval for this data set. In general the confidence set can

be bounded, disjoint or even infinite if the model is misspecified, which implies that the grid

search might become excessively time consuming. We believe that the option of user-defined

grids gives the researcher enough flexibility for finding a solution that is not too time consuming

and not too computationally intensive.

1.6 Simulations

To choose the default value of the constantκ in the farcommand we simulated the system of equations in (1.3) and (1.4) under different scenarios in which the exclusion condition is violated

and explore the far test size and power properties. We choose scenarios similar to those in

Berkowitz, Caner and Fang (2012), but with smaller correlations between the structural error

and the instruments. For empirical purposes we assume that the researcher chooses imperfect

instruments that come close to satisfying the exclusion restriction, so the covariance between

the instruments and the residuals in the structural equation is very small, but nonzero. The

data forzi, ui, vi is generated from a joint normal distribution N(0,Σ) where:

Σ =



    

1 σzu 0

σzu 1 0.9

0 0.9 1



    

(1.13)

were σzu=cov(ZW iuW i).

In equation (1.13) we setup σ_z2 = σ2_u = σ_v2 = 1, σzv = 0 and σuv = 0.9. Note that the

upper-left 2×2 submatrix corresponds to our simulated version of the Ω matrix in equation

(32)

In the first setup we have σzu local to zero as in equation (1.6):

σzu =

h

√

n

and we choose h equal to 0.5 and 1 for the simulations. The largerh becomes, the worse is

the selected instrument.

The second setup corresponds toσzu constant:

σzu=D

and we chose Dequal to 0.1 and 0.25 for the simulations.

In the third setup we have σzu consistent with the bounds of the compact set containing C:

σzu=

an1/3 n1/2

and ais equal to 0.25 and 0.5 in the simulations.

To explore the size properties of the fartest we simulate one endogenous variable (m= 1),

one instrument (k = 1) and two controls (l= 2), one of them being a constant: B = (1,2)0. To model strong identification we set Π = 2 in equation (1.4). We get the similar results (not

reported) for the weak identification case. The sample size nis equal to 100 and 200 and κ is

equal to 1.5, 2, 2.5 and 3, so κn is equal to 1.5/ √

n, 2/√n, 2.5/√nand 3/√n in the equation

(1.12). We give the data a heteroskedastic structure by using the following error form:

u∗W i= abs(ZW i)uW i

Each scenario was simulated 1,000 times using 1,000 resampling iterations. The results for

(33)

patterns than those found by Berkowitz, Caner and Fang (2012). Given that our correlations

are smaller, the test is undersized whenκn is equal 1.5/ √

nand 2/√n, but but the undersize is

corrected when κn is equal 2.5/ √

n and 3/√n, especially in setups 2 and 3 when the sample

size is 100. Note that when n = 200 the far test is undersized in all the setups due to its

conservative nature.

Table 1.10: Size of thefar test at θ0= 0.

Size at 10%

n= 100 n= 200

κ= 1.5 2.0 2.5 3.0 3.5 4.0 4.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Setup 1: σzu=h/

√

n

h= 0.5 0.0 0.1 0.3 2.0 3.3 8.0 16.2 0.0 0.7 1.7 2.3 3.1 6.1 7.7

h= 1.0 0.0 0.2 1.3 2.0 6.7 12.3 21.4 0.3 0.7 2.3 4.8 6.3 9.4 12.3

Setup 2: σzu=D

D= 0.1 0.0 0.0 0.6 2.9 6.4 11.5 20.9 0.4 1.3 4.2 6.7 8.4 14.9 19.8

D= 0.25 0.1 1.6 5.6 12.4 24.4 36.4 51.8 7.0 14.9 29.2 36.4 50.9 59.4 64.9

Setup 3: σzu=an1/3/n1/2

a= 0.25 0.0 0.0 0.8 3.6 7.4 14.0 23.4 0.4 1.4 4.5 7.3 8.5 15.5 20.8

a= 0.5 0.1 1.6 5.2 10.4 21.9 32.1 49.2 3.1 8.7 18.7 25.0 36.0 45.1 52.2

Size at 5%

Setup 1: σzu=h/

√

n

h= 0.5 0.0 0.0 0.0 0.5 0.5 1.8 4.6 0.0 0.0 0.2 0.7 0.2 1.8 1.7

h= 1.0 0.0 0.0 0.0 0.4 1.2 2.8 5.8 0.0 0.0 0.1 0.6 0.8 2.2 4.4

Setup 2: σzu=D

D= 0.1 0.0 0.0 0.0 0.6 1.0 2.7 6.7 0.0 0.2 0.6 1.6 2.0 4.2 6.0

D= 0.25 0.0 0.1 0.4 2.5 6.2 12.8 27.5 0.1 3.0 7.2 11.4 20.0 30.1 37.1

Setup 3: σzu=an1/3/n1/2

a= 0.25 0.0 0.0 0.0 0.6 1.3 3.6 7.9 0.0 0.2 0.7 1.6 2.0 4.4 6.2

a= 0.5 0.0 0.1 0.3 1.9 5.5 10.5 23.5 0.0 1.2 4.0 7.9 12.1 18.2 25.0

(34)

To explore the power properties of the fartest, we simulate scenarios with θ0 equal to −2,

−1.5,−1,−0.5, 0.5, 1, 1.5 and 2 and tested for θ0 = 0. The results are presented in the Tables

1.11, 1.12 and 1.13. We focus on power and not in size-adjusted power because the fartest uses

only one critical value in its application. The simulation exercise shows the test has low power

when θ0 is equal to -0.5 and 0.5 and κn is equal to 1.5/ √

nand 2/√n. The power improves

whenκnis equal to 2.5/ √

nand 3/√n. Considering these results we decided to setκ= 3 as the

default value in the farcommand. Thisκ value is the one that gave us the best size and power combinations and corresponds to a resampling fractionf = 0.2 whenn= 100. The researcher

can easily adjust κ to obtain resampling fractions above the 20% of the total sample. Lowerf

values generates unreliable and unstable results as discussed in section 1.5. Further discussion

and other setups for the covariance matrix can be found in Berkowitz, Caner and Fang (2012).

1.7 Conclusion

We have shown how the fartest can be used to draw valid inferences when the instruments

do not perfectly satisfy the exclusion condition. Our simulations for n = 100 exhibit good

size and power combinations when we select approximately the 20–25% of the total sample

for the resampling block sizes. This corresponds to κn = 3/ √

n in equation (1.6). κ values

that generate smaller block sizes are not recommended. By taking advantage of the speed of

(35)

Table 1.11: Power of thefar test at θ0= 0, covariance setup 1.

θ

-2 -1.5 -1 -0.5 0.5 1 1.5 2

κ h n= 100

1.5 0.5 96.8 95.4 84.7 14.4 16.8 72.0 90.1 92.2

1.0 97.3 93.8 84.5 16.4 9.0 72.8 90.7 93.5

2.0 0.5 99.7 99.1 98.0 51.8 54.5 95.1 98.7 99.6

1.0 99.4 99.3 98.3 59.1 46.1 94.3 98.9 99.5

2.5 0.5 100.0 99.7 99.2 78.4 80.3 98.9 99.9 99.8

1.0 99.9 100.0 99.4 82.1 74.2 98.7 99.7 99.9

3.0 0.5 100.0 99.9 99.8 88.5 89.8 99.5 99.7 100.0

1.0 100.0 100.0 99.5 92.6 85.7 99.5 99.9 100.0

3.5 0.5 100.0 100.0 99.9 92.4 94.2 99.6 99.9 100.0

1.0 100.0 100.0 99.8 93.6 92.1 99.8 100.0 100.0

4.0 0.5 100.0 100.0 100.0 96.2 97.1 99.8 100.0 100.0

1.0 100.0 99.9 99.9 97.2 96.6 99.8 99.9 100.0

4.5 0.5 100.0 100.0 100.0 98.0 98.7 100.0 99.8 99.9

1.0 100.0 100.0 99.9 98.8 98.0 99.8 99.9 100.0

n= 200

1.5 0.5 99.9 98.0 91.5 8.9 5.6 77.9 94.6 98.3

1.0 99.0 98.7 91.9 5.1 8.9 80.7 95.5 98.3

2.0 0.5 100.0 99.9 99.8 59.6 52.7 99.1 99.7 99.9

1.0 100.0 100.0 99.7 50.0 62.9 99.2 99.9 100.0

2.5 0.5 100.0 100.0 99.9 87.0 86.3 99.7 100.0 100.0

1.0 100.0 100.0 99.9 83.7 90.0 99.8 100.0 100.0

3.0 0.5 100.0 100.0 100.0 96.7 97.2 100.0 100.0 100.0

1.0 100.0 100.0 100.0 95.2 97.8 100.0 100.0 100.0

3.5 0.5 100.0 100.0 100.0 98.9 99.2 100.0 100.0 100.0

1.0 100.0 100.0 100.0 98.2 99.2 100.0 100.0 100.0

4.0 0.5 100.0 100.0 100.0 99.0 99.7 100.0 100.0 100.0

1.0 100.0 100.0 100.0 98.8 99.3 100.0 100.0 100.0

4.5 0.5 100.0 100.0 100.0 100.0 99.7 100.0 100.0 100.0

1.0 100.0 100.0 99.9 99.6 99.9 100.0 100.0 100.0

5.0 0.5 100.0 100.0 100.0 99.9 99.7 100.0 100.0 100.0

1.0 100.0 100.0 100.0 99.8 99.7 100.0 100.0 100.0

5.5 0.5 100.0 100.0 100.0 100.0 99.9 100.0 100.0 100.0

1.0 100.0 100.0 100.0 99.6 100.0 100.0 100.0 100.0

6.0 0.5 100.0 100.0 100.0 99.9 100.0 100.0 100.0 100.0

1.0 100.0 100.0 100.0 100.0 99.9 100.0 100.0 100.0

Setup 1 corresponds tocov[ZW iuW i] =h/

√

(36)

θ

-2 -1.5 -1 -0.5 0.5 1 1.5 2

κ D n= 100

1.5 0.1 97.2 95.5 84.0 10.9 20.3 73.1 89.9 92.2

0.25 98.5 96.2 82.3 2.3 28.7 77.4 91.9 92.6

2.0 0.1 99.8 99.5 97.8 43.7 62.0 95.2 98.7 99.5

0.25 99.8 99.4 97.1 21.7 72.6 95.9 98.8 99.1

2.5 0.1 100.0 99.7 99.1 70.8 84.7 99.0 99.9 99.9

0.25 99.9 100.0 99.3 46.3 93.7 99.3 99.8 99.8

3.0 0.1 100.0 99.9 99.8 83.9 91.6 99.7 99.8 99.9

0.25 100.0 99.9 99.6 63.9 95.7 99.2 99.9 100.0

3.5 0.1 100.0 100.0 99.9 89.2 95.8 99.7 99.9 100.0

0.25 100.0 100.0 99.8 72.7 97.4 99.8 99.9 100.0

4.0 0.1 100.0 100.0 100.0 94.6 98.0 99.8 100.0 100.0

0.25 100.0 100.0 99.9 82.6 99.4 99.9 99.9 99.9

4.5 0.1 100.0 100.0 100.0 96.7 99.1 100.0 99.9 99.9

0.25 100.0 100.0 100.0 88.1 99.6 99.9 100.0 100.0

n= 200

1.5 0.1 99.9 98.1 91.5 5.2 10.7 79.5 94.7 98.5

0.25 99.4 98.9 90.2 0.4 24.8 84.8 95.4 98.4

2.0 0.1 100.0 99.9 99.8 45.7 65.9 99.2 99.6 99.9

0.25 100.0 100.0 99.6 13.9 85.1 99.6 99.9 100.0

2.5 0.1 100.0 100.0 99.9 77.4 91.8 99.8 100.0 100.0

0.25 100.0 100.0 99.9 45.0 96.3 99.8 100.0 100.0

3.0 0.1 100.0 100.0 100.0 92.4 98.6 100.0 100.0 100.0

0.25 100.0 100.0 100.0 65.8 99.5 100.0 100.0 100.0

3.5 0.1 100.0 100.0 100.0 97.0 99.5 100.0 100.0 100.0

0.25 100.0 100.0 100.0 82.2 99.9 100.0 100.0 100.0

4.0 0.1 100.0 100.0 100.0 98.0 100.0 100.0 100.0 100.0

0.25 100.0 100.0 100.0 88.6 99.9 100.0 100.0 100.0

4.5 0.1 100.0 100.0 100.0 99.6 99.8 100.0 100.0 100.0

0.25 100.0 100.0 99.9 94.1 99.9 100.0 100.0 100.0

5.0 0.1 100.0 100.0 100.0 99.8 99.9 100.0 100.0 100.0

0.25 100.0 100.0 100.0 95.8 99.9 100.0 100.0 100.0

5.5 0.1 100.0 100.0 100.0 100.0 99.9 100.0 100.0 100.0

0.25 100.0 100.0 100.0 96.3 100.0 100.0 100.0 100.0

6.0 0.1 100.0 100.0 100.0 99.9 100.0 100.0 100.0 100.0

0.25 100.0 100.0 100.0 97.8 99.9 100.0 100.0 100.0

Setup 2 corresponds tocov[ZW iuW i] =D. The corresponding correction factor for

(37)

θ

-2 -1.5 -1 -0.5 0.5 1 1.5 2

κ a n= 100

1.5 .25 97.3 95.3 83.9 9.8 21.3 73.3 89.6 92.0

.5 98.4 96.1 82.3 2.7 27.1 77.5 91.8 92.8

2.0 .25 99.9 99.5 98.0 41.8 64.1 95.4 98.7 99.5

.5 99.7 99.4 97.2 24.2 71.5 95.9 98.7 99.2

2.5 .25 100.0 99.7 99.2 67.8 85.6 99.0 99.9 99.9

.5 99.9 100.0 99.3 49.4 92.8 99.3 99.8 99.8

3.0 .25 100.0 99.9 99.7 81.1 92.3 99.7 99.8 99.9

.5 100.0 99.9 99.6 66.3 95.6 99.2 99.9 100.0

3.5 .25 100.0 100.0 99.9 87.8 96.0 99.7 99.9 100.0

.5 100.0 100.0 99.8 75.5 97.1 99.8 99.9 100.0

4.0 .25 100.0 100.0 100.0 94.0 98.4 99.8 100.0 100.0

.5 100.0 100.0 99.9 84.9 99.2 99.9 99.9 100.0

4.5 .25 100.0 100.0 100.0 96.0 99.3 100.0 99.9 99.9

.5 100.0 100.0 100.0 89.3 99.6 99.9 100.0 100.0

n= 200

1.5 .25 99.9 98.2 91.4 5.2 11.1 79.5 94.7 98.5

.5 99.3 98.9 91.0 0.6 20.5 83.5 95.6 98.5

2.0 .25 100.0 99.9 99.8 44.9 66.2 99.2 99.6 99.9

.5 100.0 100.0 99.6 20.0 81.5 99.3 99.9 100.0

2.5 .25 100.0 100.0 99.9 76.9 91.8 99.8 100.0 100.0

.5 100.0 100.0 99.9 55.3 95.8 99.8 100.0 100.0

3.0 .25 100.0 100.0 100.0 92.3 98.6 100.0 100.0 100.0

.5 100.0 100.0 100.0 76.3 99.4 100.0 100.0 100.0

3.5 .25 100.0 100.0 100.0 96.8 99.5 100.0 100.0 100.0

.5 100.0 100.0 100.0 88.6 99.9 100.0 100.0 100.0

4.0 .25 100.0 100.0 100.0 98.0 100.0 100.0 100.0 100.0

.5 100.0 100.0 100.0 92.5 99.8 100.0 100.0 100.0

4.5 .25 100.0 100.0 100.0 99.6 99.8 100.0 100.0 100.0

.5 100.0 100.0 99.9 97.3 99.8 100.0 100.0 100.0

5.0 .25 100.0 100.0 100.0 99.7 99.9 100.0 100.0 100.0

.5 100.0 100.0 100.0 98.1 99.9 100.0 100.0 100.0

5.5 .25 100.0 100.0 100.0 100.0 99.9 100.0 100.0 100.0

.5 100.0 100.0 100.0 98.2 100.0 100.0 100.0 100.0

6.0 .25 100.0 100.0 100.0 99.8 100.0 100.0 100.0 100.0

.5 100.0 100.0 100.0 99.0 99.9 100.0 100.0 100.0

Setup 3 corresponds tocov[ZW iuW i] =an1/3/n1/2. The corresponding correction

(38)

1.8 References

Acemoglu, D., S. Johnson, and James A. Robinson (2001) “The Colonial Origins of Comparative Development: An Empirical Investigation,”American Economic Review, Vol. 91, pp. 1369–1401.

(2008) “Reply to the Revised (2008) version of David Albouy’s “The Colonial Origins of Comparative Development: An Investigation of the Settler Mortality Data,”.”

Acemoglu, Daron and Simon Johnson (2005) “Unbundling Institutions,” Journal of Political

Economy, Vol. 113, pp. 949–995.

Acemoglu, Daron, Simon Johnson, and James A. Robinson (2011) “Hither Thou Shalt Come, But No Further: Reply to “The Colonial Origins of Comparative Development: An Empirical Investigation: Comment”,” Working Paper 16966, National Bureau of Economic Research.

Albouy, David Y. (2008) “The Colonial Origins of Comparative Development: An Investigation of the Settler Mortality Data,” Working Papers series 14130, National Bureau of Economic Research, Inc.

Anderson, T. W. and H. Rubin (1949) “Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations,” The Annals of Mathematical Statistics, Vol. 20, pp. 46–63.

Angrist, Joshua D (1990) “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records,” American Economic Review, Vol. 80, pp. 313–336.

Berkowitz, Daniel, Mehmet Caner, and Ying Fang (2008) “Are “Nearly Exogenous Instruments” reliable?” Economics Letters, Vol. 101, pp. 20–23.

(2012) “The Validity of Instruments Revisited,” Journal of Econometrics, Vol. 166, pp. 255–266.

Caner, M. (2010) “Near Exogeneity and Weak Identification in Generalized Empirical Likelihood Estimators: Many Moment Asymptotics,” Working Paper, Department of Economics, North

Carolina State University.

Card, D. (1995) “Using Geographic Variation in College Proximity to Estimate the Returns to Schooling.,” In L.N. Christofides et al editors, Aspects of Labour Market Behaviour: Essays in

honor of John Vanderkamp. University of Toronto Press:, pp. 201–221.

Glaeser, Edward L., Rafael La Porta, Florencio Lopez de Silanes, and Andrei Shleifer (2004) “Do Institutions Cause Growth?” Journal of Economic Growth, Vol. 9, pp. 271–303.

(39)

Moreira, M. J. and B. P. Poi (2003) “Implementing Tests With Correct Size in the Simultaneous Equations Model,” Stata Journal, Vol. 3, pp. 57–70.

Riquelme, A., Berkowitz, D, and Caner, M. (2000) “Valid tests when instrumental variables do not perfectly satisfy the exclusion restriction,” Stata Journal, Vol. 13, pp. 528–546.

Stock, James H and Jonathan H Wright (2000) “GMM with weak identification,”Econometrica, Vol. 68, pp. 1055–1096.

Wooldridge, Jeffrey M. (2002) Econometric Analysis of Cross Section and Panel Data, Cam-bridge, MA: MIT Press.

(40)

(41)

*! V e r s i o n 1.0 J . A n d r e s R i q u e l m e 13 m a y 2 0 1 2

/* T h i s p r o g r a m p e r f o r m s t h e F r a c t i o n a l l y R e s a m p l e d A n d e r s o n R u b i n t e s t

( B e r k o w i t z , D . , Caner , M . a n d Y . Fang , 2 0 1 2 : T h e v a l i d i t y of i n s t r u m e n t s r e v i s i t e d . J o u r n a l of E c o n o m e t r i c s 166 , pp . 2 5 5 - 2 6 6 .

P r o g r a m m e d by : J . A n d r e s R i q u e l m e . P l e a s e s e n d a n y c o m m e n t to : j a r i q u e l @ n c s u . e d u */

c a p t u r e p r o g r a m d r o p far p r o g r a m d e f i n e far , r c l a s s

v e r s i o n 1 0 . 0

s y n t a x a n y t h i n g [ if ] [ in ] [ , r e p s ( i n t e g e r 1 0 0 0 0 ) k a p p a ( r e a l 3) t h e t a ( n u m l i s t ) ci l e v e l ( r e a l 95) G R i d ( n u m l i s t ) ]

l o c a l 0 ‘ a n y t h i n g ’

g e t t o k e n e x o g 0 : 0 , p a r s e ( " ( " ) g e t t o k e n d e p v a r e x o g : e x o g g e t t o k e n tmp 0 : 0 , p a r s e ( " ( " ) g e t t o k e n e n d o g 0 : 0 , p a r s e ( " = " ) g e t t o k e n tmp 0 : 0 , p a r s e ( " = " ) g e t t o k e n i n s t r 0 : 0 , p a r s e ( " ) " ) // B e g i n i n p u t c h e c k i n g < -// C h e c k if t h e r e a r e a n y c o n t r o l s

if ( w o r d c o u n t ( " ‘ exog ’ " ) == 0) l o c a l e x o g " " // i m m e d i a t e l y d e l e t e a n y b l a n k c h a r s // C h e c k f o r b e t a i n p u t ( no i n p u t m e a n s s i g n i f i c a n c e t e s t )

c a p t u r e m a t r i x d r o p _ _ t h e t a t e m p v a r _ t h e t a

l o c a l mm = w o r d c o u n t ( " ‘ endog ’ " )

m a t r i x _ _ t h e t a = J ( ‘ mm ’ , 1 , 0) // d e f a u l t if ( " ‘ theta ’ " != " " ) {

l o c a l bb = w o r d c o u n t ( " ‘ theta ’ " ) if ( ‘ mm ’ != ‘ bb ’) {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : t h e t a () }: N u m b e r of g i v e n p a r a m e t e r s m u s t be e q u a l to the n u m b e r of e n d o g e n o u s v a r i a b l e s .} "

e r r o r ( 4 9 9 ) // c o d e 4 9 9 : g e n e r i c e r r o r }

f o r v a l u e s j = 1/ ‘ bb ’ {

l o c a l _a : w o r d ‘j ’ of ‘ theta ’ m a t r i x _ _ t h e t a [ ‘ j ’ , 1] = ‘ _a ’ }

}

// C h e c k C o n f i d e n c e i n t e r v a l i n p u t : if ( ‘ level ’ < 0 | ‘ level ’ > 1 0 0 ) {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : l e v e l () }: C o n f i d e n c e l e v e l m u s t be b e t w e e n 0 and 1 0 0 . } "

// C h e c k t h e " g r i d " // d e f a u l t g r i d : l o c a l g r _ l o w -30 l o c a l g r _ u p p 30 l o c a l g r _ i n c r = .01 if ( " ‘ grid ’ " != " " ) {

if ( w o r d c o u n t ( " ‘ grid ’ " ) != 3) {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : g r i d () }: G r i d o p t i o n r e q u i r e s 3 i n p u t s .} "

l o c a l g r _ l o w : w o r d 1 of ‘ grid ’ l o c a l g r _ u p p : w o r d 2 of ‘ grid ’ l o c a l g r _ i n c r : w o r d 3 of ‘ grid ’ if ( ‘ gr_low ’ >= ‘ gr_upp ’ ) {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : g r i d () }: L o w e r l i m i t m u s t be l o w e r t h a n u p p e r l i m i t .} "

(42)

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : g r i d () }: I n c r e m e n t m u s t be a p o s i t i v e i n t e g e r .} "

}

if ( " ‘ ci ’ " != " " & w o r d c o u n t ( " ‘ theta ’ " ) > 1) {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : ci }: C o n f i d e n c e I n t e r v a l o p t i o n v a l i d o n l y w i t h one (1) e n d o g e n o u s v a r i a b l e .} "

// C h e c k t h e n u m b e r of r e p e t i t i o n s : if ‘ reps ’ <= 0 {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : r e p s () } m u s t be a p o s i t i v e i n t e g e r .} " e r r o r ( 4 9 9 ) // c o d e 4 9 9 : g e n e r i c e r r o r

}

// C h e c k t h e k a p p a v a l u e : if ( ‘ kappa ’ < 0 ) {

n o i s i l y : d i s p l a y " { e r r o r :{ bf : E R R O R -} { it : k a p p a } out of range , it has to be a p o s i t i v e n u m b e r .} "

// - - - > E n d I n p u t C h e c k i n g m a r k s a m p l e t o u s e

m a t a : f a r _ 1 ( " ‘ depvar ’ " , " ‘ exog ’ " , " ‘ endog ’ " , " ‘ instr ’ " , " ‘ touse ’ " , ‘ reps ’ , ‘ kappa ’ , // /

" ‘ ci ’ " , ‘ level ’ , ‘ gr_low ’ , ‘ gr_upp ’ , ‘ gr_incr ’ ) // s e n d v a r s to m a t a

// O u t p u t p r e p a r a t i o n a n d d i s p l a y .

l o c a l t i t l e " F r a c t i o n a l l y r e s a m p l e d A n d e r s o n and R u b i n t e s t " l o c a l n_ : d i s p l a y % 7 . 0 f ‘ r ( N ) ’

l o c a l f a r p _ : d i s p l a y % 9 . 4 f ‘ r ( F A R p ) ’ l o c a l a r p _ : d i s p l a y % 9 . 4 f ‘ r ( ARp ) ’ l o c a l ar_ : d i s p l a y % 9 . 4 f ‘ r ( AR ) ’ l o c a l r e p s _ : d i s p l a y % 9 . 0 f ‘ r ( R E P S ) ’ d i s p l a y " "

d i s p l a y " { i n p u t : ‘ title ’.} "

d i s p l a y " { h l i n e 1 5 } { c TT }{ h l i n e 60} "

d i s p l a y " { i n p u t : { c |} F u l l s a m p l e F u l l s a m p l e FAR } "

d i s p l a y " { i n p u t : { c |} s t a t i s t i c p - v a l u e p - v a l u e r e p s N } "

d i s p l a y " { h l i n e 1 5 } { c +}{ h l i n e 60} "

d i s p l a y " { i n p u t : AR - t e s t { c |}} { r e s u l t : ‘ ar_ ’} { r e s u l t : ‘ arp_ ’} { r e s u l t : ‘ farp_ ’} { r e s u l t : ‘ reps_ ’} { r e s u l t : ‘ n_ ’} "

d i s p l a y " { h l i n e 1 5 } { c BT }{ h l i n e 60} " r e t u r n s c a l a r l e v e l = ‘ level ’

r e t u r n s c a l a r n = r ( N ) r e t u r n s c a l a r m = r ( M ) r e t u r n s c a l a r l = r ( L ) r e t u r n s c a l a r k = r ( K )

r e t u r n s c a l a r k a p p a = r ( k a p p a ) r e t u r n s c a l a r r e p s = r ( R E P S ) r e t u r n s c a l a r f a r p = r ( F A R p ) r e t u r n s c a l a r arp = r ( ARp ) r e t u r n s c a l a r ar = r ( AR )

if ( ‘ reps ’ != 1 0 0 0 ) l o c a l o p t r " r e p s ( ‘ reps ’) " if ( ‘ kappa ’ != 3) l o c a l o p t k " c ( ‘ kappa ’) " if ( " ‘ theta ’ " != " " ) l o c a l o p t b " t h e t a ( ‘ theta ’) " if ( ‘ level ’ != 95 ) l o c a l o p t l " l e v e l ( ‘ level ’) "

if ( " ‘ grid ’ " != " " ) l o c a l o p t g " g r i d ( ‘ gr_low ’ , ‘ gr_upp ’ , ‘ gr_incr ’) " r e t u r n l o c a l g r i d ‘ grid ’

(43)

r e t u r n l o c a l c m d l i n e " far ‘ a n y t h i n g ’ ‘ if ’ ‘ in ’ ‘ optc ’ ‘ optr ’ ‘ optk ’ ‘ optb ’ ‘ ci ’ ‘ optl ’ ‘ optg ’ "

r e t u r n l o c a l t i t l e ‘ title ’ r e t u r n m a t r i x t h e t a _ _ t h e t a

m a t r i x c o l n a m e s rci = t h e t a FAR - p t e s t r e t u r n m a t r i x ci rci

end

c a p t u r e m a t a m a t a d r o p f a r _ 1 () m a t a

v o i d f a r _ 1 ( s t r i n g s c a l a r depvar , s t r i n g s c a l a r exog , s t r i n g s c a l a r endog , // / s t r i n g s c a l a r instr , s t r i n g s c a l a r touse , // /

r e a l nb , r e a l kappa , s t r i n g s c a l a r ci , r e a l level , r e a l gr_low , r e a l gr_upp , r e a l g r _ i n c r )

{

r e a l m a t r i x M , Y , Z , W r e a l c o l v e c t o r y

r e a l s c a l a r n , m , ar , far M = Y = Z = W = y =.

t h e t a = s t _ m a t r i x ( " _ _ t h e t a " )

s t _ v i e w ( y , . , t o k e n s ( d e p v a r ) , t o u s e ) if ( e x o g != " " ) {

s t _ v i e w ( W , . , t o k e n s ( e x o g ) , t o u s e ) }

e l s e {

W = J ( r o w s ( y ) ,1 ,1) // d u m m y s i n g l e c o l u m n to c h e c k f o r m i s s i n g s }

s t _ v i e w ( Y , . , t o k e n s ( e n d o g ) , t o u s e ) s t _ v i e w ( Z , . , t o k e n s ( i n s t r ) , t o u s e ) // E l i m i n a t e m i s s i n g v a l u e s f r o m t h e s a m p l e

M = y , W , Y , Z t f i l t e r = 1:: r o w s ( M )

for ( i =1; i <= r o w s ( M ) ; i ++) { for ( j =1; j <= c o l s ( M ) ; j ++) {

if (!( M [ i , j ] <.) ) t f i l t e r [ i ] = . }

}

f i l t e r = min ( t f i l t e r )

for ( k = min ( f i l t e r ) +1; k <= r o w s ( t f i l t e r ) ; k ++) {

if ( t f i l t e r [ k ,1] != .) f i l t e r = f i l t e r \ t f i l t e r [ k ,1] }

f i l t e r = filter ’ n = l e n g t h ( f i l t e r )

M = M [( f i l t e r ) , 1.. c o l s ( M ) ] y = y [( f i l t e r ) , 1.. c o l s ( y ) ] // G e t t h e E n d o g e n o u s

Y = Y [( f i l t e r ) , 1.. c o l s ( Y ) ]

m = c o l s ( Y ) // m : n u m b e r of e n d o g e n o u s // C h e c k c o l l i n e a r i t y : e n d o g e n o u s

if ( r a n k ( Y ) < m ) {

e r r p r i n t f ( " E r r o r : C o l l i n e a r e n d o g e n o u s v a r i a b l e s \ n " ) e x i t ( 4 9 9 )

}

// G e t t h e C o n t r o l s

if ( e x o g == " " ) W = J ( n ,1 ,1) // no c o n t r o l s

if ( e x o g != " " ) W = J ( n ,1 ,1) , W [( f i l t e r ) , 1.. c o l s ( W ) ] // c o n t r o l s l = c o l s ( W ) // l : n u m b e r of c o n t r o l s

// C h e c k f o r c o l l i n e a r i t y : c o n t r o l s if ( e x o g != " " & r a n k ( W ) < l ) {

e r r p r i n t f ( " E r r o r : C o l l i n e a r c o n t r o l s \ n " ) e x i t ( 4 9 9 )

}

// G e t t h e I n s t r u m e n t s

Z = Z [( f i l t e r ) , 1.. c o l s ( Z ) ]

k = c o l s ( Z ) // k : n u m b e r of i n s t r u m e n t s // C h e c k f o r c o l l i n e a r i t y : i n s t r u m e n t s

if ( r a n k ( Z ) < k ) {

(44)

e x i t ( 4 9 9 ) }

// P r o j e c t o u t c o n t r o l s

P = W * i n v s y m ( W ’* W ) * W ’ // P r o j e c t i o n m a t r i x Yw = Y - P * Y

yw = y - P * y Zw = Z - P * Z

// E s t i m a t e s w i t h o u t c o n f i d e n c e i n t e r v a l

r e s u l t s = f a r _ 2 ( yw , Yw , Zw , n , m , kappa , nb , t h e t a ) ar = r e s u l t s [1]

arp = r e s u l t s [2] f a r p = r e s u l t s [3]

// E x p o r t r e s u l t s to m a i n r o u t i n e s t _ r c l e a r ()

s t _ n u m s c a l a r ( " r ( N ) " , n ) s t _ n u m s c a l a r ( " r ( L ) " , l ) s t _ n u m s c a l a r ( " r ( M ) " , m ) s t _ n u m s c a l a r ( " r ( K ) " , k ) s t _ n u m s c a l a r ( " r ( AR ) " , ar ) s t _ n u m s c a l a r ( " r ( F A R p ) " , f a r p ) s t _ n u m s c a l a r ( " r ( ARp ) " , arp ) s t _ n u m s c a l a r ( " r ( R E P S ) " , nb ) s t _ n u m s c a l a r ( " r ( k a p p a ) " , k a p p a ) // C o n f i d e n c e i n t e r v a l

g r i d = ( t h e t a [1 ,1] , farp , farp >= l e v e l / 1 0 0 ) if ( ci != " " ) {

g r i d = r a n g e ( gr_low , gr_upp , g r _ i n c r )

g r i d = g r i d , J ( r o w s ( g r i d ) , 2 ,0) // M a k e t h e g r i d a n d r e s u l t s for ( k =1; k <= r o w s ( g r i d ) ; k ++) {

r e s u l t s = f a r _ 2 ( yw , Yw , Zw , n , m , kappa , nb , g r i d [ k , 1 ] ) g r i d [ k ,2] = r e s u l t s [3]

g r i d [ k ,3] = g r i d [ k ,2] >= (100 - l e v e l ) / 1 0 0 } }

s t _ m a t r i x ( " rci " , g r i d ) }

end

c a p t u r e m a t a m a t a d r o p f a r _ 2 () m a t a

f u n c t i o n f a r _ 2 ( r e a l m a t r i x yw , r e a l m a t r i x Yw , r e a l m a t r i x Zw , // / r e a l n , r e a l m , r e a l kappa , r e a l nb , r e a l m a t r i x b0 )

{

// F u l l S a m p l e H e t e r o s k e d a s t i c i t y r o b u s t A n d e r s o n R u b i n T e s t err = Zw :*( yw - Yw * b0 ) // S t r u c t u r a l e r r o r

vc = ( err ’* err ) / n // E s t i m a t e d c o v a r i a n c e m a t r i x

ar = (( yw - Yw * b0 ) ’* Zw * i n v s y m ( vc ) * Zw ’*( yw - Yw * b0 ) ) / n // H e t . r o b u s t A n d e r s o n R u b i n t e s t // F A R v a l u e s

f = 1/2 - k a p p a / s q r t ( n ) // F r a c t i o n of o r i g i n a l s a m p l e to be r e s a m p l e d b = c e i l ( n * f ) // B l o c k s i z e

f = b / n

arb = J ( nb ,1 ,0) // A n d e r s o n - R u b i n t e s t m a t r i x pre - a l l o c a t i o n for ( i =1; i <= nb ; i ++) {

blq = J ( b ,1 ,0)

r1 = c e i l (( n :: n - b +1) :* u n i f o r m ( b ,1) ) + ( 0 : : b -1) // R a n d o m s a m p l e ( w i t h r e p e t i t i o n ) r2 = 1:: n

for ( h =1; h < b ; h ++) {

r2 [( h , r1 [ h ]) ] = r2 [( r1 [ h ] , h ) ] // R a n d o m r e s a m p l i n g }

blq = r2 [ ( 1 . . b ) ,1] // R a n d o m r e s a m p l e ( no r e p e t i t i o n ) e r r 2 = yw - Yw * b0

arb [ i ] = ( b /(1 - f ) ) *(( e r r 2 [ blq ] ’* Zw [ blq , . ] ) / b ) * i n v s y m ( vc ) *(( Zw [ blq ,.] ’* e r r 2 [ blq ]) / b ) }

arb = s o r t ( arb ,1)

arp = 1 - c h i 2 ( m , ar ) // A n d e r s o n - R u b i n p v a l u e e s t i m a t e

f a r p = m e a n ( ar : <= arb ) // R e s a m p l e d A n d e r s o n R u b i n p v a l u e e s t i m a t e r e t u r n ( ar , arp , f a r p )

(45)

C h a p t e r 2 : M o m e n t a n d I V S e l e c t i o n A p p r o a c h e s : A

C o m pa r at i v e S i m u l at i o n S t u d y

2.1 Preliminaries

The focus of the first chapter was the problem of the inference when the instrumental variables

do not that perfectly satisfy the exclusion restriction, but are still used because of the difficulty

of finding alternative ones. The underlying problem was that the best the researcher could do

was work with a set of imperfect instruments. In this chapter a different but related problem

is addressed: when the researcher has a large –and probably diverging– number of candidate

instruments and only some of them are valid, two concerns arise: (1) the inclusion of invalid

instruments generate bias in the estimations, and (2) it is know the fact thatgmm estimators

have poor finite sample properties in highly overidentified models. Under this circumstances,

the relevant question is how to pick up the valid instruments and discard the invalid ones? We

refer to this issue as the moment selection problem, which may be adjudicated statistically with

the J-test which indicates whether overidentified restrictions are valid. If the null hypothesis is

rejected, the researcher would need to employ a moment selection criteria (msc) that would

allow distinguishing between the valid and invalid moment conditions.

(46)

we focus on information-based methods and review three of the moment selection criteria used in

the current literature: (i) the shrinkage procedure as in Liao (2013), (ii) the information-based

criteria with gmmin Andrews (1999), and (iii) the information based criterion using generalized

empirical likelihood of Hong, Preston and Shum (2003).

The main objective of this chapter is to compare the aforementioned msc in a fairly

comprehensive manner. By using Monte Carlo simulations we compare these methods in their

performance in selecting valid moments in linear settings under several relevant scenarios: small

and large sample sizes, fixed and increasing number of moment conditions, weak and strong

identification, local-to-zero moment conditions, homoskedastic and heteroskedastic errors. We

also analyze the second stage estimation performance of each msc by considering the finite

sample properties of structural parameter estimators. To this end, we employ Okui (2011)

model averaging technique to get better Mean Squared Error, and smaller bias for the structural

parameters. We find that the Adaptive Lasso in the model selection stage, coupled with either

unpenalized gmmor moment averaging of Okui delivers generally the smallest rmse for the

second stage coefficient estimators.

2.2 Introduction

It is not uncommon to encounter a large number of instruments or moment conditions in

the applications of instrumental variables (iv) or the Generalized Method of Moments (gmm)

estimators. Some ivs or moments may be invalid, but the researcher does not know a priori

which ones. This problem may be adjudicated statistically with the J-test which indicates

whether overidentified restrictions are valid. If the null is rejected, the researcher would need

to a moment selection technique that would allow distinguishing between the valid and invalid

moment conditions. A few techniques have been proposed, each one with advantages (for