Linear IV Model - Additional Examples - Dennis_unc_0153D

3.6 Additional Examples

3.6.4 Linear IV Model

Andrews and Cheng (2012a,b) demonstrate that the linear instrumental variable model

y1,i=y2,iπ+ui∗, y2,i =Zi0β+v

∗

fits within their framework when estimated via limited information maximum likelihood (LIML). In particular, the reduced form equationsy1,i−π·Zi0β+uiandy2,i−Zi0β+viwithui =u∗i +πv∗i,

vi =v∗i, and(ui, vi)∼N(0, Y)are estimated with the likelihood function

Qn(θ) = log|Y|+ 1 n n X i=1 εi(β, π)0Y−1εi(β, π)

whereεi(β, π) =    y1,i−π·Zi0β y2,i−Zi0β  

. Similarly to the discussion for the nonlinear binary choice

model, their theory only accommodates a single endogenous covariate in this setup, as they do not allow for mixed identification strength inπ. The theory developed in this paper, however, can be used to analyze models in this setup with more that one endogenous covariate.14 Consider the structural model

yi =x1,iπ1+x2,iπ2+u∗i, x1,i=Z10,iβ1+vi, x2,i =Z20,iβ2+ηi.

The reduced form equations are

yi =Z10,iβ1π1+Z20,iβ2π2+ui, x1,i=Z10,iβ1+vi, x2,i =Z20,iβ2+ηi

whereui =v∗iπ1+ηi∗π2+u∗i, and similarly we can assume(ui, vi, ηi)∼N(0, Y). LIML estimation

of instrumental variables models with weak instruments has been studied by Bound, Jaeger, and Baker (1996), Staiger and Stock (1997), Moreira (2003), Andrews, Moreira, and Stock (2006), Chao and Swanson (2007) and many others.

In addition to not allowing mixed identification strength (Andrews and Cheng, 2012a, 2013, 2014) and restricting the class of allowable models (Cheng, 2015), previous results for identification robust inference do not consider high dimensional parameters or max tests. In contrast, our theory allows for testing a large dimensional parameter by estimation of many parsimoniously constructed models and a test on the maximum of the sequence of estimators attained from the estimation.

Inference in models with many parameters is typically conducted with an imposed sparsity assumption by forcing a large number of the parameters to be equal to zero with a penalized estimator such as LASSO (Tibshirani, 1996) in a way that precludes inference on those parameters. 14_{Andrews and Stock (2007) note that the most important case in empirical applications involves only a single right}

hand side endogenous covariate; however, this does not mean that the ability to analyze a system with more than one endogenous covariate is not important.

As a result, valid inference can only be conducted on the remaining non-zero parameters in many cases. Recent work focusing on this inference issue has relied on ‘desparsification’ (van de Geer et al., 2014; Caner and Kock, 2018; Dezeure, B¨uhlmann, and Zhang, 2017) or ‘debiasing’ (Belloni et al., 2014b; Wooldridge and Zhu, ming) the LASSO estimator; however, using these procedures to conduct inference when some parameters are weakly identified has not been studied. In particular, one of the nice features of the LASSO is that it is a convex relaxation of a nonconvex problem; however, this convexity is not guaranteed when operating on nonlinear models.

Further, the LASSO sets exactly equal to zero any parameter that cannot be statistically dis- tinguished from zero. Belloni et al. (2016), Leeb and P¨otscher (2008) and P¨otscher (2009) note that this can be problematic for conducting inference with approximately sparse models that include both variables with small but nonzero coefficients and strong predictors, since the LASSO will exclude the variables with small coefficients, which the authors note, can lead to omitted variable bias and irregular sampling behavior. Our approach differs in that we estimate a collection of parsimonious models by considering each parameter in turn and evaluating the maximum of the estimated values, thereby allowing inference on all parameters (Ghysels et al., 2016a; Hill and Dennis, 2018; Ghysels, Hill, and Motegi, ming).

In general, we may have a desire to test a large subset of our parameters based on economic reasoning or functional form. For example, Belloni et al. (2014b,a) perform a follow-up study regarding the effect of legalized abortion on crime (Donohue and Levitt, 2001, 2008; Foote and Goetz, 2008) in which they examine inference on treatment after selection amongst a high dimensional set of controls. They include a large set of controls that allows for flexible trends that vary with state-level characteristics. In particular, they alter the baseline model of Donohue and Levitt (2001) to include 284 variables15 _{that allow for a “cubic trend for the level of the crime rate and}

abortion rate which is allowed to depend on observed state-level characteristics.” The data set con- sists of only 600 observations, and they illustrate the poor performance of OLS due to the large 15_{“the levels, differences, initial level, initial difference, and within-state average of the eight state-specific time-}

varying observables, the initial level and initial difference of the abortion rate relevant for crime type, quadratics in each of the preceding variables, interactions of all the aforementioned variables withtandt2_{, and the main effects}_t

number of covariates relative to observations.

Their LASSO-double-selection method suggests that i) results based on a small set of intuitively selected controls differ from results obtained through formal variable selection and ii) ac- counting for nonlinear trends in the data affects the results, as well. Based on this discrepancy between results based on formal selection and intuitive selection, we can use the framework developed in this paper to examine whether the group of intuitively or economically relevant controls is relevant for the regression. Alternatively, we can use the max test to construct a test of the relevance of the controls added for fidelity, such as the group of all interactions of variables that are meant to allow for a more flexible functional form.

For simplicity of exposition, consider the model with one endogenous covariate

yt =xtπ+Zt0ω+u ∗ t, xt=Zt0β+v ∗ t where β ∈ _Rdβ _with _d

β = o(n) and t is used for the observation to avoid confusion with the

parsimonious model index below. Here we wish to test the relevance of a potentially large subset of instruments, so the null hypothesis isH0 : (β20, ω

0 2)

0 _{= 0}

for some subvectorβ2 ofβ = (β10, β 0 2)

0 and similarly forω2. The reduced form parsimonious models are

yt =Z10,t(β1π+ω1) +Z20,i,t(β2,iπ+ω2,i) +ui,t, xt=Z10,tβ1+Z20,i,tβ2,i+vi,t

In its simplest form,Z1will be empty (β1 = 0), so each parsimonious model will have exactly one

exogenous covariate,Z2,i.

This is related to the literature that studies estimation and testing with many weak instruments (Bekker, 1994; Bekker and Kleibergen, 2003; Chao and Swanson, 2005; Chamberlain and Im- bens, 2004; Andrews and Stock, 2007; Hansen, Hausman, and Newey, 2012; Hausman, Newey, Woutersen, Chao, and Swanson, 2012) and many others. In particular, Andrews and Stock (2007) examine the properties of certain tests and discuss the rate condition,k3_/n_→₀_{needed for correct}

In document Dennis_unc_0153D_18442.pdf (Page 103-107)