Appendix: Notation - Dennis_unc_0153D

The framework utilized here is based upon that developed in Andrews and Cheng (2012a) and Cheng (2015); hence we find it convenient to borrow their notation. This section is meant to be a reference for the notation found throughout the remainder of the paper. Readers familiar with Cheng (2015) may wish to skip this section and return if needed.

The parameter vectorθ ∈Θ∗can be partitioned into three subvectorsθ= (β0, ζ0, π0)0where the parametersβandζare always strongly identified, and the identification strength ofπis determined by β. ζ does not affect the identification of π orβ. For the observations {Wt = (Yt0, X

t, Z

0 _: t ≤ n},{Zt}are the variables associated with parameterζ which are not associated withβ orπ.

The variablesXt are associated withβ and π but not with ζ. For anyθ ∈ Θ∗, we denote by Fγ

the distribution of{Wt : t ≤ n}andEγ its expectation, whereγ = (θ, φ) ∈ Γand φ ∈ Φ∗ is a

possibly infinite dimensional nuisance parameter such that the distribution is fully characterized by γ. In the framework of Andrews and Cheng (2012a), all elements ofπare only allowed to exhibit a single identification strength that is determined by the value ofβ.

This can be demonstrated with a simple example in which we estimate scalar parameters(β, π) from the nonlinear function Yt = βg(Xt, π) +εt for some smooth non-linear function g. It is

well known that when β 6= 0, π can be (strongly) identified, and when β = 0, π cannot be identified. In order to develop a unifying testing framework, Andrews and Cheng (2012a) utilize a thought experiment which can be characterized with the notion of drifting sequences of true parameters. Let β = βn be a sequence of true parameters that are drifting to 0, the point that

induces identification failure. Then the strength of identification of πis categorized by the speed at which βn → 0. When

√

nβn → ∞, π is characterized as being semi-strongly identified, and

when√nβn →b∈(0,∞), we sayπis weakly identified. In the latter case our estimatorπˆnis not

consistent for the true π0, and converges instead to a random variable. These drifting sequences

While this setup allows for uniform inference within the parameter space, missing from their framework is the ability to account for identification strengths that differ across elements of π. Cheng (2015) augments this theory specifically for the additive non-linear model to allow for mixed identification strength by pairing subvectors of β with subvectors of π, and allowing the subvectors of β to drift to zero at differing rates. To allow for uniformity over γ ∈ Γ, all true parameters are indexed by the sample sizen. That is, the trueγn = (θn0, φ

0 n) 0 _where θn = (βn0, ζ 0 n, π 0 n) 0 withβn = (β10,n, . . . , β 0 p,n) 0_and_π n = (π10,n, . . . , π 0 p,n)

0_{. These parameters drift to the limiting values} θn →θ0 = (β00, ζ 0 0, π 0 0) 0 _∈_Θ∗ _and_γ n →γ0 ∈Γ. B.1.1 Drifting Sequences

In this framework, the identification strength of πi, i = 1, . . . , p, is determined by the rate at

which||βi,n||converges to0asn → ∞, withπi being strongly identified only ifβi,n → βi 6= 0.

In the case thatβi,0 = 0, the speed at whichβi,n → βi,0 = 0affects the asymptotic analysis. In

particular, when ||βi,n|| → 0 fast enough, given by case (i) below, we say the parameter πi,0 is weaklyidentified. In this case, the estimatorπˆi,nis not consistent. Hence, following Cheng (2015),

we divide the space of drifting sequences into three identification categories ofπi:

(i) Weak Identification: βi,n→0withn1/2βi,n→bi ∈Rdβi

(ii) Semi-Strong Identification: βi,n→0withn1/2||βi,n|| → ∞

(iii) Strong Identification:βi,n→βi 6= 0.

Observe that the case βi,n = 0 ∀n is allowed under case (i); hence this case includes non-

identification. The category (ii) of semi-strong identification is necessary for uniform results in Cheng’s (2015) work. She groups subvectors ofπby the identification category above and the rate of convergence to zero for subvectors in the semi-strong identification category. This grouping allows a convenient inductive argument to be used to prove estimation results.

B.1.2 Grouping Notation

To facilitate sequential analysis, we follow the notation in Cheng (2015). Let ||βi||denote the

norm of vectorβi. We group subvectors ofβand their associated pairings inπ with the following

procedure.

(i) All||βj,n||that have non-zero limit are put in the first group. If all ||βj,n||have zero limits,

the first group is empty.

(ii) All||βj,n||that areO(n−1/2)are put in the last group.

(iii) For those that converge to0but at a rate slower thann−1/2_{, members in group}_k_{converge to}

0slower than members in groupk0 for anyk0 > kand members in the same group converge to0at the same rate.

The first group is associated with strong identification, the last group is associated with weak identification, and the middle groups are associated with semi-strong identification, ordered by the rate of convergence. Note that the group indexkis a property associated with the drifting sequence

{βj,n : n ≥ 1}. Therefore the group indexk does not change with the sample sizen. See Cheng

(2015) for details.

Next, suppose there are K groups and βk1, . . . , βk_pk are the elements in group k. Let lk =

{k1, . . . , kpk}denote the indices for groupk. Use the subscriptlkto denote a sub-vector associated

with groupk: βlk = (βk01, . . . , β 0 k_pk) 0 _∈ Rdk and πlk = (πk01, . . . , π 0 k_pk) 0 _∈ Rdπlk.

βlk,ndenotes the true value ofβlkwhen the sample size isnandβlk,0denotes its limit. In particular,

the grouping rule implies that ||βl_k0,n|| = o(||βlk,n||) for k

0 _{> k}

between groups and ||βj0_,n||

converges at the same rate as||βj,n|| for anyj, j0 ∈ lk andk = 1, . . . , K −1. In the presence of

weak identification,βlk,n =O(n−1/2)fork=K. If all regressors are in the semi-strong or strong

Finally, we describe one more partition of the vectors βandπbased on the grouping notation above that will be used to sequentially analyze the limiting behavior of the estimators.

Considerπ(i),lk, and denoteπ(i),k− as the elements ofπin the previous groupsl1, . . . , lk−1 and

π(i),k+ as the elements ofπ in the subsequent groupsl_k₊₁, . . . , l_K.

πk− = (π_l0 1, . . . , π 0 lk−1) 0 and πk+ = (π_l0 k+1, . . . , π 0 lK) 0 . Observe that π = (π0_k−, π0_lk, π0_k+) 0

, and that the identification strength of these subvectors are in decreasing order by definition. The same notation will apply to β, where we can note that the subvectors inβ = (β_k0−, β_lk0 , β_k0+)0 have smaller magnitude by definition.

It is important to note thatπl1 is strongly identified. All strongly identified elements ofπ are included in this group in order to analyze them together with the strongly identified parametersβ and ζ. The semi-strongly identified and weakly-identified elements of π are analyzed using the sequential procedure outlined in Cheng (2015). If no elements ofπare strongly identified,l1 =∅

andπl1 disappears.

B.1.3 Concentrated Criterion Functions

The least squares estimator θˆn minimizesQn(θ) over θ ∈ Θ, whereΘ = B × Z ×Π. B = ×p_j₌₁Bj whereBj forj = 1, . . . , p are compact sets, as areZ andΠ. We assume all true values

and parsimonious model counterparts inΘ∗ are in the interior of the optimization spaceΘ.

Proof of the consistency of the strongly and semi-strongly identified components of the estimator follows from sequential analysis in order of decreasing identification strength. In particular, we sequentially concentrate out parameters and analyze the concentrated criterion function

Qc_n(πlk, πk+) = Q_n( ˆψ_k−(π_lk, π_k+), π_lk, π_k+)

where ψk− = (β0, ζ0, π0

k−)0 collects the parameters that have been concentrated out, and the true

values of these parameters are denoted with the additional subscriptsψ_k−_,n = (β_n0, ζ_n0, π0

k−_,n)0 and

ψk−_,₀ = (β₀0, ζ₀0, π0

Due to the mixed identification strength along differing subvectors ofπ, it becomes necessary to evaluate expansions around the points of sequential identification failure,β_l0

k = 0andβ

k+ = 0, rather than the true values βlk,n = 0 andβk+_,n = 0 as is commonly done Andrews and Cheng’s (2012a). We use the superscript0notation to define

ψ_k0−_,n = (β_k0−_,n, β_l00 k, β

k+, ζ_n0, π_k0−_,n)0

to be the parameter vector consisting of the concentrated out parameters evaluated at the point of sequential identification failure β0

lk = 0 and βk0+ = 0. Observe that the difference ψk−_,n −

ψ0

k−_,n = (00, βlk,n, βk+_,n,00,00)0. This is done so that under our basic assumptions the centering termQn(ψ_k0−_,n, πlk, πk+)does not depend on(π0_lk, π0

k+)0.

B.2 Appendix: Limit Theory for Models with Mixed Identification Strength

In document Dennis_unc_0153D_18442.pdf (Page 163-167)