The framework utilized here is based upon that developed in Andrews and Cheng (2012a) and Cheng (2015); hence we find it convenient to borrow their notation. This section is meant to be a reference for the notation found throughout the remainder of the paper. Readers familiar with Cheng (2015) may wish to skip this section and return if needed.
The parameter vectorθ ∈Θ∗can be partitioned into three subvectorsθ= (β0, ζ0, π0)0where the parametersβandζare always strongly identified, and the identification strength ofπis determined by β. ζ does not affect the identification of π orβ. For the observations {Wt = (Yt0, X
0
t, Z
0
t)
0 : t ≤ n},{Zt}are the variables associated with parameterζ which are not associated withβ orπ.
The variablesXt are associated withβ and π but not with ζ. For anyθ ∈ Θ∗, we denote by Fγ
the distribution of{Wt : t ≤ n}andEγ its expectation, whereγ = (θ, φ) ∈ Γand φ ∈ Φ∗ is a
possibly infinite dimensional nuisance parameter such that the distribution is fully characterized by γ. In the framework of Andrews and Cheng (2012a), all elements ofπare only allowed to exhibit a single identification strength that is determined by the value ofβ.
This can be demonstrated with a simple example in which we estimate scalar parameters(β, π) from the nonlinear function Yt = βg(Xt, π) +εt for some smooth non-linear function g. It is
well known that when β 6= 0, π can be (strongly) identified, and when β = 0, π cannot be identified. In order to develop a unifying testing framework, Andrews and Cheng (2012a) utilize a thought experiment which can be characterized with the notion of drifting sequences of true parameters. Let β = βn be a sequence of true parameters that are drifting to 0, the point that
induces identification failure. Then the strength of identification of πis categorized by the speed at which βn → 0. When
√
nβn → ∞, π is characterized as being semi-strongly identified, and
when√nβn →b∈(0,∞), we sayπis weakly identified. In the latter case our estimatorπˆnis not
consistent for the true π0, and converges instead to a random variable. These drifting sequences
While this setup allows for uniform inference within the parameter space, missing from their framework is the ability to account for identification strengths that differ across elements of π. Cheng (2015) augments this theory specifically for the additive non-linear model to allow for mixed identification strength by pairing subvectors of β with subvectors of π, and allowing the subvectors of β to drift to zero at differing rates. To allow for uniformity over γ ∈ Γ, all true parameters are indexed by the sample sizen. That is, the trueγn = (θn0, φ
0 n) 0 where θn = (βn0, ζ 0 n, π 0 n) 0 withβn = (β10,n, . . . , β 0 p,n) 0andπ n = (π10,n, . . . , π 0 p,n)
0. These parameters drift to the limiting values θn →θ0 = (β00, ζ 0 0, π 0 0) 0 ∈Θ∗ andγ n →γ0 ∈Γ. B.1.1 Drifting Sequences
In this framework, the identification strength of πi, i = 1, . . . , p, is determined by the rate at
which||βi,n||converges to0asn → ∞, withπi being strongly identified only ifβi,n → βi 6= 0.
In the case thatβi,0 = 0, the speed at whichβi,n → βi,0 = 0affects the asymptotic analysis. In
particular, when ||βi,n|| → 0 fast enough, given by case (i) below, we say the parameter πi,0 is weaklyidentified. In this case, the estimatorπˆi,nis not consistent. Hence, following Cheng (2015),
we divide the space of drifting sequences into three identification categories ofπi:
(i) Weak Identification: βi,n→0withn1/2βi,n→bi ∈Rdβi
(ii) Semi-Strong Identification: βi,n→0withn1/2||βi,n|| → ∞
(iii) Strong Identification:βi,n→βi 6= 0.
Observe that the case βi,n = 0 ∀n is allowed under case (i); hence this case includes non-
identification. The category (ii) of semi-strong identification is necessary for uniform results in Cheng’s (2015) work. She groups subvectors ofπby the identification category above and the rate of convergence to zero for subvectors in the semi-strong identification category. This grouping allows a convenient inductive argument to be used to prove estimation results.
B.1.2 Grouping Notation
To facilitate sequential analysis, we follow the notation in Cheng (2015). Let ||βi||denote the
norm of vectorβi. We group subvectors ofβand their associated pairings inπ with the following
procedure.
(i) All||βj,n||that have non-zero limit are put in the first group. If all ||βj,n||have zero limits,
the first group is empty.
(ii) All||βj,n||that areO(n−1/2)are put in the last group.
(iii) For those that converge to0but at a rate slower thann−1/2, members in groupkconverge to
0slower than members in groupk0 for anyk0 > kand members in the same group converge to0at the same rate.
The first group is associated with strong identification, the last group is associated with weak identification, and the middle groups are associated with semi-strong identification, ordered by the rate of convergence. Note that the group indexkis a property associated with the drifting sequence
{βj,n : n ≥ 1}. Therefore the group indexk does not change with the sample sizen. See Cheng
(2015) for details.
Next, suppose there are K groups and βk1, . . . , βkpk are the elements in group k. Let lk =
{k1, . . . , kpk}denote the indices for groupk. Use the subscriptlkto denote a sub-vector associated
with groupk: βlk = (βk01, . . . , β 0 kpk) 0 ∈ Rdk and πlk = (πk01, . . . , π 0 kpk) 0 ∈ Rdπlk.
βlk,ndenotes the true value ofβlkwhen the sample size isnandβlk,0denotes its limit. In particular,
the grouping rule implies that ||βlk0,n|| = o(||βlk,n||) for k
0 > k
between groups and ||βj0,n||
converges at the same rate as||βj,n|| for anyj, j0 ∈ lk andk = 1, . . . , K −1. In the presence of
weak identification,βlk,n =O(n−1/2)fork=K. If all regressors are in the semi-strong or strong
Finally, we describe one more partition of the vectors βandπbased on the grouping notation above that will be used to sequentially analyze the limiting behavior of the estimators.
Considerπ(i),lk, and denoteπ(i),k− as the elements ofπin the previous groupsl1, . . . , lk−1 and
π(i),k+ as the elements ofπ in the subsequent groupslk+1, . . . , lK.
πk− = (πl0 1, . . . , π 0 lk−1) 0 and πk+ = (πl0 k+1, . . . , π 0 lK) 0 . Observe that π = (π0k−, π0lk, π0k+) 0
, and that the identification strength of these subvectors are in decreasing order by definition. The same notation will apply to β, where we can note that the subvectors inβ = (βk0−, βlk0 , βk0+)0 have smaller magnitude by definition.
It is important to note thatπl1 is strongly identified. All strongly identified elements ofπ are included in this group in order to analyze them together with the strongly identified parametersβ and ζ. The semi-strongly identified and weakly-identified elements of π are analyzed using the sequential procedure outlined in Cheng (2015). If no elements ofπare strongly identified,l1 =∅
andπl1 disappears.
B.1.3 Concentrated Criterion Functions
The least squares estimator θˆn minimizesQn(θ) over θ ∈ Θ, whereΘ = B × Z ×Π. B = ×pj=1Bj whereBj forj = 1, . . . , p are compact sets, as areZ andΠ. We assume all true values
and parsimonious model counterparts inΘ∗ are in the interior of the optimization spaceΘ.
Proof of the consistency of the strongly and semi-strongly identified components of the esti- mator follows from sequential analysis in order of decreasing identification strength. In particular, we sequentially concentrate out parameters and analyze the concentrated criterion function
Qcn(πlk, πk+) = Qn( ˆψk−(πlk, πk+), πlk, πk+)
where ψk− = (β0, ζ0, π0
k−)0 collects the parameters that have been concentrated out, and the true
values of these parameters are denoted with the additional subscriptsψk−,n = (βn0, ζn0, π0
k−,n)0 and
ψk−,0 = (β00, ζ00, π0
Due to the mixed identification strength along differing subvectors ofπ, it becomes necessary to evaluate expansions around the points of sequential identification failure,βl0
k = 0andβ
0
k+ = 0, rather than the true values βlk,n = 0 andβk+,n = 0 as is commonly done Andrews and Cheng’s (2012a). We use the superscript0notation to define
ψk0−,n = (βk0−,n, βl00 k, β
00
k+, ζn0, πk0−,n)0
to be the parameter vector consisting of the concentrated out parameters evaluated at the point of sequential identification failure β0
lk = 0 and βk0+ = 0. Observe that the difference ψk−,n −
ψ0
k−,n = (00, βlk,n, βk+,n,00,00)0. This is done so that under our basic assumptions the centering termQn(ψk0−,n, πlk, πk+)does not depend on(π0lk, π0
k+)0.
B.2 Appendix: Limit Theory for Models with Mixed Identification Strength