Supplementary Material
Dynamic Linear Panel Regression Models
with Interactive Fixed Effects
Hyungsik Roger Moon
‡Martin Weidner
§August 10, 2015
S.1
Proof of Identification (Theorem 2.1)
Proof of Theorem 2.1. Let Q(β, λ, f) ≡ EkY − β·X − λ f0k2F λ 0, f0, w, where β ∈ RK, λ∈RN×R and f ∈RT×R. We have Q(β, λ, f) =EnTr (Y − β·X − λ f0)0(Y − β·X − λ f0) λ 0, f0, wo =EnTrh λ0f00−λf0−(β−β0)·X+e0 λ0f00−λf0−(β−β0)·X+ei λ 0 , f0, wo =EhTr (e0e) λ 0 , f0, wi +EnTrh λ0f00−λf0−(β−β0)·X0 λ0f00−λf0 −(β−β0)·Xi λ 0, f0, wo | {z } ≡Q∗(β,λ,f) .
In the last step we used Assumption ID(ii). Because EhTr (e0e)
λ
0
, f0, wi is independent of
β, λ, f, we find minimizing Q(β, λ, f) is equivalent to minimizing Q∗(β, λ, f). We decompose
‡Department of Economics and USC Dornsife INET, University of Southern California, Los Angeles, CA
90089-0253. Email: [email protected]. Department of Economics, Yonsei University, Seoul, Korea.
§Department of Economics, University College London, Gower Street, London WC1E 6BT, U.K., and
Q∗(β, λ, f) as follows Q∗(β, λ, f) =EnTrh λ0f00 −λf0−(β−β0)·X0 λ0f00−λf0−(β−β0)·Xi λ 0 , f0, wo =EnTrh λ0f00 −λf0−(β−β0)·X0M(λ,λ0,w) λ0f00−λf0−(β−β0)·X i λ 0 , f0, wo +EnTrh λ0f00−λf0−(β−β0)·X0P(λ,λ0,w) λ0f00−λf0−(β−β0)·X i λ 0, f0, wo
=EnTrh (βhigh−β0,high)·Xhigh
0
M(λ,λ0,w) (βhigh−β0,high)·Xhigh
i λ 0, f0, wo | {z } ≡Qhigh(βhigh,λ) +EnTrh λ0f00−λf0−(β−β0)·X0 P(λ,λ0,w) λ0f00−λf0−(β−β0)·X i λ 0, f0, wo | {z } ≡Qlow(β,λ,f) ,
where (βhigh−β0,high)·Xhigh = PK
m=K1+1(βm −β
0
m)Xm. A lower bound on Qhigh(β
high , λ) is given by Qhigh(βhigh, λ) ≥ min e λ∈RN×(R+R+rank(w)) E n Tr h
(βhigh−β0,high)·Xhigh0M(eλ,λ,w) (βhigh−β0,high)·Xhigh
i λ 0 , f0, w o = min(N,T) X r=R+R+rank(w)
µrnEh (βhigh−β0,high)·Xhigh
(βhigh−β0,high)·Xhigh
0 λ
0, f0, wio.
(S.1.1) Because Q∗(β, λ, f), Qhigh(βhigh
, λ), and Qlow(β, λ, f), are expectations of traces of positive
semi-definite matrices we have Q∗(β, λ, f) ≥ 0, Qhigh(βhigh
, λ) ≥ 0, and Qlow(β, λ, f) ≥ 0 for
all β, λ, f. Let ¯β, ¯λ and ¯f be the parameter values that minimize Q(β, λ, f), and thus also
Q∗(β, λ, f). Because Q∗(β0, λ0, f0) = 0 we have Q∗(¯β,λ,¯ f¯) = min
β,λ,fQ∗(β, λ, f) = 0. This implies Qhigh(¯βhigh,λ¯) = 0 and Qlow(¯β,λ,¯ f¯) = 0. Assumption ID(v), the lower bound (S.1.1), and Qhigh(¯βhigh,λ¯) = 0 imply ¯βhigh=β0,high. Using this, we find
Qlow(¯β,λ,¯ f¯) =E Tr
λ0f00 −¯λf¯0 −(¯βlow−β0,low)·Xlow
0
λ0f00−λ¯f¯0−(¯βlow−β0,low)·Xlow
λ 0 , f0, w , ≥min f E Tr
λ0f00−λf¯ 0−(¯βlow−β0,low)·Xlow
0
λ0f00−¯λf0−(¯βlow−β0,low)·Xlow
λ 0 , f0, w =E Tr
λ0f00 −(¯βlow−β0,low)·Xlow
0
M¯λλ0f00−(¯βlow−β0,low)·Xlow
λ 0 , f0, w , (S.1.2)
where (¯βlow−β0,low)·Xlow=PKl=11(¯βl−β
0
l)Xl. BecauseQlow(¯β,λ,¯ f¯) = 0 and the last expression in (S.1.2) is non-negative we must have
E
Tr
λ0f00−(¯βlow−β0,low)·Xlow
0
M¯λλ0f00−(¯βlow−β0,low)·Xlow
λ 0 , f0, w = 0.
UsingM¯λ =M¯λM¯λ and the cyclicality of the trace we obtain from the last equality: Tr M¯λAM¯λ = 0, where A = E λ0f00−(¯βlow−β0,low)·X low λ0f00−(¯β low −β0,low)·Xlow 0 λ 0, f0, w . The trace of a positive semi-definite matrix is only equal to zero if the matrix itself is equal to zero, so we find
M¯λAM¯λ = 0,
This together with the fact that A itself is positive semi definite implies (note that A positive semi-definite impliesA=CC0 for some matrixC, andM¯λAM¯λ = 0 then impliesM¯λC= 0, i.e.,
C =P¯λC)
A=P¯λAP¯λ,
and therefore rank(A)≤rank(P¯λ)≤R. We have thus shown rank
E
λ0f00 −(¯βlow−β0,low)·Xlow λ0f00−(¯βlow−β0,low)·Xlow
0 λ 0 , f0, w ≤R. We furthermore find R≥rank E
λ0f00−(¯βlow−β0,low)·Xlow λ0f00−(¯β low −β0,low)·Xlow 0 λ 0 , f0, w ≥rank MwE
λ0f00−(¯βlow−β0,low)·Xlow
Pf0
λ0f00−(¯βlow−β0,low)·Xlow
0 Mw λ 0 , f0, w + rank PwE
λ0f00−(¯βlow−β0,low)·Xlow
Mf0
λ0f00 −(¯βlow−β0,low)·Xlow
0 Pw λ 0 , f0, w ≥rank Mwλ0f00f0λ00Mw + rank E
(¯βlow−β0,low)·Xlow
Mf0
(¯βlow−β0,low)·Xlow
0 λ 0 , f0, w .
Assumption ID(iv) guarantees rank Mwλ0f00f0λ00Mw
= rank λ0f00f0λ00 = R, that is, we must have E
(¯βlow−β0,low)·Xlow
Mf0
(¯βlow−β0,low)·Xlow
0 λ 0, f0, w = 0.
According to Assumption ID(iii) this implies ¯βlow = β0,low, i.e., we have ¯β = β0. This also implies Q∗(¯β,λ,¯ f¯) =kλ0f00−¯λf¯0k2
F = 0, and therefore ¯λf¯0 =λ
0 f00.
S.2
Examples of Error Distributions
The following Lemma provides examples of error distributions that satisfykek=Op(
p
max(N, T)) asN, T → ∞. Example (i) is particularly relevant for us, because those assumptions oneit are imposed in Assumption 5 in the main text, i.e., under those main text assumptions we indeed havekek=Op(
p
max(N, T)).
Lemma S.2.1. For each of the following distributional assumptions on the errors eit, i = 1, . . . , N, t= 1, . . . , T, we have kek=Op(
p
max(N, T)).
(i) The eit are independent across i and t, conditional on C, and satisfy E(eit|C) = 0, and
E(e4it|C) is bounded uniformly by a non-random constant, uniformly over i, t and N, T. Here C can be any conditioning sigma-field, including the empty one (corresponding to unconditional expectations).
(ii) The eit follow different MA(∞) processes for each i, namely
eit=
∞
X
τ=0
ψiτui,t−τ , for i= 1. . . N, t= 1. . . T , (S.2.1)
where the uit, i = 1. . . N, t = −∞. . . T are independent random variables with Euit = 0 and Eu4
it uniformly bounded across i, t and N, T. The coefficients ψiτ satisfy
∞ X τ=0 τ max i=1...N ψ 2 iτ < B , ∞ X τ=0 max i=1...N|ψiτ| < B , (S.2.2) for a finite constant B which is independent of N and T.
(iii) The error matrix e is generated as e = σ1/2uΣ1/2, where u is an N × T matrix with independently distributed entriesuit andEuit= 0,Eu2it = 1, and Eu4it is bounded uniformly across i, t and N, T. Here σ is the N ×N cross-sectional covariance matrix, and Σ is the
T ×T time-serial covariance matrix, and they satisfy
max j=1...N N X i=1 |σij| < B , max τ=1...T T X t=1 |Σtτ| < B , (S.2.3)
for some finite constant B which is independent of N and T. In this example we have
Proof of Lemma S.2.1, Example (i). Latala (2005) showed that for aN×T matrix ewith independent entries, conditional on C, we have
E kekC ≤ c max i " X t E e2it C #1/2 + max j " X i E e2it C #1/2 + " X i,t E e4it C #1/4 ,
where c is some universal constant. Because we assumed uniformly bounded 4th conditional moments foreit we thus havekek=OP(
√ T) +OP( √ N) +OP((T N)1/4) = Op( p max(N, T)).
Example (ii). Let ψj = (ψ1j, . . . , ψN j) be an N ×1 vector for each j ≥ 0. Let U−j be an
N×T sub-matrix of (uit) consisting ofuit,i= 1. . . N, t= 1−j, . . . , T −j. We can then write equation (S.2.1) in matrix notation as
e= ∞ X j=0 diag(ψj)U−j = T X j=0 diag(ψj)U−j +rN T,
where we cut the sum atT, which results in the remainderrN T =
P∞
j=T+1 diag(ψj)U−j. When approximating an MA(∞) by a finite MA(T) process we have for the remainder
E(krN TkF) 2 = N X i=1 T X t=1 E(rN T) 2 ij ≤ σ 2 u N X i=1 T X t=1 ∞ X j=T+1 ψ2ij ≤σ2uN T ∞ X j=T+1 max i ψ 2 ij ≤σ2uN ∞ X j=T+1 j max i ψ 2 ij , where σ2
u is the variance of uit. Therefore, for T → ∞ we have
E (krN TkF)
2
N
!
−→ 0,
which implies (krN TkF)2 =Op(N), and thereforekrN Tk ≤ krN TkF =Op(
√ N).
Let V be the N ×2T matrix consisting of uit, i= 1. . . N, t = 1−T, . . . , T. For j = 0. . . T the matricesU−j are sub-matrices ofV, and thereforekU−jk ≤ kVk. From example (i) we know
kVk=Op(pmax(N,2T)). Furthermore, we knowkdiag(ψj)k ≤maxi ψij
Combining these results we find kek ≤ T X j=0 kdiag(ψj)k kU−jk+krN Tk ≤ T X j=0 max i ψij kVk+op( √ N) ≤ " ∞ X j=0 max i ψij # Op(pmax(N,2T)) +op(√N) ≤ Op( p max(N, T)),
as required for the proof.
Example (iii). Because σ and Σ are positive definite, there exits a symmetric N ×N matrix
φ and a symmetric T ×T matrix ψ such thatσ =φ2 and Σ =ψ2. The error term can then be generated as e=φuψ, where uis anN ×T matrix with iid entriesuit such that E(uit) = 0 and
E(u4it)<∞. Given this definition ofewe immediately haveEeit = 0 andEeitejτ =σijΣtτ. What is left to show iskek=Op(
p
max(N, T)). From example (i) we know kuk=Op(pmax(N, T)). Using the inequality kσk ≤pkσk1kσk∞ =kσk1, where kσk1 =kσk∞ because σ is symmetric
we find kσk ≤ kσk1 ≡ max j=1...N N X i=1 |σij| < L ,
and analogously kΣk < L. Because kσk = kφk2 and kΣk = kψk2, we thus find kek ≤ kφkkukkψk ≤LOp(
p
max(N, T)), i.e., kek=Op(
p
max(N, T)).
S.3
Comments on Assumption 4 on the regressors
Consistency of the LS estimator βb requires the regressors not only satisfy the standard
non-collinearity condition in assumption 4(i), but also the additional conditions on high- and low-rank regressors in assumption 4(ii). Bai (2009) considers the special cases of only high-low-rank and only low-rank regressors. As low-rank regressors he considers only cross-sectional invari-ant and time-invariinvari-ant regressors, and he shows that if only these two types of regressors are present, one can show consistency under the assumption plimN,T→∞WN T > 0 on the re-gressors (instead of assumption 4), where WN T is the K ×K matrix defined by WN T,k1k2 =
(N T)−1Tr(M
f0Xk0
objective expansion in theorem 4.1, i.e., the condition plimN,T→∞WN T >0 is very natural in the context of the interactive fixed effect models, and one may wonder whether also for the general case one can replace assumption 4 with this weaker condition and still obtain consistency of the LS estimator. Unfortunately, this is not the case, and below we present two simple counter examples that show this.
(i) Let there only be one factor (R = 1) f0
t with corresponding factor loadings λ
0
i. Let there only be one regressor (K = 1) of the form Xit =wivt+λ0ift0. Assume the N ×1 vector
w= (w1, . . . , wN)0, and theT ×1 vector v = (v1, . . . , vN)0 are such that theN×2 matrix Λ = (λ0, w) and and the T ×2 matrix F = (f0, v) satisfy plimN,T→∞(Λ0Λ/N) > 0, and
plimN,T→∞(F0F/T)>0. In this case, we have WN T = (N T)−1Tr(Mf0vw0Mλ0wv0), and
therefore plimN,T→∞WN T = plimN,T→∞(N T)−1Tr(Mf0vw0Mλ0wv0) > 0. However, β is
not identified becauseβ0X+λ0f00 = (β0+ 1)X−wv0, i.e., it is not possible to distinguish
(β, λ, f) = (β0, λ0, f0) and (β, λ, f) = (β0+ 1,−w, v). This implies that the LS estimator
is not consistent (both β0 and β0+ 1 could be the true parameter, but the LS estimator cannot be consistent for both).
(ii) Let there only be one factor (R = 1)ft0with corresponding factor loadingsλ0i. Let theN×
1 vectorsλ0,w1andw2be such that Λ = (λ0, w1, w2) satisfies plimN,T→∞(Λ0Λ/N)>0. Let
theT×1 vectorsf0,v
1 andv2 be such thatF = (f0, v1, v2) satisfies plimN,T→∞(F0F/T)>
0. Let there be four regressors (K = 4) defined by X1 = w1v01, X2 = w2v20, X3 = (w1 + λ0)(v2+f0)0,X4 = (w2+λ0)(v1+f0)0. In this case, one can easily check plimN,T→∞WN T > 0. However, again βk is not identified, because P4
k=1β 0
kXk+λ0f00 =P4k=1(β0k+ 1)Xk− (λ0 +w1 +w2)(f00 +v1 +v2)0, i.e., we cannot distinguish between the true parameters
and (β, λ, f) = (β0 + 1, −λ0−w1 −w2, f00 +v1 +v2). Again, as a consequence the LS
estimator is not consistent in this case.
In example (ii), there are only low-rank regressors with rank(Xl) = 1. One can easily check assumption 4 is not satisfied for this example. In example (i) the regressor is a low-rank regressor with rank(X) = 2. In our present version of assumption 4 we only consider low-rank regressors with rank(X) = 1, but (as already noted in a footnote in the main paper) it is straightforward to extend the assumption and the consistency proof to low-rank regressors with rank larger than one. Independent of whether we extend the assumption or not, the regressor X of example (i) fails to satisfy assumption 4. This justifies our formulation of assumption 4, because it shows in general the assumption cannot be replaced by the weaker condition plimN,T→∞WN T >0.
S.4
Some Matrix Algebra (including Proof of Lemma A.1)
The following statements are true for real matrices (throughout the whole paper and supplemen-tary material we never use complex numbers anywhere). Let A be an arbitrary n×m matrix. In addition to the operator (or spectral) norm kAk and to the Frobenius (or Hilbert-Schmidt) norm kAkF, it is also convenient to define the 1-norm, the ∞-norm, and the max-norm by
kAk1 = max j=1...m n X i=1 |Aij| , kAk∞ = max i=1...n m X j=1
|Aij| , kAkmax = max
i=1...n jmax=1...m |Aij| .
Lemma S.4.1 (Some useful inequalities). Let A be an n×m matrix, B be an m×p matrix, and C and D be n×n matrices. Then we have:
(i) kAk ≤ kAkF ≤ kAk rank (A)1/2 ,
(ii) kABk ≤ kAk kBk ,
(iii) kABkF ≤ kAkFkBk ≤ kAkF kBkF ,
(iv) |Tr(AB)| ≤ kAkFkBkF , for n=p, (v) |Tr (C)| ≤ kCkrank (C) ,
(vi) kCk ≤Tr (C) , for C symmetric and C ≥0, (vii) kAk2 ≤ kAk
1kAk∞ ,
(viii) kAkmax ≤ kAk ≤ √
nmkAkmax,
(ix) kA0CAk ≤ kA0DAk, for C symmetric and C ≤D. For C, D symmetric, and i= 1, . . . , n we have:
(x) µi(C) +µn(D) ≤ µi(C+D) ≤ µi(C) +µ1(D),
(xi) µi(C)≤ µi(C+D), for D≥0,
(xii) µi(C)− kDk ≤ µi(C+D) ≤ µi(C) +kDk.
Proof. Here we use notation si(A) for theith largest singular value of a matrix A. (i) We have kAk= s1(A), and kAk2
F =
Prank(A)
i=1 (si(A))
2. The inequalities follow directly from
this representation. (ii) This inequality is true for all unitarily invariant norms, see, e.g., Bhatia (1997). (iii) can be shown as follows
kABk2F = Tr(ABB0A0) = Tr[kBk2AA0− A(kBk2 I−BB0)A0] ≤ kBk2Tr(AA0 ) =kBk2 kAk2 F ,
where we used A(kBk2
I−BB0)A0 is positive definite. Relation (iv) is just the Cauchy Schwarz
inequality. To show (v) we decomposeC =U DO0 (singular value decomposition), whereU and
O are n×rank(C) that satisfyU0U =O0O =Iand Dis a rank(C)×rank(C) diagonal matrix with entries si(C). We then have kOk=kUk= 1 andkDk=kCk and therefore
|Tr(C)|=|Tr(U DO0)|=|Tr(DO0U)| = rank(C) X i=1 η0iDO0U ηi ≤ rank(C) X i=1
kDkkO0kkUk= rank(C)kCk.
For (vi) let e1 be a vector that satisfies ke1k = 1 and kCk = e01Ce1. Because C is symmetric
such an e1 has to exist. Now choose ei, i = 2, . . . , n, such that ei, i = 1, . . . , n, becomes a orthonormal basis of the vector space of n×1 vectors. Because C is positive semi definite we then have Tr (C) = P
ie
0
iCei ≥ e1Ce1 =kCk, which is what we wanted to show. For (vii) we
refer to Golub and van Loan (1996), p.15. For (viii) letebe the vector that satisfieskek= 1 and
kA0CAk = e0A0CAe. Because A0CA is symmetric such an e has to exist. Because C ≤ D we
then have kCk= (e0A0)C(Ae)≤(e0A0)D(Ae)≤ kA0DAk. This is what we wanted to show. For
inequality (ix) lete1 be a vector that satisfiedke1k= 1 andkA0CAk=e01A
0CAe 1. Then we have kA0CAk = e0 1A 0DAe 1 −e01A 0(D−C)Ae 1 ≤ e01A 0DAe
1 ≤ kA0DAk. Statement (x) is a special
case of Weyl’s inequality, see, e.g., Bhatia (1997). The inequalities (xi) and (xii) follow directly from (ix) because µn(D)≥0 forD≥0, and because −kDk ≤µi(D)≤ kDk for i= 1, . . . , n.
Definition S.4.2. Let A be an n×r1 matrix and B be an n×r2 matrix with rank(A) = r1
and rank(B) = r2. The smallest principal angle θA,B ∈ [0, π/2] between the linear subspaces span(A) = {Aa|a∈Rr1} and span(B) = {Bb|b ∈
Br2} of Rn is defined by
cos(θA,B) = max
0=6 a∈Rr10=6maxb∈Rr2
a0A0Bb kAakkBbk.
Lemma S.4.3. Let A be an n×r1 matrix and B be an n×r2 matrix with rank(A) = r1 and
rank(B) =r2. Then we have the following alternative characterizations of the smallest principal
angle between span(A) and span(B)
sin(θA,B) = min
06=a∈Rr1 kMBA ak kA ak = min 06=b∈Rr2 kMAB bk kB bk .
Proof. Because kMBA ak2 + kPBA ak2 = kA ak2 and sin(θA,B)2 + cos(θA,B)2 = 1, we find proving the theorem is equivalent to proving
cos(θA,B) = min
06=a∈Rr1
kPBA ak
kA ak = min06=b∈Rr2
kPAB bk
kA bk .
This last statement is theorem 8 in Galantai and Hegedus (2006), and the proof can be found there.
Proof of Lemma A.1. Let
S1(Z) = min f,λ Tr [(Z−λf 0 ) (Z0−f λ0)] , S2(Z) = min f Tr(Z MfZ 0 ), S3(Z) = min λ Tr(Z 0 MλZ), S4(Z) = min ˜ λ,f˜ Tr(MeλZ MfeZ0), S5(Z) = T X i=R+1 µi(Z0Z), S6(Z) = N X i=R+1 µi(ZZ0).
The theorem claims
S1(Z) = S2(Z) = S3(Z) = S4(Z) = S5(Z) = S6(Z).
We find:
(i) The non-zero eigenvalues of Z0Z and ZZ0 are identical, so in the sums in S5(Z) and in S6(Z) we are summing over identical values, which shows S5(Z) = S6(Z).
(ii) Starting with S1(Z) and minimizing with respect to f we obtain the first-order condition λ0Z =λ0λ f0 .
Putting this into the objective function we can integrate out f, namely Tr(Z−λf0)0(Z−λf0)= Tr (Z0Z−Z0λf0)
= Tr Z0Z−Z0λ(λ0λ)−1(λ0λ)f0
= Tr Z0Z−Z0λ(λ0λ)−1(λ0λ)λ0Z
= Tr (Z0MλZ) .
(iii) Let Mbλ be the projector on the N −R eigenspaces corresponding to the N −R smallest eigenvalues1 of ZZ0, let P
b
λ =IN −Mbλ, and let ωR be the R’th largest eigenvalue of ZZ
0.
We then know the matrixP
b λ[ZZ 0−ω RIN]Pbλ−Mλb[ZZ 0−ω RIN]Mλbis positive semi-definite.
Thus, for an arbitrary N ×R matrix λ with corresponding projector Mλ we have
0≤Tr n Pλb[ZZ0−ωRIN]Pλb−Mλb[ZZ 0− ωRIN]Mbλ Mλ−Mbλ 2o = Tr Pbλ[ZZ0−ωRIN]Pbλ+Mλb[ZZ 0− ωRIN]Mbλ Mλ−Mbλ = Tr [Z0MλZ]−Tr Z0MbλZ+ωR rank(Mλ)−rank(Mbλ) ,
and because rank(Mλb) =N −R and rank(Mλ)≤N −R we have TrZ0MbλZ ≤Tr [Z0MλZ] .
This showsMbλis the optimal choice in the minimization problem ofS3(Z), i.e., the optimal λ=bλis chosen such that the span of the N-dimensional vectorsbλr(r = 1. . . R) equals to
the span of the R eigenvectors that correspond to theR largest eigenvalues of ZZ0. This showsS3(Z) = S6(Z). Analogously one can show S2(Z) =S5(Z).
(iv) In the minimization problem in S4(Z) we can choose eλ such that the span of the N
-dimensional vectors eλr (r = 1. . . R1) is equal to the span of the R1 eigenvectors that correspond to the R1 largest eigenvalues of ZZ0. In addition, we can choose fesuch that
the span of the T-dimensional vectors fer (r = 1. . . R2) is equal to the span of the R2
eigenvectors that correspond to the (R1+ 1)-largest up to theR-largest eigenvalue ofZ0Z.
With this choice of eλ and fewe actually project out all the R largest eigenvalues of Z0Z
and ZZ0. This shows that S4(Z) ≤ S5(Z). (This result is actually best understood by
using the singular value decomposition of Z.) We can write M e λZ Mfe=Z −Ze, where e Z =P e λZ Mfe+Z Pfe.
Because rank(Z) ≤ rank(P
e
λZ Mfe) + rank(Z Pfe) = R1 +R2 = R, we can always write 1If an eigenvalue has multiplicitym, we count itm times when finding theN −R smallest eigenvalues. In
e
Z =λf0 for some appropriateN ×R and T ×R matrices λ and f. This shows that
S4(Z) = min ¯ λ,f¯ Tr(M e λZ MfeZ 0 ) ≥ min {Ze: rank(Ze)≤R} Tr((Z −Ze)(Z−Ze)0) = min f,λ Tr [(Z−λf 0 ) (Z0−f λ0)] =S1(Z).
Thus we have shown here S1(Z) ≤S4(Z)≤ S5(Z), and this holds with equality because S1(Z) =S5(Z) was already shown above.
S.5
Supplement to the Consistency Proof (Appendix A)
Lemma S.5.1. Under assumptions 1 and 4 there exists a constant B0 > 0 such that for the
matrices w and v introduced in assumption 4 we have
w0Mλ0w − B0w0w≥0, wpa1, v0Mf0v − B0v0v ≥0, wpa1.
Proof. We can decomposew=wew¯, whereweis anN×rank(w) matrix and ¯wis a rank(w)×K1
matrix. Note we has full rank, and Mw =Mwe.
By assumption 1(i) we know λ00λ0/N has a probability limit, i.e., there exists some B1 >0
such that λ00λ0/N < B1IR wpa1. Using this and assumption 4 we find for any R×1 vector
a6= 0 we have kMvλ0ak2 kλ0ak2 = a0λ00Mvλ0a a0λ00 λ0a > B B1 , wpa1.
Applying Lemma S.4.3 we find min 06=b∈Rrank(w) b0we0Mλ0 e w b b0 e w0 e w b = 06=mina∈RR a0λ00Mwλ0a a0λ00 λ0a > B B1 , wpa1.
Therefore we find for every rank(w)×1 vector b that b0(we0Mλ0
e w−(B/B1)we 0 e w)b > 0, wpa1. Thus we0Mλ0 e w − (B/B1)we 0 e
w > 0, wpa1. Multiplying from the left with ¯w0 and from the right with ¯w we obtain w0Mλ0w−(B/B1)w0w ≥ 0, wpa1. This is what we wanted to show.
Analogously we can show the statement for v.
As a consequence of the this lemma we obtain some properties of the low-rank regressors summarized in the following lemma.
Lemma S.5.2. Let the assumptions 1 and 4 be satisfied and letXlow,α=
PK1
l=1αlXl be a linear combination of the low-rank regressors. Then there exists some constant B >0 such that
min {α∈RK1,kαk=1} Xlow,αMf0Xlow0 ,α N T > B , wpa1, min {α∈RK1,kαk=1} Mλ0Xlow,αMf0Xlow0 ,αMλ0 N T > B , wpa1. Proof. Note Mλ0Xlow,αMf0Xlow0 ,αMλ0
≤ Xlow,αMf0Xlow0 ,α , because kMλ0k = 1, i.e., if
we can show the second inequality of the lemma we have also shown the first inequality. We can write Xlow,α = wdiag(α0)v0. Using Lemma S.5.1 and part (v), (vi) and (ix) of Lemma S.4.1 we find Mλ0Xlow,αMf0Xlow0 ,αMλ0 =kMλ0wdiag(α0)v0Mf0vdiag(α0)w0Mλ0k ≥B0 kMλ0wdiag(α0)v0 vdiag(α0)w0Mλ0k ≥ B0 K1 Tr [Mλ0wdiag(α0)v0 vdiag(α0)w0Mλ0] = B0 K1Tr [vdiag(α 0 )w0Mλ0wdiag(α0)v0] ≥ B0 K1 kvdiag(α0)w0Mλ0wdiag(α0)v0k ≥ B 2 0 K1 kvdiag(α0)w0wdiag(α0)v0k ≥ B 2 0 K2 1 Tr [vdiag(α0)w0wdiag(α0)v0] = B 2 0 K2 1 TrXlow,αXlow0 ,α .
Thus we have Mλ0Xlow,αMf0Xlow0 ,αMλ0
/(N T) ≥ (B0/K1)2α0WN Tlowα , where the K1 ×K1
matrixWN Tlowis defined byWN T,llow1l2 = (N T)−1Tr Xl1X
0
l2
, i.e., it is a submatrix ofWN T. Because
WN T and thusWN Tlowconverges to a positive definite matrix the lemma is proven by the inequality above.
Using the above lemmas we can now prove the lower bound on Se
(2)
N T(β, f) that was used in the consistency proof. Remember
e SN T(2)(β, f) = 1 N T Tr " λ0f00+ K X k=1 (β0k−βk)Xk ! Mf λ0f00 + K X k=1 (β0k−βk)Xk !0 P(λ0,w) # .
We want to show under the assumptions of theorem 3.1 there exist finite positive constants a0, a1, a2, a3 and a4 such that
e SN T(2)(β, f)≥ a0 βlow−β0,low 2 βlow−β0,low 2 +a1 βlow−β0,low +a2 −a3 βhigh−β0,high −a4 βhigh−β0,high βlow−β0,low , wpa1.
Proof of the lower bound on Se
(2)
N T(β, f). Applying Lemma A.1 and part (xi) of Lemma S.4.1 we find e SN T(2)(β, f)≥ 1 N T µR+1 " λ0f00+ K X k=1 (β0k−βk)Xk !0 P(λ0,w) λ0f00+ K X k=1 (β0k−βk)Xk !# = 1 N T µR+1 " λ0f00+ K1 X l=1 (β0l −βl)wlvl0 !0 λ0f00+ K1 X l=1 (β0l −βl)wlvl0 ! + λ0f00+ K1 X l=1 (β0l −βl)wlv0l !0 P(λ0,w) K X m=K1 (β0m−βm)Xm + K X m=K1 (β0m−βm)Xm0 P(λ0,w) λ0f00 + K1 X l=1 (β0l −βl)wlvl0 ! + K X m=K1 (β0m−βm)Xm0 P(λ0,w) K X m=K1 (β0m−βm)Xm # ≥ 1 N T µR+1 " λ0f00+ K1 X l=1 (β0l −βl)wlvl0 !0 λ0f00+ K1 X l=1 (β0l −βl)wlvl0 ! + λ0f00+ K1 X l=1 (β0l −βl)wlv0l !0 P(λ0,w) K X m=K1 (β0m−βm)Xm + K X m=K1 (β0m−βm)Xm0 P(λ0,w) λ0f00 + K1 X l=1 (β0l −βl)wlvl0 ! # ≥ 1 N T µR+1 λ 0 f00+ K1 X l=1 (β0l −βl)wlv0l !0 λ0f00 + K1 X l=1 (β0l −βl)wlvl0 ! −a3 βhigh−β0,high −a4 βhigh−β0,high βlow−β0,low , wpa1,
Lemma S.4.1 and the fact that 1 N T K X m=K1 (β0m−βm)Xm0 P(λ0,w) λ0f00 + K1 X l=1 (β0l −βl)wlvl0 ! ≤K βhigh−β0,high max m Xm √ N T λ0f00 √ N T +K βlow−β0,low max l wlvl0 √ N T .
Our assumptions guarantee the operator norms of λ0f00/√N T and Xm/
√
N T are bounded from above as N, T → ∞, which results in finite constants a3 and a4.
We write the above result as Se
(2) N T(β, f)≥µR+1(A0A)/(N T) + terms containingβ high , where we defined A = λ0f00 +PK1 l=1(β 0
l −βl)wlvl0. We also write A = A1 +A2 +A3, with A1 = MwA Pf0 =Mwλ0f00,A2 =PwA Mf0 =PK1 l=1(β 0 l −βl)wlvl0Mf0, A3 =PwA Pf0 =Pwλ0f00 + PK1 l=1(β 0 l −βl)wlv0lPf. We then find A0A =A01A1+ (A02+A 0 3)(A2+A3) and A0A ≥ A0A−(a1/2A03+a−1/2A02)(a1/2A3+a−1/2A2) = [A01A1−(a−1)A03A3] + (1−a−1)A02A2 ,
where≥ for matrices refers to the difference being positive definite, anda is a positive number. We choose a= 1 +µR(A01A1)/(2kA3k2). The reason for this choice becomes clear below.
Note [A01A1−(a−1)A03A3] has at most rank R (asymptotically it has exactly rank R).
The non-zero eigenvalues of A0A are therefore given by the (at most) R non-zero eigenvalues of [A01A1−(a−1)A30A3] and the non-zero eigenvalues of (1−a−1)A02A2, the largest one of the latter being given given by the operator norm (1−a−1)kA2k2. We therefore find
1 N T µR+1(A 0 A)≥ 1 N T µR+1 (A01A1−(a−1)A30A3) + (1−a−1)A02A2 ≥ 1 N T min (1−a−1)kA2k2 , µR[A 0 1A1−(a−1)A03A3] .
Using Lemma S.4.1(xii) and our particular choice of a we find
µR [A01A1−(a−1)A03A3]≥ µR(A 0 1A1)− k(a−1)A03A3k = 1 2µR(A 0 1A1). Therefore 1 N T µR+1(A 0 A)≥ 1 2N T µR(A 0 1A1) min 1, 2kA2k 2 2kA3k2+µR(A01A1) ≥ 1 N T kA2k2µR(A 0 1A1) 2kAk2+µ R(A01A1) ,
where we used kAk ≥ kA3k and kAk ≥ kA2k.
Our assumptions guarantee there exist positive constants c0,c1, c2, and c3 such that kAk √ N T ≤ kλ0f00k √ N T + K1 X l=1 |β0 l −βl| kw√lvl0k N T ≤c0+c1 βlow−β0,low , wpa1, µR(A01A1) N T = µR f0λ00Mwλ0f00 N T ≥c2, wpa1, kA2k2 N T =µ1 " K1 X l1=1 (β0l1 −βl1)wl1v 0 l1Mf0 K1 X l2=1 (β0l2 −βl2)vl2w 0 l2 # ≥c3 βlow−β0,low 2 , wpa1,
were for the last inequality we used Lemma S.5.2. We thus have 1 N T µR+1(A 0 A)≥ c3 βlow−β0,low 2 1 + c2 2 c0+c1 βlow−β0,low 2 , wpa1. Defining a0 = c22cc23 1 , a1 = 2cc10 and a2 = 2cc22 1 we thus obtain 1 N T µR+1(A 0 A)≥ a0 βlow−β0,low 2 βlow−β0,low 2 +a1 βlow−β0,low +a2 , wpa1,
i.e., we have shown the desired bound on Se
(2)
N T(β, f).
S.6
Regarding the Proof of Corollary 4.2
As discussed in the main text, the proof of Corollary 4.2 is provided in Moon and Weidner (2015). All that is left to show here is the matrix WN T = WN T(λ0, f0, Xk) does not become singular as N, T → ∞ under our assumptions.
Proof. Remember WN T = 1 N TTr(Mf0X 0 k1Mλ0Xk2).
The smallest eigenvalue of the symmetric matrix W(λ0, f0, X k) is given by µK(WN T) = min {a∈RK, a6=0} a0WN T a kak2 = min {a∈RK, a6=0} 1 N T kak2Tr " Mf0 K X k1=1 ak1X 0 k1 ! Mλ0 K X k2=1 ak2Xk2 !# = min {α∈RK1, ϕ∈RK2 α6= 0, ϕ6= 0}
TrMf0 Xlow0 ,ϕ+Xhigh0 ,α Mλ0 (Xlow,ϕ+Xhigh,α) N T (kαk2+kϕk2) ,
where we decomposeda= (ϕ0, α0)0, withϕandαbeing vectors of lengthK1andK2, respectively,
and we defined linear combinations of high- and low-rank regressors2
Xlow,ϕ = K1 X l=1 ϕlXl , Xhigh,α= K X m=K1+1 αmXm . We haveMλ0 =M(λ0,w)+P(M
λ0w), where w is theN ×K1 matrix defined in assumption 4, i.e.,
(λ0, w) is an N ×(R+K1) matrix, whereas Mλ0w is also an N ×K1 matrix. Using this we
obtain µK(WN T) = min {ϕ∈RK1, α∈RK2 ϕ6= 0, α6= 0} 1 N T (kϕk2+kαk2) TrMf0 Xlow0 ,ϕ+Xhigh0 ,α M(λ0,w) (Xlow,ϕ+Xhigh,α) + Tr Mf0 Xlow0 ,ϕ+Xhigh0 ,α P(M λ0w) (Xlow,ϕ+Xhigh,α) = min {ϕ∈RK1, α∈RK2 ϕ6= 0, α6= 0} 1 N T (kϕk2+kαk2) Tr Mf0Xhigh0 ,αM(λ0,w)Xhigh,α + TrMf0 Xlow0 ,ϕ+Xhigh0 ,α P(Mλ0w) (Xlow,ϕ+Xhigh,α) . (S.6.1)
2As in assumption 4 the components ofαare denotedα
We note there exists finite positive constants c1, c2, and c3 such that 1 N TTr Mf0Xhigh0 ,αM(λ0,w)Xhigh,α ≥ c1kαk2 , wpa1, 1 N TTr Mf0 Xlow0 ,ϕ+Xhigh0 ,α P(Mλ0w) (Xlow,ϕ+Xhigh,α) ≥0, 1 N TTr Mf0Xlow0 ,ϕP(M λ0w)Xlow,ϕ ≥ c2kϕk2 , wpa1, 1 N TTr Mf0Xlow0 ,ϕP(M λ0w)Xhigh,α ≥ −c3 2 kϕkkαk, wpa1, 1 N TTr Mf0Xhigh0 ,αP(M λ0w)Xhigh,α ≥0, (S.6.2)
and we want to justify these inequalities now. The second and the last equation in (S.6.2) are true because, e.g., TrMf0Xhigh0 ,αP(M
λ0w)Xhigh,α = TrMf0Xhigh0 ,αP(M λ0w)Xhigh,αMf0 , and the trace of a symmetric positive semi-definite matrix is non-negative. The first inequality in (S.6.2) is true because rank(f0) + rank(λ0
, w) = 2R+K1 and using Lemma A.1 and assumption
4 we have 1 N Tkαk2Tr Mf0Xhigh0 ,αM(λ0,w)Xhigh,α≥ 1 N Tkαk2µ2R+K1+1 Xhigh,αXhigh0 ,α > b , wpa1,
i.e., we can setc1 =b. The third inequality in (S.6.2) is true because according Lemma S.4.1(v)
we have 1 N TTr Mf0Xlow0 ,ϕP(M λ0w)Xhigh,α ≥ − K1 N T kXlow,ϕk kXhigh,αk ≥ − K1 N T kXlow,ϕkFkXhigh,αkF ≥ −K1K1K2kϕk kαk max k1=1...K1 Xk1 √ N T F max k2=K1+1...K Xk2 √ N T F ≥ −c3 2 kϕk kαk, where we used that assumption 4 implies
Xk/ √ N T F
< C holds wpa1 for some constant C
as, and we set c3 =K1K1K2C2. Finally, we have to argue that the third inequality in (S.6.2)
holds. Note Xlow0 ,ϕP(Mλ0w)Xlow,ϕ=X
0
low,ϕMλ0Xlow,ϕ, i.e., we need to show 1 N TTr Mf0Xlow0 ,ϕMλ0Xlow,ϕ ≥ c2kϕk2 .
Using part (vi) of Lemma S.4.1 we find 1 N TTr Mf0Xlow0 ,ϕMλ0Xlow,ϕ = 1 N TTr Mλ0Xlow,ϕMf0Xlow0 ,ϕMλ0 ≥ 1 N T Mλ0Xlow,ϕMf0Xlow0 ,ϕMλ0 ,
and according to Lemma S.5.2 this expression is bounded by some positive constant timeskϕk2
(in the lemma we havekϕk= 1, but all expressions are homogeneous inkϕk). Using the inequalities (S.6.2) in equation (S.6.1) we obtain
µK(WN T)≥ min {ϕ∈RK1, α∈RK2 ϕ6= 0, α6= 0} 1 kϕk2+kαk2 c1kαk2+ max 0, c2kϕk2−c3kϕkkαk ≥min c2 2, c1c22 c2 2+c23 , wpa1.
Thus, the smallest eigenvalue of WN T is bounded from below by a positive constant asN, T →
∞, i.e., WN T is non-degenerate and invertible.
S.7
Proof of Examples for Assumption 5
Proof of Example 1. We want to show the conditions of Assumption 5 are satisfied. Condi-tions (i)-(iii) are satisfied by the assumpCondi-tions of the example.
For condition (iv), notice Cov (Xit, Xis|C) = E(UitUis). Because |β0| <1 and supitE(e2it)<
∞, it follows 1 N T N X i=1 T X t,s=1 |Cov (Xit, Xis|C)| = 1 N T N X i=1 T X t,s=1 |E(UitUis)| = 1 N T N X i=1 T X t,s=1 ∞ X p,q=0 (β0)p+qE(eit−peis−q) <∞.
{eit} that we have for some finite constantM, 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXeis, eiuXeiv|C = 1 N T2 N X i=1 T X t,s,u,v=1 |Cov (eitUis, eiuUiv)| = 1 N T2 N X i=1 T X t,s,u,v=1 ∞ X p,q=0 β 0p+q E(eiteis−peiueiv−q)− β0 p E(eiteis−p) β0 q E(eiueiv−q) ≤ M T2 T X t,s,u,v=1 ∞ X p,q=0 β0 p+q [I{t=u}I{s−p=v−q}+I{t=v−q}I{s−p=u}] = M T2 T X t,u,s,v=1 s X k=−∞ v X l=−∞ β0 s−k+v−l I{t=u}I{k=l}+M 1 T T X s,u=1 s−u≥0 β0 s−u 1 T T X v,t=1 v−t≥0 β0 v−t = M T T X s,v=1 min{s,v} X k=−∞ β0 s+v−2k +M 1 T T X s,u=1 s−u≥0 β0 s−u 1 T T X v,t=1 v−t≥0 β0 v−t . Notice 1 T T X s,v=1 min{s,v} X k=−∞ β0 s+v−2k = 2 T T X s=2 s X v=1 v X k=−∞ β0 s−v+2(v−k) + 2 T T X s=1 s X k=−∞ β0 2(s−k) = 2 T T X s=2 s X v=1 β0 s−v ∞ X l=0 β0 2l + 2 T T X s=1 ∞ X l=0 β0 2l = 2 1− β0 2 1 T T X s=2 s X v=1 β0 s−v + 2 1− β0 2 = 2 1− β0 2 !T−1 X l=1 β0 l 1− l T + 2 1− β0 2 = O(1), and 1 T T X s,u=1 s−u≥0 β0 s−u = 1 T T X s=1 s X u=1 β0 s−u = T−1 X l=0 β0 l 1− l T =O(1).
Therefore, we have the desired result 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXeis, eiuXeiv|C =Op(1).
Preliminaries for Proof of Example 2
• Although we observeXit for 1 ≤t ≤T,here we treatZit = (eit, Xit) as having an infinite past and future. Define
Gt
τ(i) = C ∨σ({Xis :τ ≤s≤t}) and Htτ(i) =C ∨σ({Zit :τ ≤s≤t}). Then, by definition, we haveGt
τ(i),Htτ(i)⊂ Fτt(i) for allτ , t, i.By Assumption (iv) of Ex-ample 2, the time series of{Xit :−∞< t <∞}and {Zit :−∞< t <∞} are conditional
α-mixing conditioning on C uniformly in i.
• Mixing inequality: The following inequality is a conditional version of the α-mixing in-equality of Hall and Heyde (1980), p. 278. Suppose Xit is a Ft-measurable random variable with E|Xit|
max{p,q}
|C < ∞, where p, q > 1 with 1/p+ 1/q < 1. Denote
kXitkC,p = (E(|Xit| p
|C))1/p.Then, for each i, we have
|Cov (Xit, Xit+m|C)| ≤8kXitkC,pkXit+mkC,qα 1−1 p− 1 q m (i). (S.7.1)
Proof of Example 2. Again, we want to show the conditions of Assumption 5 are satisfied. Conditions (i)-(iii) are satisfied by the assumptions of the example.
For condition (iv), we apply the mixing inequality (S.7.1) with p=q >4. Then, we have 1 N T N X i=1 T X t,s=1 |Cov (Xit, Xis|C)| ≤ 2 N T N X i=1 T X t=1 T−t X m=0 |Cov (Xit, Xit+m|C)|= 2 N T N X i=1 T−1 X m=0 T−m X t=1 |Cov (Xit, Xit+m|C)| = 16 N T N X i=1 T−1 X m=0 T−m X t=1 kXitkC,pkXit+mkC,pαm(i) p−2 P ≤ 16 sup i,t kXitk2C,p ∞ X m=0 α p−2 P m ≤ Op(1),
where the last line holds because supi,tkXitk
2
C,p = Op(1) for some p > 4 as assumed in the example (2), and P∞ m=0α p−2 P m =P∞m=0m−ζ p−2 P =O(1) because of ζ >3 4p 4p−1 and p >4.
For condition (v), we need to show 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXeis, eiuXeiv|C =Op(1). Notice 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXeis, eiuXeiv|C = 1 N T2 N X i=1 T X t,s,u,v=1 E eitXeiseiuXeiv|C −EeitXeis|C E eiuXeiv|C ≤ 1 N T2 N X i=1 T X t,s,u,v=1 E eitXeiseiuXeiv|C + 1 N N X i=1 1 T T X t,s=1 E eitXeis|C !2 = I+II, say.
First, for termI,there are a finite number of different orderings among the indicest, s, u, v. We consider the caset ≤s≤u≤v and establish the desired result. The other cases can be shown analogously. Note 1 N T2 N X i=1 T X t=1 T−t X k=0 T−k X l=0 T−l X m=0 E eitXeit+keit+k+lXeit+k+l+m|C ≤ 1 N N X i=1 1 T2 T X t=1 X 0≤l,m≤k 0≤k+l+m≤T−t E eit e Xit+keit+k+lXeit+k+l+m |C +1 N N X i=1 1 T2 T X t=1 X 0≤k,m≤l 0≤k+l+m≤T−t E h eitXeit+k eit+k+lXeit+k+l+m |Ci −EeitXeit+k|C E eit+k+lXeit+k+l+m|C +1 N N X i=1 1 T2 T X t=1 X 0≤k,m≤l 0≤k+l+m≤T−t E eitXeit+k|C E eit+k+lXeit+k+l+m|C +1 N N X i=1 1 T2 T X t=1 X 0≤p,l≤m 0≤k+l+m≤T−t E h eitXeit+keit+k+l e Xit+k+l+m|C i = I1 +I2+I3+I4, say.
By applying the mixing inequality (S.7.1) to E eit e Xit+keit+k+lXeit+k+l+m |C with eit and e Xit+keit+k+lXeit+k+l+m, we have E eit e Xit+keit+k+lXeit+k+l+m |C ≤ 8keitkC,p Xeit+keit+k+lXeit+k+l+m C ,q α1− 1 p− 1 q k (i) ≤ 8keitkC,p Xeit+k C ,3q keit+k+lkC,3q Xeit+k+l+m C ,3q α1− 1 p− 1 q k (i),
where the last inequality follows by the generalized Holder’s inequality. Choose p = 3q > 4.
Then, I1 ≤ 8 N N X i=1 1 T2 T X t=1 X 0≤l,m≤k 0≤k+l+m≤T−t keitkC,p Xeit+k C ,p keit+k+lkC,p Xeit+k+l+m C ,p α1− 1 4p k (i) ≤ 8 sup i,t keitk2C,p sup i,t Xeit+k 2 C,p 1 T2 T X t=1 X 0≤l,m≤k 0≤k+l+m≤T−t α1− 1 4p k ≤ 8 sup i,t keitk2C,p sup i,t Xeit+k 2 C,p ∞ X k=0 k2α1− 1 4p k ≤ Op(1),
where the last line holds because we assume in example (2) thatsupi,tkeitk
2 C,p supi,t Xeit+k 2 C,p =
Op(1) for somep > 4,, and
P∞ m=0m 2α1− 1 4p m =P∞m=0m2−ζ 4p−1 4p =O(1) because ofζ >3 4p 4p−1 and p >4.
By applying similar arguments, we can also show
I2, I3, I4 =Op(1).
S.8
Supplement to the Proof of Theorem 4.3
Notation EC and VarC and CovC: In the remainder of this supplementary file we writeEC,
VarC and CovC for the expectation, variance and covariance operators conditional on C, i.e.,
EC(A) = E(A|C), VarC(A) = Var(A|C) and CovC(A, B) = Cov(A, B|C).
What is left to show to complete the proof of Theorem 4.3 is that Lemma B.1 and Lemma B.2 in the main text appendix hold. Before showing this, we first present two further intermediate lemmas.
Lemma S.8.1. Under the assumptions of Theorem 4.3 we have for k = 1, . . . , K, (a) kPλ0Xekk=op( √ N T), (b) kXekPf0k=op( √ N T), (c) kPλ0eXk0k=op(N3/2), (d) kPλ0ePf0k=Op(1). Proof of Lemma S.8.1. # Part (a): We have
kPλ0Xekk=kλ0(λ0 0 λ0)−1λ00Xekk ≤ kλ0(λ00 λ0)−1kkλ00 e Xkk ≤ kλ0kk(λ00λ0)−1kkλ00XekkF =Op(N−1/2)kλ00XekkF ,
where we used part (i) and (ii) of Lemma S.4.1 and Assumption 1. We have
E n EC h kλ00Xekk2F io =E R X r=1 T X t=1 EC N X i=1 λ0irXek,it !2 =E ( R X r=1 T X t=1 N X i=1 (λ0ir)2EC e Xk,it2 ) = R X r=1 T X t=1 N X i=1 E(λ0ir)2VarC(Xk,it) =Op(N T),
where we used Xek,it is mean zero and independent across i, conditional on C, and our bounds
on the moments of λ0ir and Xk,it. We therefore have kλ00XekkF = Op(
√
N T) and the above inequality thus gives kPλ0Xekk=Op(
√
T) =op(
√ N T).
# The proof for part (b) is similar. As above we first obtain kXekPf0k = kPf0Xek0k ≤
Op(T−1/2)kf00Xek0kF. Next, we have EC h kf00 e Xk0k2 F i = R X r=1 N X i=1 EC T X t=1 ftr0Xek,it !2 = R X r=1 N X i=1 T X t,s=1 ftr0fsr0EC e Xk,itXek,is ≤ " R X r=1 max t |f 0 tr| 2 # N X i=1 T X t,s=1
|CovC(Xk,it, Xk,is)|
where we used that uniformly bounded Ekf0
tk4+ implies maxt|ftr0| = Op(T1/(4+)). We thus havekf00 e Xk0k2 F =op(T √ N) and thereforekXekPf0k=op( √ N T). # Next, we show part (c). First, we have
E n EC h kλ00eXk0kF 2io =E EC R X r=1 N X j=1 N X i=1 T X t=1 λ0ireitXk,jt !2 =E ( R X r=1 N X i,j,l=1 T X t,s=1 λ0irλ0lrEC(eitelsXk,jtXk,js) ) = R X r=1 N X i,j=1 T X t=1 E(λ0ir) 2 EC e2itX 2 k,jt =O(N2T),
where we used that EC(eitelsXk,jtXk,js) is only non-zero if i = l (because of cross-sectional independence conditional onC) andt=s(because regressors are pre-determined). We can thus conclude kλ00eXk0kF =Op(N
√
T). Using this we find
kPλ0eXk0k=kλ0(λ00λ0)−1λ00eXk0k ≤ kλ0(λ00 λ0)−1kkλ00 eXk0k ≤ kλ0kk(λ00λ0)−1kkλ00eXk0kF =Op(N−1/2)Op(N √ T) = Op( √ N T).
This is what we wanted to show. # For part (d), we first find √1
N T f00eλ0 F =Op(1), because E EC f00eλ0 F √ N T !2 = E 1 N TEC N X i=1 T X t=1 eitft00λ 0 i !2 = E ( 1 N T N X i=1 N X j=1 T X t=1 T X s=1 EC(eitejs)ft00λ 0 iλ 00 j f 0 s ) = 1 N T N X i=1 T X t=1 EEC e2it ft00λ0iλ0i0ft0 = O(1),
where we used eit is independent across i and over t, conditional on C. Thus we obtain
kPλ0ePf0k=kλ0(λ00λ0)−1λ00ef0(f00f0)−1f00k ≤ kλ0k(λ0 0 λ0)−1kλ0 0 ef0k(f00f0)−1 kf00k ≤ Op(N1/2)Op(N−1)kλ00ef0kFOp(T−1)Op(T1/2) =Op(1), where we used part (i) and (ii) of Lemma S.4.1.
Lemma S.8.2. Suppose A and B are T ×T and N ×N matrices that are independent of
e, conditional on C, such that EC kAk2F
= Op(N T) and EC kBk2F
= Op(N T), and let Assumption 5 be satisfied. Then there exists a finite non-random constant c0 such that
(a) EC {Tr [(e0e−EC(e0e))A]} 2 ≤c0NEC kAk2F , (b) EC {Tr [(ee0 −EC(ee0))B]} 2 ≤c0T EC kBk2F .
Proof. # Part (a): Denote Ats to be the (t, s) th element of A. We have Tr{(e0e−EC(e0e))A}= T X t=1 T X s=1 (e0e−EC(e0e))tsAts = T X t=1 T X s=1 N X i=1 (eiteis−EC(eiteis)) ! Ats. Therefore, EC(Tr{(e0e−EC(e0e))A})2 = T X t=1 T X s=1 T X p=1 T X q=1 EC " N X i=1 (eiteis−EC(eiteis)) ! N X j=1 (ejpejq−EC(ejpejq)) !# EC(AtsApq).
Let Σit =EC(e2it). Then we find
EC ( N X i=1 (eiteis−EC(eiteis)) ! N X j=1 (ejpejq−EC(ejpejq)) !) = N X i=1 N X j=1 {EC(eiteisejpejq)−EC(eiteis)EC(ejpejq)} = ΣitΣis if (t =p)6= (s=q) and (i=j) ΣitΣis if (t =q)6= (s=p) and (i=j) EC(e4it)−Σ2it if (t =s =p=q) and (i=j) 0 otherwise. Therefore, EC(Tr{(e0e−EC(e0e))A}) 2 ≤ T X t=1 T X s=1 N X i=1 ΣitΣis EC A2ts +EC(AtsAst) + T X t=1 N X i=1 EC e4it −Σ2it ECA2tt.
Define Σi = diag (Σ i1, ...,ΣiT). Then, we have T X t=1 T X s=1 N X i=1 ΣitΣis ECA2ts = EC N X i=1 Tr A0ΣiAΣi ! ≤ N X i=1 EC AΣi 2 F ≤ N X i=1 Σi 2 ECkAk2F ≤ N sup it Σ2it ECkAk2F . (S.8.1) Also, T X t=1 T X s=1 N X i=1 ΣitΣisEC(AtsAst) = EC " N X i=1 Tr ΣiAAΣi # ≤ N X i=1 EC ΣiA F AΣi F ≤ N X i=1 Σi 2 ECkAk2F ≤ N sup it Σ2it ECkAk2F . (S.8.2) Finally, T X t=1 N X i=1 EC e4it−Σit2ECA2tt ≤ N sup it E C e4it ECkAk2F, (S.8.3)
and supitEC(e4it) is assumed bounded by Assumption 5(vi). # Part (b): The proof is analogous to the proof of part (a).
Proof of Lemma B.1. # For part (a) we have
1 √ N TTr Pf0e0Pλ0Xek = 1 √ N TTr Pf0e0Pλ0Pλ0XekPf0 ≤ √R N T kPλ 0e Pf0k Pλ0Xek kPf0k = √1 N T Op(1)op( √ N T)Op(1) =op(1),
where the second-last equality follows by Lemma S.8.1 (a) and (d). # To show statement (b) we define ζk,ijt =eitXek,jt. We then have
1 √ N TTr Pλ0eXe 0 k = R X r,q=1 " λ00λ0 N −1# rq 1 N√N T T X t=1 N X i,j=1 λ0irλ0jqζk,ijt | {z } ≡Ak,rq .
We only have EC ζk,ijtζk,lms
6
= 0 if t = s (because regressors are pre-determined) and i = l
and j =m (because of cross-sectional independence). Therefore
EEC A2k,rq =E ( 1 N3T T X t,s=1 N X i,j,l,m=1 λirλjqλlrλmqEC ζk,ijtζk,lms ) = 1 N3T T X t=1 N X i,j=1 Eλ2irλ 2 jqEC ζ 2 k,ijt =O(1/N) =op(1).
We thus have Ak,rq =op(1) and therefore also √N T1 Tr
Pλ0eXek0
=op(1).
# The proof for statement (c) is similar to the proof of statement (b). Define ξk,its =
eitXek,is−EC eitXek,is . We then have 1 √ N TTr n Pf0 h e0Xek−EC e0Xek io = R X r,q=1 " f0f T −1# rq 1 T√N T N X i=1 T X t,s=1 ftrfsqξk,its | {z } ≡Bk,rq . Therefore EC Bk,rq2 = 1 T3N N X i,j=1 T X t,s,u,v=1 ftrfsqfurfvqEC ξk,itsξk,juv ≤ max t,er |fter| 4 1 T3N N X i,j=1 T X t,s,u,v=1 CovC eitXek,is, ejuXek,jv = max t,er |ftre| 4 1 T3N N X i=1 T X t,s,u,v=1 CovC eitXek,is, eiuXek,iv =Op(T4/(4+))Op(1/T) =op(1),
where we used uniformly boundedEkf0
tk4+ implies maxt|ftr0|=Op(T1/(4+)). # Part (d) and (e): We have kλ0(λ00λ0)−1(f00f0)−1f00k=O
p((N T)−1/2), kek=Op(N1/2),
kXkk=Op(
√
N T) andkPλ0ePf0k=Op(1), which was shown in Lemma S.8.1. Therefore:
1 √ N TTr ePf 0e0Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 = √1 N TTr Pλ0ePf0e 0 Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 ≤ √R N T kPλ 0ePf0k kekkXkk f0(f00f0)−1(λ00λ0)−1λ00 =Op(N−1/2) = op(1) .
which shows statement (d). The proof for part (e) is analogous.
# To prove statement (f) we need to use in addition kPλ0e Xk0k=op(N3/2), which was also
shown in Lemma S.8.1. We find 1 √ N TTr e 0 Mλ0XkMf0e0λ0(λ00λ0)−1(f00f0)−1f00 = √1 N TTr e 0 Mλ0Xke0Pλ0λ0(λ00λ0)−1(f00f0)−1f00 − √1 N TTr e 0 Mλ0XkPf0e0Pλ0λ0(λ00λ0)−1(f00f0)−1f00 ≤ √R N TkekkPλ 0e Xk0k kλ0(λ00λ0)−1(f00f0)−1f00k − √R N TkekkXkkkPλ0e Pf0kkλ 0 (λ00λ0)−1(f00f0)−1f00k =op(1).
# Now we want to prove part (g) and (h) of the present lemma. For part (g) we have 1 √ N TTr [ee0−EC(ee0)] Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 = √1 N TTr [ee0−EC(ee0)] Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 + √1 N TTr n [ee0−EC(ee0)]Mλ0XekPf0f0(f00f0)−1(λ00λ0)−1λ00 o = √1 N TTr [ee0−EC(ee0)] Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 + √1 N T kee 0− EC(ee0)k XekPf0 f0(f0 0 f0)−1(λ00λ0)−1λ00 = √1 N TTr [ee0−EC(ee0)] Mλ0Xkf0(f00f0)−1(λ0 0 λ0)−1λ00 +op(1).
Thus, what is left to prove is √1
N TTr
[ee0−EC(ee0)] Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 =op(1).
For this we define
Bk =Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 .
Using part (i) and (ii) of Lemma S.4.1 we find
kBkkF ≤R1/2kBkk ≤R1/2kXkk f0(f00f0)−1(λ00λ0)−1λ00 ≤R1/2kXkkF f0(f00f0)−1(λ00λ0)−1λ00 .
and therefore EC kBkk2F ≤Rf0(f00f0)−1(λ00λ0)−1λ00 2 EC kXkk2F =O(1), where we used EC kXkk2F
= O(N T), which is true because we assumed uniformly bounded moments of Xk,it. Applying Lemma S.8.2 we therefore find
EC 1 √ N TTr{[ee 0− EC(ee0)]Bk} 2 ≤c0 T N T EC kBkk 2 F =o(1) , and thus 1 √ N TTr{[ee 0− EC(ee0)]Bk}=op(1), which is what we wanted to show. The proof for part (h) is analogous.
# Part (i): Conditional on C the expression e2
itXitX0it−EC(e2itXitX0it) is mean zero, and it is also uncorrelated across i. This together with the bounded moments that we assume implies
VarC ( 1 N T N X i=1 T X t=1 e2itXitX0it−EC e2itXitX0it ) =Op(1/N) = op(1),
which shows the required result.
# Part (j): Define the K ×K matrix A = N T1 PN
i=1 PT t=1 e2it(Xit+Xit) (Xit− Xit) 0 . Then we have 1 N T N X i=1 T X t=1 e2it(XitX0it− XitXit0) = 1 2(A+A 0 ).
LetBk be theN ×T matrix with elements Bk,it =e2it(Xk,it+Xk,it). We havekBkk ≤ kBkkF =
Op(
√
N T), because the moments of Bk,it are uniformly bounded. The components of A can be written as Alk = N T1 Tr[Bl(Xk− Xk)0]. We therefore have
|Alk| ≤ 1
N Trank(Xk− Xk)kBlk kXk− Xkk.
We have Xk− Xk =XekPf0 +Pλ0XekMf0. Therefore rank(Xk− Xk)≤2R and |Alk| ≤ 2R N TkBlk XekPf0 + Pλ0XekMf0 ≤ 2R N TkBlk XekPf0 + Pλ0Xek = 2R N TOp( √ N T)op( √ N T) =op(1), where we used Lemma S.8.1. This shows the desired result.
Proof of Lemma B.2. Letcbe aK-vector such thatkck= 1. The required result follows by the Cramer-Wold device, if we show
1 √ N T N X i=1 T X t=1 eitX0itc ⇒ N(0, c 0 Ωc).
For this, define ξit = eitX0itc. Furthermore define ξm = ξM,m = ξN T,it, with M = N T and
m=T(i−1) +t∈ {1, . . . , M}. We then have the following:
(i) Under Assumption 5(i), (ii), (iii) the sequence {ξm, m = 1, . . . , M} is a martingale dif-ference sequence under the filtration Fm =C ∨σ({ξn:n < m}).
(ii) E(ξ4it) is uniformly bounded, because by Assumption 5(vi) ECe8it and EC(kXitk8+) are uniformly bounded by a non-random constant (applying Cauchy-Schwarz and the law of iterated expectations). (iii) M1 PM m=1ξ 2 m =c 0Ωc+o p(1).
This is true, because firstly under our assumptions we haveEC
h 1 M PM m=1 ξ 2 m−EC(ξ2m) i2 = EC n 1 M2 PM m=1 ξ 2 m−EC(ξ2m) 2o = OP(1/M) = oP(1), implying we have M1 PM m=1ξ 2 m = 1 M PM m=1EC(ξ 2 m) +op(1). We furthermore have M1 PM m=1EC(ξ 2 m) = VarC(M−1/2PMm=1ξm), and using the result in equation (14) of the main text we find VarC(M−1/2PMm=1ξm) = VarC((N T)−1/2PNi=1PTt=1ξit) =c0Ωc+op(1).
These three properties of{ξm, m= 1, . . . , M}allow us to apply Corollary 5.26 in White (2001), which is based on Theorem 2.3 in Mcleish (1974), to obtain √1
M
PM
m=1ξm →d N(0, c0Ωc). This concludes the proof, because √1
M PM m=1ξm = 1 √ N T PN i=1 PT t=1eitX 0 itc.
S.9
Expansions of Projectors and Residuals
The incidental parameter estimatorsfband bλas well as the residuals b
eenter into the asymptotic bias and variance estimators for the LS estimatorbβ. To describe the properties offb,bλ and
b
e, it is convenient to have asymptotic expansions of the projectorsM
b
λ(β) andMfb(β) that correspond
to the minimizing parameters bλ(β) and fb(β) in equation (4). Note the minimizing bλ(β) and b
f(β) can be defined for all values of β, not only for the optimal valueβ =bβ. The corresponding
Theorem S.9.1. Under Assumptions 1, 3, and 4(i) we have the following expansions M b λ(β) =Mλ0 +M (1) b λ,e +M (2) b λ,e − K X k=1 βk−β0kM(1) b λ,k+M (rem) b λ (β), M b f(β) =Mf0 +M(1) b f ,e +M (2) b f ,e − K X k=1 βk−β0k M(1) b f ,k+M (rem) b f (β), b e(β) =Mλ0e Mf0 + b e(1)e − K X k=1 βk−β0kbe(1)k +be(rem)(β),
where the spectral norms of the remainders satisfy for any series ηN T →0:
sup {β:kβ−β0k≤η N T} M (rem) b λ (β) kβ−β0k2+ (N T)−1/2kek kβ−β0k + (N T)−3/2kek3 =Op(1) , sup {β:kβ−β0k≤ηN T} M (rem) b f (β) kβ−β0k2+ (N T)−1/2kek kβ−β0k + (N T)−3/2kek3 =Op(1) , sup {β:kβ−β0k≤ηN T} be(rem)(β) (N T)1/2kβ−β0k2+kek kβ−β0k+ (N T)−1kek3 =Op(1) ,
and we have rank(be(rem)(β))≤7R, and the expansion coefficients are given by M(1) b λ,e =−Mλ0e f 0(f00 f0)−1(λ00λ0)−1λ00 − λ0(λ00λ0)−1(f00f0)−1f00e0Mλ0 , M(1) b λ,k =−Mλ0Xkf 0 (f00f0)−1(λ00λ0)−1λ00 − λ0(λ00λ0)−1(f00f0)−1f00Xk0 Mλ0 , M(2) b λ,e =Mλ0e f 0(f00 f0)−1(λ00λ0)−1λ00e f0(f00f0)−1(λ00λ0)−1λ00 +λ0(λ00λ0)−1(f00f0)−1f00e0λ0(λ00λ0)−1(f00f0)−1f00e0Mλ0 −Mλ0e Mf0e0λ0(λ00λ0)−1(f00f0)−1(λ00λ0)−1λ00 −λ0(λ00λ0)−1(f00f0)−1(λ00λ0)−1λ00e Mf0e0Mλ0 −Mλ0e f0(f00f0)−1(λ00λ0)−1(f00f0)−1f00e0Mλ0 +λ0(λ00λ0)−1(f00f0)−1f00e0Mλ0e f0(f00f0)−1(λ00λ0)−1λ00,
analogously M(1) b f ,e = −Mf0e 0 λ0(λ00λ0)−1(f00f0)−1f00 − f0(f00f0)−1(λ00λ0)−1λ00e Mf0 , M(1) b f ,k = −Mf0X 0 kλ 0(λ00 λ0)−1(f00f0)−1f00 − f0(f00f0)−1(λ00λ0)−1λ00XkMf0 , M(2) b f ,e =Mf0e 0 λ0(λ00λ0)−1(f00f0)−1f00e0λ0(λ00λ0)−1(f00f0)−1f00 +f0(f00f0)−1(λ00λ0)−1λ00e f0(f00f0)−1(λ00λ0)−1λ00e Mf0 −Mf0e0Mλ0e f0(f00f0)−1(λ00λ0)−1(f00f0)−1f00 −f0(f00f0)−1(λ00λ0)−1(f00f0)−1f00e0Mλ0e Mf0 −Mf0e0λ0(λ00λ0)−1(f00f0)−1(λ00λ0)−1λ00e Mf0 +f0(f00f0)−1(λ00λ0)−1λ00e Mf0e0λ0(λ00λ0)−1(f00f0)−1f00, and finally b e(1)k =Mλ0XkMf0 , b e(1)e =−Mλ0e Mf0e0λ0(λ00λ0)−1(f00f0)−1f00 −λ0(λ00λ0)−1(f00f0)−1f00e0Mλ0e Mf0 −Mλ0e f0(f00f0)−1(λ00λ0)−1λ00e Mf0 . Proof. The general expansion of M
b
λ(β) is given in Moon and Weidner (2015), and in the theorem we just make this expansion explicit up to a particular order. The result for M
b
f(β) is just obtained by symmetry (N ↔T,λ ↔f, e↔e0, Xk↔Xk0). For the residuals be we have
b e=Mbλ Y −X k=1 b βkXk ! =Mbλ he−βb−β0 ·X+λ0f00i ,
and plugging in the expansion of Mbλ gives the expansion of be. We have be(β) = A0+λ0f00−
b λ(β)fb0(β), whereA0 =e− P k(βk−β 0 k)Xk. Therefore be (rem)(β) = A 1+A2+A3 with A1 =A0− Mλ0A0Mf0, A2 =λ0f00−bλ(β)fb0(β), and A3 =− b
e(1)e . We find rank(A1)≤2R, rank(A2)≤2R,
rank(A3)≤3R, and thus rank(be
(rem)(β))≤7R, as stated in the theorem.
Having expansions for M
b
λ(β) andMfb(β), we also have expansions for Pλb(β) =IN −Mbλ(β)
and Pfb(β) = IT − Mfb(β). The reason why we give expansions of the projectors and not
expansions of bλ(β) andfb(β) directly is for the latter we would need to specify a normalization,
whereas the projectors are independent of any normalization choice. An expansion forbλ(β) can,
for example, be defined bybλ(β) =P
b
λ(β)λ
0
, in which case the normalization ofbλ(β) is implicitly
S.10
Consistency Proof for Bias and Variance Estimators
(Proof of Theorem 4.4)
It is convenient to introduce some alternative notation for Definition 1 in section 4.3 of the main text.
Definition Let Γ : R → R be the truncation kernel defined by Γ(x) = 1 for |x| ≤ 1, and Γ(x) = 0 otherwise. Let M be a bandwidth parameter that depends on N andT. For an N×N
matrix A with elements Aij and a T ×T matrix B with elements Bts we define (i) the diagonal truncations AtruncD = diag[(A
ii)i=1,...,N] and BtruncD = diag[(Btt)t=1,...,T]. (ii) the right-sided Kernel truncation of B, which is a T ×T matrix BtruncR with elements
BtruncR
ts = Γ
s−t M
Bts for t < s, and BtstruncR = 0 otherwise.
Here, we suppress the dependence of BtruncR on the bandwidth parameter M. Using this
notation we can represent the estimators for the bias in Definition 1 as follows:
b B1,k = 1 N Tr h P b f(be 0 Xk) truncRi , b B2,k = 1 TTr h (bebe0)truncD M b λXkfb(fb 0 b f)−1(bλ 0 b λ)−1bλ 0i , b B3,k = 1 NTr h (be0be)truncD M b fX 0 kλb(bλ 0 b λ)−1(fb0fb)−1fb0 i .
Before proving Theorem 4.4 we establish some preliminary results.
Corollary S.10.1. Under the Assumptions of Theorem 4.3 we have √N T bβ−β0
=Op(1). This corollary directly follows from Theorem 4.3.
Corollary S.10.2. Under the Assumptions of Theorem 4.4 we have
P b λ −Pλ0 = M b λ−Mλ0 =Op(N−1/2), Pfb−Pf0 = Mfb−Mf0 =Op(T −1/2 ).
Proof. Usingkek=Op(N1/2) andkXkk=Op(N) we find the expansion terms in Theorem S.9.1 satisfy M (1) b λ,e =Op(N −1/2), M (2) b λ,e =Op(N −1), M (1) b λ,k =Op(1) .
Together with corollary S.10.1 the result for M
b
λ−Mλ0
immediately follows. In addition we
haveP
b
λ−Pλ0 =−Mbλ+Mλ
0. The proof for M
b
Lemma S.10.3. Under the Assumptions of Theorem 4.4 we have A1 ≡ 1 N T N X i=1 T X t=1 e2itXitXit0 −XbitXbit0 =op(1), A2 ≡ 1 N T N X i=1 T X t=1 e2it−be2itXbitXbit0 =op(1).
Lemma S.10.4. Let fband f0 be normalized as fb0f /Tb = IR and f00f0/T = IR. Then, under
the assumptions of Theorem 4.4, there exists an R×R matrix H =HN T such that3
fb−f 0H =Op(1) , bλ−λ 0 (H0)−1 =Op(1) . Furthermore bλ(bλ 0 b λ)−1(fb0fb)−1fb0 −λ0(λ00λ0)−1(f00f0)−1f00 =Op N −3/2 . Lemma S.10.5. Under the Assumptions of Theorem 4.4 we have
(i) N−1 EC(e 0X k)−(be 0X k) truncR =op(1), (ii) N−1 EC(e 0 e)−(be0be)truncD =op(1), (iii) T−1 EC(ee 0 )−(beeb0)truncD =op(1).
Lemma S.10.6. Under the Assumptions of Theorem 4.4 we have
(i) N−1 (be 0 Xk) truncR =Op(M T 1/8), (ii) N−1 (be 0 b e)truncD =Op(1) , (iii) T−1 (bebe 0 )truncD =Op(1) .
The proof of the above lemmas is given section S.11 below. Using these lemmas we can now prove Theorem 4.4.
Proof of Theorem 4.4, Part I: show cW =W +op(1).
3We consider a limitN, T → ∞ and for differentN, T differentH-matrices can be chosen, but we write H
Using|Tr (C)| ≤ kCkrank (C) and corollary S.10.2 we find: cWk1k2−WN T ,k1k2 = (N T)−1Trh Mλb−Mλ0 Xk1MfbX 0 k2 i + (N T)−1TrhMλ0Xk1 Mfb−Mf0 Xk02i ≤ 2R N T M b λ−Mλ0 kXk1kkXk2k 2R N T Mfb−Mf 0 kXk1kkXk2k = 2R N TOp(N −1)O p(N T) + 2R N TOp(T −1)O p(N T) =op(1). Thus we have cW =WN T +op(1) = W +op(1).
Proof of Theorem 4.4, Part II: show Ω = Ω +b op(1).
Let ΩN T ≡ N T1 PN i=1 PT t=1 e 2 itXitXit0. We have Ω = ΩN T +oP(1) = Ω +b A1 +A2 +op(1) = b
Ω +oP(1), whereA1 and A2 are defined in Lemma S.10.3, and the lemma states A1 and A2 are op(1).
Proof of Theorem 4.4, Part III: show Bb1 =B1 +op(1).
LetB1,k,N T =N−1Tr [Pf0EC(e0Xk)]. According to Assumption 6 we haveB1,k =B1,k,N T+op(1).
What is left to show is B1,k,N T =Bb1,k+op(1). Using|Tr (C)| ≤ kCkrank (C) we find
B1,k,N T −Bb1 = EC 1 NTr(Pf0e 0 Xk) − 1 NTr h P b f(be 0 Xk) truncRi ≤ 1 NTr h Pf0 −P b f (be0Xk) truncRi + 1 NTr n Pf0 h EC(e0 Xk)−(be 0 Xk) truncRio ≤ 2R N Pf0 −Pfb (be 0 Xk) truncR + R N kPf0k EC(e 0 X k)−(be 0X k) truncR .
We have kPf0k= 1. We now apply Lemmas S.10.5, S.10.2 and S.10.6 to find
B1,k,N T −B1b =N −1 O p(N−1/2)Op(M N T1/8) +op(N)=op(1).
This is what we wanted to show.
Proof of Theorem 4.4, final part: show Bb2 =B2+op(1) and Bb3 =B3+op(1).
Define B2,k,N T = 1 TTr EC(ee0) Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 .
According to Assumption 6 we have B2,k =B2,k,N T +op(1). What is left to show is B2,k,N T = b B2,k+op(1). We have B2,k−Bb2,k = 1 TTr EC(ee0) Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 − 1 TTr h (bebe0)truncD M b λXkfb(fb 0 b f)−1(bλ 0 b λ)−1bλ 0i =1 TTr h (bebe0)truncD M b λXk f0(f00f0)−1(λ00λ0)−1λ00 −fb(fb0fb)−1(bλ 0 b λ)−1bλ 0i + 1 TTr h (bebe0)truncD Mλ0 −M b λ Xkf0(f00f0)−1(λ00λ0)−1λ00 i + 1 TTr nh EC(ee0)−(bebe0)truncD i Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 o .
Using|Tr (C)| ≤ kCkrank (C) (which is true for every square matrix C) we find
B2,k−Bb2,k ≤ R T (bebe 0 )truncD kXkk f 0(f00 f0)−1(λ00λ0)−1λ00−fb(fb0fb)−1(bλ 0 b λ)−1bλ 0 +R T (ebbe 0 )truncD Mλ0 −M b λ kXkk f0(f00f0)−1(λ00λ0)−1λ00 +R T EC(ee 0)−( b ebe0)truncD kXkk f0(f00f0)−1(λ0 0 λ0)−1λ00 . Here we used kMf0k = Mfb = 1. Using kXkk = Op( √
N T), and applying Lemmas S.10.2, S.10.4, S.10.5 and S.10.6, we now find
B2,k−Bb2,k =T −1 Op(T)Op((N T)1/2)Op(N−3/2) +Op(T)Op(N−1/2)Op((N T)1/2)Op((N T)−1/2) +op(T)Op((N T)1/2)Op((N T)−1/2) =op(1).
This is what we wanted to show. The proof of Bb3 =B3+op(1) is analogous.
S.11
Proof of Intermediate Lemma
Here we provide the proof of some intermediate lemmas that were stated and used in section S.10. The following lemma gives a useful bound on the maximum of (correlated) random variables
Lemma S.11.1. Let Zi, i = 1,2, . . . , n, be n real valued random variables, and let γ ≥ 1 and
B >0 be finite constants (independent ofn). Assume maxi EC|Zi|γ≤B, i.e., the γ’th moment of the Zi are finite and uniformly bounded. For n → ∞ we then have
max
i |Zi|=Op n
1/γ