Dynamic Linear Panel Regression Models with Interactive Fixed Effects

(1)

Supplementary Material

Dynamic Linear Panel Regression Models

with Interactive Fixed Effects

Hyungsik Roger Moon

‡

Martin Weidner

§

August 10, 2015

S.1 Proof of Identification (Theorem 2.1)

Proof of Theorem 2.1. Let Q(β, λ, f) ≡ _EkY − β·X − λ f0k2_F λ 0_{, f}0_{, w}_{, where} _β _∈ RK, λ∈RN×R and f ∈RT×R. We have Q(β, λ, f) =_EnTr (Y − β·X − λ f0)0(Y − β·X − λ f0) λ 0_{, f}0_{, w}o =_EnTrh λ0f00−λf0−(β−β0)·X+e0 λ0f00−λf0−(β−β0)·X+ei λ 0 , f0, wo =_EhTr (e0e) λ 0 , f0, wi +_EnTrh λ0f00−λf0−(β−β0)·X0 λ0f00−λf0 −(β−β0)·Xi λ 0_{, f}0_{, w}o | {z } ≡Q∗₍_β,λ,f₎ .

In the last step we used Assumption ID(ii). Because _EhTr (e0e)

λ

0

, f0, wi is independent of

β, λ, f, we find minimizing Q(β, λ, f) is equivalent to minimizing Q∗(β, λ, f). We decompose

‡_{Department of Economics and USC Dornsife INET, University of Southern California, Los Angeles, CA}

90089-0253. Email: [email protected]. Department of Economics, Yonsei University, Seoul, Korea.

§_{Department of Economics, University College London, Gower Street, London WC1E 6BT, U.K., and}

(2)

Q∗(β, λ, f) as follows Q∗(β, λ, f) =_EnTrh λ0f00 −λf0−(β−β0)·X0 λ0f00−λf0−(β−β0)·Xi λ 0 , f0, wo =_EnTrh λ0f00 −λf0−(β−β0)·X0M₍_λ,λ0_,w₎ λ0f00−λf0−(β−β0)·X i λ 0 , f0, wo +_EnTrh λ0f00−λf0−(β−β0)·X0P₍_λ,λ0_,w₎ λ0f00−λf0−(β−β0)·X i λ 0_{, f}0_{, w}o

=_EnTrh (βhigh−β0,high)·Xhigh

0

M₍_λ,λ0_,w₎ (βhigh−β0,high)·X_high

i λ 0_{, f}0_{, w}o | {z } ≡Qhigh₍_βhigh_,λ₎ +_EnTrh λ0f00−λf0−(β−β0)·X0 P₍_λ,λ0_,w₎ λ0f00−λf0−(β−β0)·X i λ 0_{, f}0_{, w}o | {z } ≡Qlow₍_β,λ,f₎ ,

where (βhigh−β0,high)·Xhigh = PK

m=K1+1(βm −β

0

m)Xm. A lower bound on Qhigh(β

high , λ) is given by Qhigh(βhigh, λ) ≥ min e λ∈RN×(R+R+rank(w)) E n Tr h

(βhigh−β0,high)·Xhigh0M₍_e_λ,λ,w₎ (βhigh−β0,high)·Xhigh

i λ 0 , f0, w o = min(N,T) X r=R+R+rank(w)

µ_rn_Eh (βhigh−β0,high)·Xhigh

(βhigh−β0,high)·Xhigh

0 λ

0_{, f}0_{, w}io_.

(S.1.1) Because Q∗(β, λ, f), Qhigh₍_βhigh

, λ), and Qlow₍_{β, λ, f}_{), are expectations of traces of positive}

semi-definite matrices we have Q∗(β, λ, f) ≥ 0, Qhigh₍_βhigh

, λ) ≥ 0, and Qlow₍_{β, λ, f}₎ _≥ _{0 for}

all β, λ, f. Let ¯β, ¯λ and ¯f be the parameter values that minimize Q(β, λ, f), and thus also

Q∗(β, λ, f). Because Q∗(β0, λ0, f0_{) = 0 we have} _Q∗_(¯_β,_λ,_¯ _f_¯_{) = min}

β,λ,fQ∗(β, λ, f) = 0. This implies Qhigh(¯βhigh,λ¯) = 0 and Qlow(¯β,λ,¯ f¯) = 0. Assumption ID(v), the lower bound (S.1.1), and Qhigh(¯βhigh,λ¯) = 0 imply ¯βhigh=β0,high. Using this, we find

Qlow(¯β,λ,¯ f¯) =_E Tr

λ0f00 −¯λf¯0 −(¯βlow−β0,low)·Xlow

0

λ0f00−λ¯f¯0−(¯βlow−β0,low)·Xlow

λ 0 , f0, w , ≥min f E Tr

λ0f00−λf¯ 0−(¯βlow−β0,low)·Xlow

0

λ0f00−¯λf0−(¯βlow−β0,low)·Xlow

λ 0 , f0, w =_E Tr

λ0f00 −(¯βlow−β0,low)·Xlow

0

M¯_λλ0f00−(¯βlow−β0,low)·Xlow

λ 0 , f0, w , (S.1.2)

(3)

where (¯βlow−β0,low)·Xlow=PK_l₌₁1(¯βl−β

0

l)Xl. BecauseQlow(¯β,λ,¯ f¯) = 0 and the last expression in (S.1.2) is non-negative we must have

E

Tr

λ0f00−(¯βlow−β0,low)·Xlow

0

M¯_λλ0f00−(¯βlow−β0,low)·Xlow

λ 0 , f0, w = 0.

UsingM¯_λ =M¯_λM¯_λ and the cyclicality of the trace we obtain from the last equality: Tr M¯_λAM¯_λ = 0, where A = _E λ0f00₋_(¯_βlow₋_β0,low₎_·_X low λ0f00−(¯β low −β0,low)·Xlow 0 λ 0_{, f}0_{, w} . The trace of a positive semi-definite matrix is only equal to zero if the matrix itself is equal to zero, so we find

M¯_λAM¯_λ = 0,

This together with the fact that A itself is positive semi definite implies (note that A positive semi-definite impliesA=CC0 for some matrixC, andM¯_λAM¯_λ = 0 then impliesM¯_λC= 0, i.e.,

C =P¯_λC)

A=P¯_λAP¯_λ,

and therefore rank(A)≤rank(P¯_λ)≤R. We have thus shown rank

E

λ0f00 −(¯βlow−β0,low)·Xlow λ0f00−(¯βlow−β0,low)·Xlow

0 λ 0 , f0, w ≤R. We furthermore find R≥rank E

λ0f00−(¯βlow−β0,low)·Xlow λ0f00−(¯β low −β0,low)·Xlow 0 λ 0 , f0, w ≥rank MwE

Pf0

0 Mw λ 0 , f0, w + rank PwE

Mf0

λ0f00 −(¯βlow−β0,low)·Xlow

0 Pw λ 0 , f0, w ≥rank Mwλ0f00f0λ00Mw + rank E

(¯βlow−β0,low)·Xlow

Mf0

0 λ 0 , f0, w .

Assumption ID(iv) guarantees rank Mwλ0f00f0λ00Mw

= rank λ0f00_f0_λ00 = R, that is, we must have E

Mf0

0 λ 0_{, f}0_{, w} = 0.

According to Assumption ID(iii) this implies ¯βlow = β0,low, i.e., we have ¯β = β0. This also implies Q∗(¯β,λ,¯ f¯) =kλ0f00−¯λf¯0k2

F = 0, and therefore ¯λf¯0 =λ

0 f00.

(4)

S.2 Examples of Error Distributions

The following Lemma provides examples of error distributions that satisfykek=Op(

p

max(N, T)) asN, T → ∞. Example (i) is particularly relevant for us, because those assumptions oneit are imposed in Assumption 5 in the main text, i.e., under those main text assumptions we indeed havekek=Op(

p

max(N, T)).

Lemma S.2.1. For each of the following distributional assumptions on the errors eit, i = 1, . . . , N, t= 1, . . . , T, we have kek=Op(

p

max(N, T)).

(i) The eit are independent across i and t, conditional on C, and satisfy E(eit|C) = 0, and

E(e4it|C) is bounded uniformly by a non-random constant, uniformly over i, t and N, T. Here C can be any conditioning sigma-field, including the empty one (corresponding to unconditional expectations).

(ii) The eit follow different MA(∞) processes for each i, namely

eit=

∞

X

τ=0

ψ_iτui,t−τ , for i= 1. . . N, t= 1. . . T , (S.2.1)

where the uit, i = 1. . . N, t = −∞. . . T are independent random variables with Euit = 0 and _Eu4

it uniformly bounded across i, t and N, T. The coefficients ψiτ satisfy

∞ X τ=0 τ max i=1...N ψ 2 iτ < B , ∞ X τ=0 max i=1...N|ψiτ| < B , (S.2.2) for a finite constant B which is independent of N and T.

(iii) The error matrix e is generated as e = σ1/2uΣ1/2, where u is an N × T matrix with independently distributed entriesuit andEuit= 0,Eu2it = 1, and Eu4it is bounded uniformly across i, t and N, T. Here σ is the N ×N cross-sectional covariance matrix, and Σ is the

T ×T time-serial covariance matrix, and they satisfy

max j=1...N N X i=1 |σij| < B , max τ=1...T T X t=1 |Σtτ| < B , (S.2.3)

for some finite constant B which is independent of N and T. In this example we have

(5)

Proof of Lemma S.2.1, Example (i). Latala (2005) showed that for aN×T matrix ewith independent entries, conditional on C, we have

E kekC ≤ c    max i " X t E e2it C #1/2 + max j " X i E e2it C #1/2 + " X i,t E e4it C #1/4   ,

where c is some universal constant. Because we assumed uniformly bounded 4th conditional moments foreit we thus havekek=OP(

√ T) +OP( √ N) +OP((T N)1/4) = Op( p max(N, T)).

Example (ii). Let ψ_j = (ψ₁_j, . . . , ψ_{N j}) be an N ×1 vector for each j ≥ 0. Let U−j be an

N×T sub-matrix of (uit) consisting ofuit,i= 1. . . N, t= 1−j, . . . , T −j. We can then write equation (S.2.1) in matrix notation as

e= ∞ X j=0 diag(ψ_j)U−j = T X j=0 diag(ψ_j)U−j +rN T,

where we cut the sum atT, which results in the remainderrN T =

P∞

j=T+1 diag(ψj)U−j. When approximating an MA(∞) by a finite MA(T) process we have for the remainder

E(krN TkF) 2 = N X i=1 T X t=1 E(rN T) 2 ij ≤ σ 2 u N X i=1 T X t=1 ∞ X j=T+1 ψ2_ij ≤σ2_uN T ∞ X j=T+1 max i ψ 2 ij ≤σ2_uN ∞ X j=T+1 j max i ψ 2 ij , where σ2

u is the variance of uit. Therefore, for T → ∞ we have

E (krN TkF)

2

N

!

−→ 0,

which implies (krN TkF)2 =Op(N), and thereforekrN Tk ≤ krN TkF =Op(

√ N).

Let V be the N ×2T matrix consisting of uit, i= 1. . . N, t = 1−T, . . . , T. For j = 0. . . T the matricesU−j are sub-matrices ofV, and thereforekU−jk ≤ kVk. From example (i) we know

kVk=Op(pmax(N,2T)). Furthermore, we knowkdiag(ψ_j)k ≤maxi ψ_ij

(6)

Combining these results we find kek ≤ T X j=0 kdiag(ψ_j)k kU−jk+krN Tk ≤ T X j=0 max i ψ_ij kVk+op( √ N) ≤ " _∞ X j=0 max i ψ_ij # Op(pmax(N,2T)) +op(√N) ≤ Op( p max(N, T)),

as required for the proof.

Example (iii). Because σ and Σ are positive definite, there exits a symmetric N ×N matrix

φ and a symmetric T ×T matrix ψ such thatσ =φ2 and Σ =ψ2. The error term can then be generated as e=φuψ, where uis anN ×T matrix with iid entriesuit such that E(uit) = 0 and

E(u4it)<∞. Given this definition ofewe immediately haveEeit = 0 andEeitejτ =σijΣtτ. What is left to show iskek=Op(

p

max(N, T)). From example (i) we know kuk=Op(pmax(N, T)). Using the inequality kσk ≤pkσk1kσk∞ =kσk1, where kσk1 =kσk∞ because σ is symmetric

we find kσk ≤ kσk1 ≡ max j=1...N N X i=1 |σij| < L ,

and analogously kΣk < L. Because kσk = kφk2 _and _k_Σ_k ₌ _kψk2_{, we thus find} _{kek ≤} kφkkukkψk ≤LOp(

p

max(N, T)), i.e., kek=Op(

p

max(N, T)).

S.3 Comments on Assumption 4 on the regressors

Consistency of the LS estimator βb requires the regressors not only satisfy the standard

non-collinearity condition in assumption 4(i), but also the additional conditions on high- and low-rank regressors in assumption 4(ii). Bai (2009) considers the special cases of only high-low-rank and only low-rank regressors. As low-rank regressors he considers only cross-sectional invari-ant and time-invariinvari-ant regressors, and he shows that if only these two types of regressors are present, one can show consistency under the assumption plim_N,T→∞WN T > 0 on the re-gressors (instead of assumption 4), where WN T is the K ×K matrix defined by WN T,k1k2 =

(N T)−1_Tr(_M

f0X_k0

(7)

objective expansion in theorem 4.1, i.e., the condition plim_N,T_→∞WN T >0 is very natural in the context of the interactive fixed effect models, and one may wonder whether also for the general case one can replace assumption 4 with this weaker condition and still obtain consistency of the LS estimator. Unfortunately, this is not the case, and below we present two simple counter examples that show this.

(i) Let there only be one factor (R = 1) f0

t with corresponding factor loadings λ

0

i. Let there only be one regressor (K = 1) of the form Xit =wivt+λ0ift0. Assume the N ×1 vector

w= (w1, . . . , wN)0, and theT ×1 vector v = (v1, . . . , vN)0 are such that theN×2 matrix Λ = (λ0, w) and and the T ×2 matrix F = (f0, v) satisfy plim_N,T→∞(Λ0Λ/N) > 0, and

plim_N,T_→∞(F0F/T)>0. In this case, we have WN T = (N T)−1Tr(Mf0vw0M_λ0wv0), and

therefore plim_N,T_→∞WN T = plimN,T→∞(N T)−1Tr(Mf0vw0M_λ0wv0) > 0. However, β is

not identified becauseβ0X+λ0f00 _{= (}_β0_{+ 1)}_X₋_wv0_{, i.e., it is not possible to distinguish}

(β, λ, f) = (β0, λ0, f0_{) and (}_{β, λ, f}_{) = (}_β0_{+ 1}_,_{−w, v}_{). This implies that the LS estimator}

is not consistent (both β0 and β0+ 1 could be the true parameter, but the LS estimator cannot be consistent for both).

(ii) Let there only be one factor (R = 1)f_t0with corresponding factor loadingsλ0_i. Let theN×

1 vectorsλ0,w1andw2be such that Λ = (λ0, w1, w2) satisfies plimN,T→∞(Λ0Λ/N)>0. Let

theT×1 vectorsf0_,_v

1 andv2 be such thatF = (f0, v1, v2) satisfies plimN,T→∞(F0F/T)>

0. Let there be four regressors (K = 4) defined by X1 = w1v01, X2 = w2v20, X3 = (w1 + λ0)(v2+f0)0,X4 = (w2+λ0)(v1+f0)0. In this case, one can easily check plimN,T→∞WN T > 0. However, again β_k is not identified, because P4

k=1β 0

kXk+λ0f00 =P4_k₌₁(β0k+ 1)Xk− (λ0 +w1 +w2)(f00 +v1 +v2)0, i.e., we cannot distinguish between the true parameters

and (β, λ, f) = (β0 + 1, −λ0−w1 −w2, f00 +v1 +v2). Again, as a consequence the LS

estimator is not consistent in this case.

In example (ii), there are only low-rank regressors with rank(Xl) = 1. One can easily check assumption 4 is not satisfied for this example. In example (i) the regressor is a low-rank regressor with rank(X) = 2. In our present version of assumption 4 we only consider low-rank regressors with rank(X) = 1, but (as already noted in a footnote in the main paper) it is straightforward to extend the assumption and the consistency proof to low-rank regressors with rank larger than one. Independent of whether we extend the assumption or not, the regressor X of example (i) fails to satisfy assumption 4. This justifies our formulation of assumption 4, because it shows in general the assumption cannot be replaced by the weaker condition plim_N,T_→∞WN T >0.

(8)

S.4 Some Matrix Algebra (including Proof of Lemma A.1)

The following statements are true for real matrices (throughout the whole paper and supplemen-tary material we never use complex numbers anywhere). Let A be an arbitrary n×m matrix. In addition to the operator (or spectral) norm kAk and to the Frobenius (or Hilbert-Schmidt) norm kAkF, it is also convenient to define the 1-norm, the ∞-norm, and the max-norm by

kAk1 = max j=1...m n X i=1 |Aij| , kAk∞ = max i=1...n m X j=1

|Aij| , kAkmax = max

i=1...n jmax=1...m |Aij| .

Lemma S.4.1 (Some useful inequalities). Let A be an n×m matrix, B be an m×p matrix, and C and D be n×n matrices. Then we have:

(i) kAk ≤ kAk_F ≤ kAk rank (A)1/2 ,

(ii) kABk ≤ kAk kBk ,

(iii) kABk_F ≤ kAk_FkBk ≤ kAk_F kBk_F ,

(iv) |Tr(AB)| ≤ kAk_FkBk_F , for n=p, (v) |Tr (C)| ≤ kCkrank (C) ,

(vi) kCk ≤Tr (C) , for C symmetric and C ≥0, (vii) kAk2 _{≤ kAk}

1kAk∞ ,

(viii) kAkmax ≤ kAk ≤ √

nmkAkmax,

(ix) kA0CAk ≤ kA0DAk, for C symmetric and C ≤D. For C, D symmetric, and i= 1, . . . , n we have:

(x) µ_i(C) +µ_n(D) ≤ µ_i(C+D) ≤ µ_i(C) +µ₁(D),

(xi) µ_i(C)≤ µ_i(C+D), for D≥0,

(xii) µ_i(C)− kDk ≤ µ_i(C+D) ≤ µ_i(C) +kDk.

Proof. Here we use notation si(A) for theith largest singular value of a matrix A. (i) We have kAk= s1(A), and kAk2

F =

Prank(A)

i=1 (si(A))

2_{. The inequalities follow directly from}

this representation. (ii) This inequality is true for all unitarily invariant norms, see, e.g., Bhatia (1997). (iii) can be shown as follows

kABk2_F = Tr(ABB0A0) = Tr[kBk2_AA0₋ A(kBk2 I−BB0)A0] ≤ kBk2_Tr(_AA0 ) =kBk2 _kAk2 F ,

(9)

where we used A(kBk2

I−BB0)A0 is positive definite. Relation (iv) is just the Cauchy Schwarz

inequality. To show (v) we decomposeC =U DO0 (singular value decomposition), whereU and

O are n×rank(C) that satisfyU0U =O0O =_Iand Dis a rank(C)×rank(C) diagonal matrix with entries si(C). We then have kOk=kUk= 1 andkDk=kCk and therefore

|Tr(C)|=|Tr(U DO0)|=|Tr(DO0U)| = rank(C) X i=1 η0_iDO0U η_i ≤ rank(C) X i=1

kDkkO0kkUk= rank(C)kCk.

For (vi) let e1 be a vector that satisfies ke1k = 1 and kCk = e01Ce1. Because C is symmetric

such an e1 has to exist. Now choose ei, i = 2, . . . , n, such that ei, i = 1, . . . , n, becomes a orthonormal basis of the vector space of n×1 vectors. Because C is positive semi definite we then have Tr (C) = P

ie

0

iCei ≥ e1Ce1 =kCk, which is what we wanted to show. For (vii) we

refer to Golub and van Loan (1996), p.15. For (viii) letebe the vector that satisfieskek= 1 and

kA0_CAk ₌ _e0_A0_CAe_{. Because} _A0_CA _{is symmetric such an} _e _{has to exist. Because} _C _≤ _D _we

then have kCk= (e0A0)C(Ae)≤(e0A0)D(Ae)≤ kA0_DAk_{. This is what we wanted to show. For}

inequality (ix) lete1 be a vector that satisfiedke1k= 1 andkA0CAk=e01A

0_CAe 1. Then we have kA0_CAk ₌ _e0 1A 0_DAe 1 −e01A 0₍_D₋_C₎_Ae 1 ≤ e01A 0_DAe

1 ≤ kA0DAk. Statement (x) is a special

case of Weyl’s inequality, see, e.g., Bhatia (1997). The inequalities (xi) and (xii) follow directly from (ix) because µ_n(D)≥0 forD≥0, and because −kDk ≤µ_i(D)≤ kDk for i= 1, . . . , n.

Definition S.4.2. Let A be an n×r1 matrix and B be an n×r2 matrix with rank(A) = r1

and rank(B) = r2. The smallest principal angle θA,B ∈ [0, π/2] between the linear subspaces span(A) = {Aa|a∈_Rr1_} _and _span(_B_{) =} _{Bb|_b _∈

Br2} of Rn is defined by

cos(θA,B) = max

0=6 a∈Rr1₀=6maxb∈Rr2

a0A0Bb kAakkBbk.

Lemma S.4.3. Let A be an n×r1 matrix and B be an n×r2 matrix with rank(A) = r1 and

rank(B) =r2. Then we have the following alternative characterizations of the smallest principal

angle between span(A) and span(B)

sin(θA,B) = min

06=a∈Rr1 kMBA ak kA ak = min 06=b∈_Rr₂ kMAB bk kB bk .

(10)

Proof. Because kMBA ak2 + kPBA ak2 = kA ak2 and sin(θA,B)2 + cos(θA,B)2 = 1, we find proving the theorem is equivalent to proving

cos(θA,B) = min

06=a∈_Rr₁

kPBA ak

kA ak = min06=b∈_Rr₂

kPAB bk

kA bk .

This last statement is theorem 8 in Galantai and Hegedus (2006), and the proof can be found there.

Proof of Lemma A.1. Let

S1(Z) = min f,λ Tr [(Z−λf 0 ) (Z0−f λ0)] , S2(Z) = min f Tr(Z MfZ 0 ), S3(Z) = min λ Tr(Z 0 MλZ), S4(Z) = min ˜ λ,f˜ Tr(M_e_λZ M_f_eZ0), S5(Z) = T X i=R+1 µ_i(Z0Z), S6(Z) = N X i=R+1 µ_i(ZZ0).

The theorem claims

S1(Z) = S2(Z) = S3(Z) = S4(Z) = S5(Z) = S6(Z).

We find:

(i) The non-zero eigenvalues of Z0Z and ZZ0 are identical, so in the sums in S5(Z) and in S6(Z) we are summing over identical values, which shows S5(Z) = S6(Z).

(ii) Starting with S1(Z) and minimizing with respect to f we obtain the first-order condition λ0Z =λ0λ f0 .

Putting this into the objective function we can integrate out f, namely Tr(Z−λf0)0(Z−λf0)= Tr (Z0Z−Z0λf0)

= Tr Z0Z−Z0λ(λ0λ)−1(λ0λ)f0

= Tr Z0Z−Z0λ(λ0λ)−1(λ0λ)λ0Z

= Tr (Z0MλZ) .

(11)

(iii) Let M_b_λ be the projector on the N −R eigenspaces corresponding to the N −R smallest eigenvalues1 _of _ZZ0_{, let} _P

b

λ =IN −Mbλ, and let ωR be the R’th largest eigenvalue of ZZ

0_.

We then know the matrixP

b λ[ZZ 0_−ω RIN]Pbλ−Mλb[ZZ 0_−ω RIN]Mλbis positive semi-definite.

Thus, for an arbitrary N ×R matrix λ with corresponding projector Mλ we have

0≤Tr n P_λ_b[ZZ0−ωRIN]Pλb−Mλb[ZZ 0₋ ωRIN]Mbλ Mλ−Mbλ 2o = Tr P_b_λ[ZZ0−ωRIN]Pbλ+Mλb[ZZ 0₋ ωRIN]Mbλ Mλ−Mbλ = Tr [Z0MλZ]−Tr Z0M_b_λZ+ωR rank(Mλ)−rank(M_b_λ) ,

and because rank(M_λ_b) =N −R and rank(Mλ)≤N −R we have TrZ0M_b_λZ ≤Tr [Z0MλZ] .

This showsM_b_λis the optimal choice in the minimization problem ofS3(Z), i.e., the optimal λ=bλis chosen such that the span of the N-dimensional vectorsbλ_r(r = 1. . . R) equals to

the span of the R eigenvectors that correspond to theR largest eigenvalues of ZZ0. This showsS3(Z) = S6(Z). Analogously one can show S2(Z) =S5(Z).

(iv) In the minimization problem in S4(Z) we can choose eλ such that the span of the N

-dimensional vectors eλr (r = 1. . . R1) is equal to the span of the R1 eigenvectors that correspond to the R1 largest eigenvalues of ZZ0. In addition, we can choose fesuch that

the span of the T-dimensional vectors fe_r (r = 1. . . R₂) is equal to the span of the R₂

eigenvectors that correspond to the (R1+ 1)-largest up to theR-largest eigenvalue ofZ0Z.

With this choice of eλ and fewe actually project out all the R largest eigenvalues of Z0Z

and ZZ0. This shows that S4(Z) ≤ S5(Z). (This result is actually best understood by

using the singular value decomposition of Z.) We can write M e λZ Mfe=Z −Ze, where e Z =P e λZ Mfe+Z Pfe.

Because rank(Z) ≤ rank(P

e

λZ Mfe) + rank(Z Pfe) = R1 +R2 = R, we can always write 1_{If an eigenvalue has multiplicity}_m_{, we count it}_m _{times when finding the}_N ₋_R _{smallest eigenvalues. In}

(12)

e

Z =λf0 for some appropriateN ×R and T ×R matrices λ and f. This shows that

S4(Z) = min ¯ λ,f¯ Tr(M e λZ MfeZ 0 ) ≥ min {Ze: rank(Ze)≤R} Tr((Z −Ze)(Z−Ze)0) = min f,λ Tr [(Z−λf 0 ) (Z0−f λ0)] =S1(Z).

Thus we have shown here S1(Z) ≤S4(Z)≤ S5(Z), and this holds with equality because S1(Z) =S5(Z) was already shown above.

S.5 Supplement to the Consistency Proof (Appendix A)

Lemma S.5.1. Under assumptions 1 and 4 there exists a constant B0 > 0 such that for the

matrices w and v introduced in assumption 4 we have

w0M_λ0w − B0w0w≥0, wpa1, v0Mf0v − B₀v0v ≥0, wpa1.

Proof. We can decomposew=w_ew¯, wherew_eis anN×rank(w) matrix and ¯wis a rank(w)×K1

matrix. Note w_e has full rank, and Mw =Mwe.

By assumption 1(i) we know λ00λ0/N has a probability limit, i.e., there exists some B1 >0

such that λ00λ0/N < B1IR wpa1. Using this and assumption 4 we find for any R×1 vector

a6= 0 we have kMvλ0ak2 kλ0_ak2 = a0λ00Mvλ0a a0_λ00 λ0a > B B1 , wpa1.

Applying Lemma S.4.3 we find min 06=b∈_Rrank(w) b0w_e0M_λ0 e w b b0 e w0 e w b = 06=mina∈RR a0λ00Mwλ0a a0_λ00 λ0a > B B1 , wpa1.

Therefore we find for every rank(w)×1 vector b that b0(w_e0M_λ0

e w−(B/B1)we 0 e w)b > 0, wpa1. Thus w_e0M_λ0 e w − (B/B1)we 0 e

w > 0, wpa1. Multiplying from the left with ¯w0 and from the right with ¯w we obtain w0M_λ0w−(B/B1)w0w ≥ 0, wpa1. This is what we wanted to show.

Analogously we can show the statement for v.

As a consequence of the this lemma we obtain some properties of the low-rank regressors summarized in the following lemma.

(13)

Lemma S.5.2. Let the assumptions 1 and 4 be satisfied and letXlow,α=

PK1

l=1αlXl be a linear combination of the low-rank regressors. Then there exists some constant B >0 such that

min {α∈RK1,kαk=1} X_low_,αM_f0X_low0 _,α N T > B , wpa1, min {α∈RK1_,kαk=1} M_λ0X_low_,αM_f0X_low0 _,αM_λ0 N T > B , wpa1. Proof. Note M_λ0X_low_,αM_f0X_low0 _,αM_λ0

≤ X_low_,αM_f0X_low0 _,α , because kM_λ0k = 1, i.e., if

we can show the second inequality of the lemma we have also shown the first inequality. We can write Xlow,α = wdiag(α0)v0. Using Lemma S.5.1 and part (v), (vi) and (ix) of Lemma S.4.1 we find M_λ0X_low_,αM_f0X_low0 _,αM_λ0 =kM_λ0wdiag(α0)v0M_f0vdiag(α0)w0M_λ0k ≥B0 kMλ0wdiag(α0)v0 vdiag(α0)w0M_λ0k ≥ B0 K1 Tr [M_λ0wdiag(α0)v0 vdiag(α0)w0M_λ0] = B0 K1Tr [vdiag(α 0 )w0M_λ0wdiag(α0)v0] ≥ B0 K1 kvdiag(α0)w0M_λ0wdiag(α0)v0k ≥ B 2 0 K1 kvdiag(α0)w0wdiag(α0)v0k ≥ B 2 0 K2 1 Tr [vdiag(α0)w0wdiag(α0)v0] = B 2 0 K2 1 TrXlow,αXlow0 ,α .

Thus we have M_λ0Xlow_,αM_f0X_low0 _,αM_λ0

/(N T) ≥ (B0/K1)2α0W_{N T}lowα , where the K1 ×K1

matrixW_{N T}lowis defined byW_{N T,l}low₁_l₂ = (N T)−1Tr Xl1X

0

l2

, i.e., it is a submatrix ofWN T. Because

WN T and thusWN Tlowconverges to a positive definite matrix the lemma is proven by the inequality above.

Using the above lemmas we can now prove the lower bound on Se

(2)

N T(β, f) that was used in the consistency proof. Remember

e S_{N T}(2)(β, f) = 1 N T Tr " λ0f00+ K X k=1 (β0_k−β_k)Xk ! Mf λ0f00 + K X k=1 (β0_k−β_k)Xk !0 P₍_λ0_,w₎ # .

(14)

We want to show under the assumptions of theorem 3.1 there exist finite positive constants a0, a1, a2, a3 and a4 such that

e S_{N T}(2)(β, f)≥ a0 βlow−β0,low 2 βlow−β0,low 2 +a1 βlow−β0,low +a2 −a3 βhigh−β0,high −a₄ βhigh−β0,high βlow−β0,low , wpa1.

Proof of the lower bound on Se

(2)

N T(β, f). Applying Lemma A.1 and part (xi) of Lemma S.4.1 we find e S_{N T}(2)(β, f)≥ 1 N T µR+1 " λ0f00+ K X k=1 (β0_k−β_k)Xk !0 P₍_λ0_,w₎ λ0f00+ K X k=1 (β0_k−β_k)Xk !# = 1 N T µR+1 " λ0f00+ K1 X l=1 (β0_l −β_l)wlvl0 !0 λ0f00+ K1 X l=1 (β0_l −β_l)wlvl0 ! + λ0f00+ K1 X l=1 (β0_l −β_l)wlv0l !0 P₍_λ0_,w₎ K X m=K1 (β0_m−β_m)Xm + K X m=K1 (β0_m−β_m)X_m0 P₍_λ0_,w₎ λ0f00 + K1 X l=1 (β0_l −β_l)wlvl0 ! + K X m=K1 (β0_m−β_m)X_m0 P₍_λ0_,w₎ K X m=K1 (β0_m−β_m)Xm # ≥ 1 N T µR+1 " λ0f00+ K1 X l=1 (β0_l −β_l)wlvl0 !0 λ0f00+ K1 X l=1 (β0_l −β_l)wlvl0 ! + λ0f00+ K1 X l=1 (β0_l −β_l)wlv0l !0 P₍_λ0_,w₎ K X m=K1 (β0_m−β_m)Xm + K X m=K1 (β0_m−β_m)X_m0 P₍_λ0_,w₎ λ0f00 + K1 X l=1 (β0_l −β_l)wlvl0 ! # ≥ 1 N T µR+1   λ 0 f00+ K1 X l=1 (β0_l −β_l)wlv0l !0 λ0f00 + K1 X l=1 (β0_l −β_l)wlvl0 !  −a3 βhigh−β0,high −a₄ βhigh−β0,high βlow−β0,low , wpa1,

(15)

Lemma S.4.1 and the fact that 1 N T K X m=K1 (β0_m−β_m)X_m0 P₍_λ0_,w₎ λ0f00 + K1 X l=1 (β0_l −β_l)wlvl0 ! ≤K βhigh−β0,high max m Xm √ N T λ0f00 √ N T +K βlow−β0,low max l wlvl0 √ N T .

Our assumptions guarantee the operator norms of λ0f00/√N T and Xm/

√

N T are bounded from above as N, T → ∞, which results in finite constants a3 and a4.

We write the above result as Se

(2) N T(β, f)≥µR+1(A0A)/(N T) + terms containingβ high , where we defined A = λ0f00 ₊PK1 l=1(β 0

l −βl)wlvl0. We also write A = A1 +A2 +A3, with A1 = MwA Pf0 =M_wλ0f00,A₂ =P_wA M_f0 =PK1 l=1(β 0 l −βl)wlvl0Mf0, A₃ =P_wA P_f0 =P_wλ0f00 + PK1 l=1(β 0 l −βl)wlv0lPf. We then find A0A =A01A1+ (A02+A 0 3)(A2+A3) and A0A ≥ A0A−(a1/2A0₃+a−1/2A0₂)(a1/2A3+a−1/2A2) = [A0₁A1−(a−1)A03A3] + (1−a−1)A02A2 ,

where≥ for matrices refers to the difference being positive definite, anda is a positive number. We choose a= 1 +µ_R(A0₁A1)/(2kA3k2). The reason for this choice becomes clear below.

Note [A0₁A1−(a−1)A03A3] has at most rank R (asymptotically it has exactly rank R).

The non-zero eigenvalues of A0A are therefore given by the (at most) R non-zero eigenvalues of [A0₁A1−(a−1)A₃0A3] and the non-zero eigenvalues of (1−a−1)A0₂A2, the largest one of the latter being given given by the operator norm (1−a−1)kA2k2. We therefore find

1 N T µR+1(A 0 A)≥ 1 N T µR+1 (A0₁A1−(a−1)A30A3) + (1−a−1)A02A2 ≥ 1 N T min (1−a−1)kA2k2 , µR[A 0 1A1−(a−1)A03A3] .

Using Lemma S.4.1(xii) and our particular choice of a we find

µ_R [A0₁A1−(a−1)A03A3]≥ µR(A 0 1A1)− k(a−1)A03A3k = 1 2µR(A 0 1A1). Therefore 1 N T µR+1(A 0 A)≥ 1 2N T µR(A 0 1A1) min 1, 2kA2k 2 2kA3k2+µR(A01A1) ≥ 1 N T kA2k2µR(A 0 1A1) 2kAk2₊_µ R(A01A1) ,

(16)

where we used kAk ≥ kA3k and kAk ≥ kA2k.

Our assumptions guarantee there exist positive constants c0,c1, c2, and c3 such that kAk √ N T ≤ kλ0_f00_k √ N T + K1 X l=1 |β0 l −βl| kw_√lvl0k N T ≤c0+c1 βlow−β0,low , wpa1, µ_R(A0₁A1) N T = µ_R f0λ00Mwλ0f00 N T ≥c2, wpa1, kA2k2 N T =µ1 " _K₁ X l1=1 (β0_l₁ −β_l₁)wl1v 0 l1Mf0 K1 X l2=1 (β0_l₂ −β_l₂)vl2w 0 l2 # ≥c3 βlow−β0,low 2 , wpa1,

were for the last inequality we used Lemma S.5.2. We thus have 1 N T µR+1(A 0 A)≥ c3 βlow−β0,low 2 1 + _c2 2 c0+c1 βlow−β0,low 2 , wpa1. Defining a0 = c₂2_cc23 1 , a1 = 2_cc₁0 and a2 = ₂c_c22 1 we thus obtain 1 N T µR+1(A 0 A)≥ a0 βlow−β0,low 2 βlow−β0,low 2 +a1 βlow−β0,low +a2 , wpa1,

i.e., we have shown the desired bound on Se

(2)

N T(β, f).

S.6 Regarding the Proof of Corollary 4.2

As discussed in the main text, the proof of Corollary 4.2 is provided in Moon and Weidner (2015). All that is left to show here is the matrix WN T = WN T(λ0, f0, Xk) does not become singular as N, T → ∞ under our assumptions.

Proof. Remember WN T = 1 N TTr(Mf0X 0 k1Mλ0Xk2).

(17)

The smallest eigenvalue of the symmetric matrix W(λ0, f0_{, X} k) is given by µ_K(WN T) = min {a∈_RK_{, a}₆₌₀_} a0WN T a kak2 = min {a∈RK, a6=0} 1 N T kak2Tr " Mf0 K X k1=1 ak1X 0 k1 ! M_λ0 K X k2=1 ak2Xk2 !# = min {α∈_RK1_{, ϕ}∈_RK2 α6= 0, ϕ6= 0}

TrMf0 X_low0 _,ϕ+X_high0 _,α M_λ0 (Xlow_,ϕ+Xhigh,α) N T (kαk2₊_kϕk2₎ ,

where we decomposeda= (ϕ0, α0)0, withϕandαbeing vectors of lengthK1andK2, respectively,

and we defined linear combinations of high- and low-rank regressors2

Xlow,ϕ = K1 X l=1 ϕ_lXl , Xhigh,α= K X m=K1+1 αmXm . We haveM_λ0 =M₍_λ0_,w₎+P₍_M

λ0w), where w is theN ×K1 matrix defined in assumption 4, i.e.,

(λ0, w) is an N ×(R+K1) matrix, whereas Mλ0w is also an N ×K₁ matrix. Using this we

obtain µ_K(WN T) = min {ϕ∈RK1_{, α}_∈_RK2 ϕ6= 0, α6= 0} 1 N T (kϕk2₊_kαk2₎ TrMf0 X_low0 _,ϕ+X_high0 _,α M₍_λ0_,w₎ (X_low_,ϕ+X_high_,α) + Tr M_f0 X_low0 _,ϕ+X_high0 _,α P₍_M λ0w) (Xlow,ϕ+Xhigh,α) = min {ϕ∈_RK1_{, α}∈_RK2 ϕ6= 0, α6= 0} 1 N T (kϕk2₊_kαk2₎ Tr M_f0X_high0 _,αM₍_λ0_,w₎X_high_,α + TrMf0 X_low0 _,ϕ+X_high0 _,α P(M_λ0w) (Xlow,ϕ+Xhigh,α) . (S.6.1)

2_{As in assumption 4 the components of}_α_{are denoted}_α

(18)

We note there exists finite positive constants c1, c2, and c3 such that 1 N TTr Mf0X_high0 _,αM₍_λ0_,w₎X_high_,α ≥ c1kαk2 , wpa1, 1 N TTr Mf0 X_low0 _,ϕ+X_high0 _,α P(M_λ0w) (Xlow,ϕ+Xhigh,α) ≥0, 1 N TTr Mf0X_low0 _,ϕP₍_M λ0w)Xlow,ϕ ≥ c2kϕk2 , wpa1, 1 N TTr Mf0X_low0 _,ϕP₍_M λ0w)Xhigh,α ≥ −c3 2 kϕkkαk, wpa1, 1 N TTr Mf0X_high0 _,αP₍_M λ0w)Xhigh,α ≥0, (S.6.2)

and we want to justify these inequalities now. The second and the last equation in (S.6.2) are true because, e.g., TrMf0X_high0 _,αP₍_M

λ0w)Xhigh,α = TrMf0X_high0 _,αP₍_M λ0w)Xhigh,αMf0 , and the trace of a symmetric positive semi-definite matrix is non-negative. The first inequality in (S.6.2) is true because rank(f0_{) + rank(}_λ0

, w) = 2R+K1 and using Lemma A.1 and assumption

4 we have 1 N Tkαk2Tr M_f0X_high0 _,αM₍_λ0_,w₎X_high_,α≥ 1 N Tkαk2µ2R+K1+1 Xhigh,αXhigh0 ,α > b , wpa1,

i.e., we can setc1 =b. The third inequality in (S.6.2) is true because according Lemma S.4.1(v)

we have 1 N TTr Mf0X_low0 _,ϕP₍_M λ0w)Xhigh,α ≥ − K1 N T kXlow,ϕk kXhigh,αk ≥ − K1 N T kXlow,ϕkFkXhigh,αkF ≥ −K1K1K2kϕk kαk max k1=1...K1 Xk1 √ N T F max k2=K1+1...K Xk2 √ N T F ≥ −c3 2 kϕk kαk, where we used that assumption 4 implies

Xk/ √ N T F

< C holds wpa1 for some constant C

as, and we set c3 =K1K1K2C2. Finally, we have to argue that the third inequality in (S.6.2)

holds. Note X_low0 _,ϕP(M_λ0w)Xlow,ϕ=X

0

low,ϕMλ0Xlow,ϕ, i.e., we need to show 1 N TTr Mf0X_low0 _,ϕM_λ0X_low_,ϕ ≥ c2kϕk2 .

Using part (vi) of Lemma S.4.1 we find 1 N TTr Mf0X_low0 _,ϕM_λ0X_low_,ϕ = 1 N TTr M_λ0X_low_,ϕM_f0X_low0 _,ϕM_λ0 ≥ 1 N T M_λ0X_low_,ϕM_f0X_low0 _,ϕM_λ0 ,

(19)

and according to Lemma S.5.2 this expression is bounded by some positive constant timeskϕk2

(in the lemma we havekϕk= 1, but all expressions are homogeneous inkϕk). Using the inequalities (S.6.2) in equation (S.6.1) we obtain

µ_K(WN T)≥ min {ϕ∈RK1_{, α}_∈_RK2 ϕ6= 0, α6= 0} 1 kϕk2₊_kαk2 c1kαk2+ max 0, c2kϕk2−c3kϕkkαk ≥min c2 2, c1c2₂ c2 2+c23 , wpa1.

Thus, the smallest eigenvalue of WN T is bounded from below by a positive constant asN, T →

∞, i.e., WN T is non-degenerate and invertible.

S.7 Proof of Examples for Assumption 5

Proof of Example 1. We want to show the conditions of Assumption 5 are satisfied. Condi-tions (i)-(iii) are satisfied by the assumpCondi-tions of the example.

For condition (iv), notice Cov (Xit, Xis|C) = E(UitUis). Because |β0| <1 and supitE(e2it)<

(20)

{eit} that we have for some finite constantM, 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXe_is, e_iuXe_iv|C = 1 N T2 N X i=1 T X t,s,u,v=1 |Cov (eitUis, eiuUiv)| = 1 N T2 N X i=1 T X t,s,u,v=1 ∞ X p,q=0 β 0p+q E(eiteis−peiueiv−q)− β0 p E(eiteis−p) β0 q E(eiueiv−q) ≤ M T2 T X t,s,u,v=1 ∞ X p,q=0 β0 p+q [_I{t=u}_I{s−p=v−q}+_I{t=v−q}_I{s−p=u}] = M T2 T X t,u,s,v=1 s X k=−∞ v X l=−∞ β0 s−k+v−l I{t=u}I{k=l}+M    1 T T X s,u=1 s−u≥0 β0 s−u       1 T T X v,t=1 v−t≥0 β0 v−t    = M T T X s,v=1 min{s,v} X k=−∞ β0 s+v−2k +M    1 T T X s,u=1 s−u≥0 β0 s−u       1 T T X v,t=1 v−t≥0 β0 v−t   . Notice 1 T T X s,v=1 min{s,v} X k=−∞ β0 s+v−2k = 2 T T X s=2 s X v=1 v X k=−∞ β0 s−v+2(v−k) + 2 T T X s=1 s X k=−∞ β0 2(s−k) = 2 T T X s=2 s X v=1 β0 s−v ∞ X l=0 β0 2l + 2 T T X s=1 ∞ X l=0 β0 2l = 2 1− β0 2 1 T T X s=2 s X v=1 β0 s−v + 2 1− β0 2 = 2 1− β0 2 !_T₋₁ X l=1 β0 l 1− l T + 2 1− β0 2 = O(1), and 1 T T X s,u=1 s−u≥0 β0 s−u = 1 T T X s=1 s X u=1 β0 s−u = T−1 X l=0 β0 l 1− l T =O(1).

(21)

Therefore, we have the desired result 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXe_is, e_iuXe_iv|C =Op(1).

Preliminaries for Proof of Example 2

• Although we observeXit for 1 ≤t ≤T,here we treatZit = (eit, Xit) as having an infinite past and future. Define

Gt

τ(i) = C ∨σ({Xis :τ ≤s≤t}) and Htτ(i) =C ∨σ({Zit :τ ≤s≤t}). Then, by definition, we haveGt

τ(i),Htτ(i)⊂ Fτt(i) for allτ , t, i.By Assumption (iv) of Ex-ample 2, the time series of{Xit :−∞< t <∞}and {Zit :−∞< t <∞} are conditional

α-mixing conditioning on C uniformly in i.

• Mixing inequality: The following inequality is a conditional version of the α-mixing in-equality of Hall and Heyde (1980), p. 278. Suppose Xit is a Ft-measurable random variable with _E|Xit|

max{p,q}

|C < ∞, where p, q > 1 with 1/p+ 1/q < 1. Denote

kXitkC,p = (E(|Xit| p

|C))1/p.Then, for each i, we have

|Cov (Xit, Xit+m|C)| ≤8kXitkC,pkXit+mkC,qα 1−1 p− 1 q m (i). (S.7.1)

Proof of Example 2. Again, we want to show the conditions of Assumption 5 are satisfied. Conditions (i)-(iii) are satisfied by the assumptions of the example.

(22)

where the last line holds because sup_i,tkXitk

2

C,p = Op(1) for some p > 4 as assumed in the example (2), and P∞ m=0α p−2 P m =P∞_m₌₀m−ζ p−2 P =O(1) because of ζ >3 4p 4p−1 and p >4.

For condition (v), we need to show 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXeis, eiuXeiv|C =Op(1). Notice 1 N T2 N X i=1 T X t,s,u,v=1 Cov eitXe_is, e_iuXe_iv|C = 1 N T2 N X i=1 T X t,s,u,v=1 E eitXe_ise_iuXe_iv|C −_EeitXe_is|C E eiuXe_iv|C ≤ 1 N T2 N X i=1 T X t,s,u,v=1 E eitXe_ise_iuXe_iv|C + 1 N N X i=1 1 T T X t,s=1 E eitXe_is|C !2 = I+II, say.

First, for termI,there are a finite number of different orderings among the indicest, s, u, v. We consider the caset ≤s≤u≤v and establish the desired result. The other cases can be shown analogously. Note 1 N T2 N X i=1 T X t=1 T−t X k=0 T−k X l=0 T−l X m=0 E eitXe_it₊_ke_it₊_k₊_lXe_it₊_k₊_l₊_m|C ≤ 1 N N X i=1 1 T2 T X t=1 X 0≤l,m≤k 0≤k+l+m≤T−t E eit e Xit+keit+k+lXe_it₊_k₊_l₊_m |C +1 N N X i=1 1 T2 T X t=1 X 0≤k,m≤l 0≤k+l+m≤T−t E h eitXeit+k eit+k+lXeit+k+l+m |Ci −_EeitXe_it₊_k|C E eit+k+lXe_it₊_k₊_l₊_m|C +1 N N X i=1 1 T2 T X t=1 X 0≤k,m≤l 0≤k+l+m≤T−t E eitXe_it₊_k|C E eit+k+lXe_it₊_k₊_l₊_m|C +1 N N X i=1 1 T2 T X t=1 X 0≤p,l≤m 0≤k+l+m≤T−t E h eitXe_it₊_ke_it₊_k₊_l e Xit+k+l+m|C i = I1 +I2+I3+I4, say.

(23)

By applying the mixing inequality (S.7.1) to E eit e Xit+keit+k+lXeit+k+l+m |C with eit and e Xit+keit+k+lXe_it₊_k₊_l₊_m, we have E eit e Xit+keit+k+lXeit+k+l+m |C ≤ 8keitkC,p Xeit+keit+k+lXeit+k+l+m _C ,q α1− 1 p− 1 q k (i) ≤ 8keitkC,p Xeit+k _C ,3q keit+k+lkC,3q Xeit+k+l+m _C ,3q α1− 1 p− 1 q k (i),

where the last inequality follows by the generalized Holder’s inequality. Choose p = 3q > 4.

Then, I1 ≤ 8 N N X i=1 1 T2 T X t=1 X 0≤l,m≤k 0≤k+l+m≤T−t keitkC,p Xeit+k _C ,p keit+k+lkC,p Xeit+k+l+m _C ,p α1− 1 4p k (i) ≤ 8 sup i,t keitk2C,p sup i,t Xeit+k 2 C,p 1 T2 T X t=1 X 0≤l,m≤k 0≤k+l+m≤T−t α1− 1 4p k ≤ 8 sup i,t keitk2C,p sup i,t Xeit+k 2 C,p ∞ X k=0 k2α1− 1 4p k ≤ Op(1),

where the last line holds because we assume in example (2) thatsup_i,tkeitk

2 C,p sup_i,t Xeit+k 2 C,p =

Op(1) for somep > 4,, and

P∞ m=0m 2_α1− 1 4p m =P∞_m₌₀m2−ζ 4p−1 4p ₌_O_{(1) because of}_{ζ >}₃ 4p 4p−1 and p >4.

By applying similar arguments, we can also show

I2, I3, I4 =Op(1).

S.8 Supplement to the Proof of Theorem 4.3

Notation _EC and VarC and CovC: In the remainder of this supplementary file we writeEC,

VarC and CovC for the expectation, variance and covariance operators conditional on C, i.e.,

EC(A) = E(A|C), VarC(A) = Var(A|C) and CovC(A, B) = Cov(A, B|C).

What is left to show to complete the proof of Theorem 4.3 is that Lemma B.1 and Lemma B.2 in the main text appendix hold. Before showing this, we first present two further intermediate lemmas.

(24)

Lemma S.8.1. Under the assumptions of Theorem 4.3 we have for k = 1, . . . , K, (a) kP_λ0Xekk=op( √ N T), (b) kXe_kP_f0k=o_p( √ N T), (c) kP_λ0eX_k0k=o_p(N3/2), (d) kP_λ0eP_f0k=O_p(1). Proof of Lemma S.8.1. # Part (a): We have

kP_λ0Xe_kk=kλ0(λ0 0 λ0)−1λ00Xe_kk ≤ kλ0₍_λ00 λ0)−1kkλ00 e Xkk ≤ kλ0kk(λ00λ0)−1kkλ00Xe_kk_F =O_p(N−1/2)kλ00Xe_kk_F ,

where we used part (i) and (ii) of Lemma S.4.1 and Assumption 1. We have

E n EC h kλ00Xe_kk2_F io =_E    R X r=1 T X t=1 EC   N X i=1 λ0_irXe_k,it !2     =_E ( _R X r=1 T X t=1 N X i=1 (λ0_ir)2_EC e X_k,it2 ) = R X r=1 T X t=1 N X i=1 E(λ0ir)2VarC(Xk,it) =Op(N T),

where we used Xe_k,it is mean zero and independent across i, conditional on C, and our bounds

on the moments of λ0_ir and Xk,it. We therefore have kλ00Xe_kk_F = O_p(

√

N T) and the above inequality thus gives kP_λ0Xe_kk=O_p(

√

T) =op(

√ N T).

# The proof for part (b) is similar. As above we first obtain kXe_kP_f0k = kP_f0Xe_k0k ≤

Op(T−1/2)kf00Xe_k0kF. Next, we have EC h kf00 e X_k0k2 F i = R X r=1 N X i=1 EC   T X t=1 f_tr0Xe_k,it !2  = R X r=1 N X i=1 T X t,s=1 f_tr0f_sr0_EC e Xk,itXe_k,is ≤ " _R X r=1 max t |f 0 tr| 2 # _N X i=1 T X t,s=1

|CovC(Xk,it, Xk,is)|

(25)

where we used that uniformly bounded _Ekf0

tk4+ implies maxt|ftr0| = Op(T1/(4+)). We thus havekf00 e X_k0k2 F =op(T √ N) and thereforekXe_kP_f0k=o_p( √ N T). # Next, we show part (c). First, we have

E n EC h kλ00eX_k0kF 2io =_E    EC   R X r=1 N X j=1 N X i=1 T X t=1 λ0_ireitXk,jt !2     =_E ( _R X r=1 N X i,j,l=1 T X t,s=1 λ0_irλ0_lr_EC(eitelsXk,jtXk,js) ) = R X r=1 N X i,j=1 T X t=1 E(λ0ir) 2 EC e2itX 2 k,jt =O(N2T),

where we used that _EC(eitelsXk,jtXk,js) is only non-zero if i = l (because of cross-sectional independence conditional onC) andt=s(because regressors are pre-determined). We can thus conclude kλ00eX_k0kF =Op(N

√

T). Using this we find

kP_λ0eX_k0k=kλ0(λ00λ0)−1λ00eX_k0k ≤ kλ0₍_λ00 λ0)−1kkλ00 eX_k0k ≤ kλ0kk(λ00λ0)−1kkλ00eX_k0kF =Op(N−1/2)Op(N √ T) = Op( √ N T).

This is what we wanted to show. # For part (d), we first find √1

N T f00eλ0 F =Op(1), because E    EC   f00eλ0 F √ N T !2     = _E    1 N TEC   N X i=1 T X t=1 eitft00λ 0 i !2     = _E ( 1 N T N X i=1 N X j=1 T X t=1 T X s=1 EC(eitejs)ft00λ 0 iλ 00 j f 0 s ) = 1 N T N X i=1 T X t=1 EEC e2it f_t00λ0_iλ0_i0f_t0 = O(1),

where we used eit is independent across i and over t, conditional on C. Thus we obtain

kP_λ0eP_f0k=kλ0(λ00λ0)−1λ00ef0(f00f0)−1f00k ≤ kλ0k(λ0 0 λ0)−1kλ0 0 ef0k(f00f0)−1 kf00k ≤ Op(N1/2)Op(N−1)kλ00ef0kFOp(T−1)Op(T1/2) =Op(1), where we used part (i) and (ii) of Lemma S.4.1.

(26)

Lemma S.8.2. Suppose A and B are T ×T and N ×N matrices that are independent of

e, conditional on C, such that _EC kAk2_F

= Op(N T) and EC kBk2_F

= Op(N T), and let Assumption 5 be satisfied. Then there exists a finite non-random constant c0 such that

(a) _EC {Tr [(e0e−_EC(e0e))A]} 2 ≤c0NEC kAk2_F , (b) _EC {Tr [(ee0 −_EC(ee0))B]} 2 ≤c0T _EC kBk2_F .

Proof. # Part (a): Denote Ats to be the (t, s) th element of A. We have Tr{(e0e−_EC(e0e))A}= T X t=1 T X s=1 (e0e−_EC(e0e))_tsAts = T X t=1 T X s=1 N X i=1 (eiteis−EC(eiteis)) ! Ats. Therefore, EC(Tr{(e0e−EC(e0e))A})2 = T X t=1 T X s=1 T X p=1 T X q=1 EC " _N X i=1 (eiteis−EC(eiteis)) ! _N X j=1 (ejpejq−EC(ejpejq)) !# EC(AtsApq).

Let Σit =_EC(e2it). Then we find

EC ( _N X i=1 (eiteis−EC(eiteis)) ! _N X j=1 (ejpejq−EC(ejpejq)) !) = N X i=1 N X j=1 {_EC(eiteisejpejq)−EC(eiteis)EC(ejpejq)} =            ΣitΣis if (t =p)6= (s=q) and (i=j) ΣitΣis if (t =q)6= (s=p) and (i=j) EC(e4_it)−Σ2_it if (t =s =p=q) and (i=j) 0 otherwise. Therefore, EC(Tr{(e0e−EC(e0e))A}) 2 ≤ T X t=1 T X s=1 N X i=1 ΣitΣis EC A2ts +_EC(AtsAst) + T X t=1 N X i=1 EC e4it −Σ2_it ECA2tt.

(27)

Define Σi _{= diag (Σ} i1, ...,ΣiT). Then, we have T X t=1 T X s=1 N X i=1 ΣitΣis ECA2_ts = _EC N X i=1 Tr A0ΣiAΣi ! ≤ N X i=1 EC AΣi 2 F ≤ N X i=1 Σi 2 ECkAk2_F ≤ N sup it Σ2_it ECkAk2_F . (S.8.1) Also, T X t=1 T X s=1 N X i=1 ΣitΣisEC(AtsAst) = EC " _N X i=1 Tr ΣiAAΣi # ≤ N X i=1 EC ΣiA F AΣi F ≤ N X i=1 Σi 2 ECkAk2_F ≤ N sup it Σ2_it ECkAk2_F . (S.8.2) Finally, T X t=1 N X i=1 EC e4_it−Σ_it2ECA2_tt ≤ N sup it E C e4_it ECkAk2_F, (S.8.3)

and sup_it_EC(e4it) is assumed bounded by Assumption 5(vi). # Part (b): The proof is analogous to the proof of part (a).

Proof of Lemma B.1. # For part (a) we have

1 √ N TTr Pf0e0P_λ0Xe_k = 1 √ N TTr Pf0e0P_λ0P_λ0Xe_kP_f0 ≤ √R N T kPλ 0e P_f0k Pλ0Xek kPf0k = √1 N T Op(1)op( √ N T)Op(1) =op(1),

where the second-last equality follows by Lemma S.8.1 (a) and (d). # To show statement (b) we define ζ_k,ijt =eitXe_k,jt. We then have

1 √ N TTr P_λ0eXe 0 k = R X r,q=1 " λ00λ0 N −1# rq 1 N√N T T X t=1 N X i,j=1 λ0_irλ0_jqζ_k,ijt | {z } ≡Ak,rq .

(28)

We only have _EC ζk,ijtζk,lms

6

= 0 if t = s (because regressors are pre-determined) and i = l

and j =m (because of cross-sectional independence). Therefore

EEC A2k,rq =E ( 1 N3_T T X t,s=1 N X i,j,l,m=1 λirλjqλlrλmqEC ζk,ijtζk,lms ) = 1 N3_T T X t=1 N X i,j=1 Eλ2irλ 2 jqEC ζ 2 k,ijt =O(1/N) =op(1).

We thus have Ak,rq =op(1) and therefore also √_{N T}1 Tr

P_λ0eXe_k0

=op(1).

# The proof for statement (c) is similar to the proof of statement (b). Define ξ_k,its =

eitXe_k,is−EC eitXe_k,is . We then have 1 √ N TTr n Pf0 h e0Xe_k−EC e0Xe_k io = R X r,q=1 " f0f T −1# rq 1 T√N T N X i=1 T X t,s=1 ftrfsqξk,its | {z } ≡Bk,rq . Therefore EC B_k,rq2 = 1 T3_N N X i,j=1 T X t,s,u,v=1 ftrfsqfurfvqEC ξk,itsξk,juv ≤ max t,er |fter| 4 1 T3_N N X i,j=1 T X t,s,u,v=1 CovC eitXe_k,is, e_juXe_k,jv = max t,er |ftre| 4 1 T3_N N X i=1 T X t,s,u,v=1 CovC eitXek,is, eiuXek,iv =Op(T4/(4+))Op(1/T) =op(1),

where we used uniformly bounded_Ekf0

tk4+ implies maxt|ftr0|=Op(T1/(4+)). # Part (d) and (e): We have kλ0(λ00λ0)−1₍_f00_f0₎−1_f00_k₌_O

p((N T)−1/2), kek=Op(N1/2),

kXkk=Op(

√

N T) andkP_λ0eP_f0k=O_p(1), which was shown in Lemma S.8.1. Therefore:

1 √ N TTr ePf 0e0M_λ0X_kf0(f00f0)−1(λ00λ0)−1λ00 = √1 N TTr Pλ0ePf0e 0 M_λ0X_kf0(f00f0)−1(λ00λ0)−1λ00 ≤ √R N T kPλ 0eP_f0k kekkX_kk f0(f00f0)−1(λ00λ0)−1λ00 =O_p(N−1/2) = o_p(1) .

(29)

which shows statement (d). The proof for part (e) is analogous.

# To prove statement (f) we need to use in addition kP_λ0e X_k0k=o_p(N3/2), which was also

shown in Lemma S.8.1. We find 1 √ N TTr e 0 M_λ0X_kM_f0e0λ0(λ00λ0)−1(f00f0)−1f00 = √1 N TTr e 0 M_λ0X_ke0P_λ0λ0(λ00λ0)−1(f00f0)−1f00 − √1 N TTr e 0 M_λ0X_kP_f0e0P_λ0λ0(λ00λ0)−1(f00f0)−1f00 ≤ √R N TkekkPλ 0e X_k0k kλ0(λ00λ0)−1(f00f0)−1f00k − √R N TkekkXkkkPλ0e Pf0kkλ 0 (λ00λ0)−1(f00f0)−1f00k =op(1).

# Now we want to prove part (g) and (h) of the present lemma. For part (g) we have 1 √ N TTr [ee0−_EC(ee0)] Mλ0X_kf0(f00f0)−1(λ00λ0)−1λ00 = √1 N TTr [ee0−_EC(ee0)] M_λ0X_kf0(f00f0)−1(λ00λ0)−1λ00 + √1 N TTr n [ee0−_EC(ee0)]Mλ0XekPf0f0(f00f0)−1(λ00λ0)−1λ00 o = √1 N TTr [ee0−_EC(ee0)] Mλ0X_kf0(f00f0)−1(λ00λ0)−1λ00 + √1 N T kee 0₋ EC(ee0)k XekPf0 f0(f0 0 f0)−1(λ00λ0)−1λ00 = √1 N TTr [ee0−_EC(ee0)] Mλ0Xkf0(f00f0)−1(λ0 0 λ0)−1λ00 +op(1).

Thus, what is left to prove is √1

N TTr

[ee0−_EC(ee0)] M_λ0X_kf0(f00f0)−1(λ00λ0)−1λ00 =o_p(1).

For this we define

Bk =Mλ0X_kf0(f00f0)−1(λ00λ0)−1λ00 .

Using part (i) and (ii) of Lemma S.4.1 we find

kBkkF ≤R1/2kBkk ≤R1/2kXkk f0(f00f0)−1(λ00λ0)−1λ00 ≤R1/2kXkkF f0(f00f0)−1(λ00λ0)−1λ00 .

(30)

and therefore EC kBkk2F ≤Rf0(f00f0)−1(λ00λ0)−1λ00 2 EC kXkk2F =O(1), where we used _EC kXkk2F

= O(N T), which is true because we assumed uniformly bounded moments of Xk,it. Applying Lemma S.8.2 we therefore find

EC 1 √ N TTr{[ee 0₋ EC(ee0)]Bk} 2 ≤c0 T N T EC kBkk 2 F =o(1) , and thus 1 √ N TTr{[ee 0₋ EC(ee0)]Bk}=op(1), which is what we wanted to show. The proof for part (h) is analogous.

# Part (i): Conditional on C the expression e2

itXitX0it−EC(e2itXitX0it) is mean zero, and it is also uncorrelated across i. This together with the bounded moments that we assume implies

VarC ( 1 N T N X i=1 T X t=1 e2_itXitX0it−EC e2itXitX0it ) =Op(1/N) = op(1),

which shows the required result.

# Part (j): Define the K ×K matrix A = _{N T}1 PN

i=1 PT t=1 e2it(Xit+Xit) (Xit− Xit) 0 . Then we have 1 N T N X i=1 T X t=1 e2_it(XitX0it− XitXit0) = 1 2(A+A 0 ).

LetBk be theN ×T matrix with elements Bk,it =e2it(Xk,it+Xk,it). We havekBkk ≤ kBkkF =

Op(

√

N T), because the moments of Bk,it are uniformly bounded. The components of A can be written as Alk = _{N T}1 Tr[Bl(Xk− Xk)0]. We therefore have

|Alk| ≤ 1

N Trank(Xk− Xk)kBlk kXk− Xkk.

We have Xk− Xk =XekPf0 +P_λ0XekMf0. Therefore rank(X_k− X_k)≤2R and |Alk| ≤ 2R N TkBlk XekPf0 + Pλ0XekMf0 ≤ 2R N TkBlk XekPf0 + Pλ0Xek = 2R N TOp( √ N T)op( √ N T) =op(1), where we used Lemma S.8.1. This shows the desired result.

(31)

Proof of Lemma B.2. Letcbe aK-vector such thatkck= 1. The required result follows by the Cramer-Wold device, if we show

1 √ N T N X i=1 T X t=1 eitX0itc ⇒ N(0, c 0 Ωc).

For this, define ξ_it = eitX0itc. Furthermore define ξm = ξM,m = ξN T,it, with M = N T and

m=T(i−1) +t∈ {1, . . . , M}. We then have the following:

(i) Under Assumption 5(i), (ii), (iii) the sequence {ξ_m, m = 1, . . . , M} is a martingale dif-ference sequence under the filtration Fm =C ∨σ({ξn:n < m}).

(ii) _E(ξ4_it) is uniformly bounded, because by Assumption 5(vi) _ECe8it and EC(kXitk8+) are uniformly bounded by a non-random constant (applying Cauchy-Schwarz and the law of iterated expectations). (iii) _M1 PM m=1ξ 2 m =c 0_Ω_c₊_o p(1).

This is true, because firstly under our assumptions we have_EC

h 1 M PM m=1 ξ 2 m−EC(ξ2m) i2 = EC n 1 M2 PM m=1 ξ 2 m−EC(ξ2m) 2o = OP(1/M) = oP(1), implying we have _M1 PM m=1ξ 2 m = 1 M PM m=1EC(ξ 2 m) +op(1). We furthermore have _M1 PM m=1EC(ξ 2 m) = VarC(M−1/2PM_m₌₁ξm), and using the result in equation (14) of the main text we find VarC(M−1/2PM_m₌₁ξm) = VarC((N T)−1/2PN_i₌₁PT_t₌₁ξ_it) =c0Ωc+op(1).

These three properties of{ξ_m, m= 1, . . . , M}allow us to apply Corollary 5.26 in White (2001), which is based on Theorem 2.3 in Mcleish (1974), to obtain √1

M

PM

m=1ξm →d N(0, c0Ωc). This concludes the proof, because √1

M PM m=1ξm = 1 √ N T PN i=1 PT t=1eitX 0 itc.

S.9 Expansions of Projectors and Residuals

The incidental parameter estimatorsfband bλas well as the residuals b

eenter into the asymptotic bias and variance estimators for the LS estimatorbβ. To describe the properties offb,bλ and

b

e, it is convenient to have asymptotic expansions of the projectorsM

b

λ(β) andMfb(β) that correspond

to the minimizing parameters bλ(β) and fb(β) in equation (4). Note the minimizing bλ(β) and b

f(β) can be defined for all values of β, not only for the optimal valueβ =bβ. The corresponding

(32)

Theorem S.9.1. Under Assumptions 1, 3, and 4(i) we have the following expansions M b λ(β) =Mλ0 +M (1) b λ,e +M (2) b λ,e − K X k=1 β_k−β0_kM(1) b λ,k+M (rem) b λ (β), M b f(β) =Mf0 +M(1) b f ,e +M (2) b f ,e − K X k=1 β_k−β0_k M(1) b f ,k+M (rem) b f (β), b e(β) =M_λ0e M_f0 + b e(1)_e − K X k=1 β_k−β0_k_be(1)_k +_be(rem)(β),

where the spectral norms of the remainders satisfy for any series η_{N T} →0:

sup {β:kβ−β0_k_≤_η N T} M (rem) b λ (β) kβ−β0k2_{+ (}_{N T}₎−1/2_{kek kβ}₋_β0_k _{+ (}_{N T}₎−3/2_kek3 =Op(1) , sup {β:kβ−β0k≤ηN T} M (rem) b f (β) kβ−β0k2_{+ (}_{N T}₎−1/2_{kek kβ}₋_β0_k _{+ (}_{N T}₎−3/2_kek3 =Op(1) , sup {β:kβ−β0k≤ηN T} _be(rem)(β) (N T)1/2_kβ₋_β0_k2₊_{kek kβ}₋_β0_k_{+ (}_{N T}₎−1_kek3 =Op(1) ,

and we have rank(_be(rem)₍_β₎₎_≤₇_R_{, and the expansion coefficients are given by} M(1) b λ,e =−Mλ0e f 0₍_f00 f0)−1(λ00λ0)−1λ00 − λ0(λ00λ0)−1(f00f0)−1f00e0M_λ0 , M(1) b λ,k =−Mλ0Xkf 0 (f00f0)−1(λ00λ0)−1λ00 − λ0(λ00λ0)−1(f00f0)−1f00X_k0 M_λ0 , M(2) b λ,e =Mλ0e f 0₍_f00 f0)−1(λ00λ0)−1λ00e f0(f00f0)−1(λ00λ0)−1λ00 +λ0(λ00λ0)−1(f00f0)−1f00e0λ0(λ00λ0)−1(f00f0)−1f00e0M_λ0 −M_λ0e M_f0e0λ0(λ00λ0)−1(f00f0)−1(λ00λ0)−1λ00 −λ0(λ00λ0)−1(f00f0)−1(λ00λ0)−1λ00e Mf0e0M_λ0 −M_λ0e f0(f00f0)−1(λ00λ0)−1(f00f0)−1f00e0M_λ0 +λ0(λ00λ0)−1(f00f0)−1f00e0M_λ0e f0(f00f0)−1(λ00λ0)−1λ00,

(33)

analogously M(1) b f ,e = −Mf0e 0 λ0(λ00λ0)−1(f00f0)−1f00 − f0(f00f0)−1(λ00λ0)−1λ00e Mf0 , M(1) b f ,k = −Mf0X 0 kλ 0₍_λ00 λ0)−1(f00f0)−1f00 − f0(f00f0)−1(λ00λ0)−1λ00XkMf0 , M(2) b f ,e =Mf0e 0 λ0(λ00λ0)−1(f00f0)−1f00e0λ0(λ00λ0)−1(f00f0)−1f00 +f0(f00f0)−1(λ00λ0)−1λ00e f0(f00f0)−1(λ00λ0)−1λ00e M_f0 −Mf0e0M_λ0e f0(f00f0)−1(λ00λ0)−1(f00f0)−1f00 −f0(f00f0)−1(λ00λ0)−1(f00f0)−1f00e0M_λ0e M_f0 −M_f0e0λ0(λ00λ0)−1(f00f0)−1(λ00λ0)−1λ00e M_f0 +f0(f00f0)−1(λ00λ0)−1λ00e Mf0e0λ0(λ00λ0)−1(f00f0)−1f00, and finally b e(1)_k =M_λ0X_kM_f0 , b e(1)_e =−M_λ0e M_f0e0λ0(λ00λ0)−1(f00f0)−1f00 −λ0(λ00λ0)−1(f00f0)−1f00e0M_λ0e M_f0 −M_λ0e f0(f00f0)−1(λ00λ0)−1λ00e M_f0 . Proof. The general expansion of M

b

λ(β) is given in Moon and Weidner (2015), and in the theorem we just make this expansion explicit up to a particular order. The result for M

b

f(β) is just obtained by symmetry (N ↔T,λ ↔f, e↔e0, Xk↔Xk0). For the residuals be we have

b e=M_b_λ Y −X k=1 b β_kXk ! =M_b_λ he−βb−β0 ·X+λ0f00i ,

and plugging in the expansion of M_b_λ gives the expansion of _be. We have _be(β) = A0+λ0f00−

b λ(β)fb0(β), whereA₀ =e− P k(βk−β 0 k)Xk. Therefore be (rem)₍_β_{) =} _A 1+A2+A3 with A1 =A0− M_λ0A₀M_f0, A₂ =λ0f00−bλ(β)fb0(β), and A₃ =− b

e(1)e . We find rank(A1)≤2R, rank(A2)≤2R,

rank(A3)≤3R, and thus rank(be

(rem)₍_β₎₎_≤₇_R_{, as stated in the theorem.}

Having expansions for M

b

λ(β) andMfb(β), we also have expansions for Pλb(β) =IN −Mbλ(β)

and P_f_b(β) = _IT − Mfb(β). The reason why we give expansions of the projectors and not

expansions of bλ(β) andfb(β) directly is for the latter we would need to specify a normalization,

whereas the projectors are independent of any normalization choice. An expansion forbλ(β) can,

for example, be defined bybλ(β) =P

b

λ(β)λ

0

, in which case the normalization ofbλ(β) is implicitly

(34)

S.10 Consistency Proof for Bias and Variance Estimators

(Proof of Theorem 4.4)

It is convenient to introduce some alternative notation for Definition 1 in section 4.3 of the main text.

Definition Let Γ : _R → _R be the truncation kernel defined by Γ(x) = 1 for |x| ≤ 1, and Γ(x) = 0 otherwise. Let M be a bandwidth parameter that depends on N andT. For an N×N

matrix A with elements Aij and a T ×T matrix B with elements Bts we define (i) the diagonal truncations AtruncD _{= diag[(}_A

ii)i=1,...,N] and BtruncD = diag[(Btt)t=1,...,T]. (ii) the right-sided Kernel truncation of B, which is a T ×T matrix BtruncR _{with elements}

BtruncR

ts = Γ

s−t M

Bts for t < s, and BtstruncR = 0 otherwise.

Here, we suppress the dependence of BtruncR _{on the bandwidth parameter} _M_{. Using this}

notation we can represent the estimators for the bias in Definition 1 as follows:

b B1,k = 1 N Tr h P b f(be 0 Xk) truncRi , b B2,k = 1 TTr h (_be_be0)truncD M b λXkfb(fb 0 b f)−1(bλ 0 b λ)−1bλ 0i , b B3,k = 1 NTr h (_be0_be)truncD M b fX 0 kλb(bλ 0 b λ)−1(fb0fb)−1fb0 i .

Before proving Theorem 4.4 we establish some preliminary results.

Corollary S.10.1. Under the Assumptions of Theorem 4.3 we have √N T bβ−β0

=Op(1). This corollary directly follows from Theorem 4.3.

Corollary S.10.2. Under the Assumptions of Theorem 4.4 we have

P b λ −Pλ0 = M b λ−Mλ0 =O_p(N−1/2), Pfb−Pf0 = Mfb−Mf0 =Op(T −1/2 ).

Proof. Usingkek=Op(N1/2) andkXkk=Op(N) we find the expansion terms in Theorem S.9.1 satisfy M (1) b λ,e =Op(N −1/2₎_, M (2) b λ,e =Op(N −1₎_, M (1) b λ,k =Op(1) .

Together with corollary S.10.1 the result for M

b

λ−Mλ0

immediately follows. In addition we

haveP

b

λ−Pλ0 =−Mbλ+Mλ

0. The proof for M

b

(35)

Lemma S.10.3. Under the Assumptions of Theorem 4.4 we have A1 ≡ 1 N T N X i=1 T X t=1 e2_itXitXit0 −Xb_itXb_it0 =op(1), A2 ≡ 1 N T N X i=1 T X t=1 e2_it−_be2_itXb_itXb_it0 =o_p(1).

Lemma S.10.4. Let fband f0 be normalized as fb0f /Tb = I_R and f00f0/T = I_R. Then, under

the assumptions of Theorem 4.4, there exists an R×R matrix H =HN T such that3

fb−f 0_H =Op(1) , bλ−λ 0 (H0)−1 =Op(1) . Furthermore bλ(bλ 0 b λ)−1(fb0fb)−1fb0 −λ0(λ00λ0)−1(f00f0)−1f00 =Op N −3/2 . Lemma S.10.5. Under the Assumptions of Theorem 4.4 we have

(i) N−1 EC(e 0_X k)−(be 0_X k) truncR =op(1), (ii) N−1 EC(e 0 e)−(_be0_be)truncD =op(1), (iii) T−1 EC(ee 0 )−(_bee_b0)truncD =op(1).

Lemma S.10.6. Under the Assumptions of Theorem 4.4 we have

(i) N−1 (be 0 Xk) truncR =Op(M T 1/8₎_, (ii) N−1 (be 0 b e)truncD =Op(1) , (iii) T−1 (bebe 0 )truncD =Op(1) .

The proof of the above lemmas is given section S.11 below. Using these lemmas we can now prove Theorem 4.4.

Proof of Theorem 4.4, Part I: show cW =W +op(1).

3_{We consider a limit}_{N, T} _{→ ∞} _{and for different}_{N, T} _different_H_{-matrices can be chosen, but we write} _H

(36)

Using|Tr (C)| ≤ kCkrank (C) and corollary S.10.2 we find: cWk1k2−WN T ,k1k2 = (N T)−1Trh M_λ_b−M_λ0 Xk1MfbX 0 k2 i + (N T)−1TrhM_λ0X_k₁ M_f_b−Mf0 X_k0₂i ≤ 2R N T M b λ−Mλ0 kXk1kkXk2k 2R N T Mfb−Mf 0 kXk1kkXk2k = 2R N TOp(N −1₎_O p(N T) + 2R N TOp(T −1₎_O p(N T) =op(1). Thus we have cW =WN T +op(1) = W +op(1).

Proof of Theorem 4.4, Part II: show Ω = Ω +b op(1).

Let ΩN T ≡ _{N T}1 PN i=1 PT t=1 e 2 itXitXit0. We have Ω = ΩN T +oP(1) = Ω +b A₁ +A₂ +o_p(1) = b

Ω +oP(1), whereA1 and A2 are defined in Lemma S.10.3, and the lemma states A1 and A2 are op(1).

Proof of Theorem 4.4, Part III: show Bb₁ =B₁ +o_p(1).

LetB1,k,N T =N−1Tr [Pf0_E_C(e0X_k)]. According to Assumption 6 we haveB₁_,k =B₁_{,k,N T}+o_p(1).

What is left to show is B1,k,N T =Bb1,k+op(1). Using|Tr (C)| ≤ kCkrank (C) we find

B1,k,N T −Bb1 = EC 1 NTr(Pf0e 0 Xk) − 1 NTr h P b f(be 0 Xk) truncRi ≤ 1 NTr h P_f0 −P b f (_be0Xk) truncRi + 1 NTr n P_f0 h EC(e0 Xk)−(be 0 Xk) truncRio ≤ 2R N Pf0 −Pfb (be 0 Xk) truncR + R N kPf0k EC(e 0 _X k)−(be 0_X k) truncR .

We have kPf0k= 1. We now apply Lemmas S.10.5, S.10.2 and S.10.6 to find

B1,k,N T −B1b =N −1 _O p(N−1/2)Op(M N T1/8) +op(N)=op(1).

This is what we wanted to show.

Proof of Theorem 4.4, final part: show Bb₂ =B₂+o_p(1) and Bb₃ =B₃+o_p(1).

Define B2,k,N T = 1 TTr EC(ee0) Mλ0X_kf0(f00f0)−1(λ00λ0)−1λ00 .

(37)

According to Assumption 6 we have B2,k =B2,k,N T +op(1). What is left to show is B2,k,N T = b B2,k+op(1). We have B2,k−Bb2,k = 1 TTr EC(ee0) Mλ0Xkf0(f00f0)−1(λ00λ0)−1λ00 − 1 TTr h (_be_be0)truncD M b λXkfb(fb 0 b f)−1(bλ 0 b λ)−1bλ 0i =1 TTr h (_be_be0)truncD M b λXk f0(f00f0)−1(λ00λ0)−1λ00 −fb(fb0fb)−1(bλ 0 b λ)−1bλ 0i + 1 TTr h (_be_be0)truncD M_λ0 −M b λ Xkf0(f00f0)−1(λ00λ0)−1λ00 i + 1 TTr nh EC(ee0)−(_be_be0)truncD i M_λ0X_kf0(f00f0)−1(λ00λ0)−1λ00 o .

Using|Tr (C)| ≤ kCkrank (C) (which is true for every square matrix C) we find

B2,k−Bb2,k ≤ R T (bebe 0 )truncD kXkk f 0₍_f00 f0)−1(λ00λ0)−1λ00−fb(fb0fb)−1(bλ 0 b λ)−1bλ 0 +R T (ebbe 0 )truncD M_λ0 −M b λ kX_kk f0(f00f0)−1(λ00λ0)−1λ00 +R T EC(ee 0₎₋₍ b e_be0)truncD kXkk f0(f00f0)−1(λ0 0 λ0)−1λ00 . Here we used kMf0k = Mfb = 1. Using kXkk = Op( √

N T), and applying Lemmas S.10.2, S.10.4, S.10.5 and S.10.6, we now find

B2,k−Bb2,k =T −1 Op(T)Op((N T)1/2)Op(N−3/2) +Op(T)Op(N−1/2)Op((N T)1/2)Op((N T)−1/2) +op(T)Op((N T)1/2)Op((N T)−1/2) =op(1).

This is what we wanted to show. The proof of Bb₃ =B₃+o_p(1) is analogous.

S.11 Proof of Intermediate Lemma

Here we provide the proof of some intermediate lemmas that were stated and used in section S.10. The following lemma gives a useful bound on the maximum of (correlated) random variables

Lemma S.11.1. Let Zi, i = 1,2, . . . , n, be n real valued random variables, and let γ ≥ 1 and

B >0 be finite constants (independent ofn). Assume maxi EC|Zi|γ≤B, i.e., the γ’th moment of the Zi are finite and uniformly bounded. For n → ∞ we then have

max

i |Zi|=Op n

1/γ