Nonparametric Time-Varying Coefficient Panel Data Models with Fixed Effects

(1)

The University of Adelaide

School of Economics

Research Paper No. 2010-08

May 2010

Nonparametric Time-Varying Coefficient Panel

Data Models with Fixed Effects

(2)

Nonparametric Time–Varying Coefficient Panel Data Models with Fixed Effects

By Degui Li, Jia Chen and Jiti Gao

School of Economics, The University of Adelaide, Adelaide, Australia

Abstract

This paper is concerned with developing a nonparametric time–varying coefficient model with fixed effects to characterize nonstationarity and trending phenomenon in nonlinear panel data analysis. We develop two methods to estimate the trend function and the coeffi-cient function without taking the first difference to eliminate the fixed effects. The first one eliminates the fixed effects by taking cross–sectional averages, and then uses a nonparamet-ric local linear approach to estimate the trend function and the coefficient function. The asymptotic theory for this approach reveals that although the estimates of both the trend function and the coefficient function are consistent, the estimate of the coefficient function has a rate of convergence of (T h)−1/2 that is slower than that of the trend function, which has a rate of (N T h)−1/2. To estimate the coefficient function more efficiently, we propose a pooled local linear dummy variable approach. This is motivated by a least squares dummy variable method proposed in parametric panel data analysis. This method removes the fixed effects by deducting a smoothed version of cross–time average from each individual. It estimates the trend function and the coefficient function with a rate of convergence of (N T h)−1/2_{. The asymptotic distributions of both of the estimates are established when} _T

tends to infinity and N is fixed or both T and N tend to infinity. Simulation results are provided to illustrate the finite sample behavior of the proposed estimation methods.

JEL Classifications: C13, C14, C23.

Abbreviated Title: Time–Varying Coefficient Models

Keywords: Fixed effects, local linear estimation, nonstationarity, panel data, specification testing, time–varying coefficient function.

(3)

1. Introduction

Panel data analysis has received a lot of attention during the last two decades due to applications in many disciplines, such as economics, finance and biology. The double–index panel data models enable researchers to estimate complex models and extract information which may be difficult to obtain by applying purely cross section or time series models. There exists a rich literature on parametric linear and nonlinear panel data models, see the books by Baltagi (1995), Arellano (2003) and Hsiao (2003). However, it is known that the parametric panel data models may be misspecified, and estimators obtained from misspecified models are often inconsistent. To address such issues, some nonparametric methods have been used in both panel data model estimation and specification testing, see Ullah and Roy (1998), Lin and Ying (2001), Hjellvik, Chen and Tjøstheim (2004), Cai and Li (2008), Henderson, Carroll and Li (2008), and Mammen, Støve and Tjøstheim (2009).

Meanwhile, the trending econometric modeling of nonstationary processes has also gained a great deal of attention in recent years. For example, it is generally believed that the increase in carbon dioxide emissions through the 20th century has caused global warming problem and it is important to model the trend of the global temperature. Some existing literature, such as Gao and Hawthorne (2006), revealed that the parametric linear trend does not approximate well the behavior of global temperature data. Hence, nonpara-metric modelling of the trending phenomenon has since attracted interest. One of the key features of the nonparametric trending model is that it allows for the data to “speak for themselves” with regard to choosing the form of the trend. For the recent development in nonparametric and semiparametric trending modelling of time series or panel data, see Gao and Hawthorne (2006), Cai (2007), Robinson (2008), Atak, Linton and Xiao (2009) and the references therein.

While there is a rich literature on parametric and nonparametric time–varying coefficient time series models (Robinson 1989; Phillips 2001; Cai 2007), as far as we know, few work has been done in the panel data case. The recent work by Robinson (2008) may be among the first to introduce a trending time–varying model for the panel data case where cross– section dependence is incorporated. In both theory and applications, various explanatory variables are of significant interest when modeling the trend of a panel data. Thus, it may be more informative and useful to add such explanatory variables into a time–varying panel data model when modeling the trend of a panel data.

(4)

data model of the form Yit = ft+ d P j=1 βt,jXit,j+αi+eit = ft+Xit>βt+αi+eit, i= 1,· · ·, N, t= 1,· · ·, T, (1.1)

whereXit = (Xit,1,· · ·, Xit,d)>, βt= (βt,1,· · ·, βt,d)>, allβt and ft are unknown functions,

{αi}reflects unobserved individual effect, and{eit}is stationary and weakly dependent for each i and independent of {Xit} and {αi}, T is the time series length and N is the cross section size.

Model (1.1) is called a fixed effects model as{αi}is allowed to be correlated with{Xit} with an unknown correlation structure. Model (1.1) is called a random–effects model if

{αi}is uncorrelated with {Xit}. For the purpose of identification, we assume that

N

X

i=1

αi = 0, (1.2)

throughout the paper.

Model (1.1) includes many interesting nonparametric panel data models. For example, when{βt}does not vary over time and reduces to a vector of constants, model (1.1) becomes a semiparametric trending panel data model with fixed effects. When βt ≡0d (0d is a d– dimensional null vector), model (1.1) reduces to a nonparametric trending panel data model as discussed in Robinson (2008), which allows for cross–sectional dependence for{eit}.

The aim of this paper is to construct consistent estimates for the time trendftand time– varying coefficient vectorβtbefore we establish asymptotic properties for the estimates. As in Robinson (1989, 2008) and Cai (2007), we suppose that the time trend functionft and the coefficient vectorβt satisfy

ft=f _t T and βt,j =βj _t T , t= 1,· · ·, T, (1.3)

wheref(·) andβj(·) are unknown smooth functions.

In this paper, we consider two classes of local linear estimates. As PN

i=1

αi = 0, the first method eliminates the fixed effects by taking cross–sectional averages, and we call it the averaged local linear method. We establish asymptotic distributions for the resulting estimates off(·) andβ(·) under mild conditions. The asymptotic results reveals that as both

T and N tend to infinity, the rate of convergence for the estimate of the coefficient function

(5)

f(·) is OP((N T h)−1/2). To improve the rate of convergence for the coefficient function, a local linear dummy variable approach is proposed. This is motivated by a least squares dummy variable method proposed in parametric panel data analysis (see Chapter 3 of Hsiao 2003 for example). This method removes the fixed effects by deducting a smoothed version of cross–time average from each individual. As a consequence, it is shown that the rate of convergence ofOP((N T h)−1/2) for both of the estimates is achievable. The simulation study in Section 3 confirms that the local linear dummy variable estimate ofβ(·) outperforms the averaged local linear estimate.

The rest of this paper is organized as follows. The two classes of local linear estimates as well as their asymptotic distributions are given in Section 2. The simulated example is provided in Section 3. Section 4 summarizes some conclusions and discusses future research. All the mathematical proofs of the asymptotic results are relegated to Appendix A.

2. Nonparametric estimation method and asymptotic theory

In this section, we introduce two classes of local linear estimates and establish the asymp-totic distributions of the proposed estimates. In Section 2.1, we consider the averaged local linear estimation method. Section 2.2 discusses the local linear dummy variable approach.

2.1. Averaged local linear estimation

To introduce the estimation method, we introduce some notation. Define

Y·t= 1 N N X i=1 Yit, X·t= 1 N N X i=1 Xit and e·t= 1 N N X i=1 eit.

By taking averages over iand using N

P

i=1

αi = 0, we have

Y·t=ft+X·>tβt+e·t, t= 1,· · ·, T, (2.1) where the individual effects αi’s are eliminated.

The averaged model (2.1) can be treated as a nonparametric time–varying coefficient time series model, see Robinson (1989) and Cai (2007) for example. The formulation in model (2.1) makes it possible to directly estimate f(·) and βj(·) nonparametrically.

Letting Y = (Y·1,· · ·, Y·T)>, f = (f1,· · ·, fT)>, B(X, β) =

X_·>₁β1,· · ·, X·>TβT

>

and

e= (e·1,· · ·, e·T)>, model (2.1) can then be rewritten as the following vector form:

(6)

We then adopt the conventional local linear approach (see, for example, Fan and Gijbels 1996) to estimate

β∗(·) = (f(·), β1(·),· · ·, βd(·))>. For given 0< τ <1, define

D(τ) =        1 X_·>₁ 1−_{T h}τ T 1−_{T h}τ TX_·>₁ .. . ... ... ... 1 X_·>_T T_{T h}−τ T T−_{T h}τ TX_·>_T        and W(τ) = diag K ₁₋_{τ T} T h ,· · ·, K _T₋_{τ T} T h .

Assuming that β∗(·) has continuous derivatives of up to the second order, by Taylor

expansion we have

β∗(t) =β∗(τ) +β∗0(τ)(t−τ) +O

(t−τ)2, (2.3) where 0< τ <1 andβ∗0(·) is the derivative ofβ∗(·). Based on the local linear approximation

in (2.3), β_∗>(τ), β_∗0>(τ)> can be estimated by solving the optimization problem: arg min

a∈Rd+1_,b_∈_Rd+1

Y −D(τ)(a>, b>)>>W(τ)Y −D(τ)(a>, b>)>. (2.4) Following the standard argument, the local linear estimator of β∗(τ) is

b

β∗(τ) = [Id+1, 0d+1]

h

D>(τ)W(τ)D(τ)i−1D>(τ)W(τ)Y, (2.5) whereId+1 is a (d+ 1)×(d+ 1) identity matrix and0d+1 is a (d+ 1)×(d+ 1) null matrix.

To establish asymptotic results, we need to introduce the following regularity conditions. Here and in the sequel, defineµj =R ujK(u)du andνj =RujK2(u)du.

A1 The probability kernel function K(·) is symmetric and Lipschitz continuous with a compact support [−1, 1].

A2 (i) {(Xi, ei), i ≥ 1} is a sequence of independent and identically distributed (i.i.d.) variables, whereXi = (Xit, t≥1) and ei = (eit, t≥1). Furthermore, for each i≥1,

{(Xit, eit), t ≥ 1} is stationary and α–mixing with mixing coefficient αk satisfying

αk=O(k−τ), where τ >(δ+ 2)/δ for someδ >0 involved in (ii) below. (ii)E(Xit) = 0d and there exists a positive definite matrix ΣX :=E

XitXit> . Fur-thermore,EkXitk2(2+δ)

(7)

(iii) The error process{eit}is independent of{Xit}withE[eit] = 0 andσ2e =E e2 it < ∞. Furthermore, Eh|eit|2+δ i < ∞ and σ2_eΣX + 2 ∞ P t=1 cX(t)ce(t) is positive definite, wherecX(t) =E XisXi,s>+t and ce(t) =E(eisei,s+t).

A3 The trend function f(·) and the coefficient function β(·) have continuous derivatives up to the second order.

A4The bandwidth hsatisfies that h→0 andT h→ ∞ asT → ∞.

Remark 2.1. The conditions on the kernel function K(·) in A1 are imposed for brevity of our proofs and they can be weakened. For example, the compact support assumption can be removed if we impose certain restriction on the tail of the kernel function. In A2 (i), we assume that {(Xi, ei), i ≥ 1} is cross–sectional independent and each time series component is α–mixing. Such assumptions are reasonable and verifiable and cover many linear and nonlinear time series models (see, for example, Fan and Yao 2003; Gao 2007; Li and Racine 2007). Condition A2(iii) imposes the homoscedasticity assumption on {eit}. Since model (1.1) allows for the inclusion of the fixed effects term{αi}and therefore model (1.1) allows for endogeneity, the homoscedasticity assumption is less restrictive than in the time series case. Having mentioned this, the independence between {eit} and {Xit} may be weakened through allowing eit = σ(Xit, t)it, where σ(x, t) is a positive function and Lipschitz–continuous int, and{it}satisfies A2(iii) withE[it] = 0 andE[2it] = 1. Both A3 and A4 are mild common conditions on the smoothness of the functions and the bandwidth involved in the local linear fitting. For the brevity of the proofs in Appendix A, we use Assumptions A.1–A.4 throughout this paper.

Define ∆∗_X =    1 0>_d 0d ΣX   , ΛX =ν0     σ2 e + 2 ∞ P t=1 ce(t) 0 > d 0d σe2ΣX + 2 ∞ P t=1 cX(t)ce(t)     .

We state an asymptotic distribution for βb_∗(τ) in the following theorem.

Theorem 2.1.Consider models (1.1) and (1.2). Suppose that A1–A4 are satisfied. Then, as T → ∞, DN T b β∗(τ)−β∗(τ)−b(τ)h2+oP(h2) _d −→N0d+1,∆X∗−1ΛX∆∗−X 1 (2.6)

(8)

for given 0 < τ < 1, where DN T = diag √ N T h,√T hId , b(τ) = 1₂µ2β∗00(τ) and β∗00(·) = (f00(·), β₁00(·),· · ·, β_d00(·))>. In particular, √ N T hfb(τ)−f(τ)−b_f(τ)h2+o_P(h2) _d −→N 0, ν0 σ2e+ 2 ∞ X t=1 ce(t) !! (2.7) and √ T hβb(τ)−β(τ)−b_β(τ)h2+o_P(h2) _d −→N 0d,Σ_X−1ν0 σe2ΣX + 2 ∞ X t=1 cX(t)ce(t) ! Σ−_X1 ! , (2.8) where bf(τ) = 1₂µ2f00(τ), bβ(τ) = 1₂µ2β00(τ) and β00(·) = (β100(·),· · ·, β00d(·)) > .

Remark 2.2. The above theorem complements some existing results (see Robinson 1989, 2008; Cai 2007 for example). Theorem 2.1 considers only the interior point in the interval (0,1) and the case of the boundary points can be dealt with analogously (see Theorem 4 in Cai 2007 for example).

Remark 2.3. The asymptotic distribution for the case of E(Xit)6=0d can be obtained as a corollary of Theorem 2.1. Note that (1.1) can be rewritten as

Yit=ft+Xit>βt+αi+eit=ft∗+Xe_it>β_t+α_i+e_it,

wheref_t∗ =ft+E(Xit>)βt and Xe_it =X_it−E(X_it). By Theorem 2.1, we can establish the

asymptotic distribution for the averaged local linear estimator off∗(τ), β>(τ). Then by the Cram´er–Wold device, we can further obtain an asymptotic distribution for the averaged local linear estimate off(τ), β>(τ) by noting that

f∗(τ), β>(τ)>=    1 E(X_it>) 0d Id    f(τ), β>(τ)>.

Remark 2.4. The asymptotic distribution (2.6) is established by letting T → ∞. This implies that two cases are included: (i) T tends to infinity andN is fixed and (ii) both T

and N tend to infinity. It can be seen from (2.7) and (2.8) that the rate of convergence of βb(τ) is slower than that of fb(τ), which is confirmed by the simulation study in Section

3. To get an asymptotically more efficient estimate ofβ(·), we will introduce a local linear dummy variable approach in Section 2.2 below.

(9)

As shown in Theorem 2.1, the rate of convergence of the averaged local linear estimate of β(·) is (T h)−1/2. To get an estimate that has faster rate of convergence, we propose a local linear dummy variable approach. Recently, Su and Ullah (2006) considered a profile likelihood dummy variable approach in partially linear models with fixed effects, and Sun, Carroll and Li (2009) discussed a local linear dummy variable method in varying coefficient models with fixed effects. We now propose using a nonparametric dummy variable technique for the trending time–varying coefficient panel data model (1.1).

Note that (1.1) can be rewritten as

e Y =fe+Be(X, β) +De₀α₀+_ee, (2.9) where e Y =Y₁>,· · ·, Y_N>>, e_e=e>₁,· · ·, e>_N>, Yi= (Yi1,· · ·, YiT)>, ei = (ei1,· · ·, eiT)>, e f =IN ⊗(f1,· · ·, fT)> =IN ⊗f, e B(X, β) =X₁₁>β1,· · ·, X1>TβT, X21>β1,· · ·, XN T> βT > , α0= (α1,· · ·, αN)>, De₀=I_N ⊗I_T,

in which Ik is ak–dimensional vector of ones and f is as defined in Section 2.1. As

N

P

i=1

αi = 0, (2.9) can be further rewritten as

e Y =fe+Be(X, β) +Dαe +e,_e (2.10) whereα= (α2,· · ·, αN)> and De = −IN−1, IN−1 > ⊗IT.

The two–step algorithm for the local linear dummy variable method is described as follows.

Step i: For givenβ and 0< τ <1, we first solve the optimization problem:

min α e Y −fe−Be(X, β)−Dαe > K(τ)Ye −fe−Be(X, β)−Dαe , (2.11) whereK(τ) =IN⊗W(τ).

Taking derivative of (2.11) with respect to α and setting the result to zero, we obtain

b α :=αb(τ) = e D>K(τ)De −1 e D>K(τ)Ye −fe−Be(X, β) .

(10)

Step ii: Letting α in (2.11) replaced by α_b and based on the Taylor expansion (2.3), we obtain the local linear estimators ofβ∗(τ) andβ∗0(τ) through minimizing the weighted least

squares: min a,b e Y −De(τ)(a>, b>)> > W∗(τ)Ye −De(τ)(a>, b>)> > , (2.12) whereDe>(τ) = D>₁(τ),· · ·, D_N>(τ) with Di(τ) =        1 X_i>₁ 1−_{T h}τ T 1−_{T h}τ TX_i>₁ .. . ... ... ... 1 X_iT> T−_{T h}τ T T−_{T h}τ TX_iT>        , W∗₍_τ_{) =}_W>₍_τ₎_K₍_τ₎_W₍_τ_{) and}_W₍_τ_{) =}_I N T−De e D>K(τ)De −1 e

D>K(τ). Observe that for any τ,W(τ)Dαe = 0. Hence, the fixed effects termDαe is eliminated in (2.12).

By simple calculation, we obtain the solution to the minimization problem (2.12) as

e β∗(τ) = [Id+1, 0d+1] h e D>(τ)W∗(τ)De(τ) i−1 e D>(τ)W∗(τ)Y .e (2.13) e

β∗(τ) is the local linear dummy variable estimator ofβ∗(τ) and its asymptotic distribution

is given in the following theorem.

Theorem 2.2. Consider models (1.1) and (1.2). Suppose that A1–A4 are satisfied. For given0< τ <1, as T → ∞, √ N T hβe_∗(τ)−β_∗(τ)−b(τ)h2+o_P(h2) _d −→N0d+1,∆∗−X 1ΛX∆ ∗−1 X . (2.14) In particular, √ N T hfe(τ)−f(τ)−b_f(τ)h2+o_P(h2) _d −→N 0, ν0 σe2+ 2 ∞ X t=1 ce(t) !! (2.15) and √ N T hβe(τ)−β(τ)−b_β(τ)h2+o_P(h2) _d −→N 0d,Σ−X1ν0 σ 2 eΣX+ 2 ∞ X t=1 cX(t)ce(t) ! Σ−_X1 ! . (2.16)

Remark 2.5. When E[Xit]6=0d, define

e ∆∗_X =    1 EhX_it>i E[Xit] ΣX   ,

(11)

e ΛX = ν0     σ_e2+ 2 ∞ P t=1 ce(t) σ_e2+ 2 ∞ P t=1 ce(t) (EX_it>) σ2_e+ 2 ∞ P t=1 ce(t) (EXit) σe2ΣX+ 2 ∞ P t=1 cX(t)ce(t)     .

Then, the asymptotic distribution in (2.14) still holds if∆e∗_X andΛe_X are positive definite

and ∆∗_X and ΛX are replaced by∆e∗_X andΛe_X, respectively. On the other hand, if either ∆∗_X

or ΛX is non-positive definite, we need to use the transformation in Remark 2.3 to obtain the estimates of f(·) and β(·).

Remark 2.6. As in Theorem 2.1, both N being fixed or N tending to infinity are allowed in Theorem 2.2. Moreover, the local linear dummy variable estimates of bothf(·) andβ(·) have a rate of convergence of (N T h)−1/2_{. This implies that the local linear dummy variable}

estimate of β(·) is asymptotically more efficient than the averaged local linear estimate of β(·). In addition, as the averaged local linear method, the individual fixed effects are eliminated in the estimation procedure without taking the first difference, and the fixed effects do not affect the asymptotic distributions of the two estimates.

3. Simulations

In this section, we provide a simulated example to compare the small sample behavior of the proposed local linear estimation methods with that of the ordinary local linear approach ignoring the fixed effects in nonparametric random effects panel data models. Throughout this section, we use a uniform kernel function of the formK(u) = 1₂I[−1,1](x) and apply the

leave-one-out cross validation method to choose the bandwidth involved in each of the local linear estimates.

Example 3.1. Consider a trending time–varying coefficient model of the form

Yit=f(t/T) +β(t/T)Xit+αi+eit, i= 1,· · ·, N, t= 1,· · ·, T, (3.1)

wheref(u) =u2+u+ 1, β(u) = sin(πu),{Xit} is generated by the AR (1) process

Xit= 1

2Xi,t−1+xit, t≥1, Xi,0 = 0,

{xit, t≥1} is a sequence of independent and identically distributed N(0,1) random vari-ables,{xit, t≥1} is independent of {xjt, t≥1} fori6=j,

αi =θ0Xi·+ui, i= 1,· · ·, N −1, αN =− N−1

X

i=1

(12)

Xi· = _T1

T

P

t=1

Xit, θ0 = 0,1,2, {ui, i ≥ 1} is a sequence of independent and identically distributed N(0,1) random variables, {eit} is an AR(1) process generated by

eit = 1

2ei,t−1+ηit,

{ηit, t≥1} is a sequence of independent and identically distributed N(0,1) random vari-ables,{ηit, t≥1}is independent of {ηjt, t≥1} fori6=j, and {ηit, 1≤i≤N,1≤t≤T} is independent of {Xit, 1≤i≤N,1≤t≤T}.

Model (3.1) is similar to the simulated example in Sun, Carroll and Li (2009) and they studied the case of random design time–varying coefficient panel data models. It is easy to check that (3.1) becomes the random effects case whenθ0 = 0 in (3.2). Otherwise

(θ0= 1, 2), (3.1) is the fixed effects panel data model. We next compare three classes of local

linear estimation methods: the averaged local linear estimates (ALLE), local linear dummy variable estimates (LLDVE) and the ordinary local linear estimates (OLLE) ignoring the fixed effects. We compare the average mean squared errors (AMSE) of the three estimators. The AMSE of an estimator fboff is defined as

AMSE(fb) = 1 R R X r=1 1 T T X t=1 b f₍_r₎(t/T)−f(t/T)2 ! ,

where fb₍_r₎(·) denotes the estimate of f(·) in the rth replication and R is the number of

replications which is chosen as R = 500 in our simulation. Similarly, the AMSE of an estimator ofβ is AMSE(βb) = 1 R R X r=1 1 T T X t=1 b β(r)(t/T)−β(t/T) 2 ! .

The simulation results for estimates of f(·) and β(·) with different values ofN and T

(N, T = 5, 10, 20) and with both random effects (θ0 = 0) and fixed effects (θ0 = 1, 2)

are summarized in Tables 3.1(a)–(c) and Tables 3.2(a)–(c). Tables 3.1(a)–(c) contain the AMSEs and their standard deviations (in parentheses) of the three estimates off(·) for the three cases ofθ0 = 0, 1 and 2, and those ofβ(·) are listed in Tables 3.2(a)–(c).

From Tables 3.1–3.2, we can see the performances of all the three estimates of the trend function f(·) are satisfactory even when both N and T are small. And all of the three estimates of f(·) improve as eitherT or N increases. However, when we observe the simulation results for estimates ofβ(·), we observe that when keepT fixed, the performance of the average local linear estimate of the coefficient functionβ(·) does not improve as N

(13)

increases. But asT increases, its performance improves significantly. In contrast, the local linear dummy variable estimate and the ordinary local linear estimate perform better and better as eitherN orT increases. This confirms the asymptotic theory given in Section 2: the rate of convergence of the averaged local linear estimate of β(·), which is (T h)−1/2, is slower than that of the local linear dummy variable estimate, which is (N T h)−1/2. However, the averaged local linear estimate and the local linear dummy variable estimate off(·) have the same rate of convergence of (N T h)−1/2.

The simulation results also reveal that in the random effects case (θ0 = 0), the

perfor-mances of the local linear dummy variable estimate and the ordinary local linear estimator are comparable with the local linear dummy variable estimate performing slightly better than the ordinary local linear estimate. As θ0 increases, the performance of the ordinary

local linear estimate becomes worse, while the performance of the dummy variable local linear dummy variable estimate isn’t influenced by the increase of θ0 and remains quite

(14)

Table 3.1(a) AMSE for estimators of f(·) with random effects (θ0 = 0)

AMSE for estimators off(·)

N\T 5 10 20 5 ALLE 0.9976(4.1267) 0.2008(0.2370) 0.1282(0.1634) LLDVE 0.1898(0.1795) 0.1344(0.1207) 0.0914(0.0815) OLLE 0.2405(0.2551) 0.1512(0.1397) 0.0999(0.0855) 10 ALLE 0.6950(3.9111) 0.1117(0.2133) 0.0514(0.0515) LLDVE 0.0978(0.0993) 0.0670(0.0617) 0.0502(0.0388) OLLE 0.1061(0.1030) 0.0727(0.0678) 0.0530(0.0403) 20 ALLE 0.2426(0.6344) 0.0525(0.0665) 0.0318(0.0282) LLDVE 0.0525(0.0460) 0.0331(0.0285) 0.0273(0.0184) OLLE 0.0562(0.0513) 0.0350(0.0299) 0.0286(0.0203)

Table 3.1(b) AMSE for estimators of f(·) with fixed effects (θ0 = 1)

(15)

Table 3.1(c) AMSE for estimators of f(·) with fixed effects (θ0= 2)

Table 3.2(a) AMSE for estimators of β(·) with random effects (θ0 = 0)

AMSE for estimators ofβ(·)

(16)

Table 3.2(b) AMSE for estimators of β(·) with fixed effects (θ0 = 1)

Table 3.2(c) AMSE for estimators of β(·) with fixed effects (θ0 = 2)

(17)

4. Conclusions and Discussion

We have considered a nonparametric time–varying coefficient panel data model with fixed effects. Two classes of nonparametric estimates have been proposed and studied. The first one is based on an averaged local linear estimation method while the second one relies on a local linear dummy variable approach. Asymptotic distributions of the proposed estimates have been established with the second estimation method providing a faster rate of convergence than the first one. Some detailed simulation results have been provided to illustrate the asymptotic theory and support the finite–sample performance of the proposed estimates.

There are some limitations in this paper. The first one is the assumption on cross– sectional independence. One future topic is to extend the discussion of Robinson (2008) to model (1.1) under cross–sectional dependence. The second one is that there is no endogene-ity between{eit}and {Xit}. Another future topic is to accommodate such endogeneity in a general model.

5. Acknowledgments

The authors would like to thank Professor Peter Phillips for his constructive comments when the authors discussed the paper with him during his recent visit to Adelaide in Febru-ary 2010. Thanks also go to the Australian Research Council Discovery Grant Program under Grant Number: DP0879088 for its financial support.

Appendix: Proofs of the main results

In this section, we provide the proofs for the asymptotic results in Section 2. We first give a central limit theorem on weighted sums ofα–mixing processes, which can be derived from Theorem 2.2 in Peligrad and Utev (1997).

Lemma A.1.Suppose that {ank, k= 1,· · ·, n, n≥1} is a triangular array of real numbers

satisfying sup n n X k=1 a2_nk <∞ and max 1≤k≤n|ank| →0

and {Xk} is a stationary α–mixing sequence satisfying

E[Xk] = 0 and E

h

|Xk|2+δ

i

(18)

Furthermore, E _n P k=1 ankXk 2 → 1 and ∞ P k=1

k2/δαk <∞, where αk is the α–mixing

coeffi-cient. Then, we have

n

X

k=1

ankXk−→d N(0,1).

We now provide the detailed proof of Theorem 2.1.

Proof of Theorem 2.1. We only consider the case when T and N tend to infinity simul-taneously. The proof for the case whenT tends to infinity andN is fixed is similar and we therefore omit the details here.

Note that b β∗(τ) = [Id+1, 0d+1] h D>(τ)W(τ)D(τ)i−1D>(τ)W(τ)Y = [Id+1, 0d+1] h D>(τ)W(τ)D(τ)i−1D>(τ)W(τ) (B(X, β∗) +e) (A.1) and b β∗(τ)−β∗(τ) = [Id+1, 0d+1] h D>(τ)W(τ)D(τ)i−1D>(τ)W(τ)B(X, β∗)−β∗(τ) + [Id+1, 0d+1] h D>(τ)W(τ)D(τ)i−1D>(τ)W(τ)e =: IN T(1) +IN T(2), (A.2) whereB(X, β∗) = f1+X·>1β1,· · ·, fT +X·>TβT > .

Note that (2.7) and (2.8) can be derived from (2.6). Hence, we only provide the detailed proof of (2.6), which follows from the following three propositions.

Proposition A.1.Suppose that A1, A2 (i), (ii) and A4 are satisfied. Then, as T, N → ∞

simultaneously, P_{N T}> D>(τ)W(τ)D(τ)PN T −Λµ⊗∆∗X =oP(1), (A.3) wherePN T = diag 1 √ T h, q N T hId, 1 √ T h, q N T hId

,Λµ= diag (µ0, µ2) and∆∗X is as defined in

Section 2.

Proof. Observe that

P_{N T}> D>(τ)W(τ)D(τ)PN T =     T P t=1 e X·tXe_·>_tK t−τ T T h T P t=1 e X·tXe_·>_t t−τ T T h Kt−_{T h}τ T T P t=1 e X·tXe_·>_t t−τ T T h Kt−_{T h}τ T T P t=1 e X·tXe_·>_t t−τ T T h 2 Kt−_{T h}τ T     ,

(19)

whereXe_·_t=P_{N T}∗ 1, X·>t > and P_{N T}∗ = diag 1 √ T h, q N T hId . We need only to prove that

T X t=1 e X·tXe_·>_tK _t₋_{τ T} T h =µ0∆∗X +oP(1), (A.4)

since the proofs for the other components inP_{N T}> D>(τ)W(τ)D(τ)PN T are analogous. For simplicity, define Qt = Xe_·_tXe_·>_t, K_t = K

t−τ T T h and Xe_it = P_{N T}∗ (1, X_it>)>. By A2(i)(ii), we have T h N2E   N X i=1 e Xit ! _N X i=1 e Xit !> =    1 0>_d 0d ΣX   = ∆ ∗ X.

By A1 and the above equation, it is easy to check that as T, N → ∞,

T P t=1 E[QtKt] = _N12 T P t=1 Kt  E   N P i=1 e Xit ! N P i=1 e Xit !>    = _{T h}1 T P t=1 Kt  T h_N2E   N P i=1 e Xit ! N P i=1 e Xit !>    = µ0∆∗X 1 +O(_{T h}1 ). (A.5)

We then consider the variance of PT

t=1 QtKt. Note that Var T X t=1 QtKt ! = T X t=1 Var (QtKt) + T X t=1 X s6=t Cov (QtKt, QsKs) =: ΠN T(1) + ΠN T(2).

It follows from A2 (i)(ii) that

sup 1≤t≤T E Xe·t 2(2+γ) ≤C(T h)−(2+γ), (A.6)

for some 0≤γ ≤δ and some constant C >0, whereδ is as defined in A2. Furthermore, by A1 and (A.6) with γ= 0, we have

ΠN T(1) = T P t=1 K_t2Var(Qt) = T P t=1 K_t2Var(Xe_·_tXe_·>_t) ≤ _T2C_h2 T P t=1 K_t2 =O_{T h}1 . (A.7)

(20)

Meanwhile, for ΠN T(2), notice that ΠN T(2) = T P t=1 P 0<|s−t|≤qT Cov (QtKt, QsKs) + T P t=1 P |s−t|>qT Cov (QtKt, QsKs) = ΠN T(3) + ΠN T(4), (A.8) whereqT → ∞ andqT =o(T h).

As {Xi, i≥1} is cross–sectional independent and for each i, {Xit, t ≥1} is stationary andα–mixing, both{X·t}and{Qt}are still stationary andα–mixing with the same mixing coefficientαt. As a consequence, we are able to employ some existing results forα–mixing processes.

By A1, A2 (i)(ii), (A.6) and Corollary A.2 in Hall and Heyde (1980), we have

ΠN T(4) ≤ C T P t=1 Kt P |s−t|>qT Cov (Qt, Qs) ≤ C T2_h2 T P t=1 Kt P k>qT αδ/_k (2+δ)=o_{T h}1 , (A.9) asqT → ∞and αk=O(k−τ) for τ >(δ+ 2)/δ. Since K(·) is Lipschitz continuous by A1, we have

|Kt−Ks| ≤C

qT

T h when|t−s| ≤qT. (A.10)

Standard calculation gives

ΠN T(3) = T P t=1 K_t2 P 0<|s−t|≤qT Cov (Qt, Qs) − PT t=1 Kt P 0<|s−t|≤qT (Kt−Ks)Cov (Qt, Qs) =: ΠN T(5) + ΠN T(6). (A.11)

By (A.10) and the fact that

∞ P t=2 kCov(Q1, Qt)k=O (T h)−2, we have kΠN T(6)k ≤ CqT T3_h3 T X t=1 Kt=o ₁ T h . (A.12)

Furthermore, similarly to the proof of (A.7), we can show that

kΠN T(5)k ≤ C T2_h2 T X t=1 K_t2 =O ₁ T h . (A.13)

In view of (A.7)–(A.9) and (A.11)–(A.13),

Var T X t=1 QtKt ! =O ₁ T h . (A.14)

(21)

By (A.5) and (A.14), we have shown that (A.4) holds. The proof of Proposition A.1 is completed.

Proposition A.2.Suppose that A1, A2 (i)(ii), A3 and A4 are satisfied. Then, as T, N → ∞, IN T(1) = 1 2µ2β 00 ∗(τ)h2+oP(h2). (A.15)

Proof. By A3 and Taylor expansion, we have

β∗(t/T)≈β∗(τ) +β∗0(τ)(t/T−τ) +

1 2β

00

∗(τ)(t/T −τ)2.

Equation (A.15) then follows from the definition of IN T(1), the above equation and Proposition A.1.

Proposition A.3.Suppose that A1, A2 and A4 are satisfied. Then, as T, N → ∞,

DN TIN T(2) d

−→N0d+1,∆X∗−1ΛX∆∗−X 1

, (A.16)

where DN T andΛX are defined as in Theorem 2.1.

Proof. By A2 (i)(iii), it is easy to check that

E(IN T(2)) =0d+1. (A.17)

We then calculate the variance of DN TIN T(2). Noticing that

N2EX·tX·>te2·t = N 2 N4E   N X i=1 Xit N X j=1 X_jt> N X k=1 ekt N X l=1 elt   = 1 N2 N X i=1 N X k=1 E(XitXit>)E(e2kt) = ΣXσ2e and N2Cov (X·te·t, X·se·s) =E XitXis> E(eiteis) =cX(|t−s|)ce(|t−s|), we have as T, N → ∞, N2 T Var T X t=1 X·te·t ! → σ_e2ΣX + 2 ∞ X t=1 cX(t)ce(t) ! . (A.18)

By (A.18), the fact that _{T h}1 PT

t=1

K2

t =ν0+oP(1) and following the proof of (A.14), we have N2 T hVar T X t=1 KtX·te·t ! =ν0 σe2ΣX+ 2 ∞ X t=1 cX(t)ce(t) ! +o(1). (A.19)

(22)

Analogously, we have N T hVar T X t=1 Kte·t ! =ν0 σ2e+ 2 ∞ X t=1 ce(t) ! +o(1) (A.20) and N3/2 T h E T X t=1 K_t2X·te2·t ! =0d. (A.21)

Furthermore, by (A.19)–(A.21) and Proposition A.1 and noting that

DN TIN T(2) =DN T[Id+1, 0d+1]PN T h P_{N T}> D>(τ)W(τ)D(τ)PN T i−1 P_{N T}> D>(τ)W(τ)e, we have Var (DN TIN T(2)) = ∆∗−X 1ΛX∆∗−X 1+o(1). (A.22) By A2, (A.17), (A.22) and Lemma A.1, we have shown that (A.16) holds.

Proof of Theorem 2.2. Note that

e β∗(τ)−β∗(τ) = [Id+1, 0d+1] h e D>(τ)W∗₍_τ₎ e D(τ)i−1De>(τ)W∗(τ)Ye −β_∗(τ) = [Id+1, 0d+1] h e D>(τ)W∗₍_τ₎ e D(τ)i−1De>(τ)W∗(τ) e f +Be(X, β) −β∗(τ) + [Id+1, 0d+1] h e D>(τ)W∗₍_τ₎ e D(τ)i−1De>(τ)W∗(τ)Dαe + [Id+1, 0d+1] h e D>(τ)W∗₍_τ₎ e D(τ)i−1De>(τ)W∗(τ)_ee =: ΞN T(1) + ΞN T(2) + ΞN T(3). (A.23) By the definition ofW∗₍_τ_{), we have}

W∗(τ) =W>(τ)K(τ)W(τ) =K(τ)− K(τ)De e D>K(τ)De −1 e D>K(τ),

which implies that

ΞN T(2) = [Id+1, 0d+1] h e D>(τ)W∗(τ)De(τ) i−1 e D>(τ)K(τ)Dαe − [Id+1, 0d+1] h e D>(τ)W∗(τ)De(τ) i−1 e D>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)Dαe = [Id+1, 0d+1] h e D>(τ)W∗(τ)De(τ) i−1 e D>(τ)K(τ)Dαe − [Id+1, 0d+1] h e D>(τ)W∗(τ)De(τ) i−1 e D>(τ)K(τ)Dαe ≡ 0d+1.

(23)

Therefore, in order to establish the asymptotic distribution (2.14) in Theorem 2.2, we need only to consider ΞN T(1) and ΞN T(3). The following propositions establish the asymp-totic properties of ΞN T(1) and ΞN T(3), which lead to (2.14). And (2.14) implies (2.15) and (2.16). Theorem 2.2 holds for both N being fixed and N tending to infinity, but we only give the proof for the case of N tending to infinity simultaneously with T. The proof for fixedN is similar.

Proposition A.4.Suppose that A1, A2 (i), (ii) and A4 are satisfied. Then, as T, N → ∞

simultaneously, 1 N T hDe >₍_τ₎_W∗₍_τ₎ e D(τ)−Λµ⊗∆∗X =oP(1), (A.24)

where Λµ is defined as in Proposition A.1.

Proof. By the definition ofW∗(τ), we have

e D>(τ)W∗(τ)De(τ) =De>(τ)K(τ)De(τ)−De>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)De(τ). (A.25) Following the proof of Proposition A.1, we have

1

N T hDe

>

(τ)K(τ)De(τ) = Λ_µ⊗∆∗_X+o_P(1). (A.26)

In view of (A.25), we need only to prove

e D>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)De(τ) =o_P(N T h). (A.27)

To do so, we first consider the term De>K(τ)De. Define Z_T =

T

P

t=1

Kt−_{T h}T τ. By the definition ofDe and standard calculation, we have

e D>K(τ)De =        2ZT ZT · · · ZT .. . ... ... ZT ZT · · · 2ZT        =        ZT ZT · · · ZT .. . ... ... ZT ZT · · · ZT        + diag (ZT,· · ·, ZT). LettingA= diag (ZT,· · ·, ZT),B = √ ZT,· · ·, √ ZT > ,C = 1 andD= √ZT,· · ·, √ ZT , thenDe>K(τ)De =A+BCD. Noticing that

(24)

which can be found in Poirier 1995, we have e D>K(τ)De −1 =            1 ZT − 1 N ZT − 1 N ZT · · · − 1 N ZT −_{N Z}1 T 1 ZT − 1 N ZT · · · − 1 N ZT .. . ... ... −_{N Z}1 T − 1 N ZT · · · 1 ZT − 1 N ZT            . (A.29) Define ZT(i) = T P t=1 Kt−_{T h}T τXit and Ze_T(i) = T P t=1 t−T τ T h Kt−_{T h}T τXit. By standard arguments, we have e D>(τ)K(τ)De =            0 0 · · · 0 ZT(2)−ZT(1) ZT(3)−ZT(1) · · · ZT(N)−ZT(1) 0 0 · · · 0 e ZT(2)−Ze_T(1) Ze_T(3)−Ze_T(1) · · · Ze_T(N)−Ze_T(1)            .

which together with (A.29) implies that

e D>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)De(τ) =            0 0 0 0 0 AN T 0 CN T 0 0 0 0 0 DN T 0 BN T            , (A.30) where AN T = N X k=2 AN T(k)(ZT(k)−ZT(1))>, BN T = N X k=2 BN T(k)(Ze_T(k)−Ze_T(1))>, CN T = N X k=2 AN T(k)(Ze_T(k)−Ze_T(1))>, DN T = N X k=2 BN T(k)(ZT(k)−ZT(1))>, and AN T(k) =ZT(k) N −1 N ZT − 1 N ZT N X j=1,6=k ZT(j) = ZT(k) ZT − 1 N ZT N X i=1 ZT(i) BN T(k) =Ze_T(k) N −1 N ZT − 1 N ZT N X j=1,6=k e ZT(j) = e ZT(k) ZT − 1 N ZT N X i=1 e ZT(i).

(25)

By the consistency of the nonparametric estimator in the fixed design case, we have for each i≥1, 1 T hZT =µ0+oP(1), 1 T hZT(i) =oP(1), 1 T hZeT(i) =oP(1). (A.31)

By (A.30) and (A.31), we have shown that (A.27) holds. The proof of Proposition A.4 is completed.

Proposition A.5.Suppose that A1, A2 (i)(ii), A3 and A4 are satisfied. Then, as T, N → ∞ simultaneously, ΞN T(1) = 1 2µ2β 00 ∗(τ)h2+oP(h2) =b(τ)h2+oP(1). (A.32)

Proof. By A3, Taylor expansion ofβ∗(·) atτ and Proposition A.4, we can show that (A.32)

holds.

Proposition A.6.Suppose that A1, A2 and A4 are satisfied. Then, we have

√ N T hΞN T(3)−→d N 0d+1,∆∗−_X 1ΛX∆∗−_X 1 (A.33) as T, N → ∞ simultaneously.

Proof. By Propositions A.4 and A.5, to prove (A.33), it suffices to prove 1 √ N T hDe >₍_τ₎_W∗₍_τ₎ e e−→d N 02d+2,Λν⊗ΛX, (A.34) where Λν = diag (ν0, ν2).

By the definition ofW∗₍_τ_{), we have}

e D>(τ)W∗(τ)ee=De >₍_τ₎_K₍_τ₎ e e−De>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)ee. (A.35)

By (A.35), it is enough to show that 1 √ N T hDe >₍_τ₎_K₍_τ₎ e e−→d N 02d+2,Λν⊗ΛX. (A.36) and e D>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)ee=oP( √ N T h). (A.37) Define Lj _t₋_{T τ} T h = _t₋_{T τ} T h j K _t₋_{T τ} T h

(26)

forj= 0,1.

To prove (A.36), we need only to prove

1 √ N T h N X i=1 T X t=1 Lj _t₋_{T τ} T h    1 Xit   eit d −→N 0d+1, ν2jΛX . (A.38) By A2, we have 1 N T hVar    N X i=1 T X t=1 Lj _t₋_{T τ} T h    1 Xit   eit   =ν2jΛX +o(1). (A.39) Denote {ξj, j = 1,· · ·, N T} ={(X11, e11),· · ·,(X1T, e1T),(X21, e21),· · ·,(XN T, eN T)}, then{ξ} is stationary and mixng with α–mixing coefficient

b αk=      αk or 0, k < T, 0, k≥T.

By (A.39) and Lemma A.1, (A.38) holds.

We then turn to the proof of (A.37). Let eT(i) = T

P

t=1

Kt−_{T h}T τeit. Following the proof of (A.30), we have

e

D>K(τ)_ee= (eT(2)−eT(1),· · ·, eT(N)−eT(1))>. (A.40) By (A.29), (A.30) and (A.40), we have

e D>(τ)K(τ)De e D>K(τ)De −1 e D>K(τ)ee= 0, U_{N T}> ,0, V_{N T}> >, (A.41) where UN T = N X k=2 AN T(k)(eT(k)−eT(1)) = N X k=1 AN T(k)eT(k) and VN T = N X k=2 BN T(k)(eT(k)−eT(1)) = N X k=1 BN T(k)eT(k),

in which AN T(k) andBN T(k) are defined in the proof of Proposition A.4. By A2 (i) and (iii), we can show

EU_{N T}2 =o(N T h) and EV_{N T}2 =o(N T h), which implies UN T =oP √ N T h and VN T =oP √ N T h. (A.42)

(27)

In view of (A.41) and (A.42), (A.37) holds. The proof of Proposition A.6 is completed.

References

Arellano, M. 2003. Panel Data Econometrics. Oxford University Press, Oxford.

Atak, A., Linton, O., Xiao, Z. 2009. A semiparametric panel model for climate change in the United Kingdom. Manuscript.

Baltagi, B. H. 1995. Econometrics Analysis of Panel Data. John Wiley, New York.

Cai, Z. 2007. Trending time–varying coefficient time series models with serially correlated errors. Journal of Econometrics136, 163-188.

Cai, Z., Li, Q. 2008. Nonparametric estimation of varying coefficient dynamic panel data models. Econo-metric Theory24, 1321-1342.

Fan, J., Gijbels, I. 1996. Local Polynomial Modelling and Its Applications. Chapman and Hall, London. Fan, J., Yao, Q. 2003. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New

York.

Gao, J. 2007. Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman & Hall/CRC, London.

Gao, J., Hawthorne, K. 2006. Semiparametric estimation and testing of the trend of temperature series.

Econometrics Journal9, 332-355.

Hall, P., Heyde, C. 1980. Martingale Limit Theory and Its Applications. Academic Press, New York. Henderson, D., Carroll, R., Li, Q. 2008. Nonparametric estimation and testing of fixed effects panel data

models. Journal of Econometrics144, 257-275.

Hjellvik, V., Chen, R., Tjøstheim, D. 2004. Nonparametric estimation and testing in panels of intercorre-lated time series. Journal of Time Series Analysis25, 831-872.

Hsiao, C. 2003. Analysis of Panel Data. Cambridge University Press, Cambridge.

Li, Q., Racine, J. 2007. Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton.

Lin, D., Ying, Z. 2001. Semiparametric and nonparametric regression analysis of longitudinal data (with discussion). Journal of the American Statistical Association96, 103-126.

Mammen, E., Støve, B., Tjøstheim, D. 2009. Nonparametric additive models for panels of time series.

Econometric Theory25, 442-481.

(28)

Phillips, P. C. B. 2001. Trending time series and macroeconomic activity: some present and future chal-lengers. Journal of Econometrics100, 21-27.

Poirier, D. J. 1995. Intermediate Statistics and Econometrics: a Comparative Approach. The MIT Press. Robinson, P. 1989. Nonparametric estimation of time–varying parameters. Statistical Analysis and

Fore-casting of Economic Structural Change. Hackl, P. (Ed.). Springer, Berlin, pp. 164-253. Robinson, P. 2008. Nonparametric trending regression with cross-sectional dependence. Manuscript.

Su, L., Ullah, A. 2006. Profile likelihood estimation of partially linear panel data models with fixed effects.

Economics Letters92, 75-81.

Sun, Y., Carroll, R. J., Li, D. 2009. Semiparametric estimation of fixed effects panel data varying coefficient models. Advances in Econometrics25, 101-129.

Ullah, A., Roy, N. 1998. Nonparametric and semiparametric econometrics of panel data. Handbook of Applied Economics and Statistics, Ullah, A., Giles, D.E.A. (Eds.). Marcel Dekker, New York, pp. 579-604.