Lecture 15 Panel Data Models

(1)

Lecture 15

(2)

• A panel data set, or longitudinal data set, is one where there are repeated observations on the same units.

• The units may be individuals, households, enterprises, countries, or any set of entities that remain stable through time.

• The National Longitudinal Survey (NLS) of Youth is an example.

The same respondents were interviewed every year from 1979 to 1994. Since 1994 they have been interviewed every two years.

• Panel data sets are often very large. If there are N units and T time periods => Number of observations: NT. Great for estimation!

Panel Data Sets

                    NT iT T T Nt it t t N i N i y y y y y y y y y y y y y y y y                   2 1 2 1 2 2 22 12 1 1 21 11 Time series Cross section

(3)

• A standard panel data set model stacks the y_i’s and the x_i’s:

y = X + c + 

X is a Σ_iT_ixk matrix

 is a kx1 matrix

c is Σ_iT_ix1 matrix, associated with unobservable variables.

y and  are Σ_iT_ix1 matrices

Panel Data Sets

                                                                                    j j j T kT T k t t k k i kT T T k t t k k iT it i i i T t w w w w w w x w w w w w X x x x x x x x x x x x x X y y y y y y y y y y ... ... ... ... ... ;....; ... ... ... ... ... ;....; 2 1 2 2 1 2 22 12 1 21 11 2 1 2 2 1 2 22 12 1 21 11 1 2 1 1 1 12 11 1 1 1 1                   • Notation:

(4)

• Financial data

– COMPUSTAT provides financial data by firm (N =99,000) and by quarter (T = 1962:I, 1962:II, ..., )

– CRSP daily and monthly stock and index returns from 1962 on. – Datastream provides economic and financial data for countries. It

also covers bonds and stock markets around the world.

– Intra-daily data: Olsen (exchange rates) and TAQ (stock market transaction prices). Essentially infinite T, large N.

– OptionMetrics, also known as “Ivy DB”, is a database of historical prices, implied volatility for listed stocks and option markets.

– International Financial Statistics (IFS from the IMF) covers

economic and financial data for almost all countries post-WWII.

(5)

Balanced and Unbalanced Panels

• Notation:

y_i,t, i = 1,…,N; t = 1,…,T_i

• Mathematical and notational convenience: - Balanced: NT

(that is, every unit is surveyed in every time period.) - Unbalanced:

Q: Is the fixed T_i assumption ever necessary? SUR models.

• The NLS of Youth is unbalanced because some individuals have not been interviewed in some years. Some could not be located, some refused, and a few have died. CRSP is also unbalanced, some firms are listed from 1962, others started to be listed later.

N i i=1 T



(6)

• With panel data we can study different issues:

- Cross sectional variation (unobservable in time series data) vs. time series variation (unobservable in cross sectional data)

- Heterogeneity (observable and unobservable individual heterogeneity)

- Hierarchical structures

- Dynamics in economic behavior - Group effects (individual effects) - Time effects

(7)

Panel Data Models: Simplest Model - SUR

• In Zellner’s SUR formulation we have: (A1) y_it = x_it’_i + _it (A2) E[_i|X] = 0, (A3’) Var[_i|X] = _i2_I T --groupwise heteroscedasticity. E[_it_jt|X] = _ij --contemporaneous correlation E[_it_jt’ |X] = 0 when t≠t’

(A4) Rank(X) = full rank

• In (A1)- (A4), we have the a GR model with heteroscedasticity. OLS in each equation is OK, but not efficient. GLS is efficient.

• We are not taking advantage of pooling –i.e., using NT observations! • Use LR or F tests to check if pooling (aggregation) can be done.

(8)

Panel Data Models: Simple Model - Pooling

• Assumptions

(A1) y_it = x_it’ + z_i’ γ + _it - the DGP

i = 1, 2, ...., N - we think of i as individuals or groups.

t = 1, 2, ...., T_i - usually, N >> T.

(A2) E[_i|X,z] = 0, - X and z: exogenous

(A3) Var[_i|X,z] = 2_I_{. - Heteroscedasticity}_can_be_allowed.

(A4) Rank(X) = full rank

• We think of X as a vector of observed characteristics. For example, firm size, Market-to-book, Z-score, R&D expenditures, etc.

• We think of z as a vector of unobserved characteristics (individual effects). For example, quality of management, growth opportunities, etc.

(9)

25

- The index i refers to the unit of observation, t refers to the time

period, and j and p are used to differentiate between different observed and unobserved explanatory variables.

- A time trend t allows for a shift of the intercept over time, capturing time effects –technological change, regulations, taxes, etc.

- If the implicit assumption of a constant rate of change is strong, the trend can be replaced by a set of dummy variables, one for each time period except the reference period.

- The X_j variables are usually the variables of interest, while the Z_p

variables are responsible for unobserved heterogeneity and as such constitute a nuisance component of the model.

it s p pi p k j jit j it X Z t y   



 



      2 1 1 • The DGP (A1):

(10)

• We can rewrite the regression model as:

The characterization of the c_i component is crucial. Different ways of thinking about z_p create different panel data models.



   s p pi p i Z c 1 it i k j jit j it X c t y   



      2 1 30

• Since the Z_p variables are unobserved, there is no means of obtaining information about the 



_pZ_p component of the model. Usually, it is convenient to define a term c_ known as the unobserved effect,

representing the joint impact of the Z_p variables on y_i.

(11)

it i k j jit j it X c t y   



      2 1 30

• Note that if the X_j’s are so comprehensive that they capture all the relevant characteristics of the individual, there will be no relevant unobserved characteristics.

=> Now, c_ can be dropped. Pooled OLS may be used, treating all the observations for all of the time periods as a single sample.

• In general, dropping c_ leads to missing variables problem: bias!

• We usually think of c_ as contemporaneously exogenous to the conditional error. That is, E[_it|c_ ] = 0, t =1,..., T

(12)

30

• A stronger assumption can solve the estimation problem. Strict exogeneity can also be imposed. In this case,

E[_it|x_i1, x_i2,...,x_iT,c_ ] = 0, t =1,..., T

Then,

=> The β_j’s are partial effects holding c_ constant.

• Violations of strict exogeneity are not rare. If x_it contains lagged

dependent variables or if changes in _it affect x_it+1(a “feedback” effect). • But to estimate β we still need to say something about the relation between x_it and c_. Different assumptions will give rise to different

models.

Panel Data Models: Basic Model

t c X c y E _i k j jit j i it it   



    2 1 ] , | [ x

(13)

• The basic DGP: y_it = x_it’ + z_i’ γ + _it & (A2)-(A4) apply. • Cases:

(1)

Pooled (Constant Effect) Model

z_i’γ is a constant. z_i = α. In this case, y_it = x_it’ + α + _it

=> OLS estimates α and  consistently. We estimate k+1 parameters. (2)

Fixed Effects Model

(FEM)

The z_i’s are correlated with X_i Fixed Effects: E[z_i|X_i ] = g(X_i); effects are correlated with included variables –i.e., OLS will be inconsistent. Assume z_i’ γ = α_i (constant; it does not vary with t). Then,

y_it = x_it’ + α_i + _it

=>We have a lot of parameters: k+N. We have N individual effects! OLS can be used to estimates α and  consistently.

(14)

(3)

Random Effects Model

(REM)

The z_i’s are uncorrelated with the X_i. That is: E[z_i |X_i ] = μ. If X_i

contains a constant term, μ=0 WLOG. Add and subtract E[z_i’ γ] from (*):

y_it = x_it’ + E[z_i’ γ] +( z_i’ γ) - E[z_i’ γ] + _it

= x_it’ + μ + u_i + _it

We have a compound (“composed”) error –i.e., u_i + _it. We will have contemporaneous cross correlations across the i group.

=>OLS estimates μ and  consistently, but GLS will be efficient. (4)

Random Parameters Model

y_it = x_it’( + h_i) + α_i + _it

h_i is a random vector that induces parameter variation. Let h_i~D(0, σ2

hi),

then, we introduce heteroscedasticity.

(15)

Compact Notation

• Compact Notation: y_i = X_i  + c_ + _i

X_i is a T_ixk matrix

 is a kx1 matrix

c_is a T_ix1 matrix

y_i and _i are T_ix1 matrices

• Recall we stack the y_i’s and X_i’s: y = X  + c + 

X is a Σ_iT_ixk matrix

 is a kx1 matrix

c, y and  are Σ_iT_ix1 matrices

Or y = X* * + , with X * = [X ι] - Σ_iT_ix(k+1) matrix.

(16)

Assumptions for Asymptotics (Greene)

• Convergence of moments involving cross section X_i. Usually, we assume N increasing, T or T_i assumed fixed.

– “Fixed T asymptotics” (see Greene, p. 196)

– Time series characteristics are not relevant (may be nonstationary) – If T is also growing, need to treat as multivariate time series.

• Ranks of matrices. X must have full column rank. (X_i may not, if T_i

< K.)

• Strict exogeneity and dynamics. If x_it contains y_i,t-1 then x_it cannot be strictly exogenous. X_it will be correlated with the unobservables in period t-1. Inconsistent OLS estimates! (To be revisited later.)

(17)

• We can relax assumption (A3). The new DGP model:

y = X* * + , with X * = [X ι] - Σ_iT_ix(k+1) matrix.

* = [ c]’ - (k+1)x1 matrix Now, we allow E[ '|X] = Σ ≠ σ2 _I

ΣiTi

• Potentially, a lot of different elements in E[ '|X] in a panel:

- Individual heteroscedasticity. Usual groupwise heteroscedasticity. - Autocorrelation (Individual/group/firm) effects. Errors have arbitrary correlation across time for a particular individual i:

- Temporal correlation (Time) effects. Errors have arbitrary correlation across individuals at a moment in time (SUR-type correlation).

- Persistent common shocks: Errors have some correlation between different firms in different time periods (but, these shocks are assumed to die out over time, and may be ignored after L periods).

(18)

• To understand the different elements in Σ, consider the following DGP for the errors, ε_it’s:

ε_it = θ_i’f_t+ η_it, f_t~D(0, σ_f2₎

and η_it = ϕ η_it-1 + ς_it, ς_it~D(0, σ_ςi2₎

f_t: vector of random factors common to all individuals/groups/firms.

θ_i: vector of factor loadings, specific to individual i.

ς_it: random shocks to individual i, uncorrelated across both i and t.

η_it: random shocks to i. This term generates autocorrelation effects in i.

• θ_i’f_t generates both contemporaneous (SUR) and time-varying cross-correlations between i and j . (Autorrelations die out after L periods.) - If f_t is uncorrelated across t, =>only contemporaneous (SUR) effects. - If f_t is persistent in t, =>both SUR and persistent common effects.

(19)

• Different forms for E[ '|X]:

- Individual heteroscedasticity. E[_i2_|_X_] ₌_σ i2

=> standard groupwise heteroscedasticity.

- Autocorrelation (Individual) effects: E[_it_is|X] ≠ 0 (t≠s) => auto-/time-correlation for errors, _it.

- Temporal correlation effects: E[_it_jt|X] ≠ 0 (i≠j) => contemporary cross-correlation for errors.

- Persistent common shocks: E[_it_js|X] ≠ 0 (i≠j) and |t − s| < L => time-varying cross-correlation for errors.

• Remark: Heteroscedasticity points to GLS efficient estimation, but, as before, for consistent inferences we can use OLS with (adjusted for panels) White or NW SE’s.

(20)

• For consistent inferences, we can use OLS with White or NW SE’s: - The White’s adjust only for heteroscedasticity:

S₀ = (1/T) _i e_i2 _x

i xi.

- The NW SE’s adjust for heteroscedasticity and autocorrelation:

S_T = S₀ + (1/T) _l w_L(l) _t=_l_+1,...,T (x_t-_le_t-_l e_tx_t+ x_te_t e_t_-lx_t-_l)

• But, cross-sectional (SUR) or “spatial” dependencies are ignored. If present, the White’s or NW’s HAC need to be adjusted.

• We may have different covariance structures within the data, that vary by some characteristic (a cluster), say industry. We will need to group or “cluster” the errors for these dependencies.

• These new robust SE’s are generally called Panel corrected SE’s (PCSE).

(21)

Pooled Model

• General DGP y_it = x_it’ + c_ + _it & (A2)-(A4) apply.

• The pooled model assumes that unobservable characteristics are constant, independent of i --no heterogeneity. That is, we have:

y_it = x_it’ + α + _it

Now, we have a CLM, with k+1 parameters. • Stacking the variables in matrices, we have:

y = X  + α ι + 

Dimensions:

- y, ι and  are Σ_iT_ix1

- X is Σ_iT_ixk

(22)

• We can re-write the pooled equation model as:

y = X* * + , X * = [X ι] - Σ_iT_ix(k+1) matrix:

* = [ α]’ - (k+1)x1 matrix

In this context, OLS produces BLUE and consistent estimator. In this model, we refer to pooled OLS estimation

• Of course, if our assumption regarding the unobservable variables is wrong, we are in the presence of an omitted variable, c_.

• Then, we have potential bias and inconsistency of pooled OLS. The magnitude of these problems depends on how the true model behaves: ‘fixed’ or ‘random.’

(23)

• In the pooled model, there is no model for group/individual i

heterogeneity. Thus, pooled regression may result in heterogeneity bias:

Pooled regression:

y_it= β₀+β₁x_it+ε_it

True model: Firm 1

True model: Firm 2

True model: Firm 3

True model: Firm 4 y x • • • _• • • _• _• • • • • • • • _•

Pooled Regression: Heterogeneity Bias

j _j _j 

(24)

• We can estimate  by centering the observations around their group/individual means. That is,

Pooled Regression: Within Transformation

• This method is called the within-groups estimation because it relies on variations within individuals rather than between individuals. That is, this estimator reflects the time-series or within-subject information reflected in the changes within subjects across time.

• We are estimating  using the time-series information in the data.

it k j jit j it x y   



   2 1 i it k j ji jit j i it y x x y  



    2 ) (

(25)

Pooled Regression: Within Transformation

• There is a cost in the simplicity of the within-groups estimation. First, the intercept



_ and any X variable that remains constant for each individual will drop out of the model.

The elimination of the intercept may not matter, but the loss of the unchanging explanatory variables may be frustrating.

• Under the usual assumptions, pooled OLS is consistent and unbiased.

(26)

i k j ji j i x y   



     2 1

• There is an additional alternative to estimate , by expressing the model in terms of group/individual means. That is,

Pooled Regression: Between Transformation

• It is called the between estimator because it relies on variations between groups/individuals. We are estimating  using the cross-sectional

information in the data (the time-series variation is gone!).

• We lose observations (and power!): we have only N data points. • Under the usual assumptions, pooled OLS using the between transformation is consistent and unbiased.

it k j jit j it x y   



   2 1

(27)

Useful Analysis of Variance Notation (Greene)

i i 2

T T

N 2 N 2 N

i=1 t=1 it i=1 t=1 it i i=1 i i

Decomposition of Total variation:

Σ Σ (z  z)  Σ _Σ (z  z.) _  Σ T z. z _  _

Total variation = Within groups variation + Between groups variation

(28)

(29)

• We start with the pooled model:

y = X* * + , with X * = [X ι] - Σ_iT_ix(k+1) matrix.

* = [ α]’ - (k+1)x1 matrix Now, we allow E[_i _j'|X_i ] = σ_ij Ω_ij

• Potentially a lot of different forms for E[_i _j'|X_i] in a panel:

- Individual heteroscedasticity. E[_i2_|_X

i] = σi2

- Individual/group effects: E[_it_is|X_i] ≠ 0 (t≠s) - Time (SUR or spatial) effects: E[_it_jt|X_i] ≠ 0 (i≠j)

- Persistent common shocks: E[_it_js|X_i] ≠ 0 (i≠j) and |t − s| < L

(30)

• Heteroscedasticity points to GLS efficient estimation, but, as before, for consistent inferences we can use OLS with White or NW SE:

• But, cross-sectional (contemporaneous or time-varying) dependencies are ignored. If present, the White’s or NW’s HAC need to be adjusted. • We may have different covariance structures within the data, that vary by some characteristic (a cluster), say industry. We need to group or

“cluster” the errors for these dependencies.

• Driscoll and Kraay (1998) provide an easy extension to estimate

robust NW SE’s in panels with cross-sectional dependencies: Average the x_te_tover the clusters we suspect cause dependence.

(31)

• Driscoll and Kraay (1998) provide an easy extension to estimate robust NW SE’s in panels with cross-sectional dependencies.

• The NW SE’s adjust for heteroscedasticity and autocorrelation:

S_T = S₀ + (1/T) _l w_L(l) _t=_l_+1,...,T (x_t-_le_t-_l e_tx_t+ x_te_t e_t_-lx_t-_l)

• The w(l ) are the usual Bartlett or QS weights –other weights are OK. The KxK sandwich matrix is defined as

• The NW method is applied to the time series of cross-sectional

averages of h_it(b). By using cross-sectional averages, estimated SE are consistent independently of the panel’s cross-sectional dimension N.

Pooled Model: Living with (A3’)



       ) ( 1 1 ^ ) ( ) ( with )' ( ) ( ' t N i it t j t T j t t b h b h b h b h X X

(32)

• If we do not suspect autocorrelation problems –not a strange

situation, given that many panel data sets have heavy temporally spaced observations-, we can rely on White SE (S₀).

• We can cluster them by one variable (say, industry) or by several variables (say, year and industry) --“multi-level clustering”.

• We assume that the correlations within a cluster (a group of firms, a region, different years for the same firm, different years for the same region) are the same for different observations.

• These clusetered standard errors are called panel corrected SE (PCSE’s) or clustered SE. The clustered White-style SE are, sometimes, called

Rogers SE.

(33)

• We assume that the correlations within a cluster are the same for different observations.

• Different clusters can produce very different SE. Usually, we cluster using economic theory (clustering by industry, year, industry and year). • It is not a bad idea to try different ways of defining clusters and see how the estimated SE are affected. Be conservative, report largest SE. • These PCSE’s are robust to very general forms of cross-sectional (and temporal) dependence.

(34)

• In general, we find that PCSE tend to be bigger than the usual OLS SE’s. The bigger the cross-sectional correlation, the bigger the SE. That is, NW SE’s tend to be smaller than Driscoll and Kraay SE.

• In simulations, it is found (as expected) that the PCSE perform better when there is cross-sectional dependence in the data. But, when there is no dependence in the cross-section, the standard White or NW SE do better. In some cases, these differences can be significant

• Testing for cross-sectional dependence may be a good idea, especially when results are not robust to different SE. LM tests can be easily

implemented. Pesaran (2004) proposes an easy test. • More on PCSE later.

(35)

Cornwell and Rupert Data (Greene)

Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are

EXP = work experience WKS = weeks worked

OCC = occupation, 1 if blue collar, IND = 1 if manufacturing industry SOUTH = 1 if resides in south

SMSA = 1 if resides in a city (SMSA) MS = 1 if married

FEM = 1 if female

UNION = 1 if wage set by unioin contract ED = years of education

BLK = 1 if individual is black

LWAGE = log of wage = dependent variable in regressions

These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. See Baltagi, page 122 for further analysis. The data were downloaded from the website for Baltagi's text.

(36)

(37)

(38)

• We start with the pooled model:

y = X* * + 

where X * = [X ι] - Σ_iT_ix(k+1) matrix:

* = [ α]’ - (k+1)x1 matrix Now, we allow E[_i _j'|X_i ] = σ_ij Ω_ij

• We can use OLS with PCSE’s or we can do GLS. Note: Why GLS? Efficiency.

• Suppose Ω_ij = I_T Then, we only have cross-equation correlation, not time correlation. We are back in the (aggregation) SUR framework

(39)

• Suppose Ω_ij = I_T. We are in the (aggregation) SUR framework:

Pooled Model with SUR - GLS

y I X X I X y I X X I X y V X X V X GLS ] [' ) ] [' ( ] [' ) ] [' ( ' ) ' ( ˆ 1 1 1 1 1 1 1 1 1                     

• For FGLS, use the pooled OLS residuals e_i and e_j to estimate the covariance σ_ij. Note that

E E T e e T T t t t ' 1 ' 1 ˆ 1   





where

E

is a TxN matrix and e_t=[e_1t e_2t ... e_Nt ]' is Nx1 vector. We need to invert (NxN matrix).

Note: In general, the rank(

E

) ≤ T. Then, rank( ) ≤ T < N =>

singularity, FGLS cannot be computed. This is a problem of the data, not the model.

ˆ

(40)

• Now, suppose we have groupwise heteroscedasticity. That is, E[_i _j'|X_i ] = 0 for i≠j

E[_i2_|_X

i ] = σi2

• We do FGLS, as usual, using the pooled OLS residuals e_i to estimate the variance σ_i2 _{and, thus, to estimate}_Σ_:

Pooled Model with Heteroscedasticity - GLS

                2 2 2 2 1 ... 0 0 .. ... .. .. 0 ... 0 0 ... 0 N   

• We can test this model with H₀: σ₁2 ₌_σ

22 =...=σN2. We can use:

W = Σ_i(s_i2 _– _s2

pooled)/Var(si2) → χ2N

where s_i2 _{is computed using the pooled OLS}_e

(41)

• Now, suppose we have individual autocorrelation. That is, E[_it _js'|X_i ] = 0 for i≠j

E[_it _it-p'|X_i ] ≠ 0 -for example, Var[_it|X_i ] = σ2

• We do FGLS, as usual, using the pooled OLS residuals e_i to estimate the ρ_i and, thus, to estimate Σ_i:

Pooled Model with Autocorrelation - GLS

                    1 ... .. ... .. .. 0 ... 1 ... 1 1 ) ( 2 1 1 2 2 T i T i i T i i i u i i       

• We can test this model with H₀: ρ₁ = ρ₂ =...= ρ_N=0. We can use an LM test to test H₀. it it i it







1



u



(42)

Pooled OLS with First Differences

• From the general DGP:

y_it = x_it’ + c_ + _it & (A2)-(A4) apply.

It may still be possible to use OLS to estimate , when we have

individual heterogeneity. We can use OLS if we eliminate the cause of heterogeneity: c_

We can do this by taking first differences of the DGP. That is,

Δy_it = y_it – y_it-1 = Δx_it’  + Δ c_ + Δ _it

= (x_it- x_it-1 )’  + u_it

Note: All time invariant variables, including c_disappear from the model (one “diff”). If the model has a time trend –economic

fluctuations- , it also disappear, it become the constant term (the other “diff”). Thus, this method is usually called “diffs in diffs”.

(43)

Pooled OLS with First Differences

• With strict exogeneity of (X_i,c_i), the OLS regression of Δy_it on Δx_it is unbiased and consistent, but inefficient.

• Why? The error is not longer _it, but u_it. The Var[u] is given by:

i i 2 2 i,2 i,1 2 2 2 i,3 i,2 2 2 2 2 i,T i,T 1 2 0 0 2

Var (Toeplitz form)

0 0 2               _{ } _ _ _  _{ } _    _ _ _  _{ }_ _  _{ } _ _ _  _{ } _ _ _{ }  __ _ _ __      

• That is, first differencing produces heteroscedasticity. Efficient estimation method: GLS.

• It turns out that GLS is complicated. Use OLS in first differences and use Newey-West SE/PCSE with one lag.

(44)

OLS with First Diffs: Treatment Application

• Suppose there is random assignment to treatment and control groups, like in a typical medical experiment.

• We compare the change in outcomes across the treatment and control groups to estimate the treatment effect

• With two periods –i.e., before and after- and strict exogeneity:

y_it = y_i2 – y_i1 = δ₀+ δ₁Treatment_i + (

x

_i2'–

x

_i1' )  u_it

(This is a CLM. OLS is consistent and BLUE).

• In medical experiments, they use the diffs in diffs estimation:

1

0 i

ˆ _{y | treatment - y | control}

= "difference in differences" estimator. ˆ _{Average change in y for the "treated"}

   

(45)

OLS with First Diffs: Natural Experiment

• Suppose you are studying the effect of X_t (say, transparency) on y_t

(say, analysts’ forecasting skills). But, you suspect X_t is endogenous. • Recall that it is very difficult to find a good IV Z_t –i.e., Z_t is

exogenous to _t and valid (Z_t has a good correlation with X_t).

• Finding a good exogenous event may help in the study of the effect of

X_t on y_t. There are a lot of natural, exogenous events: asteroids hitting the Pacific Ocean, earthquakes in New Zealand, etc.

• But, few of them have an effect on y_t, the financial variables we are studying (say, analysts’ forecasting skills).

(46)

OLS with First Diffs: Natural Experiment

• A change in a law, a policy or a regulation can have the same

exogenous characteristics as a natural event., but have a direct effect on

y_t.

• In the context of studying the effect of X_t on y_t, these exogenous events are usually referred as natural experiments.

• Now, we have two periods: Before and after the natural experiment. • If we also have a well-defined control group, where the treatment was not administered –i.e., the natural experiment never occurred--, then, we can use the diffs in diffs estimation.

(47)

OLS with First Diffs: Natural Experiment - 1

Example 1: We are interested in the effect of labor shocks on wages and employments. Natural experiment: The 1980 Mariel boatlifs, a temporary lifting of emigration restrictions in Cuba. Most of the

marielitos (the 1980 Cuban immigrants) settled in Miami. • Two periods: Before and after the 1980 Mariel boatlifs.

• Control group: Low skilled workers in Houston, LA and Atlanta. • Calculate unemployment and wages of low skilled workers in both periods. Then, regress y_it against a set of control variables (industry, education, age, etc.) and a treatment dummy:

x

_i2'–

x

_i1' )  u_it

(48)

OLS with First Diffs: Natural Experiment - 2

Example 2: Suppose we are interested in the effect of accounting

transparency on analysts forecast errors. We can use the July 30, 2002 Sarbanes-Oxley (SOX) law as a natural experiment.

• Two periods: Before and after SOX.

• Control group (firms under no SOX law): It may be difficult to find a good control group. Maybe, we can use analysts in Canada or in

Germany as a control group.

• Calculate the dispersion of the analysts’ forecasts, y_i, in both periods & regress y_it against a set of control variables (experience of analyst,

education, age, etc.) and a treatment dummy:

x

_i2'–

x

_i1' )  u_it

(49)

Dealing with Attrition

• Attrition problem: If an unbalanced panel is a result of some selection process related to ε_it, then endogeneity is present and need to be dealt with using some correction methods. Otherwise, we have attrition bias. • Example: In the "Quality of Life for cancer patients" study discussed in Greene, appearance for the second interview was low for people with initial low QOL (death or depression) or with initial high QOL (don’t need the treatment).

• Solutions to the attrition problem

– Heckman selection model (used in the study)

• Prob[Present at exit|covariates] = Φ(z’θ) (Probit model)

• Additional variable added to difference model _i =

ϕ

(z_i’θ)/Φ(z_i’θ) – The FDA solution: fill with zeros. (!)

(50)

Pooled Model: ML Estimation

• In the pooled model, y = X  + , we assume _t ~N(0, Σ), where

_t = [_1t, _2t,..., _Nt]' and Σ is an NxN matrix. • We can write the log likelihood function as:

L = log L(, Σ|X) = -NT/2 ln(2π) – T/2 ln|Σ| - ½ Σ_t _t'Σ-1 t

• The ML estimator is equal to the iterated FGLS estimator. • Testing is straightforward with likelihood ratio test.

Example: H₀: No cross correlation across equations: The off-diagonal elements of Σ are zero.

LR = T (ln| _R| - ln| _U|) = T (Σ_i ln s_i2 _{- ln| |),}

which follows a χ2 _with_N(N-_{1)/2 d.f.}

(51)

Estimation with Fixed Effects

• The two main approaches to the fitting of models using panel data are known as

(1) Fixed effects regressions. (2) Random effects regressions.

• The key difference between these two approaches is how the unobservable characteristics –the individual effects- are modeled.

(52)

Estimation with Fixed Effects

• The fixed effects (FE) model

y

_it = x_it’ + c_i + _it -observation for individual i at time t.

• The unobserved component, c_i, is arbitrarily correlated with x_it: E[c_i|X_i ] = g(X_i) => Cov[x_it,c_i] ≠0

=> Under the FE assumption, pooled OLS is biased and inconsistent. • We summarize these individual effects on a constant α_i.

=> All time invariant characteristic of individual i (location, gender, nationality, etc.) will be swept away under this formulation.

(53)

Estimation with Fixed Effects

• Matrix notation

- In matrix notation for individual i:

y_i = x_i’ + c_i + _i - c_i is a T_ix1 vector. (Each individual has T_i observations.)

- In matrix notation for all individuals –i.e., stacking:

y = X  + c +  - we have Σ_iT_i observations. Now, c, y, and  are Σ_iT_ix1 vectors.

• Dummy variable representation

) ( 1 ; ' 1 j i d d c y _it _jit N j jit j it it   



     x

(54)

Assumptions for the FE Model (FEM)

• The individual unobservable characteristics (effects) are correlated with the included variables:

E[c_i|X_i ] = g(X_i) => Cov[x_it,c_i] ≠ 0

• The FE model assumes c_i = α_i (constant; it does not vary with t):

y_i = X_i + d_iα_i + ε_i, for each individual i.

• Stacking ₁ ₁ 2 2 N = = N        _{  }   _   _{ } _    _{  }       _ _             1 2 N y X d 0 0 0 y X 0 d 0 0 β ε α y X 0 0 0 d β [X, D] ε α Zδ ε      

(55)

FEM: Estimation

• The FEM is the CLM, but with many independent variables: k+N. • OLS is unbiased, consistent, efficient, but impractical if N is large. • The OLS estimates of β and α are given by:

1 1 U s in g th e F ris c h -W a u g h th e o re m = [ ]                _ _   _         _D __  _D __ b X X X D X y a D X D D D y b X M X X M y

Note (Greene): LS is an estimator, not a model. Given the formulation with a lot of dummy variables, this particular LS estimator is called Least Squares Dummy Variable (LSDV) estimator.

(56)









i 1 i T

i i i i _k,l t=1 it,k i.,k it,l i.,l

i i i _{i k}

(The dummy variables are orthogonal)

( ) (1/T ) , (x -x )(x -x ) ,                                 i i 1 D 2 D D N D i D T i i i T i N i i D i=1 D D N i i D i=1 D D M 0 0 0 M 0 M 0 0 M M I d d d d = I d d X M X = X M X X M X X M y = X M y X M y Ti t=1 it,k i.,k it i.   (x -x )(y -y )

• That is, we subtract the group mean from each individual observation. Then, the individual effects disappear. Now, OLS can easily be used to estimate the k β parameters, using the demeaned data.

• We know this method to estimate the FEM: The within-groups

estimation.

(57)

i it k j ji jit j i it Y X X t t Y  





 



 







 ) ( ) ( 2 it i k j jit j it X t Y 





















2 1 i i k j ji j i X t Y 





















2 1

• The within-groups method estimates the parameters using demeaned data. That is,

Recall: It is called within-groups method because the model is explaining the variations about the mean of the dependent variable in terms of the variations about the means of the explanatory variables for the group of observations relating to a given individual.

(58)

FEM: Within Transformation Removes Effects

i it k j ji jit j i it y x x t t y  



          ) ( ) ( 2

• There are costs in the simplicity of the within-groups estimation. • First, all X variables (including constant) that remain constant for each individual i will drop out of the model. The elimination of the intercept may not matter, but the loss of the unchanging explanatory variables may matter.

• For example, if we are studying CEO compensation, the within group transformation will lose individual schooling, years of

experience, previous job network, etc. That is, schooling effects on CEO compensation cannot be study in this context.

(59)

• Second, the dependent variables are likely to have much smaller

variances than in the original specification. Now, they are measured as deviations from the individual mean, rather than as absolute amounts. This is likely to adversely affect the precision of the estimates of the coefficients. It is also likely to aggravate measurement error bias if the explanatory variables are subject to measurement error.

• Third, the manipulation involves the loss of N degrees of freedom (we are estimating N means!).

FEM: Within Transformation Removes Effects

i it k j ji jit j i it y x x t t y  



          ) ( ) ( 2

(60)

FEM: LS Dummy Variable (LSDV) Estimator

• b is obtained by within-groups least squares (group mean deviations). • Then, we use the normal equations to estimate a:

D’Xb + D’Da=D’y a = (D’D)-1_{D’(y – Xb)} i T i i t= 1 it it i a = ( 1 / T )Σ ( y -x b ) = e Note:

- This is simple algebra – the estimator is just OLS

- Again, LS is an estimator, not a model. This particular least squares estimator is called LSDV estimator.

(61)

FEM: LSDV Estimator

• Recall the dummy variable trap: If a constant is present in the model, the number of dummy variable should be N-1. The omitted individual or group becomes the reference category.

• However, the choice of reference category is often arbitrary and, thus, the interpretation of the



_i will not be particularly interesting.

• Alternatively, we can drop the



₁ intercept and define dummy variables for all of the individuals. This is the more common approach. The



_i now become the intercepts for each of the i’s.

• If E[ε_it|X_is,c_is]≠0, then LSDV cannot be used. It is inconsistent. In this case, we need to use IVs. Or a good natural experiment.

(62)

FEM: First-Difference (FD) Method

1 2 1 1 ( ) ( ( 1))              



_it _it k j jit jit j it it Y X X t t Y

• We can also eliminate the individual FE using the first-difference method. • The unobserved effect is eliminated by subtracting the observation for the previous time period from the observation for the current time period, for all time periods.

1 2        



k _it _it j jit j it X Y    

• The error term is now (



_it –



_it_–1). As before, differencing induces a moving average autocorrelation if



_it satisfies the CLM assumptions. Note: If



_it is subject to AR(1) autocorrelation and



is close to 1, taking first differences may approximately solve the problem.

(63)

FEM: Estimation – FE or FD?

• Fixed-effects (or Within) Estimator

– Each variable is demeaned -i.e., subtracted by its average.

– Dummy Variable Regression -i.e., put in a dummy variable for each cross-sectional unit, along with other explanatory variables. This may cause estimation difficulty when N is large.

• FD Estimator

– Each variable is differenced once over time, so we are effectively estimating the relationship between changes of variables.

(64)

• Theoretically, when N is large and T is small but greater than 2 (for

T=2, FE=FD), FE is more efficient when ε_it are serially uncorrelated while FD is more efficient when ε_it follows a random walk (ρ=1).

• When T is large and N is small

– FD has advantage for processes with large positive autocorrelation. (If



is near 1, FD solves the nonstationary problem!)

– FE is more sensitive to nonnormality, heteroskedasticity, and serial correlation in ε_it.

– On the other hand, FE is less sensitive to violation of the strict

exogeneity assumption. Then, FE is preferred when the processes are weakly dependent over time

(65)

FEM: Calculation of Var[b|X]

• Assume strict exogeneity: Cov[ε_it,(x_js,c_j)]=0. Every disturbance in every period for each person is uncorrelated with variables and effects for every person and across periods.

• Now, we have OLS in a CLM.

• Asy.Var[b|X] = ( 2 / N_{i=1 i}T )plim[( 2 / _{i=1 i}N T ) _i=1N ] 1

which is the usual estimator for OLS

         i i D i X M X





i T N 2 2 _{i=1 t=1} _it _i _it N i=1 i (y -a -x ) ˆ T - N - K

(Note the degrees of freedom correction)

       b

(66)

FEM: PCSE

• If heteroscedasticity is suspected, we can use PSCE (clustered SE) for robust inferences. All previous comments and remarks apply to the

FEM.

• If we do not suspect autocorrelation problems –not a strange

situation, given that many panel data sets have heavy temporally spaced observations-, we can rely on White SE’s (S₀).

• We can clustered them by one variable (say, industry) or by several

variables (say, year and industry) --“multi-level clustering.” The clustered White-style SE’s are sometimes called Rogers SE.

• If we suspect autocorrelation problems, then the Driscoll and Kraay SE should be used.

(67)

FEM: Testing for Fixed Effects

• Under H₀(No FE): α_i = α for all i.

• That is, we test whether to pool or not to pool the data. • Different tests:

– F-test based on the LSDV dummy variable model: constant or zero coefficients for D. Test follows an F(_N-1,NT-N-K) distribution. – F-test based on FEM (the unrestricted model) vs. pooled model

(the restricted model). Test follows an F(_N-1,NT-N-K) distribution. – A LR can also be done –usually, assuming normality. Test follows

a χ2

(68)

FEM: Hypothesis Testing

• Based on estimated residuals of the fixed effects model. (1) Estimate FEM:

y_it = x_it’β + α_i + _it => Keep residuals e_FE,_it

(2) Tests as usual: – Heteroscedasticity

• Breusch and Pagan (1980) – Autocorrelation: AR(1)

• Breusch and Godfrey (1981)

2 1 1 2

'

1 

_



















 d FE FE FE FE

e

T

NT

LM

(69)

Application: Cornwell and Rupert Data (Greene)

Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years

Variables in the file are: (Not used in regressions) EXP = work experience, EXPSQ = EXP2

WKS = weeks worked

OCC = occupation, 1 if blue collar,

(IND = 1 if manufacturing industry)

(SOUTH = 1 if resides in south)

SMSA = 1 if resides in a city (SMSA)

MS = 1 if married

FEM = 1 if female

UNION = 1 if wage set by unioin contract

ED = years of education

(BLK = 1 if individual is black)

LWAGE = log of wage = dependent variable in regressions (Y)

These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155.

(70)

Application: Cornwell and Rupert (Greene)

(1) Returns to Schooling - Pooled OLS Results

K

(71)

(2) Returns to Schooling - LSDV Results

Application: Cornwell and Rupert (Greene)

N+K

RSS & R2 _{X and group}

(72)

FEM: Testing for FE (and other formulations)

• Calculations: F-test_594,3566 = [(651.78-83.89)/594]/[83.89/3566] = 40.64

Pooled

(73)

The Random Effects Model (REM)

• Recall the general DGP:

y_it = x_it’ + z_i’ γ + _it -observation for individual i at time t.

When the observed characteristics are constant for each individual, a FEM is not an effective tool because such variables cannot be included. • An alternative approach, known as a random effects (REM) model that, subject to two conditions, provides a solution to this problem. • Conditions:

(1)It is possible to treat each of the unobserved Z_p variables as being drawn randomly from a given distribution.

(2) The Z_p variables are distributed independently of all of the

X

_j

(74)

• Conditions:

(1) The unobserved Z_p variables are drawn randomly from a given distribution. Thus, the c_i may be treated as RV (thus, the name of this approach) drawn from a given distribution. Let’s call it u_i. Then,

The Random Effects Model (REM)

it i it u w  



it k j jit j it i k j jit j it w t X t u X Y         



         2 1 2 1

• We deal with the unobserved effect by subsuming it into a compound disturbance term, w_it. We will assume that the Z_pis drawn from a

distribution with zero mean and constant variance. Then,

0 ) ( ) ( ) ( ) (w_it  E u_i  _it  E u_i  E _it  E



(75)

• The zero mean assumption –E[u_i] = 0-- is not crucial, any nonzero component is being absorbed by the intercept,



₁.

(2) The Z_p variables are distributed independently of all of the X_j

variables. Otherwise, u_i and the compounded error, w_it, will not be uncorrelated with

X

_j. The RE estimation will be biased and

inconsistent.

Note: We would have to use the FEM, even if the first condition seems to be satisfied.

• If (1) and (2) are satisfied, we can use the REM, but there is a complication. Now, w_it will be heterosccedastic.

(76)

REM: Error Components Model

• REM Assumptions: y_it = x_it’ + z_i’ γ + _it = x_it’ + u_i + _it = x_it’ + w_it E[_it|X_i]=0 E[_it2_|_X i]= σε2 E[u_i|X_i]=0 E[u_i2_|_X i]= σu2

E[u_i_it|X_i] = E[u_i_jt|X_i ] =0 - u and  are independent. E[u_iu_j |X_i] = 0 (i≠j) -no cross-correlation of RE.

E[_it_jt|X_i]= 0 (i≠j) -no cross-correlation for the errors, _it. E[_it_js|X_i]= 0 (t≠s) -there is no autocorrelation for _it.

2 2 , 2 2 2 2

₂

    

























_w _u _u _u _u it i it i it i it 2 ) )( ( ₁ ₂ 2 1w u u u w_it _it



_i _it _i _it



 __ __ 

(77)

REM: Notation (Greene)

1 1 1 2 2 2 N N N i u T observations u T observations u T observations = + + T observations = +                   _  _   _                           1 1 1 2 2 2 N N N N i=1 y X ε i y X ε i β y X ε i Xβ ε u Xβ w In all that fo        i it it

llows, except where explicitly noted, X, X and x contain a constant term as the first element. To avoid notational clutter, in those cases, x etc. will

simply denote the counterpart without the constant term. Use of the symbol K for the number of variables will thus

(78)

2 2 2 2 u u u 2 2 2 2 u u u i i 2 2 2 2 u u u 2 2 u i i 2 2 u i 1 2 N Var[ +u ] = T T = = Var[ | ]     _     _ _                    i i T T ε i I ii I ii Ω Ω 0 0 0 Ω 0 w X 0 0 Ω                                    i

(Note these differ only in the dimension T )       _

REM: Notation (Greene)

• Note: If E[_it_jt|X_i ]≠ 0 (i≠j) or E[_it_js|X_i ] ≠ 0 (t≠s), we no longer

(79)

REM: Assumptions - Convergence of Moments

N i 1 i N i 1 i N i 1 i N i 1 i N N i 1 i u i 1 i i

f a weighted sum of individual moment matrices T T

f a weighted sum of individual moment matrices

T T = f f T Note asymptoti                    _    i i i i i 2 ii i 2 i i X X X X X Ω X X ΩX X X x x    i i i

cs are with respect to N. Each matrix is the T

moments for the T observations. Should be 'well behaved' in micro level data. The average of N such matrices should be likewise.

T or T is assum

_ii _i

X X

(80)

REM: Pooled OLS Estimation (Greene)

• Standard results for the pooled OLS estimator b in the GR model - Consistent and asymptotic normal

- Unbiased - Inefficient

• We can use pooled OLS, but for inferences we need the true variance –i.e., the sandwich estimator:

1 1 N N N N i 1 i i 1 i i 1 i i 1 i Var[ | ] T T T T

as N with our convergence assumptions

              _ _ _ _  _ _  _ _           -1 -1 1 X X X ΩX X X b X 0 Q Q * Q 0

(81)

REM: Sandwich Estimator for OLS (Greene)

1 1 N N N N i 1 i i 1 i i 1 i i 1 i N i 1 i N i 1 i N i 1 i N i 1 i V a r[ | ] T T T T f , w h e re = = E [ | ] T T

In th e spirit o f th e W h ite e stim a to r, u se ˆ ˆ ˆ f , T T                     _ _ _ _ _ _  _  _{ }  _{ }  _   _          i i i i i i i i i i i i 1 X X X ΩX X X b X X Ω X X ΩX _Ω _{w w} _X X w w X X ΩX _w ₌

H ypo th e sis te sts a re th e n ba se d o n W a ld sta tistics.

i i

y - X b

T H IS IS T H E 'C L U S T E R ' E S T IM A T O R • Recall: Clustered standard errors or PCSE

There is a grouping, or “cluster,” within which the error term is possibly correlated, but outside of which (across groups) it is not.

(82)

REM: Sandwich Estimator – Mechanics (Greene)





1 _N 1 i 1 i i ˆ ˆ Est.Var[ | ]

ˆ = set of T OLS residuals for individual i.

= T xK data on exogenous variable for individual i. ˆ = K x 1 vector of products ˆ ˆ ( )( )         _ _  _ _      i i i i i i i i i i i i b X X X X w w X X X w X X w X w w X





















 



N i 1 N N i 1 i 1

KxK matrix (rank 1, outer product)

ˆ ˆ = sum of N rank 1 matrices. Rank K. ˆ

ˆ ˆ

We could compute this as = . Why not do it that way?

            i i i i i i i i i i i X w w X X w w X X Ω X

(83)

Clustered Standard Errors (PCSE)

• Key Assumption

– Correlations within a cluster (a group of firms, a region, different years for the same firm, different years for the same region) are the same for different observations.

• Procedure

– (1) Identify clusters using economic theory (clustered by industry, year, industry and year)

– (2) Calculate clustered standard errors

– (3) Try different ways of defining clusters and see how estimated standard errors are affected. Be conservative, report largest SE. • Performance

– Not a lot of studies –some simulations done for simple DGPs. – PCSE’s coverage rates are not very good (typically below their