Chapter 3: The Multiple Linear Regression Model

(1)

Christophe Hurlin

University of Orléans

November 23, 2013

(2)

Introduction

(3)

The objectives of this chapter are the following:

1 De…ne the multiple linear regression model.

2 Introduce the ordinary least squares (OLS) estimator.

(4)

The outline of this chapter is the following:

Section 2: The multiple linear regression model Section 3: The ordinary least squares estimator Section 4: Statistical properties of the OLS estimator Subsection 4.1: Finite sample properties

Subsection 4:2: Asymptotic properties

(5)

References

Amemiya T. (1985), Advanced Econometrics. Harvard University Press.

Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil (recommended)

Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (a special thank)

Ruud P., (2000) An introduction to Classical Econometric Theory, Oxford University Press.

(6)

notation.

f_Y (y) probability density or mass function F_Y (y) cumulative distribution function Pr() probability

y vector

Y matrix

Be careful: in this chapter, I don’t distinguish between a random vector (matrix) and a vector (matrix) of deterministic elements. For more appropriate notations, see:

(7)

The Multiple Linear Regression Model

(8)

Objectives

1 De…ne the concept of Multiple linear regression model.

2 Semi-parametric andParametric multiple linear regression model.

3 The multiple linear Gaussianmodel.

(9)

De…nition (Multiple linear regression model)

The multiple linear regression model is used to study the relationship between a dependent variable and one or more independent variables. The generic form of the linear regression model is

y =x1β₁+x2β₂+..+xKβ_K +ε

where y is the dependent or explained variable and x₁, .., x_K are the independent or explanatory variables.

(10)

Notations

1 y is the dependent variable, the regressand or the explained variable.

2 xj is an explanatory variable, a regressor or a covariate.

3 ε is the error term or disturbance.

IMPORTANT: do not use the term "residual"..

(11)

Notations (cont’d)

The term ε is a random disturbance, so named because it “disturbs” an otherwise stable relationship. The disturbance arises for several reasons:

1 Primarily because we cannot hope to capture every in‡uence on an economic variable in a model, no matter how elaborate. The net e¤ect, which can be positive or negative, of theseomitted factors is captured in the disturbance.

2 There are many other contributors to the disturbance in an empirical model. Probably the most signi…cant iserrors of measurement. It is easy to theorize about the relationships among precisely de…ned variables; it is quite another to obtain accurate measures of these variables.

(12)

We assume that each observation in a sample f^yⁱ^{, x}^{i 1}^{, x}^{i 2}^..x^iKg ^for i =1, .., N is generated by an underlying process described by

y_i =x_{i 1}β₁+x_{i 2}β₂+..+ x_iKβ_K +εi

Remark:

x_ik =value of the k^th explanatory variable for the i^th unit of the sample xunit,variable

(13)

Let the N 1 column vector x_k be the N observations on variable x_k, for k =1, .., K .

Let assemble these data in an N K data matrix, X.

Let y be the N 1 column vector of the N observations, y_1,y₂, .., y_N. Let ε be the N 1 column vector containing the N disturbances.

(14)

y

N 1

= 0 BB BB BB

@ y₁ y₂ ..

y_i ..

yN

1 CC CC CC A

x_k

N 1

= 0 BB BB BB

@ x_1k x_2k ..

x_ik ..

xNk

1 CC CC CC A

Nε1= 0 BB BB BB

@ ε1

ε2

..

εi

..

εN

1 CC CC CC A

β

K 1

= 0 BB

@ β₁ β₂ ..

β_K 1 CC A

(15)

NXK = (x₁ : x2 : .. : xK) or equivalently

NXK = 0 BB BB BB

@

x11 x12 .. x_1k .. x_1K x₂₁ x₂₂ .. x_2k .. x_2K .. .. .. .. .. ..

xi 1 xi 2 .. x_ik .. x_iK

.. .. .. .. ..

x_{N 1} x_{N 2} .. x_Nk .. x_NK 1 CC CC CC A

(16)

Fact

In most of cases, the …rst column of X is assumed to be a column of 1s so that β₁ is the constant term in the model.

x₁

N 1

= 1

N 1

X

N K =

0 BB BB BB

@

1 x₁₂ .. x_1k .. x_1K 1 x22 .. x_2k .. x_2K .. .. .. .. .. ..

1 x_{i 2} .. x_ik .. x_iK

.. .. .. .. ..

1 x_{N 2} .. x_Nk .. x_NK 1 CC CC CC A

(17)

Remark

More generally, the matrix X may as well contain stochastic and non stochastic elements such as:

Constant;

Time trend;

Dummy variables (for speci…c episodes in time);

Etc.

Therefore, X is generally a mixture of …xed and random variables.

(18)

De…nition (Simple linear regression model)

The simple linear regression model is a model with only one stochastic regressor: K =1 if there is no constant

y_i = β₁x_i+εi

or K =2 if there is a constant:

yi =β₁+β₂xi 2+εi

for i =1, .., N, or

y= β₁+β₂x2+ε

(19)

De…nition (Multiple linear regression model)

The multiple linear regression model can be written y

N 1

= X

N K β

K 1

+ ε

N 1

(20)

One key di¤erence for the speci…cation of the MLRM:

Parametric/semi-parametric speci…cation Parametric model: the distribution of the error terms is fully characterized, e.g. ε N (^0,Ω)

Semi-Parametric speci…cation: only a few moments of the error terms are speci…ed, e.g. E(ε) =0 andV(ε) =E εε^> = Ω.

(21)

This di¤erence does not matter for the derivation of the ordinary least square estimator

But this di¤erence matters for (among others):

1 The characterization of the statistical properties of the OLS estimator (e.g., e¢ ciency);

2 The choice of alternative estimators (e.g., the maximum likelihood estimator);

3 Etc.

(22)

De…nition (Semi-parametric multiple linear regression model) The semi-parametric multiple linear regression model is de…ned by

y =Xβ+ε where the error term ε satis…es

E(εj^X) = 0

N 1

V(εj^X) =σ² I_N

N N

and I_N is the identity matrix of order N.

(23)

Remarks

1 If the matrix X is non stochastic (…xed), i.e. there are only …xed regressors, then the conditions on the error term u read:

E(ε) =0 V(ε) =σ²I_N

2 If the (conditional) variance covariance matrix of ε is not diagonal, i.e. if

V(εj^X) =_Ω

the model is called the Multiple Generalized Linear Regression Model

(24)

Remarks (cont’d)

The two conditions on the error term ε E(εj^X) =0_N ₁ V(εj^X) =σ²IN

are equivalent to

E(yj^X) =Xβ V(yj^X) =σ²I_N

(25)

De…nition (The multiple linear Gaussian model)

The (parametric) multiple linear Gaussian model is de…ned by y =Xβ+ε

where the error term ε is normally distributed ε N ^{0, σ}²^I^N

As a consequence, the vector y has a conditional normal distribution with yj^X N ^{Xβ, σ}²^IN

(26)

Remarks

1 The multiple linear Gaussian model is (by de…nition) a parametric model.

2 If the matrix X is non stochastic (…xed), i.e. there are only …xed regressors, then the vector y has marginal normal distribution:

y N ^{Xβ, σ}²^I^N

(27)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(28)

De…nition (Assumption 1: Linearity)

The model is linear with respect to the parameters β₁, .., β_K.

(29)

variable and the regressors. For instance, the models y =β₀+β₁x+u

y = β₀+β₁cos(x) +v y = β₀+β₁ 1

x +w are all linear (with respect to β).

In contrast, the model y =β₀+β₁x^β²+ε is non linear

(30)

Remark

The model can be linear after some transformations. Starting from y =Ax^βexp(ε), one has a log-linear speci…cation:

ln(y) =ln(A) +β ln(x) +ε

(31)

De…nition (Log-linear model) The loglinear model is

ln(y_i) =β₁ln(x_{i 1}) +β₂ln(x_{i 2}) +..+β_K ln(x_iK) +εi

This equation is also known as the constant elasticity form as in this equation, the elasticity of y with respect to changes in x does not vary with x_ik:

β_k = ^{∂ ln}(y_i)

∂ ln(x_ik) = ^∂yⁱ

∂xik

x_ik y_i

(32)

(33)

De…nition (Assumption 2: Full column rank) X is an N K matrix with rank K .

(34)

Interpretation

1 There is no exact relationship among any of the independent variables in the model.

2 The columns of X are linearly independent.

(35)

Example

Suppose that a cross-section model satis…es:

yi = β₀+β₁non labor incomei +β₂salaryi

+β₃total income_i+εi

The identi…cation condition does not hold since total income is exactly equal to salary plus non labor income (exact linear dependency in the model).

(36)

Remarks

1 Perfect multi-collinearity is generally not di¢ cult to spot and is signalled by most statistical software.

2 Imperfect multi-collinearity is a more serious issue (see further).

(37)

De…nition (Identi…cation)

The multiple linear regression model is identi…able if and only if one the following equivalent assertions holds:

(i) rank(X) =K

(ii) The matrix X^>X is invertible

(iii) The columns of X form a basis of L (^X)

(iv) Xβ₁ =Xβ₂ =)^β1 = β₂ 8 (^β1, β₂)2 R^K R^K (v) Xβ=0=)^β=0 8^β2R^K

(vi) ker(X) = f⁰g

(38)

(39)

De…nition (Assumption 3: Strict exogeneity of the regressors) The regressors are exogenous in the sense that:

E(εj^X) =0_N ₁ or equivalently for all the units i 2 f^{1, ..N}g

E(εij^X) =0 or equivalently

E(εij^x^jk) =0

for any explanatory variable k 2 f^{1, ..K}gand any unit j 2 f^{1, ..N}g^.

(40)

CommentsComments

1 The expected value of the error term at observation i (in the sample) is not a function of the independent variables observed at any

observation (including the i^th observation). The independent variables are not predictors of the error terms.

2 The strict exogeneity condition can be rewritten as:

E(y j ^X) =Xβ

3 If the regressors are …xed, this condition can be rewritten as:

E(ε) =0_N ₁

(41)

Implications

The (strict) exogeneity conditionE(εj^X) =0_N ₁ has two implications:

1 The zero conditional mean of ε implies that the unconditional mean of u is also zero (the reverse is not true):

E(ε) =EX(E(εj^X)) =EX (0) =0

2 The zero conditional mean of ε implies that (the reverse is not true):

E(εix_jk) =0 8^{i , j, k} or

Cov(εi, X) =0 8ⁱ

(42)

(43)

De…nition (Assumption 4: Spherical disturbances) The error terms are such that:

V(εij^X) =E ε²_i X = σ² for all i 2 f^{1, ..N}g and

Cov(εi, εjj^X) =_E(εi εjj^X) =0 for all i 6=^j

The condition of constant variances is called homoscedasticity. The uncorrelatedness across observations is called nonautocorrelation.

(44)

Comments

1 Spherical disturbances =homoscedasticity +nonautocorrelation

2 If the errors are not spherical, we call them nonspherical disturbances.

3 The assumption of homoscedasticity is a strong one: this is the exception rather than the rule!

(45)

Comments

Let us consider the (conditional) variance covariance matrix of the error terms:

V(εj^X)

| {z }

N N

= _{E εε}^> X

| {z }

N N

= 0 BB BB BB

@

E ε²₁ X E(ε1ε2j^X) .. E(ε1εjj^X) .. E(ε1εNj^X) E(ε2ε1j^X) _{E ε}²₂ X .. E(ε2εjj^X) .. E(ε2εNj^X)

.. .. .. .. ..

E(εiε1j^X) .. E(εiεjj^X) .. E(εiεNj^X)

.. .. .. .. ..

E(εNε1j^X) .. E(εNεjj^X) .. E ε²_N X 1 CC CC CC A

(46)

Comments

The two assumptions (homoscedasticity and nonautocorrelation) imply that:

V(εj^X)

| {z }

N N

= _{E εε}^> X

| {z }

N N

= σ² I_N

= 0 BB BB BB

@

σ² 0 .. 0 .. 0 0 σ² .. 0 .. 0 .. .. .. .. ..

.. .. .. .. 0 .. .. .. .. ..

0 .. 0 .. σ² 1 CC CC CC A

(47)

(48)

De…nition (Assumption 5: Data generation)

The data in (x_{i 1} x_{i 2} ...x_iK)may be any mixture of constants and random variables.

(49)

Comments

1 Analysis will be done conditionally on the observed X, so whether the elements in X are …xed constants or random draws from a stochastic process will not in‡uence the results.

2 In the case of stochastic regressors, the unconditional statistical properties of are obtained in two steps: (1) using the result conditioned on X and (2) …nding the unconditional result by

”averaging” (i.e., integrating over) the conditional distributions.

(50)

Comments

Assumptions regarding(x_{i 1} x_{i 2} ...x_iK y_i)for i =1, .., N is also required. This is a statement about how the sample is drawn.

In the sequel, we assume that (x_{i 1} x_{i 2} ...x_iK y_i)for i =1, .., N are independently and identically distributed (i.i.d).

The observations are drawn by a simple random sampling from a large population.

(51)

(52)

De…nition (Assumption 6: Normal distribution) The disturbances are normally distributed.

εij^X N ^{0, σ}² or equivalently

εj^X N ⁰^N ¹^{, σ}²^I^N

(53)

Comments

1 Once again, this is a convenience that we will dispense with after some analysis of its implications.

2 Normality is not necessary to obtain many of the results presented below.

3 Assumption 6 implies assumptions 3 (exogeneity) and 4 (spherical disturbances).

(54)

Summary

The main assumptions of the multiple linear regression model A1: linearity The model is linear with β

A2: identi…cation X is an N K matrix with rank K A3: exogeneity E(εj^X) =0_N ₁

A4: spherical error terms V(εj^X) =σ²I_N

A5: data generation X may be …xed or random A6: normal distribution εj^X N ⁰N 1, σ²I_N

(55)

Key Concepts

1 Simple linear regression model

2 Multiple linear regression model

3 Semi-parametric multiple linear regression model

4 Multiple linear Gaussian model

5 Assumptions of the multiple linear regression model

6 Linearity (A1), Identi…cation (A2), Exogeneity (A3), Spherical error terms (A4), Data generation (A5) and Normal distribution (A6)

(56)

The ordinary least squares estimator

(57)

Introduction

1 The simple linear regressionmodel assumes that the following speci…cation is true in the population:

y=Xβ+ε

where other unobserved factors determining y are captured by the error term ε.

2 Consider asample f^x^{i 1}^{, x}^{i 2}^{, .., x}^iK^{, y}ⁱg^Ni=1 of i .i .d . random variables (be careful to the change of notations here) and only one realization of this sample (your data set).

3 How to estimatethe vector of parameters β?

(58)

Introduction (cont’d)

1 If we assume that assumptions A1-A6 hold, we have a multiple linear Gaussian model (parametric model), and a solution is to use the MLE. The MLE estimator for β coincides to the ordinary least squares (OLS) estimator (cf. chapter 2).

2 If we assume that only assumptions A1-A5 hold, we have a semi-parametric multiple linear regression model, the MLE is unfeasible.

3 In this case, the only solution is to use theordinary least squares estimator (OLS).

(59)

Intuition

Let us consider the simple linear regression model and for simplicity denote x_i =x_{i 2}:

y_i = β₁+β₂x_i+εi

The general idea of the OLS consists in minimizing the ”distance”

between the points (xi, yi)and the regression line byⁱ =bβ₁+bβ₂x_i or the points (x_i,byi)for all i =1, .., N

(60)

(61)

Estimates of β₁ and β₂ are chosen by minimizing the sum of the squared residuals (SSR):

∑

N i=1

bε²i

This SSR can be written as:

∑

N i=1

bε²i = y_i bβ₁ bβ₂x_i ²

Therefore, bβ₁ and bβ₂ are the solutions of the minimization problem bβ₁, bβ₂ = arg min

∑

N ⁽^yⁱ ^β¹ ^β²^xⁱ⁾²

(62)

De…nition (OLS - simple linear regression model)

In the simple linear regression model y_i = β₁+β₂x_i+εi, the OLS estimators bβ₁ and bβ₂ are the solutions of the minimization problem

bβ₁, bβ₂ = arg min

(β₁,β2)

∑

N i=1

(y_i β₁ β₂x_i)²

The solutions are:

bβ₁ =y_N bβ₂x_N bβ₂ = ^∑

N

i=1(x_i x_N) (y_i y_N)

∑^Ni=1(x_i x_N)²

where y =N ¹∑^N yi and x =N ¹∑^N xi respectively denote the

(63)

Remark

The OLS estimator is a linear estimator (cf. chapter 1) since it can be expressed as a linear function of the observations yi :

bβ₂ =

∑

N i=1

ωiy_i with

ωi = (xi xN)

∑^Ni=1(xi x_N)² in the case where y_N =0.

(64)

De…nition (Fitted value)

The predicted or …tted value for observation i is:

byⁱ =bβ₁+bβ₂xi

with a sample mean equal to the sample average of the observations byN = ¹

N

∑

N i=1

byi = y_N = ¹ N

∑

N i=1

y_i

(65)

De…nition (Fitted residual) The residual for observation i is:

bεi =yi bβ₁ bβ₂xi

with a sample mean equal to zero by de…nition.

bεN = ¹ N

∑

N i=1

bεi =0

(66)

Remarks

1 The …t of the regression is ”good” if the sum ∑^Ni=1bε²i (or SSR) is

”small”, i.e., the unexplained part of the variance of y is ”small”.

2 The coe¢ cient of determination or R² is given by:

R² = ^∑

N

i=1(_byi y_N)²

∑^Ni=1(y_i y_N)² =1 ∑^Ni=1bε²i

∑^Ni=1(y_i y_N)²

(67)

Orthogonality conditions

Under assumption A3 (strict exogeneity), we have E(εij^xⁱ) =0. This condition implies that:

E(εi) =0 E(εix_i) =0

Using the sample analog of this moment conditions (cf. chapter 6, GMM), one has:

1 N

∑

N i=1

y_i bβ₁ bβ₂x_i =0 1

N

∑

N i=1

y_i bβ₁ bβ₂x_i x_i =0

(68)

De…nition (Orthogonality conditions)

The ordinary least squares estimator can be de…ned from the two sample analogs of the following moment conditions:

E(εi) =0 E(εix_i) =0

The corresponding system of equations is just-identi…ed.

(69)

OLS and multiple linear regression model Now consider the multiple linear regression model

y =Xβ+ε or

y_i =

∑

K k=1

β_kx_ik +εi

Objective: …nd an estimator (estimate) of β₁, β₂, .., β_K and σ² under the assumptions A1-A5.

(70)

OLS and multiple linear regression model

Di¤erent methods:

1 Minimize the sum of squared residuals (SSR)

2 Solve the same minimization problem with matrix notation.

3 Use moment conditions.

4 Geometrical interpretation

(71)

1. Minimize the sum of squared residuals (SSR):

As in the simple linear regression, bβ= arg min

β

∑

N i=1

ε²_i = arg min

β

∑

N i=1

yi

∑

K k=1

β_kxik

!2

One can derive the …rst order conditions with respect to β_k for k =1, .., K and solve a system of K equations with K unknowns.

(72)

2. Using matrix notations:

De…nition (OLS and multiple linear regression model) In the multiple linear regression model y_i =x^>_i β+εi, with x_i = (x_{i 1}, .., x_iK)^>, the OLS estimator bβ is the solution of

bβ= arg min

β

∑

N i=1

y_i x_i^>β

2

The OLS estimators of β is:

bβ=

∑

N i=1

x_ix^>_i

! 1

∑

N i=1

x_iy_i

!

(73)

De…nition (Normal equations)

Under suitable regularity conditions, in the multiple linear regression model y_i =x^>_i β+εi, with x_i = (x_{i 1} : .. : xiK)^>, the normal equations are

∑

N i=1

xi yi x^>_i bβ =0K 1

(74)

De…nition (OLS and multiple linear regression model)

In the multiple linear regression model y=Xβ+ε, the OLS estimator bβ is the solution of the minimization problem

bβ= arg min

β

ε^>ε = arg min

β

(y Xβ)^>(y Xβ) The OLS estimators of β is:

bβ= X^>X ¹ X^>y

(75)

De…nition

The ordinary least squares estimator bβ of β minimizes the following criteria s(β) = k(^y ^Xβ)k²I_N = (y Xβ)^>(y Xβ)

(76)

The FOC (normal equations) are de…ned by:

∂s(β)

∂β _bβ = 2 X|{z}^>

K N

y Xbβ

| {z }

N 1

=0_K ₁

The second-order conditions hold:

∂s(β)

∂β∂β^> _bβ

=2 X| {z }^>X

K K

is de…nite positive

since by de…nition X^>X is a positive de…nite matrix. We have a minimum.

(77)

De…nition (Normal equations)

Under suitable regularity conditions, in the multiple linear regression model y =Xβ+ε, the normal equations are given by:

X^>

|{z}

K N

(y Xβ)

| {z }

N 1

=0_K ₁

(78)

De…nition (Unbiased variance estimator)

In the multiple linear regression model y=Xβ+ε, the unbiased estimator of σ² is given by:

bσ² = ¹

N K

∑

N i=1

bε²i

SSR

N K

(79)

The estimator bσ² can also be written as:

bσ² = ¹

N K

∑

N i=1

yi x^>_i bβ ²

bσ²= (y Xβ)^>(y Xβ)

N K

bσ² = k(^y ^Xβ)k²IN

N K

(80)

3. Using moment conditions:

Under assumption A3 (strict exogeneity), we have E(εj^X) =0. This condition implies:

E(εix_i) =0_K ₁

with xi = (xi 1 : .. : xiK)^>. Using the sample analogs, one has:

1 N

∑

N i=1

xi yi x^>_i bβ =0K 1

We have K (normal) equations with K unknown parameters bβ₁, .., bβ_K. The system is just identi…ed.

(81)

4. Geometric interpretation:

1 The ordinary least squares estimation methods consists in determining the adjusted vector, by, which is the closest to y (in a certain space...) such that the squared norm between y andby is minimized.

2 Findingby is equivalent to …nd an estimator of β.

(82)

De…nition (Geometric interpretation)

The adjusted vector,by, is the (orthogonal) projection of y onto the column space of X. The …tted error terms,bε, is the projection of y onto the orthogonal space engendered by the column space of X. The vectors by andbε are orthogonal.

(83)

Source: F. Pelgrin (2010), Lecture notes, Advanced Econometrics

(84)

De…nition (Projection matrices) The vectorsby and bε are de…ned to be:

by=P y bε=M y

where P and M denote the two following projection matrices:

P=X X^>X

1

X^>

M=I_N P=I_N X X^>X ¹X^>

(85)

Other geometric interpretations:

Suppose that there is a constant term in the model.

1 The least squares residuals sum to zero:

∑

N i=1

bεi =0

2 The regression hyperplane passes through the point of means of the data (x_N, y_N).

3 The mean of the …tted (adjusted) values of y equals the mean of the actual values of y :

byN =y_N

(86)

De…nition (Coe¢ cient of determination)

The coe¢ cient of determination of the multiple linear regression model (with a constant term) is the ratio of the total (empirical) variance explained by model to the total (empirical) variance of y:

R² = ^∑

N

i=1(_byi y_N)²

∑^Ni=1(yi y_N)² =1 ∑^Ni=1bε²i

∑^Ni=1(yi y_N)²

(87)

Remark

1 The coe¢ cient of determination measures the proportion of the total variance (or variability) in y that is accounted for by variation in the regressors (or the model).

2 Problem: the R² automatically and spuriously increases when extra explanatory variables are added to the model.

(88)

De…nition (Adjusted R-squared)

The adjusted R-squared coe¢ cient is de…ned to be:

R² =1 N 1

N p 1 1 R²

where p denotes the number of regressors (not counting the constant term, i.e., p =K 1 if there is a constant or p=K otherwise).

(89)

Remark

One can show that

1 R²<R²

2 if N is large R² '^R²

3 The adjusted R-squared R² can be negative.

(90)

Key Concepts

1 OLS estimator and estimate

2 Fitted or predicted value

3 Residual or …tted residual

4 Orthogonality conditions

5 Normal equations

6 Geometric interpretations of the OLS

Coe¢ cient of determination and adjusted R-squared

(91)

Statistical properties of the OLS estimator

(92)

In order to study the statistical properties of the OLS estimator, we have to distinguish (cf. chapter 1):

1 The…nite sample properties

2 The large sample orasymptotic properties

(93)

But, we have also to distinguish the properties given the assumptions made on the linear regression model

1 Semi-parametric linear regression model(the exact distribution of ε is unknown) versus parametric linear regression model(and especially Gaussian linear regression model, assumption A6).

2 X is a matrix of random regressorsversus X is a matrix of …xed regressors.

(94)

Fact (Assumptions)

In the rest of this section, we assume that assumptions A1-A5 hold.

A1: linearity The model is linear with β

A2: identi…cation X is an N K matrix with rank K A3: exogeneity E(εj^X) =0_N ₁

A4: spherical error terms V(εj^X) =σ²I_N

A5: data generation X may be …xed or random

(95)

Finite sample properties of the OLS estimator

(96)

Objectives

The objectives of this subsection are the following:

1 Compute the two …rst moments of the (unknown) …nite sample distribution of the OLS estimators bβ andbσ²

2 Determine the …nite sample distribution of the OLS estimators bβ and bσ under particular assumptions (A6).

3 Determine if the OLS estimators are "good": e¢ cient estimator versus BLUE.

4 Introduce the Gauss-Markov theorem.

(97)

First moments of the OLS estimators

(98)

Moments

In a …rst step, we will derive the …rst moments of the OLS estimators

1 Step 1: computeE bβ and V bβ

2 Step 2: computeE bσ² andV bσ²

(99)

De…nition (Unbiased estimator)

In the multiple linear regression model y=Xβ₀+ε, under the assumption A3 (strict exogeneity), the OLS estimator bβ is unbiased:

E bβ =β₀

where β₀ denotes the true value of the vector of parameters. This result holds whether or not the matrix X is considered as random.

(100)

Proof

Case 1: …xed regressors (cf. chapter 1)

bβ= X^>X ¹ X^>y =β₀+ X^>X ¹ X^>ε So, if X is a matrix of …xed regressors:

E bβ = β₀+ X^>X ¹ X^>E(ε)

Under assumption A3 (exogeneity),E(εj^X) =E(ε) =0. Then, we get:

E bβ =β₀

(101)

Proof (cont’d)

Case 2: random regressors

bβ= X^>X ¹ X^>y =β₀+ X^>X ¹ X^>ε If X is includes some random elements:

E bβ X =β₀+ X^>X ¹ X^>E(εj^X) Under assumption A3 (exogeneity),E(εj^X) =0. Then, we get:

E bβ X =β₀

(102)

Case 2: random regressors

The OLS estimator bβ is conditionally unbiased.

E bβ X =β₀ Besides, we have:

E bβ =_EX E bβ X =_EX (β₀) =β₀

where EX denotes the expectation with respect to the distribution of X.

So, the OLS estimator bβ is unbiased.

E bβ

(103)

De…nition (Variance of the OLS estimator, non-stochastic regressors) In the multiple linear regression model y=Xβ+ε, if the matrix X is non-stochastic, the unconditional variance covariance matrix of the OLS estimator bβ is:

V bβ =σ² X^>X ¹

(104)

Proof

bβ= X^>X ¹ X^>y =β₀+ X^>X ¹ X^>ε So, if X is a matrix of …xed regressors:

V bβ = _E bβ β₀ ^> bβ β₀

= E X^>X ¹X^>εε^>X X^>X ¹

= X^>X ¹X^>E εε^> X X^>X ¹

(105)

Under assumption A4 (spherical disturbances), we have:

V(ε) =_{E εε}^> =σ²I_N

The variance covariance matrix of the OLS estimator is de…ned by:

V bβ = X^>X ¹X^>E εε^> X X^>X ¹

= X^>X ¹X^>σ²I_NX X^>X ¹

= σ² X^>X ¹ X^>X X^>X ¹

= σ² X^>X ¹

(106)

De…nition (Variance of the OLS estimator, stochastic regressors) In the multiple linear regression model y=Xβ₀+ε, if the matrix X is stochastic, the conditional variance covariance matrix of the OLS estimator bβ is:

V bβ X =σ² X^>X ¹ The unconditional variance covariance matrix is equal to:

V bβ =σ² EX X^>X ¹

(107)

Proof

bβ= X^>X ¹ X^>y =β₀+ X^>X ¹ X^>ε So, if X is a stochastic matrix:

V bβ X = E bβ β₀ ^> bβ β₀ X

= E X^>X ¹X^>εε^>X X^>X ¹ X

= X^>X ¹X^>E εε^> X X X^>X ¹

(108)

Proof (cont’d)

Under assumption A4 (spherical disturbances), we have:

V(εj^X) =E εε^> X =σ²I_N

The conditional variance covariance matrix of the OLS estimator is de…ned by:

V bβ X = X^>X ¹X^>E εε^> X X X^>X ¹

= X^>X ¹X^>σ²I_NX X^>X ¹

= σ² X^>X ¹

(109)

Proof (cont’d) We have:

V bβ X =σ² X^>X ¹

The (unconditional) variance covariance matrix of the OLS estimator is de…ned by:

V bβ =_E_X _{V bβ X} =σ² EX X^>X ¹

(110)

Summary

Case 1: X stochastic Case 2: X nonstochastic

Mean E bβ =β₀ E bβ = β₀

Variance V bβ =σ² EX X^>X ¹ V bβ =σ² X^>X ¹

Cond. mean E bβ X =β₀ —

Cond. var V bβ X =σ² X^>X ¹ —

(111)

Question

How to estimate the variance covariance matrix of the OLS estimator?

V bβ_OLS =σ² X^>X ¹ if X is nonstochastic V bβ_OLS =σ² EX X^>X ¹ if X is stochastic

(112)

Question (cont’d)

De…nition (Variance estimator)

An unbiased estimator of the variance covariance matrix of the OLS estimator is given:

V bβb _OLS = _bσ² X^>X ¹

where bσ² = (N K) ¹_bε^>bε is an unbiased estimator of σ². This result holds whether X is stochastic or non stochastic.

(113)

Summary

Case 1: X stochastic Case 2: X nonstochastic Variance V bβ =σ² EX X^>X ¹ V bβ =σ² X^>X ¹ Estimator V bβb _OLS = _bσ² X^>X ¹ V bβb _OLS =_bσ² X^>X ¹

(114)

De…nition (Estimator of the variance of disturbances)

Under the assumption A1-A5, in the multiple linear regression model y =Xβ+ε, the estimator bσ² is unbiased:

E bσ² =σ² where

bσ² = ¹

N K

∑

N i=1

bε²i = ^bε

>bε

N K

This result holds whether or not the matrix X is considered as random.

(115)

We assume that X is stochastic. Let M denotes the projection matrix (“residual maker”) de…ned by:

M=IN X X^>X ¹X^>

with

(N ,1bε )= M

(N ,N) y

(N ,1)

The N N matrix M satis…es the following properties:

1 if X is regressed on X, a perfect …t will result and the residuals will be zero, so M X=0

The matrix M is symmetric M^>=M and idempotent M M=M

(116)

Proof (cont’d)

The residuals are de…ned as to be:

bε=M y Since y=Xβ+ε, we have

bε=M(Xβ+ε) =MXβ+Mε Since MX =0, we have

bε=Mε

(117)

Proof (cont’d)

The estimator bσ² is based on the sum of squared residuals (SSR) bσ² = ^bε

>bε

N K = ^ε^>^Mε

N K

The expected value of the SSR is

E bε^>bε X =E ε^>Mε X

The scalar ε^>Mε is a 1 1 scalar, so it is equal to its trace.

tr E ε^>Mε X =tr E εε^>M X =tr E Mεε^> X since tr(AB) = (AB).

(118)

Proof (cont’d)

Since M =I_N X X^>X ¹X^> depends on X, we have:

E bε^>bε X =tr E Mεε^> X =tr M E εε^> X Under assumptions A3 and A4, we have

E εε^> X =σ²I_N As a consequence

E bε^>bε X =tr σ²M I_N =σ²tr(M)

(119)

Proof (cont’d)

E bε^>bε X = σ² tr(M)

= σ² tr IN X X^>X ¹X^>

= σ² tr(I_N) σ² tr X X^>X ¹X^>

= σ² tr(I_N) σ² tr X^>X X^>X ¹

= σ² tr(I_N) σ² tr(I_K)

= σ²(N K)

(120)

Proof (cont’d)

By de…nition of bσ², we have:

E bσ² X = ^{E bε}

>bε X

N K = ^σ

2(N K)

N K =σ²

So, the estimator bσ² is conditionally unbiased.

E bσ² =EX E bσ² X =EX σ² =σ² The estimator bσ² is unbiased:

E bσ² =σ²

(121)

Remark

Given the same principle, we can compute the variance of the estimator bσ². bσ² = ^bε

>bε

N K = ^ε^>^Mε

N K

As a consequence, we have:

V bσ² X = ¹

(N K)^{V ε}

>Mε X

V bσ² =_EX V bσ² X But, it takes... at least ten slides...

(122)

De…nition (Variance of the estimator bσ²)

In the multiple linear regression model y=Xβ₀+ε, the variance of the estimator bσ² is

V bσ² = ^2σ

4

N K

where σ² denotes the true value of variance of the error terms. This result holds whether or not the matrix X is considered as random.