• No results found

Chapter 3: The Multiple Linear Regression Model

N/A
N/A
Protected

Academic year: 2022

Share "Chapter 3: The Multiple Linear Regression Model"

Copied!
174
0
0

Loading.... (view fulltext now)

Full text

(1)

Christophe Hurlin

University of Orléans

November 23, 2013

(2)

Introduction

(3)

The objectives of this chapter are the following:

1 De…ne the multiple linear regression model.

2 Introduce the ordinary least squares (OLS) estimator.

(4)

The outline of this chapter is the following:

Section 2: The multiple linear regression model Section 3: The ordinary least squares estimator Section 4: Statistical properties of the OLS estimator Subsection 4.1: Finite sample properties

Subsection 4:2: Asymptotic properties

(5)

References

Amemiya T. (1985), Advanced Econometrics. Harvard University Press.

Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil (recommended)

Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (a special thank)

Ruud P., (2000) An introduction to Classical Econometric Theory, Oxford University Press.

(6)

notation.

fY (y) probability density or mass function FY (y) cumulative distribution function Pr() probability

y vector

Y matrix

Be careful: in this chapter, I don’t distinguish between a random vector (matrix) and a vector (matrix) of deterministic elements. For more appropriate notations, see:

(7)

The Multiple Linear Regression Model

(8)

Objectives

1 De…ne the concept of Multiple linear regression model.

2 Semi-parametric andParametric multiple linear regression model.

3 The multiple linear Gaussianmodel.

(9)

De…nition (Multiple linear regression model)

The multiple linear regression model is used to study the relationship between a dependent variable and one or more independent variables. The generic form of the linear regression model is

y =x1β1+x2β2+..+xKβK +ε

where y is the dependent or explained variable and x1, .., xK are the independent or explanatory variables.

(10)

Notations

1 y is the dependent variable, the regressand or the explained variable.

2 xj is an explanatory variable, a regressor or a covariate.

3 ε is the error term or disturbance.

IMPORTANT: do not use the term "residual"..

(11)

Notations (cont’d)

The term ε is a random disturbance, so named because it “disturbs” an otherwise stable relationship. The disturbance arises for several reasons:

1 Primarily because we cannot hope to capture every in‡uence on an economic variable in a model, no matter how elaborate. The net e¤ect, which can be positive or negative, of theseomitted factors is captured in the disturbance.

2 There are many other contributors to the disturbance in an empirical model. Probably the most signi…cant iserrors of measurement. It is easy to theorize about the relationships among precisely de…ned variables; it is quite another to obtain accurate measures of these variables.

(12)

Notations (cont’d)

We assume that each observation in a sample fyi, xi 1, xi 2..xiKg for i =1, .., N is generated by an underlying process described by

yi =xi 1β1+xi 2β2+..+ xiKβK +εi

Remark:

xik =value of the kth explanatory variable for the ith unit of the sample xunit,variable

(13)

Notations (cont’d)

Let the N 1 column vector xk be the N observations on variable xk, for k =1, .., K .

Let assemble these data in an N K data matrix, X.

Let y be the N 1 column vector of the N observations, y1,y2, .., yN. Let ε be the N 1 column vector containing the N disturbances.

(14)

Notations (cont’d)

y

N 1

= 0 BB BB BB

@ y1 y2 ..

yi ..

yN

1 CC CC CC A

xk

N 1

= 0 BB BB BB

@ x1k x2k ..

xik ..

xNk

1 CC CC CC A

Nε1= 0 BB BB BB

@ ε1

ε2

..

εi

..

εN

1 CC CC CC A

β

K 1

= 0 BB

@ β1 β2 ..

βK 1 CC A

(15)

Notations (cont’d)

NXK = (x1 : x2 : .. : xK) or equivalently

NXK = 0 BB BB BB

@

x11 x12 .. x1k .. x1K x21 x22 .. x2k .. x2K .. .. .. .. .. ..

xi 1 xi 2 .. xik .. xiK

.. .. .. .. ..

xN 1 xN 2 .. xNk .. xNK 1 CC CC CC A

(16)

Fact

In most of cases, the …rst column of X is assumed to be a column of 1s so that β1 is the constant term in the model.

x1

N 1

= 1

N 1

X

N K =

0 BB BB BB

@

1 x12 .. x1k .. x1K 1 x22 .. x2k .. x2K .. .. .. .. .. ..

1 xi 2 .. xik .. xiK

.. .. .. .. ..

1 xN 2 .. xNk .. xNK 1 CC CC CC A

(17)

Remark

More generally, the matrix X may as well contain stochastic and non stochastic elements such as:

Constant;

Time trend;

Dummy variables (for speci…c episodes in time);

Etc.

Therefore, X is generally a mixture of …xed and random variables.

(18)

De…nition (Simple linear regression model)

The simple linear regression model is a model with only one stochastic regressor: K =1 if there is no constant

yi = β1xi+εi

or K =2 if there is a constant:

yi =β1+β2xi 2+εi

for i =1, .., N, or

y= β1+β2x2+ε

(19)

De…nition (Multiple linear regression model)

The multiple linear regression model can be written y

N 1

= X

N K β

K 1

+ ε

N 1

(20)

One key di¤erence for the speci…cation of the MLRM:

Parametric/semi-parametric speci…cation Parametric model: the distribution of the error terms is fully characterized, e.g. ε N (0,)

Semi-Parametric speci…cation: only a few moments of the error terms are speci…ed, e.g. E(ε) =0 andV(ε) =E εε> = Ω.

(21)

This di¤erence does not matter for the derivation of the ordinary least square estimator

But this di¤erence matters for (among others):

1 The characterization of the statistical properties of the OLS estimator (e.g., e¢ ciency);

2 The choice of alternative estimators (e.g., the maximum likelihood estimator);

3 Etc.

(22)

De…nition (Semi-parametric multiple linear regression model) The semi-parametric multiple linear regression model is de…ned by

y =+ε where the error term ε satis…es

E(εjX) = 0

N 1

V(εjX) =σ2 IN

N N

and IN is the identity matrix of order N.

(23)

Remarks

1 If the matrix X is non stochastic (…xed), i.e. there are only …xed regressors, then the conditions on the error term u read:

E(ε) =0 V(ε) =σ2IN

2 If the (conditional) variance covariance matrix of ε is not diagonal, i.e. if

V(εjX) =

the model is called the Multiple Generalized Linear Regression Model

(24)

Remarks (cont’d)

The two conditions on the error term ε E(εjX) =0N 1 V(εjX) =σ2IN

are equivalent to

E(yjX) = V(yjX) =σ2IN

(25)

De…nition (The multiple linear Gaussian model)

The (parametric) multiple linear Gaussian model is de…ned by y =+ε

where the error term ε is normally distributed ε N 0, σ2IN

As a consequence, the vector y has a conditional normal distribution with yjX N Xβ, σ2IN

(26)

Remarks

1 The multiple linear Gaussian model is (by de…nition) a parametric model.

2 If the matrix X is non stochastic (…xed), i.e. there are only …xed regressors, then the vector y has marginal normal distribution:

y N Xβ, σ2IN

(27)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(28)

De…nition (Assumption 1: Linearity)

The model is linear with respect to the parameters β1, .., βK.

(29)

variable and the regressors. For instance, the models y =β0+β1x+u

y = β0+β1cos(x) +v y = β0+β1 1

x +w are all linear (with respect to β).

In contrast, the model y =β0+β1xβ2+ε is non linear

(30)

Remark

The model can be linear after some transformations. Starting from y =Axβexp(ε), one has a log-linear speci…cation:

ln(y) =ln(A) +β ln(x) +ε

(31)

De…nition (Log-linear model) The loglinear model is

ln(yi) =β1ln(xi 1) +β2ln(xi 2) +..+βK ln(xiK) +εi

This equation is also known as the constant elasticity form as in this equation, the elasticity of y with respect to changes in x does not vary with xik:

βk = ∂ ln(yi)

∂ ln(xik) = ∂yi

∂xik

xik yi

(32)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(33)

De…nition (Assumption 2: Full column rank) X is an N K matrix with rank K .

(34)

Interpretation

1 There is no exact relationship among any of the independent variables in the model.

2 The columns of X are linearly independent.

(35)

Example

Suppose that a cross-section model satis…es:

yi = β0+β1non labor incomei +β2salaryi

+β3total incomei+εi

The identi…cation condition does not hold since total income is exactly equal to salary plus non labor income (exact linear dependency in the model).

(36)

Remarks

1 Perfect multi-collinearity is generally not di¢ cult to spot and is signalled by most statistical software.

2 Imperfect multi-collinearity is a more serious issue (see further).

(37)

De…nition (Identi…cation)

The multiple linear regression model is identi…able if and only if one the following equivalent assertions holds:

(i) rank(X) =K

(ii) The matrix X>X is invertible

(iii) The columns of X form a basis of L (X)

(iv) Xβ1 =2 =)β1 = β2 8 (β1, β2)2 RK RK (v) Xβ=0=)β=0 8β2RK

(vi) ker(X) = f0g

(38)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(39)

De…nition (Assumption 3: Strict exogeneity of the regressors) The regressors are exogenous in the sense that:

E(εjX) =0N 1 or equivalently for all the units i 2 f1, ..Ng

E(εijX) =0 or equivalently

E(εijxjk) =0

for any explanatory variable k 2 f1, ..Kgand any unit j 2 f1, ..Ng.

(40)

CommentsComments

1 The expected value of the error term at observation i (in the sample) is not a function of the independent variables observed at any

observation (including the ith observation). The independent variables are not predictors of the error terms.

2 The strict exogeneity condition can be rewritten as:

E(y j X) =

3 If the regressors are …xed, this condition can be rewritten as:

E(ε) =0N 1

(41)

Implications

The (strict) exogeneity conditionE(εjX) =0N 1 has two implications:

1 The zero conditional mean of ε implies that the unconditional mean of u is also zero (the reverse is not true):

E(ε) =EX(E(εjX)) =EX (0) =0

2 The zero conditional mean of ε implies that (the reverse is not true):

E(εixjk) =0 8i , j, k or

Cov(εi, X) =0 8i

(42)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(43)

De…nition (Assumption 4: Spherical disturbances) The error terms are such that:

V(εijX) =E ε2i X = σ2 for all i 2 f1, ..Ng and

Cov(εi, εjjX) =E(εi εjjX) =0 for all i 6=j

The condition of constant variances is called homoscedasticity. The uncorrelatedness across observations is called nonautocorrelation.

(44)

Comments

1 Spherical disturbances =homoscedasticity +nonautocorrelation

2 If the errors are not spherical, we call them nonspherical disturbances.

3 The assumption of homoscedasticity is a strong one: this is the exception rather than the rule!

(45)

Comments

Let us consider the (conditional) variance covariance matrix of the error terms:

V(εjX)

| {z }

N N

= E εε> X

| {z }

N N

= 0 BB BB BB

@

E ε21 X E(ε1ε2jX) .. E(ε1εjjX) .. E(ε1εNjX) E(ε2ε1jX) E ε22 X .. E(ε2εjjX) .. E(ε2εNjX)

.. .. .. .. ..

E(εiε1jX) .. E(εiεjjX) .. E(εiεNjX)

.. .. .. .. ..

E(εNε1jX) .. E(εNεjjX) .. E ε2N X 1 CC CC CC A

(46)

Comments

The two assumptions (homoscedasticity and nonautocorrelation) imply that:

V(εjX)

| {z }

N N

= E εε> X

| {z }

N N

= σ2 IN

= 0 BB BB BB

@

σ2 0 .. 0 .. 0 0 σ2 .. 0 .. 0 .. .. .. .. ..

.. .. .. .. 0 .. .. .. .. ..

0 .. 0 .. σ2 1 CC CC CC A

(47)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(48)

De…nition (Assumption 5: Data generation)

The data in (xi 1 xi 2 ...xiK)may be any mixture of constants and random variables.

(49)

Comments

1 Analysis will be done conditionally on the observed X, so whether the elements in X are …xed constants or random draws from a stochastic process will not in‡uence the results.

2 In the case of stochastic regressors, the unconditional statistical properties of are obtained in two steps: (1) using the result conditioned on X and (2) …nding the unconditional result by

”averaging” (i.e., integrating over) the conditional distributions.

(50)

Comments

Assumptions regarding(xi 1 xi 2 ...xiK yi)for i =1, .., N is also required. This is a statement about how the sample is drawn.

In the sequel, we assume that (xi 1 xi 2 ...xiK yi)for i =1, .., N are independently and identically distributed (i.i.d).

The observations are drawn by a simple random sampling from a large population.

(51)

The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity

Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity

Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution

(52)

De…nition (Assumption 6: Normal distribution) The disturbances are normally distributed.

εijX N 0, σ2 or equivalently

εjX N 0N 1, σ2IN

(53)

Comments

1 Once again, this is a convenience that we will dispense with after some analysis of its implications.

2 Normality is not necessary to obtain many of the results presented below.

3 Assumption 6 implies assumptions 3 (exogeneity) and 4 (spherical disturbances).

(54)

Summary

The main assumptions of the multiple linear regression model A1: linearity The model is linear with β

A2: identi…cation X is an N K matrix with rank K A3: exogeneity E(εjX) =0N 1

A4: spherical error terms V(εjX) =σ2IN

A5: data generation X may be …xed or random A6: normal distribution εjX N 0N 1, σ2IN

(55)

Key Concepts

1 Simple linear regression model

2 Multiple linear regression model

3 Semi-parametric multiple linear regression model

4 Multiple linear Gaussian model

5 Assumptions of the multiple linear regression model

6 Linearity (A1), Identi…cation (A2), Exogeneity (A3), Spherical error terms (A4), Data generation (A5) and Normal distribution (A6)

(56)

The ordinary least squares estimator

(57)

Introduction

1 The simple linear regressionmodel assumes that the following speci…cation is true in the population:

y=+ε

where other unobserved factors determining y are captured by the error term ε.

2 Consider asample fxi 1, xi 2, .., xiK, yigNi=1 of i .i .d . random variables (be careful to the change of notations here) and only one realization of this sample (your data set).

3 How to estimatethe vector of parameters β?

(58)

Introduction (cont’d)

1 If we assume that assumptions A1-A6 hold, we have a multiple linear Gaussian model (parametric model), and a solution is to use the MLE. The MLE estimator for β coincides to the ordinary least squares (OLS) estimator (cf. chapter 2).

2 If we assume that only assumptions A1-A5 hold, we have a semi-parametric multiple linear regression model, the MLE is unfeasible.

3 In this case, the only solution is to use theordinary least squares estimator (OLS).

(59)

Intuition

Let us consider the simple linear regression model and for simplicity denote xi =xi 2:

yi = β1+β2xi+εi

The general idea of the OLS consists in minimizing the ”distance”

between the points (xi, yi)and the regression line byi =1+2xi or the points (xi,byi)for all i =1, .., N

(60)
(61)

Estimates of β1 and β2 are chosen by minimizing the sum of the squared residuals (SSR):

N i=1

2i

This SSR can be written as:

N i=1

2i = yi 1 2xi 2

Therefore, bβ1 and bβ2 are the solutions of the minimization problem 1, bβ2 = arg min

N (yi β1 β2xi)2

(62)

De…nition (OLS - simple linear regression model)

In the simple linear regression model yi = β1+β2xi+εi, the OLS estimators bβ1 and bβ2 are the solutions of the minimization problem

1, bβ2 = arg min

(β12)

N i=1

(yi β1 β2xi)2

The solutions are:

1 =yN 2xN 2 =

N

i=1(xi xN) (yi yN)

Ni=1(xi xN)2

where y =N 1N yi and x =N 1N xi respectively denote the

(63)

Remark

The OLS estimator is a linear estimator (cf. chapter 1) since it can be expressed as a linear function of the observations yi :

2 =

N i=1

ωiyi with

ωi = (xi xN)

Ni=1(xi xN)2 in the case where yN =0.

(64)

De…nition (Fitted value)

The predicted or …tted value for observation i is:

byi =1+2xi

with a sample mean equal to the sample average of the observations byN = 1

N

N i=1

byi = yN = 1 N

N i=1

yi

(65)

De…nition (Fitted residual) The residual for observation i is:

i =yi 1 2xi

with a sample mean equal to zero by de…nition.

N = 1 N

N i=1

i =0

(66)

Remarks

1 The …t of the regression is ”good” if the sum ∑Ni=12i (or SSR) is

”small”, i.e., the unexplained part of the variance of y is ”small”.

2 The coe¢ cient of determination or R2 is given by:

R2 =

N

i=1(byi yN)2

Ni=1(yi yN)2 =1 ∑Ni=12i

Ni=1(yi yN)2

(67)

Orthogonality conditions

Under assumption A3 (strict exogeneity), we have E(εijxi) =0. This condition implies that:

E(εi) =0 E(εixi) =0

Using the sample analog of this moment conditions (cf. chapter 6, GMM), one has:

1 N

N i=1

yi 1 2xi =0 1

N

N i=1

yi 1 2xi xi =0

(68)

De…nition (Orthogonality conditions)

The ordinary least squares estimator can be de…ned from the two sample analogs of the following moment conditions:

E(εi) =0 E(εixi) =0

The corresponding system of equations is just-identi…ed.

(69)

OLS and multiple linear regression model Now consider the multiple linear regression model

y =+ε or

yi =

K k=1

βkxik +εi

Objective: …nd an estimator (estimate) of β1, β2, .., βK and σ2 under the assumptions A1-A5.

(70)

OLS and multiple linear regression model

Di¤erent methods:

1 Minimize the sum of squared residuals (SSR)

2 Solve the same minimization problem with matrix notation.

3 Use moment conditions.

4 Geometrical interpretation

(71)

1. Minimize the sum of squared residuals (SSR):

As in the simple linear regression, = arg min

β

N i=1

ε2i = arg min

β

N i=1

yi

K k=1

βkxik

!2

One can derive the …rst order conditions with respect to βk for k =1, .., K and solve a system of K equations with K unknowns.

(72)

2. Using matrix notations:

De…nition (OLS and multiple linear regression model) In the multiple linear regression model yi =x>i β+εi, with xi = (xi 1, .., xiK)>, the OLS estimator bβ is the solution of

= arg min

β

N i=1

yi xi>β

2

The OLS estimators of β is:

=

N i=1

xix>i

! 1

N i=1

xiyi

!

(73)

2. Using matrix notations:

De…nition (Normal equations)

Under suitable regularity conditions, in the multiple linear regression model yi =x>i β+εi, with xi = (xi 1 : .. : xiK)>, the normal equations are

N i=1

xi yi x>i =0K 1

(74)

2. Using matrix notations:

De…nition (OLS and multiple linear regression model)

In the multiple linear regression model y=+ε, the OLS estimator bβ is the solution of the minimization problem

= arg min

β

ε>ε = arg min

β

(y )>(y ) The OLS estimators of β is:

= X>X 1 X>y

(75)

2. Using matrix notations:

De…nition

The ordinary least squares estimator bβ of β minimizes the following criteria s(β) = k(y )k2IN = (y )>(y )

(76)

2. Using matrix notations:

The FOC (normal equations) are de…ned by:

∂s(β)

∂β = 2 X|{z}>

K N

y Xbβ

| {z }

N 1

=0K 1

The second-order conditions hold:

∂s(β)

∂β∂β>

=2 X| {z }>X

K K

is de…nite positive

since by de…nition X>X is a positive de…nite matrix. We have a minimum.

(77)

2. Using matrix notations:

De…nition (Normal equations)

Under suitable regularity conditions, in the multiple linear regression model y =+ε, the normal equations are given by:

X>

|{z}

K N

(y )

| {z }

N 1

=0K 1

(78)

De…nition (Unbiased variance estimator)

In the multiple linear regression model y=+ε, the unbiased estimator of σ2 is given by:

2 = 1

N K

N i=1

2i

SSR

N K

(79)

2. Using matrix notations:

The estimator 2 can also be written as:

2 = 1

N K

N i=1

yi x>i 2

2= (y )>(y )

N K

2 = k(y )k2IN

N K

(80)

3. Using moment conditions:

Under assumption A3 (strict exogeneity), we have E(εjX) =0. This condition implies:

E(εixi) =0K 1

with xi = (xi 1 : .. : xiK)>. Using the sample analogs, one has:

1 N

N i=1

xi yi x>i =0K 1

We have K (normal) equations with K unknown parameters bβ1, .., bβK. The system is just identi…ed.

(81)

4. Geometric interpretation:

1 The ordinary least squares estimation methods consists in determining the adjusted vector, by, which is the closest to y (in a certain space...) such that the squared norm between y andby is minimized.

2 Findingby is equivalent to …nd an estimator of β.

(82)

4. Geometric interpretation:

De…nition (Geometric interpretation)

The adjusted vector,by, is the (orthogonal) projection of y onto the column space of X. The …tted error terms,bε, is the projection of y onto the orthogonal space engendered by the column space of X. The vectors by andbε are orthogonal.

(83)

Source: F. Pelgrin (2010), Lecture notes, Advanced Econometrics

(84)

4. Geometric interpretation:

De…nition (Projection matrices) The vectorsby and bε are de…ned to be:

by=P y =M y

where P and M denote the two following projection matrices:

P=X X>X

1

X>

M=IN P=IN X X>X 1X>

(85)

Other geometric interpretations:

Suppose that there is a constant term in the model.

1 The least squares residuals sum to zero:

N i=1

i =0

2 The regression hyperplane passes through the point of means of the data (xN, yN).

3 The mean of the …tted (adjusted) values of y equals the mean of the actual values of y :

byN =yN

(86)

De…nition (Coe¢ cient of determination)

The coe¢ cient of determination of the multiple linear regression model (with a constant term) is the ratio of the total (empirical) variance explained by model to the total (empirical) variance of y:

R2 =

N

i=1(byi yN)2

Ni=1(yi yN)2 =1 ∑Ni=12i

Ni=1(yi yN)2

(87)

Remark

1 The coe¢ cient of determination measures the proportion of the total variance (or variability) in y that is accounted for by variation in the regressors (or the model).

2 Problem: the R2 automatically and spuriously increases when extra explanatory variables are added to the model.

(88)

De…nition (Adjusted R-squared)

The adjusted R-squared coe¢ cient is de…ned to be:

R2 =1 N 1

N p 1 1 R2

where p denotes the number of regressors (not counting the constant term, i.e., p =K 1 if there is a constant or p=K otherwise).

(89)

Remark

One can show that

1 R2<R2

2 if N is large R2 'R2

3 The adjusted R-squared R2 can be negative.

(90)

Key Concepts

1 OLS estimator and estimate

2 Fitted or predicted value

3 Residual or …tted residual

4 Orthogonality conditions

5 Normal equations

6 Geometric interpretations of the OLS

Coe¢ cient of determination and adjusted R-squared

(91)

Statistical properties of the OLS estimator

(92)

In order to study the statistical properties of the OLS estimator, we have to distinguish (cf. chapter 1):

1 The…nite sample properties

2 The large sample orasymptotic properties

(93)

But, we have also to distinguish the properties given the assumptions made on the linear regression model

1 Semi-parametric linear regression model(the exact distribution of ε is unknown) versus parametric linear regression model(and especially Gaussian linear regression model, assumption A6).

2 X is a matrix of random regressorsversus X is a matrix of …xed regressors.

(94)

Fact (Assumptions)

In the rest of this section, we assume that assumptions A1-A5 hold.

A1: linearity The model is linear with β

A2: identi…cation X is an N K matrix with rank K A3: exogeneity E(εjX) =0N 1

A4: spherical error terms V(εjX) =σ2IN

A5: data generation X may be …xed or random

(95)

Finite sample properties of the OLS estimator

(96)

Objectives

The objectives of this subsection are the following:

1 Compute the two …rst moments of the (unknown) …nite sample distribution of the OLS estimators bβ and2

2 Determine the …nite sample distribution of the OLS estimators bβ and bσ under particular assumptions (A6).

3 Determine if the OLS estimators are "good": e¢ cient estimator versus BLUE.

4 Introduce the Gauss-Markov theorem.

(97)

First moments of the OLS estimators

(98)

Moments

In a …rst step, we will derive the …rst moments of the OLS estimators

1 Step 1: computeE bβ and V bβ

2 Step 2: computeE bσ2 andV bσ2

(99)

De…nition (Unbiased estimator)

In the multiple linear regression model y=0+ε, under the assumption A3 (strict exogeneity), the OLS estimator bβ is unbiased:

E bβ =β0

where β0 denotes the true value of the vector of parameters. This result holds whether or not the matrix X is considered as random.

(100)

Proof

Case 1: …xed regressors (cf. chapter 1)

= X>X 1 X>y =β0+ X>X 1 X>ε So, if X is a matrix of …xed regressors:

E bβ = β0+ X>X 1 X>E(ε)

Under assumption A3 (exogeneity),E(εjX) =E(ε) =0. Then, we get:

E bβ =β0

(101)

Proof (cont’d)

Case 2: random regressors

= X>X 1 X>y =β0+ X>X 1 X>ε If X is includes some random elements:

E bβ X =β0+ X>X 1 X>E(εjX) Under assumption A3 (exogeneity),E(εjX) =0. Then, we get:

E bβ X =β0

(102)

Case 2: random regressors

The OLS estimator bβ is conditionally unbiased.

E bβ X =β0 Besides, we have:

E bβ =EX E bβ X =EX (β0) =β0

where EX denotes the expectation with respect to the distribution of X.

So, the OLS estimator bβ is unbiased.

E bβ

(103)

De…nition (Variance of the OLS estimator, non-stochastic regressors) In the multiple linear regression model y=+ε, if the matrix X is non-stochastic, the unconditional variance covariance matrix of the OLS estimator bβ is:

V bβ =σ2 X>X 1

(104)

Proof

= X>X 1 X>y =β0+ X>X 1 X>ε So, if X is a matrix of …xed regressors:

V bβ = E bβ β0 > bβ β0

= E X>X 1X>εε>X X>X 1

= X>X 1X>E εε> X X>X 1

(105)

Under assumption A4 (spherical disturbances), we have:

V(ε) =E εε> =σ2IN

The variance covariance matrix of the OLS estimator is de…ned by:

V bβ = X>X 1X>E εε> X X>X 1

= X>X 1X>σ2INX X>X 1

= σ2 X>X 1 X>X X>X 1

= σ2 X>X 1

(106)

De…nition (Variance of the OLS estimator, stochastic regressors) In the multiple linear regression model y=0+ε, if the matrix X is stochastic, the conditional variance covariance matrix of the OLS estimator bβ is:

V bβ X =σ2 X>X 1 The unconditional variance covariance matrix is equal to:

V bβ =σ2 EX X>X 1

where EX denotes the expectation with respect to the distribution of X.

(107)

Proof

= X>X 1 X>y =β0+ X>X 1 X>ε So, if X is a stochastic matrix:

V bβ X = E bβ β0 > bβ β0 X

= E X>X 1X>εε>X X>X 1 X

= X>X 1X>E εε> X X X>X 1

(108)

Proof (cont’d)

Under assumption A4 (spherical disturbances), we have:

V(εjX) =E εε> X =σ2IN

The conditional variance covariance matrix of the OLS estimator is de…ned by:

V bβ X = X>X 1X>E εε> X X X>X 1

= X>X 1X>σ2INX X>X 1

= σ2 X>X 1

(109)

Proof (cont’d) We have:

V bβ X =σ2 X>X 1

The (unconditional) variance covariance matrix of the OLS estimator is de…ned by:

V bβ =EX V bβ X =σ2 EX X>X 1

where EX denotes the expectation with respect to the distribution of X.

(110)

Summary

Case 1: X stochastic Case 2: X nonstochastic

Mean E bβ =β0 E bβ = β0

Variance V bβ =σ2 EX X>X 1 V bβ =σ2 X>X 1

Cond. mean E bβ X =β0

Cond. var V bβ X =σ2 X>X 1

(111)

Question

How to estimate the variance covariance matrix of the OLS estimator?

V bβOLS =σ2 X>X 1 if X is nonstochastic V bβOLS =σ2 EX X>X 1 if X is stochastic

(112)

Question (cont’d)

De…nition (Variance estimator)

An unbiased estimator of the variance covariance matrix of the OLS estimator is given:

V bβb OLS = 2 X>X 1

where 2 = (N K) 1>bε is an unbiased estimator of σ2. This result holds whether X is stochastic or non stochastic.

(113)

Summary

Case 1: X stochastic Case 2: X nonstochastic Variance V bβ =σ2 EX X>X 1 V bβ =σ2 X>X 1 Estimator V bβb OLS = 2 X>X 1 V bβb OLS =2 X>X 1

(114)

De…nition (Estimator of the variance of disturbances)

Under the assumption A1-A5, in the multiple linear regression model y =+ε, the estimator 2 is unbiased:

E bσ2 =σ2 where

2 = 1

N K

N i=1

2i =

>

N K

This result holds whether or not the matrix X is considered as random.

(115)

We assume that X is stochastic. Let M denotes the projection matrix (“residual maker”) de…ned by:

M=IN X X>X 1X>

with

(N ,1 )= M

(N ,N) y

(N ,1)

The N N matrix M satis…es the following properties:

1 if X is regressed on X, a perfect …t will result and the residuals will be zero, so M X=0

The matrix M is symmetric M>=M and idempotent M M=M

(116)

Proof (cont’d)

The residuals are de…ned as to be:

=M y Since y=+ε, we have

=M(+ε) =MXβ+ Since MX =0, we have

=

(117)

Proof (cont’d)

The estimator 2 is based on the sum of squared residuals (SSR) 2 =

>

N K = ε>

N K

The expected value of the SSR is

E bε>bε X =E ε>Mε X

The scalar ε>Mε is a 1 1 scalar, so it is equal to its trace.

tr E ε>Mε X =tr E εε>M X =tr E Mεε> X since tr(AB) = (AB).

(118)

Proof (cont’d)

Since M =IN X X>X 1X> depends on X, we have:

E bε>bε X =tr E Mεε> X =tr M E εε> X Under assumptions A3 and A4, we have

E εε> X =σ2IN As a consequence

E bε>bε X =tr σ2M IN =σ2tr(M)

(119)

Proof (cont’d)

E bε>bε X = σ2 tr(M)

= σ2 tr IN X X>X 1X>

= σ2 tr(IN) σ2 tr X X>X 1X>

= σ2 tr(IN) σ2 tr X>X X>X 1

= σ2 tr(IN) σ2 tr(IK)

= σ2(N K)

(120)

Proof (cont’d)

By de…nition of 2, we have:

E bσ2 X = E bε

>bε X

N K = σ

2(N K)

N K =σ2

So, the estimator 2 is conditionally unbiased.

E bσ2 =EX E bσ2 X =EX σ2 =σ2 The estimator 2 is unbiased:

E bσ2 =σ2

(121)

Remark

Given the same principle, we can compute the variance of the estimator 2. 2 =

>

N K = ε>

N K

As a consequence, we have:

V bσ2 X = 1

(N K)V ε

>Mε X

V bσ2 =EX V bσ2 X But, it takes... at least ten slides...

(122)

De…nition (Variance of the estimator 2)

In the multiple linear regression model y=0+ε, the variance of the estimator 2 is

V bσ2 =

4

N K

where σ2 denotes the true value of variance of the error terms. This result holds whether or not the matrix X is considered as random.

References

Related documents

In terms of the outcomes of Experiment 3, if the mood and arousal account (Ritter &amp; Ferguson, 2017; Thompson et al., 2001) is correct, then we expected to observe an increase

However, where union wage premia are disappearing we suggest that an explanation in terms of risk management by employers is at least worth considering – employers facing high

Result of mutual benefit life assurance services and more quickly and individuals in terms of life cover the prudential building, an accident insurance is the terms and cover

We utilize this background knowledge to generate SAM of PFOA on mild steel surface by dip coating (DC-PFOA) and in a new approach; it is intended to generate an

These strategies using CRISPR/Cas9 are not only therapy oriented but can also be used for disease modeling as well, which in turn can lead to the improved understanding of mechanisms

This section presents formal models of data centers and workloads and a thermal aware scheduling algorithm, which allocates compute resources in a data center for incoming

Taking coal enterprise as the object, a safety management audit risk evaluation model is constructed based on the improved TOPSIS method.. Com- pared with the classical TOPSIS

Annual written school notices to all parents , Written notice when we become aware of need , Oral notification by home room teachers , Oral notification by special education