CHVtheMultipleRegressionModel.pptx

(1)

V. The Multiple Regression Model

a. The Multiple Regression Model b. The Data and Least Squares c. Inference and F-tests

d. Prediction

(2)

a. The Multiple Regression Model

Many problems involve more than one independent variable or factor which affects the dependent or response variable e.g.

– _{Single and multiple factor asset pricing models (CAPM and}

APT)

– _{Demand for a product given prices of competing brands,}

advertising, household attributes

(3)

We can add the additional variables into the simple model in a linear fashion:

0 1 1 2 2 k k

Y

   

X

 

X



L

 

X

 

Conditional Mean: E[Y|X1, …, Xk] Error Term: Part of Y unrelated to the X’s

The interpretation of the regression coefficients, β_j, can be extended from the simple regression case:





j k 1 j

X

,...,

X

|

Y

E





β

(4)

We can plot the regression plane as a surface in three dimensions.

Consider an example of Sales of a product as predicted by Price of this product (P1) and the price of a competing product (P2).

Sales = 1000 – 100P1 + 110P2

hold P2 fixed and vary P1

(5)

b. The Data and Least Squares

The data in the multiple regression model is a set of points with values of each X variable and Y

Data:

ith value of Xj in the data

Model:

i 0 1 1,i 2 2,i k k,i i

Y

  

X

 

X



L

 

X

 

ε

_i

~ N 0,σ

(

2

)

X

_1i

,X

_2i

,

K

,X

_ji

,

K

,X

_ki

,Y

_i

(6)

How do we estimate the parameters?

We use the principle of Least Squares just as we did in SLR Model:

– _{we define the fitted values}

– _{find the best fitting plane by minimizing the sum of squared}

residuals

Fitted Values:

i 0 1 1,i k k,i

(7)

Residuals:

i i

ˆ

i

e



Y



Y

Least Squares:

Standard Error of the Regression:



 



N

2 i i 1

1

s

e

N k 1

chooseb

₀

,b

₁

,

K

,

b

_k

to m

ini

m

ize

e

_i2

i1 N

(8)

Let’s run a multiple regression with the Sales and Price Data.

Model: Sales

_i



β

₀



β

₁

p1

_i



β

₂

p2

_i



ε

_i

The regression equation is Sales = 116 - 97.7 p1 + 109 p2

b₁ b₂

(9)

As in the SLR model, the residuals in the multiple regression model are purged of any relationship to the independent

variables.

( )

_j



K

corr X ,e

0; j 1, ,k

Y



Y 

ˆ

Part linearly explained by the Xs Unexplained part

(10)

Let’s see this graphically:

(11)

If the regression fits the data well, there should be a strong correlation between the actual and fitted Y values:

ˆY

_i

,Y

_i

(

)

(12)

c. Inference and F-tests

All inference from the simple regression extends to multiple regression, including:

– _{standard errors}

– _{confidence intervals} – _t-tests

Standard Errors

Since each coefficient estimator is a linear combination of normal random variables, each b_i (i = 0,1, ..., k) is normally distributed.

(13)

The variance of b_j is complicated and depends on the variation of all of the X’s.

We call the estimated standard deviations the standard errors:

j j

b b

s

 

ˆ

What are the factors which influence the size of the standard errors?

i. σ2

ii. N

iii. S_I.V.- variation of X_j which is independent of other X vars

(14)

Confidence Intervals

(

1



α

)



100

%

C

I.

:.

b

_j



t

_N__k_₁_,_α_/₂

s

_b_j

t-Tests

  

  

 





j

*

0 j j

*

j j

N k 1, / 2 b

test

H :

b

reject if t

t

(15)

c. F-tests

t-test procedures are designed to examine one coefficient at a time for statistical significance.

In many situations, we need a testing procedure that can address simultaneous hypotheses about more than one coefficient.

We will look at two important types of simultaneous or joint hypotheses:

(16)

c. F-tests

1. Overall Test of Significance

Suppose we run a regression with some 6 regressors and get a modest R2.

We may want to see if indeed there is any relationship in this data

Implicitly, we want to test the hypothesis:

0 1 2 k

(17)

c. F-tests: Country Returns Example

Surely the answer must be yes but how strong is

evidence?

data(countryret)

(18)

c. F-tests

So why don’t we just use R2? If R2 > 0, then we reject the

null hypothesis of no relationship.

Problem: R2 will always be > 0 even if there is no

relationship between Y and the X’s.

This is because least squares is content to fit “noise” in the data.

(19)

c. F-tests

To see this, let’s generate some garbage data that has nothing to do with USA returns (10 variables):

Now we will create a data frame with only the USA return and these 10 variables which are, by definition, independent of USA returns.

cbind() takes two spreadsheets and joins them (“column bind”).

(20)

c. F-tests

Look at that R2!

Look at that

R2

(21)

c. F-tests

It seems then, that an R2 of 15 per cent is pretty to close

zero in the sense as that this can be expected even if the “true” R2 is 0. As usual, we need a statistical notion of how

“close is close.”

It turns out that under the null hypothesis we can derive the distribution of a transformation of R2.

Define the F statistic for testing overall significance of the regression as follows:

f



R

2

k

1

−

R

2

(22)

c. F-tests

Properties of this statistic:

– _{can take on values between 0 and infinity} – _{the larger R}2_{, the larger F}

– _{under the null, we expect F to be clustered around small}

positive values.

What is the null distribution of f introduced on the last slide?

0 1

k N ,

k

under

H

F

~

f

_ _

(23)

c. F-tests

What kind of distribution is this?

It is a right skewed, positive valued family of distributions which is indexed by two parameters, the df in numerator and the df in the denominator. 4 3 2 1 0 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 f F -d en s

To test the null hypothesis, we compute a value of the F test using R2.

(24)

c. F-tests

We only get “excited” by large values of F, small values are consistent with the null!.

This is called a one-sided test.

Summary:

i. choose significance level α

ii. find critical value from INVCDF function…

0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0

F-de

ns α

(25)

c. F-tests

Summary continued…

iii. Compute f

(

)

(

)

SSE

(

N k 1

)

k SSR 1 k N R 1 k R f ₂ 2       

Two equivalent expressions for f

Let’s do it with country returns data.

(26)

c. F-tests

Step iii. Compute F Stat

(27)

c. F-tests

2. Partial F – tests

In many situations, a group of regressors are identified which we all agree are important to include in the model.

Another group of variables are somewhat questionable. (these typically have low t-stats)

Are the last group of variables worth including? We write the model as:

1 1 1 1 1 2 1 2

i 0 1 1,i k k ,i k 1 k 1,i k k k k ,i i

Y   X  K  X   _ X _  K  _ X _  

(28)

c. F-tests

We wish to test:

0

s

'

k

last

the

of

one

least

at

:

H

vs

.

0

=

...

=

:

H

2 a k + k 1 k

0 ₁ ₁ ₂





β

Intuitively, we could check the goodness of fit with and without the k₂ additional variables.

If adding in the additional k₂ variables improved the fit

(29)

c. F-tests

The problem: R2 will always increase as we add additional

variables even if these variables have zero population coefficients.

Why? Least Squares likes additional degrees of freedom to make fitted values which are closer to the observed Y values.

(30)

c. F-tests

Adjusted R2

Some have suggested computing a new quantity called adjusted R2 to take into account this problem:

2 y 2 2 s s 1 ) 1 N /( SST ) 1 k N /( SSE 1

R  

   



Note that we are looking at a ratio of variance estimates. will not necessarily increase as other regressors are added.

Also, can be < 0!

The problem with the use of is that we have no statistical _R2

R2

(31)

c. F-tests

Partial F-tests

We can develop a test for inclusion of a subset of variables by using the change in R2 as we add the variables.

. R

R R

Δ 2  2_full  _restricted2

Here:

is the R2 from the regression with all variables included and

is the R2 from the regression with only the first k

1 variables

included

This test is sometimes referred to as a partial F test or an

"inclusion/exclusion" test.

full 2 R restricted 2 R

f  ΔR

2 _k 2

1−R _full2

(32)

c. F-tests

Back to Country Returns Data

1. Run Full and Restricted Regressions

(33)

c. F-tests

Restricted regression (remove insignificant ind vars):

Compute the Critical Value for F:

(34)

c. F-tests

Compute F and P - Values:

f = ΔR

2 _k

2

1−R _full2

( ) N( −(k₁  k₂)−1) 

.566 −.547

( ) 3 1−.566

( ) 100 1.459

(35)

This is the standard error of prediction . The standard error of fit is very complicated for multiple regression

d. Prediction

There is nothing new about prediction in multiple regression that was not covered in the SLR notes.

Prediction Problem:





f j,f

i 1,i k,i

Predict Y given X

j 1,

,k

and data

Y, X ,

,X

i 1,

,N



K

We use the least squares fitted plane to provide the prediction rule. Our prediction interval is:

N k 1, / 2

*

f 0 1 1,f k k,f pred

ˆY

b

b X

t

s

  











K







( 2)

fit 2

pred s stderr

(36)

d. Prediction

Back to the Pricing and Sales Example:

N k 1, /2 *

f 0 1 1,f k k,f pred

ˆY b b X

b X

t

s

  





(37)

e. Multiple Regression Explained: Pricing Puzzle

We’ve looked only at the multiple regression of Sales on p1 and p2. What if we looked at only the relationship between sales and p1.

(38)

It appears that there is a positive, but weak, relationship between sales and the price of your product.

(39)

You are about to sue your micro-economics instructor for fraud, when your colleague says –

“Why don’t you look at the plot of P1 vs. P2? May be there is something screwy there.”

15 10 5 0 9 8 7 6 5 4 3 2 1 0 p2 p1 15 10 5 0 9 8 7 6 5 4 3 2 1 0 p2 p1

A Strong Positive Correlation!

This means that as you increase your price, the competition

(40)

e. Multiple Regression Explained

Multiple Regression: Sales on p1 and p2 SLR: Sales on p1

(41)

Why is there such a difference? The difference stems from confounding of effects.

Own price

(P1) Competitor's Price (P2) Sales

+ corr

+

(42)

How does the multiple regression work? Let’s make a multiple regression using only simple regressions.

If p1 and p2 were uncorrelated, there would be no difference between the simple and multiple regression results.

Problem: How can we “purge” p1 of it’s relationship to p2?

Solution: Use residuals from regression of p1 on p2

(43)

Proceed in two steps:

Step 1:

Regress P1 on P2 to purge P1 of relationship to P2.

(44)

Step 2:

Regress Sales on e_1.2

Same as that from multiple regression

(45)

How Does Multiple Regression Compute the Standard Error?

Key Insight:

The independent variation of P1 enables us to estimate the pure

or partial effect of P1 on Sales.

Independent variation is that part of the variation in P1 which in

unrelated to P2.

Independent Variation Covariance

(46)

f. More on the Interpretation of MR Coefficients

To use MR intelligently, it is essential that we fully understand the sources of the differences between the SLR and MR models.

We also need to understand how to interpret the coefficients for the purpose of making business decisions.

The Difference:

MR :

Y



β

₀



β

₁

X

₁



β

₂

X

₂



K +β

_k

X

_k

+ε

* 1

* 1 *

0

X

Y

:

SLR



β



β



ε

(47)

As we saw, the differences between the SLR and MR coefficients stem from the confounding of effects in SLR.

i.e. The SLR model incorporates the influences of all of the other variables in the error term

Thus, we should interpret the SLR coefficients as the effect of X₁ averaged over the values and effects of the other

variables.

SLR: Effect of X₁ on Y taking into account the co-movement of X₁ with the other variables

(48)

To see this another way, consider a multiple regression with two variables.

Suppose we want to predict Y for a given value of X₁ but we don't have any idea what value X₂ will take on.

A logical way to find the value of X₂ would be to regress X₂ on X₁ and use the fitted value from this regression.

(49)

We use where this predicted value is the expected value of X2 given X1 computed from the simple regression:

Then we have in the multiple regression:

that can be written as

or as

and it can be shown that this equation is identical to the simple linear equation.







2 2 1 1 1

ˆX

X

c (X

X )

1 1 1 2 2 2

    

ˆ ˆ

Y Y b (X X ) b (X X )

 

₁ ₁



₁



_{2 1} ₁



₁

ˆY Y b (X X ) b c (X X )

 

₁



_{2 1} ₁



₁

(50)

We can demonstrate this in the context of the sales and pricing equation.

And we compute the simple regression coefficient by: 63.7 = -97.7 + 109 ( 1.48)

(51)

If we use the MR to predict the results of a price change, we need to predict what our competitor will do! DON’T just “plug-in” the mean!

Takeaways:

i. be careful about interpreting and using MR in situations in which we don’t have full control over the X’s. In these cases, you have to construct a scenario for what you expect the other X’s to be.

(52)

Glossary of Symbols

- F distribution with υ₁ df in the numerator, υ₂ df in the denominator

f - value of F statistic

- adjusted R-squared

F

_{u u}

1, 2

2

(53)

Important Equations

0 1 1 2 2 k k

Y

  

X

 

X



L

 

X

 

(

)

(

)

SSE

(

N k 1

)

k SSR 1 k N R 1 k R f ₂ 2        2 y 2 2 s s 1 ) 1 N /( SST ) 1 k N /( SSE 1

R  

(54)

Important Equations

(

)

(

)

k ,N k k 1 0

2 1 2 full 2 2 H under F ~ 1 ) k k ( N R 1 k R Δ =

f ₂ _ ₁_ ₂_

(55)

Glossary of R Commands

• pf(f value, df1=5,df2=54)_{: Returns the probability}

left of value under the F distribution with df of numerator as 5, and df of denominator as 54.

• qf(prob,df1=5,df2=54)_{: Returns the critical value of}

the probability under the F distribution with df of numerator as 5, and df of denominator as 54.

• matrix(vec,ncol=x)_{: creates an array or matrix from}

the vector, vec, with x columns.

• cbind(col1,col2)_{: creates an array or matrix by}