• No results found

CHVtheMultipleRegressionModel.pptx

N/A
N/A
Protected

Academic year: 2020

Share "CHVtheMultipleRegressionModel.pptx"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)

V. The Multiple Regression Model

a. The Multiple Regression Model b. The Data and Least Squares c. Inference and F-tests

d. Prediction

(2)

a. The Multiple Regression Model

Many problems involve more than one independent variable or factor which affects the dependent or response variable e.g.

Single and multiple factor asset pricing models (CAPM and

APT)

Demand for a product given prices of competing brands,

advertising, household attributes

(3)

a. The Multiple Regression Model

We can add the additional variables into the simple model in a linear fashion:

0 1 1 2 2 k k

Y

   

X

 

X

L

 

X

 

Conditional Mean: E[Y|X1, …, Xk] Error Term: Part of Y unrelated to the X’s

The interpretation of the regression coefficients, βj, can be extended from the simple regression case:

j k 1 j

X

X

,...,

X

|

Y

E

β

(4)

a. The Multiple Regression Model

We can plot the regression plane as a surface in three dimensions.

Consider an example of Sales of a product as predicted by Price of this product (P1) and the price of a competing product (P2).

Sales = 1000 – 100P1 + 110P2

hold P2 fixed and vary P1

(5)

b. The Data and Least Squares

The data in the multiple regression model is a set of points with values of each X variable and Y

Data:

ith value of Xj in the data

Model:

i 0 1 1,i 2 2,i k k,i i

Y

  

X

 

X

L

 

X

 

ε

i

~ N 0,σ

(

2

)

X

1i

,X

2i

,

K

,X

ji

,

K

,X

ki

,Y

i
(6)

b. The Data and Least Squares

How do we estimate the parameters?

We use the principle of Least Squares just as we did in SLR Model:

we define the fitted values

find the best fitting plane by minimizing the sum of squared

residuals

Fitted Values:

i 0 1 1,i k k,i

(7)

b. The Data and Least Squares

Residuals:

i i

ˆ

i

e

Y

Y

Least Squares:

Standard Error of the Regression:

 

N

2 i i 1

1

s

e

N k 1

chooseb

0

,b

1

,

K

,

b

k

to m

ini

m

ize

e

i2

i1 N

(8)

b. The Data and Least Squares

Let’s run a multiple regression with the Sales and Price Data.

Model: Sales

i

β

0

β

1

p1

i

β

2

p2

i

ε

i

The regression equation is Sales = 116 - 97.7 p1 + 109 p2

 

b1 b2

(9)

b. The Data and Least Squares

As in the SLR model, the residuals in the multiple regression model are purged of any relationship to the independent

variables.

( )

j

K

corr X ,e

0; j 1, ,k

Y

Y 

ˆ

Part linearly explained by the Xs Unexplained part

(10)

b. The Data and Least Squares

Let’s see this graphically:

(11)

b. The Data and Least Squares

If the regression fits the data well, there should be a strong correlation between the actual and fitted Y values:

ˆY

i

,Y

i

(

)

(12)

c. Inference and F-tests

All inference from the simple regression extends to multiple regression, including:

standard errors

confidence intervalst-tests

Standard Errors

Since each coefficient estimator is a linear combination of normal random variables, each bi (i = 0,1, ..., k) is normally distributed.

(13)

c. Inference and F-tests

The variance of bj is complicated and depends on the variation of all of the X’s.

We call the estimated standard deviations the standard errors:

j j

b b

s

 

ˆ

What are the factors which influence the size of the standard errors?

i. σ2

ii. N

iii. SI.V.- variation of Xj which is independent of other X vars

(14)

c. Inference and F-tests

Confidence Intervals

(

1

α

)

100

%

C

I.

:.

b

j

t

Nk1,α/2

s

bj

t-Tests

  

  

 

j

*

0 j j

*

j j

N k 1, / 2 b

test

H :

b

reject if t

t

(15)

c. F-tests

t-test procedures are designed to examine one coefficient at a time for statistical significance.

In many situations, we need a testing procedure that can address simultaneous hypotheses about more than one coefficient.

We will look at two important types of simultaneous or joint hypotheses:

(16)

c. F-tests

1. Overall Test of Significance

Suppose we run a regression with some 6 regressors and get a modest R2.

We may want to see if indeed there is any relationship in this data

Implicitly, we want to test the hypothesis:

0 1 2 k

(17)

c. F-tests: Country Returns Example

Surely the answer must be yes but how strong is

evidence?

data(countryret)

(18)

c. F-tests

So why don’t we just use R2? If R2 > 0, then we reject the

null hypothesis of no relationship.

Problem: R2 will always be > 0 even if there is no

relationship between Y and the X’s.

This is because least squares is content to fit “noise” in the data.

(19)

c. F-tests

To see this, let’s generate some garbage data that has nothing to do with USA returns (10 variables):

Now we will create a data frame with only the USA return and these 10 variables which are, by definition, independent of USA returns.

cbind() takes two spreadsheets and joins them (“column bind”).

(20)

c. F-tests

Look at that R2!

Look at that

R2

(21)

c. F-tests

It seems then, that an R2 of 15 per cent is pretty to close

zero in the sense as that this can be expected even if the “true” R2 is 0. As usual, we need a statistical notion of how

“close is close.”

It turns out that under the null hypothesis we can derive the distribution of a transformation of R2.

Define the F statistic for testing overall significance of the regression as follows:

f

R

2

k

1

R

2
(22)

c. F-tests

Properties of this statistic:

can take on values between 0 and infinity the larger R2 , the larger F

under the null, we expect F to be clustered around small

positive values.

What is the null distribution of f introduced on the last slide?

0 1

k N ,

k

under

H

F

~

f

(23)

c. F-tests

What kind of distribution is this?

It is a right skewed, positive valued family of distributions which is indexed by two parameters, the df in numerator and the df in the denominator. 4 3 2 1 0 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 f F -d en s

To test the null hypothesis, we compute a value of the F test using R2.

(24)

c. F-tests

We only get “excited” by large values of F, small values are consistent with the null!.

This is called a one-sided test.

Summary:

i. choose significance level α

ii. find critical value from INVCDF function…

0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0

F-de

ns α

(25)

c. F-tests

Summary continued…

iii. Compute f

(

)

(

)

SSE

(

N k 1

)

k SSR 1 k N R 1 k R f 2 2       

Two equivalent expressions for f

Let’s do it with country returns data.

(26)

c. F-tests

Step iii. Compute F Stat

(27)

c. F-tests

2. Partial F – tests

In many situations, a group of regressors are identified which we all agree are important to include in the model.

Another group of variables are somewhat questionable. (these typically have low t-stats)

Are the last group of variables worth including? We write the model as:

1 1 1 1 1 2 1 2

i 0 1 1,i k k ,i k 1 k 1,i k k k k ,i i

Y   X  K  X   X  K  X  

(28)

c. F-tests

We wish to test:

0

s

'

k

last

the

of

one

least

at

:

H

vs

.

0

=

=

...

=

:

H

2 a k + k 1 k

0 1 1 2

β

β

β

Intuitively, we could check the goodness of fit with and without the k2 additional variables.

If adding in the additional k2 variables improved the fit

(29)

c. F-tests

The problem: R2 will always increase as we add additional

variables even if these variables have zero population coefficients.

Why? Least Squares likes additional degrees of freedom to make fitted values which are closer to the observed Y values.

(30)

c. F-tests

Adjusted R2

Some have suggested computing a new quantity called adjusted R2 to take into account this problem:

2 y 2 2 s s 1 ) 1 N /( SST ) 1 k N /( SSE 1

R  

   

Note that we are looking at a ratio of variance estimates. will not necessarily increase as other regressors are added.

Also, can be < 0!

The problem with the use of is that we have no statistical R2

R2

(31)

c. F-tests

Partial F-tests

We can develop a test for inclusion of a subset of variables by using the change in R2 as we add the variables.

. R

R R

Δ 2  2fullrestricted2

Here:

is the R2 from the regression with all variables included and

is the R2 from the regression with only the first k

1 variables

included

This test is sometimes referred to as a partial F test or an

"inclusion/exclusion" test.

full 2 R restricted 2 R

f  ΔR

2 k 2

1−R full2

(32)

c. F-tests

Back to Country Returns Data

1. Run Full and Restricted Regressions

(33)

c. F-tests

Restricted regression (remove insignificant ind vars):

Compute the Critical Value for F:

(34)

c. F-tests

Compute F and P - Values:

f = ΔR

2 k

2

1−R full2

( ) N( −(k1  k2)−1) 

.566 −.547

( ) 3 1−.566

( ) 100 1.459

(35)

This is the standard error of prediction . The standard error of fit is very complicated for multiple regression

d. Prediction

There is nothing new about prediction in multiple regression that was not covered in the SLR notes.

Prediction Problem:

f j,f

i 1,i k,i

Predict Y given X

j 1,

,k

and data

Y, X ,

,X

i 1,

,N

K

K

K

We use the least squares fitted plane to provide the prediction rule. Our prediction interval is:

N k 1, / 2

*

f 0 1 1,f k k,f pred

ˆY

b

b X

b X

t

s

  

K

( 2)

fit 2

pred s stderr

(36)

d. Prediction

Back to the Pricing and Sales Example:

N k 1, /2 *

f 0 1 1,f k k,f pred

ˆY b b X

b X

t

s

  

(37)

e. Multiple Regression Explained: Pricing Puzzle

We’ve looked only at the multiple regression of Sales on p1 and p2. What if we looked at only the relationship between sales and p1.

(38)

e. Multiple Regression Explained: Pricing Puzzle

It appears that there is a positive, but weak, relationship between sales and the price of your product.

(39)

e. Multiple Regression Explained: Pricing Puzzle

You are about to sue your micro-economics instructor for fraud, when your colleague says –

“Why don’t you look at the plot of P1 vs. P2? May be there is something screwy there.”

15 10 5 0 9 8 7 6 5 4 3 2 1 0 p2 p1 15 10 5 0 9 8 7 6 5 4 3 2 1 0 p2 p1

A Strong Positive Correlation!

This means that as you increase your price, the competition

(40)

e. Multiple Regression Explained

Multiple Regression: Sales on p1 and p2 SLR: Sales on p1

(41)

e. Multiple Regression Explained

Why is there such a difference? The difference stems from confounding of effects.

Own price

(P1) Competitor's Price (P2) Sales

+ corr

+

(42)

e. Multiple Regression Explained

How does the multiple regression work? Let’s make a multiple regression using only simple regressions.

If p1 and p2 were uncorrelated, there would be no difference between the simple and multiple regression results.

Problem: How can we “purge” p1 of it’s relationship to p2?

Solution: Use residuals from regression of p1 on p2

(43)

e. Multiple Regression Explained

Proceed in two steps:

Step 1:

Regress P1 on P2 to purge P1 of relationship to P2.

(44)

e. Multiple Regression Explained

Step 2:

Regress Sales on e1.2

Same as that from multiple regression

(45)

e. Multiple Regression Explained

How Does Multiple Regression Compute the Standard Error?

Key Insight:

The independent variation of P1 enables us to estimate the pure

or partial effect of P1 on Sales.

Independent variation is that part of the variation in P1 which in

unrelated to P2.

Independent Variation Covariance

(46)

f. More on the Interpretation of MR Coefficients

To use MR intelligently, it is essential that we fully understand the sources of the differences between the SLR and MR models.

We also need to understand how to interpret the coefficients for the purpose of making business decisions.

The Difference:

MR :

Y

β

0

β

1

X

1

β

2

X

2

K +β

k

X

k

* 1

* 1 *

0

X

Y

:

SLR

β

β

ε

(47)

f. More on the Interpretation of MR Coefficients

As we saw, the differences between the SLR and MR coefficients stem from the confounding of effects in SLR.

i.e. The SLR model incorporates the influences of all of the other variables in the error term

Thus, we should interpret the SLR coefficients as the effect of X1 averaged over the values and effects of the other

variables.

SLR: Effect of X1 on Y taking into account the co-movement of X1 with the other variables

(48)

f. More on the Interpretation of MR Coefficients

To see this another way, consider a multiple regression with two variables.

Suppose we want to predict Y for a given value of X1 but we don't have any idea what value X2 will take on.

A logical way to find the value of X2 would be to regress X2 on X1 and use the fitted value from this regression.

(49)

f. More on the Interpretation of MR Coefficients

We use where this predicted value is the expected value of X2 given X1 computed from the simple regression:

Then we have in the multiple regression:

that can be written as

or as

and it can be shown that this equation is identical to the simple linear equation.

2 2 1 1 1

ˆX

X

c (X

X )

1 1 1 2 2 2

    

ˆ ˆ

Y Y b (X X ) b (X X )

 

1 1

1

2 1 1

1

ˆY Y b (X X ) b c (X X )

 

1

2 1 1

1
(50)

f. More on the Interpretation of MR Coefficients

We can demonstrate this in the context of the sales and pricing equation.

And we compute the simple regression coefficient by: 63.7 = -97.7 + 109 ( 1.48)

(51)

f. More on the Interpretation of MR Coefficients

If we use the MR to predict the results of a price change, we need to predict what our competitor will do! DON’T just “plug-in” the mean!

Takeaways:

i. be careful about interpreting and using MR in situations in which we don’t have full control over the X’s. In these cases, you have to construct a scenario for what you expect the other X’s to be.

(52)

Glossary of Symbols

- F distribution with υ1 df in the numerator, υ2 df in the denominator

f - value of F statistic

- adjusted R-squared

F

u u

1, 2

2

(53)

Important Equations

0 1 1 2 2 k k

Y

  

X

 

X

L

 

X

 

(

)

(

)

SSE

(

N k 1

)

k SSR 1 k N R 1 k R f 2 2        2 y 2 2 s s 1 ) 1 N /( SST ) 1 k N /( SSE 1

R  

(54)

Important Equations

(

)

(

)

k ,N k k 1 0

2 1 2 full 2 2 H under F ~ 1 ) k k ( N R 1 k R Δ =

f 2 1 2

(55)

Glossary of R Commands

• pf(f value, df1=5,df2=54): Returns the probability

left of value under the F distribution with df of numerator as 5, and df of denominator as 54.

• qf(prob,df1=5,df2=54): Returns the critical value of

the probability under the F distribution with df of numerator as 5, and df of denominator as 54.

• matrix(vec,ncol=x): creates an array or matrix from

the vector, vec, with x columns.

• cbind(col1,col2): creates an array or matrix by

References

Related documents

Results suggest that the probability of under-educated employment is higher among low skilled recent migrants and that the over-education risk is higher among high skilled

In this PhD thesis new organic NIR materials (both π-conjugated polymers and small molecules) based on α,β-unsubstituted meso-positioning thienyl BODIPY have been

The main optimization of antichain-based algorithms [1] for checking language inclusion of automata over finite alphabets is that product states that are subsets of already

Aptness of Candidates in the Pool to Serve as Role Models When presented with the candidate role model profiles, nine out of ten student participants found two or more in the pool

Spain’s Podemos, Greece’s Syriza, France’s Parti de Gauche and Portugal’s Left Bloc and Communist Party, together with like-minded parties across Europe have been inspired in

• Develops a category strategy and applies appropriate value levers and supporting techniques/tools as needed to meet value objectives (e.g., strategic sourcing, SRM,

Although the regulation does not specifically stipulate that the medical director must sign all monthly ITPs, AACVPR recommends that it should be the medical director’s signature

Proprietary Schools are referred to as those classified nonpublic, which sell or offer for sale mostly post- secondary instruction which leads to an occupation..