V. The Multiple Regression Model
a. The Multiple Regression Model b. The Data and Least Squares c. Inference and F-tests
d. Prediction
a. The Multiple Regression Model
Many problems involve more than one independent variable or factor which affects the dependent or response variable e.g.
– Single and multiple factor asset pricing models (CAPM and
APT)
– Demand for a product given prices of competing brands,
advertising, household attributes
a. The Multiple Regression Model
We can add the additional variables into the simple model in a linear fashion:
0 1 1 2 2 k k
Y
X
X
L
X
Conditional Mean: E[Y|X1, …, Xk] Error Term: Part of Y unrelated to the X’s
The interpretation of the regression coefficients, βj, can be extended from the simple regression case:
j k 1 jX
X
,...,
X
|
Y
E
β
a. The Multiple Regression Model
We can plot the regression plane as a surface in three dimensions.
Consider an example of Sales of a product as predicted by Price of this product (P1) and the price of a competing product (P2).
Sales = 1000 – 100P1 + 110P2
hold P2 fixed and vary P1
b. The Data and Least Squares
The data in the multiple regression model is a set of points with values of each X variable and Y
Data:
ith value of Xj in the data
Model:
i 0 1 1,i 2 2,i k k,i i
Y
X
X
L
X
ε
i~ N 0,σ
(
2)
X
1i,X
2i,
K
,X
ji,
K
,X
ki,Y
ib. The Data and Least Squares
How do we estimate the parameters?
We use the principle of Least Squares just as we did in SLR Model:
– we define the fitted values
– find the best fitting plane by minimizing the sum of squared
residuals
Fitted Values:
i 0 1 1,i k k,i
b. The Data and Least Squares
Residuals:
i i
ˆ
ie
Y
Y
Least Squares:
Standard Error of the Regression:
N
2 i i 1
1
s
e
N k 1
chooseb
0,b
1,
K
,
b
kto m
ini
m
ize
e
i2i1 N
b. The Data and Least Squares
Let’s run a multiple regression with the Sales and Price Data.
Model: Sales
i
β
0
β
1p1
i
β
2p2
i
ε
iThe regression equation is Sales = 116 - 97.7 p1 + 109 p2
b1 b2
b. The Data and Least Squares
As in the SLR model, the residuals in the multiple regression model are purged of any relationship to the independent
variables.
( )
j
K
corr X ,e
0; j 1, ,k
Y
Y
ˆ
Part linearly explained by the Xs Unexplained part
b. The Data and Least Squares
Let’s see this graphically:
b. The Data and Least Squares
If the regression fits the data well, there should be a strong correlation between the actual and fitted Y values:
ˆY
i,Y
i(
)
c. Inference and F-tests
All inference from the simple regression extends to multiple regression, including:
– standard errors
– confidence intervals – t-tests
Standard Errors
Since each coefficient estimator is a linear combination of normal random variables, each bi (i = 0,1, ..., k) is normally distributed.
c. Inference and F-tests
The variance of bj is complicated and depends on the variation of all of the X’s.
We call the estimated standard deviations the standard errors:
j j
b b
s
ˆ
What are the factors which influence the size of the standard errors?
i. σ2
ii. N
iii. SI.V.- variation of Xj which is independent of other X vars
c. Inference and F-tests
Confidence Intervals
(
1
α
)
100
%
C
I.
:.
b
j
t
Nk1,α/2s
bjt-Tests
j
*
0 j j
*
j j
N k 1, / 2 b
test
H :
b
reject if t
t
c. F-tests
t-test procedures are designed to examine one coefficient at a time for statistical significance.
In many situations, we need a testing procedure that can address simultaneous hypotheses about more than one coefficient.
We will look at two important types of simultaneous or joint hypotheses:
c. F-tests
1. Overall Test of Significance
Suppose we run a regression with some 6 regressors and get a modest R2.
We may want to see if indeed there is any relationship in this data
Implicitly, we want to test the hypothesis:
0 1 2 k
c. F-tests: Country Returns Example
Surely the answer must be yes but how strong is
evidence?
data(countryret)
c. F-tests
So why don’t we just use R2? If R2 > 0, then we reject the
null hypothesis of no relationship.
Problem: R2 will always be > 0 even if there is no
relationship between Y and the X’s.
This is because least squares is content to fit “noise” in the data.
c. F-tests
To see this, let’s generate some garbage data that has nothing to do with USA returns (10 variables):
Now we will create a data frame with only the USA return and these 10 variables which are, by definition, independent of USA returns.
cbind() takes two spreadsheets and joins them (“column bind”).
c. F-tests
Look at that R2!
Look at that
R2
c. F-tests
It seems then, that an R2 of 15 per cent is pretty to close
zero in the sense as that this can be expected even if the “true” R2 is 0. As usual, we need a statistical notion of how
“close is close.”
It turns out that under the null hypothesis we can derive the distribution of a transformation of R2.
Define the F statistic for testing overall significance of the regression as follows:
f
R
2
k
1
−
R
2c. F-tests
Properties of this statistic:
– can take on values between 0 and infinity – the larger R2 , the larger F
– under the null, we expect F to be clustered around small
positive values.
What is the null distribution of f introduced on the last slide?
0 1
k N ,
k
under
H
F
~
f
c. F-tests
What kind of distribution is this?
It is a right skewed, positive valued family of distributions which is indexed by two parameters, the df in numerator and the df in the denominator. 4 3 2 1 0 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 f F -d en s
To test the null hypothesis, we compute a value of the F test using R2.
c. F-tests
We only get “excited” by large values of F, small values are consistent with the null!.
This is called a one-sided test.
Summary:
i. choose significance level α
ii. find critical value from INVCDF function…
0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0
F-de
ns α
c. F-tests
Summary continued…
iii. Compute f
(
)
(
)
SSE(
N k 1)
k SSR 1 k N R 1 k R f 2 2
Two equivalent expressions for f
Let’s do it with country returns data.
c. F-tests
Step iii. Compute F Stat
c. F-tests
2. Partial F – tests
In many situations, a group of regressors are identified which we all agree are important to include in the model.
Another group of variables are somewhat questionable. (these typically have low t-stats)
Are the last group of variables worth including? We write the model as:
1 1 1 1 1 2 1 2
i 0 1 1,i k k ,i k 1 k 1,i k k k k ,i i
Y X K X X K X
c. F-tests
We wish to test:
0
s
'
k
last
the
of
one
least
at
:
H
vs
.
0
=
=
...
=
:
H
2 a k + k 1 k0 1 1 2
β
β
β
Intuitively, we could check the goodness of fit with and without the k2 additional variables.
If adding in the additional k2 variables improved the fit
c. F-tests
The problem: R2 will always increase as we add additional
variables even if these variables have zero population coefficients.
Why? Least Squares likes additional degrees of freedom to make fitted values which are closer to the observed Y values.
c. F-tests
Adjusted R2
Some have suggested computing a new quantity called adjusted R2 to take into account this problem:
2 y 2 2 s s 1 ) 1 N /( SST ) 1 k N /( SSE 1
R
Note that we are looking at a ratio of variance estimates. will not necessarily increase as other regressors are added.
Also, can be < 0!
The problem with the use of is that we have no statistical R2
R2
c. F-tests
Partial F-tests
We can develop a test for inclusion of a subset of variables by using the change in R2 as we add the variables.
. R
R R
Δ 2 2full restricted2
Here:
is the R2 from the regression with all variables included and
is the R2 from the regression with only the first k
1 variables
included
This test is sometimes referred to as a partial F test or an
"inclusion/exclusion" test.
full 2 R restricted 2 R
f ΔR
2 k 2
1−R full2
c. F-tests
Back to Country Returns Data
1. Run Full and Restricted Regressions
c. F-tests
Restricted regression (remove insignificant ind vars):
Compute the Critical Value for F:
c. F-tests
Compute F and P - Values:
f = ΔR
2 k
2
1−R full2
( ) N( −(k1 k2)−1)
.566 −.547
( ) 3 1−.566
( ) 100 1.459
This is the standard error of prediction . The standard error of fit is very complicated for multiple regression
d. Prediction
There is nothing new about prediction in multiple regression that was not covered in the SLR notes.
Prediction Problem:
f j,f
i 1,i k,i
Predict Y given X
j 1,
,k
and data
Y, X ,
,X
i 1,
,N
K
K
K
We use the least squares fitted plane to provide the prediction rule. Our prediction interval is:
N k 1, / 2
*
f 0 1 1,f k k,f pred
ˆY
b
b X
b X
t
s
K
( 2)
fit 2
pred s stderr
d. Prediction
Back to the Pricing and Sales Example:
N k 1, /2 *
f 0 1 1,f k k,f pred
ˆY b b X
b X
t
s
e. Multiple Regression Explained: Pricing Puzzle
We’ve looked only at the multiple regression of Sales on p1 and p2. What if we looked at only the relationship between sales and p1.
e. Multiple Regression Explained: Pricing Puzzle
It appears that there is a positive, but weak, relationship between sales and the price of your product.
e. Multiple Regression Explained: Pricing Puzzle
You are about to sue your micro-economics instructor for fraud, when your colleague says –
“Why don’t you look at the plot of P1 vs. P2? May be there is something screwy there.”
15 10 5 0 9 8 7 6 5 4 3 2 1 0 p2 p1 15 10 5 0 9 8 7 6 5 4 3 2 1 0 p2 p1
A Strong Positive Correlation!
This means that as you increase your price, the competition
e. Multiple Regression Explained
Multiple Regression: Sales on p1 and p2 SLR: Sales on p1
e. Multiple Regression Explained
Why is there such a difference? The difference stems from confounding of effects.
Own price
(P1) Competitor's Price (P2) Sales
+ corr
+
e. Multiple Regression Explained
How does the multiple regression work? Let’s make a multiple regression using only simple regressions.
If p1 and p2 were uncorrelated, there would be no difference between the simple and multiple regression results.
Problem: How can we “purge” p1 of it’s relationship to p2?
Solution: Use residuals from regression of p1 on p2
e. Multiple Regression Explained
Proceed in two steps:
Step 1:
Regress P1 on P2 to purge P1 of relationship to P2.
e. Multiple Regression Explained
Step 2:
Regress Sales on e1.2
Same as that from multiple regression
e. Multiple Regression Explained
How Does Multiple Regression Compute the Standard Error?
Key Insight:
The independent variation of P1 enables us to estimate the pure
or partial effect of P1 on Sales.
Independent variation is that part of the variation in P1 which in
unrelated to P2.
Independent Variation Covariance
f. More on the Interpretation of MR Coefficients
To use MR intelligently, it is essential that we fully understand the sources of the differences between the SLR and MR models.
We also need to understand how to interpret the coefficients for the purpose of making business decisions.
The Difference:
MR :
Y
β
0
β
1X
1
β
2X
2
K +β
kX
k+ε
* 1
* 1 *
0
X
Y
:
SLR
β
β
ε
f. More on the Interpretation of MR Coefficients
As we saw, the differences between the SLR and MR coefficients stem from the confounding of effects in SLR.
i.e. The SLR model incorporates the influences of all of the other variables in the error term
Thus, we should interpret the SLR coefficients as the effect of X1 averaged over the values and effects of the other
variables.
SLR: Effect of X1 on Y taking into account the co-movement of X1 with the other variables
f. More on the Interpretation of MR Coefficients
To see this another way, consider a multiple regression with two variables.
Suppose we want to predict Y for a given value of X1 but we don't have any idea what value X2 will take on.
A logical way to find the value of X2 would be to regress X2 on X1 and use the fitted value from this regression.
f. More on the Interpretation of MR Coefficients
We use where this predicted value is the expected value of X2 given X1 computed from the simple regression:
Then we have in the multiple regression:
that can be written as
or as
and it can be shown that this equation is identical to the simple linear equation.
2 2 1 1 1
ˆX
X
c (X
X )
1 1 1 2 2 2
ˆ ˆ
Y Y b (X X ) b (X X )
1 1
1
2 1 1
1ˆY Y b (X X ) b c (X X )
1
2 1 1
1f. More on the Interpretation of MR Coefficients
We can demonstrate this in the context of the sales and pricing equation.
And we compute the simple regression coefficient by: 63.7 = -97.7 + 109 ( 1.48)
f. More on the Interpretation of MR Coefficients
If we use the MR to predict the results of a price change, we need to predict what our competitor will do! DON’T just “plug-in” the mean!
Takeaways:
i. be careful about interpreting and using MR in situations in which we don’t have full control over the X’s. In these cases, you have to construct a scenario for what you expect the other X’s to be.
Glossary of Symbols
- F distribution with υ1 df in the numerator, υ2 df in the denominator
f - value of F statistic
- adjusted R-squared
F
u u1, 2
2
Important Equations
0 1 1 2 2 k k
Y
X
X
L
X
(
)
(
)
SSE(
N k 1)
k SSR 1 k N R 1 k R f 2 2 2 y 2 2 s s 1 ) 1 N /( SST ) 1 k N /( SSE 1
R
Important Equations
(
)
(
)
k ,N k k 1 02 1 2 full 2 2 H under F ~ 1 ) k k ( N R 1 k R Δ =
f 2 1 2
Glossary of R Commands
• pf(f value, df1=5,df2=54): Returns the probability
left of value under the F distribution with df of numerator as 5, and df of denominator as 54.
• qf(prob,df1=5,df2=54): Returns the critical value of
the probability under the F distribution with df of numerator as 5, and df of denominator as 54.
• matrix(vec,ncol=x): creates an array or matrix from
the vector, vec, with x columns.
• cbind(col1,col2): creates an array or matrix by