II. The Simple Linear Regression Model
a. Prediction and the Modeling Goal
b. The Simple Linear Regression Model c. Prediction Intervals with the True Model
d. Summary of the Simple Linear Regression Model e. Three Key Characteristics of SLR Model
f. Estimation of σ2
a. Prediction and the Modeling Goal
Prediction in R:
X b b
) X ( f
Yˆ 0 1
Prediction Rule
Price = -1408.93 + 57.13 Size
We are developing prediction rules:
Put in a value of X for a Y that we have not observed, and out comes a prediction (a function or black box)
a. Prediction and the Modeling Goal
How should we quantify accuracy of forecasts?
One method is to specify a range of Y values that are likely, given an X value. This is called a prediction interval.
Prediction Interval: range of Y values that are likely given X
What do we mean by “likely?”
We have to develop a probability model, using a
b. The Simple Linear Regression Model
The power of statistical inference comes from the ability to make precise statements about the accuracy of the forecasts. In order to do this, we must postulate a
probability model.
The Simple Linear Regression Model:
Part of Y related to X Part of Y independent of X
~ N(0, )
2Error Term
b. The Simple Linear Regression Model
For convenience, we assume
E(
ε)
= 0
The size of ε is measured by its standard deviation
Std dev (
ε)
=
σ
The “systematic” part of the regression is given by β0 + β1 X We can interpret this as the “true” regression line:
E[Y|X] =
β
0+ β
1X
b. The Simple Linear Regression Model
Think of E[Y|X] as the average price of flat-panel TVs with size X. Some flat-panel TVs could have a price bigger than the expected value, some smaller, the true line tells us what to expect on average. The error term represents the
influence of factors other than size E[Y|X]
X
Y E[Y|X] = β
0 + β1 X
b. The Simple Linear Regression Model
What distribution should we use for ε ?
Justifications for using the normal distribution:
– It’s the only distribution I know of! – It works!
– The sum of many r.v.s often has a normal looking distribution
(Central Limit Theorem)
Now the model becomes:
Mean of ε is 0
Sometimes Y is above the
Remember we think of σ as the average size of the error term
Y
β
0
β
1X
ε
Quick Review of Normal Distribution
Properties of Normal: i. Symmetric ii. Uni-modal iii. Thin-tails increasing variance -3.4 -3.3 -3.2 -3.1 -3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-4 -3 -2 -1 0 1 2 3 4
fx
(x
)
X
Probability Density Function for a Normal Distrubution
Quick Review of Normal Distribution
Remember the relationship between σ and where the normal distribution puts its mass.
Pr( - 1 < X < + 1) .68 One Sigma
Pr( - 2 < X < + 2) .9544 Two Sigma
Pr( - 3 < X < + 3) .997 Three Sigma
1
2
3
b. The Simple Linear Regression Model
Let’s look at the role of σ in determining the dispersion of points about the true regression line.
Size
P
ric
e
1.0 1.5 2.0 2.5 3.0 3.5
5 0 1 00 1 50 2 00 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o Size P ric e
1.0 1.5 2.0 2.5 3.0 3.5
50 10 0 15 0 20 0 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o
o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
c. Prediction Intervals with the True Model
You are told (without looking at the data) that β0 = -1400; β1 = 55; σ = 100
and you are asked to predict the price of a TV with X=58. What do you know about Y from the model?
Y = -1400 + 55(58) + ε or
Y = 1790 + ε
but
so
Part of Y we know from X=58 Part of Y unrelated to X, we are unsure about this part
ε
~ N 0,100
(
2)
c. Prediction Intervals with the True Model
The model says that the mean value of a TV of size 58 is $1790 and the deviation from that mean is in the range ±$200 or so (2 standard deviations)
– We are 95% sure that -200< ε < 200
– We are 95% sure that 1790 – 200 < Y < 1790 +200
In general, given an X value and the true model parameters:
95% Prediction Interval: β0 + β1 X ± 2σ
We don’t know the true parameters, these are estimated using least squares. By using estimates, we have introduced another source of uncertainty. Two sources of Prediction Uncertainty:
i. ε (error)
Assume that all observations are drawn from the regression model discussed in section c and that errors on those observations are independent.
The SLR has 3 basic parameters.
d. Summary of Simple Linear Regression
Y
i
β
0
β
1X
i
ε
iε
i~ iid N(0,σ
2)
MODELβ0 , β1 and σ
Independent and Identically distributed
e. Key Characteristics of Linear Regression Model
Three Key Characteristics:
1. Mean of Y is linear in X
2. Error terms (deviations from line) are normally distributed (few deviations > 3 sd away from line)
f. Estimation of σ2
Recall that,
and that σ drives the width of the prediction intervals
One sensible strategy would be to estimate this population average of squared errors with the sample average squared residuals
σ2 = Var(εi) = E (ε⎡⎣ i − E ε⎡⎣ ⎤⎦i )2 ⎤⎦= E ε⎡⎣ ⎤⎦i2
ˆ
σ
2
1
N
i 2i1 N
∑
f. Estimation of σ2
However, this is not an unbiased estimator of σ2. We have to alter the denominator slightly:
Usually we want to use s (not s2). Why? s is in the same
units as Y.
s
2
1
N
−
2
i2 i1
N
∑
N
SSE
−
2
# of degrees of freedom are reduced by 2 because 2 have been “used up” in the
estimation of b0 and b1
s
SSE
f. Estimation of σ2
Where is s in the R output?
s
Remember whenever you see “standard error” read it as
f. R2 and s
Why is s (standard error of the regression) so important? It determines the width of prediction intervals!
As a rough approximation, a 95 percent Prediction Interval is the fitted value +/- 2 s!
Does a “high” value of R2 mean good prediction? Not necessarily!
f. R2 and s
R2=.83. s=$14,000.
P.I. width is +/- $28,000.
Average Price = $100,000
g. Conditional Distributions vs. Marginal Distributions
Regression models are really all about modeling the conditional distribution of Y given X.
Why are conditional distributions important? We want to develop models for forecasting. What we are doing is
exploiting the information in the conditional distribution of Y given X.
The conditional distribution is obtained by “slicing” the point cloud in the scattergram to obtain the distribution of Y
g. Conditional Distributions vs. Marginal Distributions
Let’s slice up the scattergram in the TV data (but let’s add more data)
Now let’s plot the conditional distributions for each of the slices
Cond Dist of Price given 60 < size < 65
Regression line Marg
Dist of Price
g. Conditional Distributions vs. Marginal Distributions
Key Observations from these plots:
– Conditional distributions answer the forecasting problem: if I
know that a flat-panel TV has Size between 45 and 50 inches , then the conditional distribution (depicted in the second boxplot) gives me a point forecast (the mean) and any prediction interval I want.
– The conditional means (medians) seem to line up along the
regression line
– The conditional distributions have much smaller dispersion
g. Conditional Distributions vs. Marginal Distributions
This suggests two general points:
– If X has no forecasting power, then
the marginal and conditionals will be the same.
– If X has some forecasting information or power, then
conditional means will be different than the marginal or overall mean
and
g. Conditional Distributions vs. Marginal Distributions
Let’s develop a stronger intuition by looking at an example where X has no predictive power.
Flat-panel TV price (Y) vs. the number of HDMI connectors (X).
g. Conditional Distributions vs. Marginal Distributions
The conditional distribution of Y given X (Y|X)
This equation should be read as "the conditional distribution of Y given X is a normal distribution with mean β0 + β1 X and standard deviation σY|X " Note: to assume that Y is normal conditional on X does not mean that X has to be normally distributed!
In general,
σY > σY|X if X and Y are related. or
Marginal Variance of Y > Conditional Variance of Y | X
if X is worth knowing!
g. Conditional Distributions vs. Marginal Distributions
To see that this is equivalent to the earlier expression involving the error term, ε, let's compute the conditional mean and variance.
If
since E[ε] = 0,
so
Yi β0 β1Xi εi
E Y X⎡⎣ x⎤⎦ β0 β1x
V ar Y X x( ) V ar( ) =σÚ 2
g. Conditional Distributions vs. Marginal Distributions
Let’s compute the marginal variance of Y and contrast this to the conditional variance of Y | X!
Var Y
( ) V ar
i(
β
0
β
1X
i
ε
i) V ar
(
β
1X
i
ε
i)
β
12V ar X
( ) V ar
i( )
ε
iβ
12V ar X
( )
iσ
ε2Var Y
( )
iβ
12V ar X
( )
iσ
ε2>
σ
ε2
σ
2Y|Xh. Another Example of Conditional Distributions
A frequently used application of regression models is the
Market Model or its close cousin, the CAPM
These models are often used as benchmarks against which to gauge the performance of portfolio managers.
The models use a time series of observed portfolio returns with T observations over T time periods. Returns are the only relevant summary of performance!
Where return is defined as:
R
t
Δ
P
tD
th. Another Example of Conditional Distributions
Let's examine some data on mutual fund returns.
First compute means and standard deviations for all the data
Notice that:
– Some funds have returns less than the t-bill rate
h. Another Example of Conditional Distributions
Typically, the portfolios are plotted in µ, σ space
If two portfolios have the same variance, we would prefer the one with the highest return. e.g Windsor is preferable to ValMrkt
0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.011 0.010 0.009 0.008 0.007 0.006 0.005 0.004 Std Dev M ea n tbill valmkt eqmkt windsor scudinc putinc keys fid
dref Dominated by
Putnam Income Fund
h. Another Example of Conditional Distributions
A common benchmark for performance is the market index.
Let’s look at the relationship between returns on the market index and Windsor.
– Does the Windsor outperform the market?
– Does the Windsor do better in up markets than down
markets?
These are questions about the distribution of Windsor given the Market or
If εt is normal, then this is a standard regression model. The Market Model assumes that R is normal
h. Another Example of Conditional Distributions
Let’s look at a scattergram of the data:
Is a linear regression model
appropriate?
h. Another Example of Conditional Distributions
h. Another Example of Conditional Distributions
How do we relate the regression model to performance
evaluation? We want to measure performance relative to the market. Does the slope work as a performance measure? No!
What about the intercept? This is sometimes called Jensen’s alpha.
The intercept in the above regression can be interpreted as a measure of the risk-adjusted excess return for the
Glossary of Symbols
Regression Model parameters
β0 - true line intercept β1 - true slope
σ or σε - error standard deviation
Important Equations
i 0 1i i
Y
X
2
i
~
i
i
d
N
(
0
, )
(
)
20 1 Y|X
Y | X
x ~ N
x,
Two Versions of Simple Linear Regression Model:
top as regression equation
Glossary of R commands
• descStat(A): produce descriptive statistics for all the