ChIITheSimpleLinearRegressionModel.pptx

(1)

II. The Simple Linear Regression Model

a. Prediction and the Modeling Goal

b. The Simple Linear Regression Model c. Prediction Intervals with the True Model

d. Summary of the Simple Linear Regression Model e. Three Key Characteristics of SLR Model

f. Estimation of σ2

(2)

a. Prediction and the Modeling Goal

Prediction in R:

X b b

) X ( f

Yˆ   ₀  ₁

Prediction Rule

Price = -1408.93 + 57.13 Size

We are developing prediction rules:

Put in a value of X for a Y that we have not observed, and out comes a prediction (a function or black box)

(3)

a. Prediction and the Modeling Goal

How should we quantify accuracy of forecasts?

One method is to specify a range of Y values that are likely, given an X value. This is called a prediction interval.

Prediction Interval: range of Y values that are likely given X

What do we mean by “likely?”

We have to develop a probability model, using a

(4)

b. The Simple Linear Regression Model

The power of statistical inference comes from the ability to make precise statements about the accuracy of the forecasts. In order to do this, we must postulate a

probability model.

The Simple Linear Regression Model:

Part of Y related to X Part of Y independent of X



~ N(0, )



2

Error Term

(5)

For convenience, we assume

E(

ε)

= 0

The size of ε is measured by its standard deviation

Std dev (

ε)

=

σ

The “systematic” part of the regression is given by β₀ + β₁ X We can interpret this as the “true” regression line:

E[Y|X] =

β

₀

+ β

₁

X

(6)

Think of E[Y|X] as the average price of flat-panel TVs with size X. Some flat-panel TVs could have a price bigger than the expected value, some smaller, the true line tells us what to expect on average. The error term represents the

influence of factors other than size E[Y|X]

X

Y _{E[Y|X] = β}

0 + β1 X

(7)

What distribution should we use for ε ?

Justifications for using the normal distribution:

– _{It’s the only distribution I know of!} – _{It works!}

– _{The sum of many r.v.s often has a normal looking distribution}

(Central Limit Theorem)

Now the model becomes:

Mean of ε is 0

Sometimes Y is above the

Remember we think of σ as the average size of the error term

Y



β

₀



β

₁

X 

ε

(8)

Quick Review of Normal Distribution

Properties of Normal: i. Symmetric ii. Uni-modal iii. Thin-tails increasing variance -3.4 -3.3 -3.2 -3.1 -3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4 -3 -2 -1 0 1 2 3 4

fx

(x

)

X

Probability Density Function for a Normal Distrubution

(9)

Quick Review of Normal Distribution

Remember the relationship between σ and where the normal distribution puts its mass.

Pr( - 1 < X <  + 1) .68 One Sigma

Pr( - 2 < X <  + 2) .9544 Two Sigma

Pr( - 3 < X <  + 3) .997 Three Sigma



1



2 

3

(10)

Let’s look at the role of σ in determining the dispersion of points about the true regression line.

Size

P

ric

e

1.0 1.5 2.0 2.5 3.0 3.5

5 0 1 00 1 50 2 00 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o _o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o Size P ric e

1.0 1.5 2.0 2.5 3.0 3.5

50 10 0 15 0 20 0 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o _o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o

o o o

o o o o o o o o o o o o o o o o o o o o o o o o o _o o o o o o o o

(11)

c. Prediction Intervals with the True Model

You are told (without looking at the data) that β₀ = -1400; β₁ = 55; σ = 100

and you are asked to predict the price of a TV with X=58. What do you know about Y from the model?

Y = -1400 + 55(58) + ε or

Y = 1790 + ε

but

so

Part of Y we know from X=58 Part of Y unrelated to X, we are unsure about this part

ε

~ N 0,100

(

2

)

(12)

c. Prediction Intervals with the True Model

The model says that the mean value of a TV of size 58 is $1790 and the deviation from that mean is in the range ±$200 or so (2 standard deviations)

– _{We are 95% sure that} _-200<ε_{< 200}

– _{We are 95% sure that} _{1790 – 200 < Y < 1790 +200}

In general, given an X value and the true model parameters:

95% Prediction Interval: β₀ + β₁ X ± 2σ

We don’t know the true parameters, these are estimated using least squares. By using estimates, we have introduced another source of uncertainty. Two sources of Prediction Uncertainty:

i. ε (error)

(13)

Assume that all observations are drawn from the regression model discussed in section c and that errors on those observations are independent.

The SLR has 3 basic parameters.

d. Summary of Simple Linear Regression

Y

_i



β

₀



β

₁

X

_i



ε

_i

ε

_i

~ iid N(0,σ

2

)

MODEL

β₀ , β₁ and σ

Independent and Identically distributed

(14)

e. Key Characteristics of Linear Regression Model

Three Key Characteristics:

1. Mean of Y is linear in X

2. Error terms (deviations from line) are normally distributed (few deviations > 3 sd away from line)

(15)

f. Estimation of σ2

Recall that,

and that σ drives the width of the prediction intervals

One sensible strategy would be to estimate this population average of squared errors with the sample average squared residuals

σ2 = Var(ε_i) = E (ε⎡⎣ _i − E ε⎡⎣ ⎤⎦_i )2 ⎤⎦= E ε⎡⎣ ⎤⎦_i2

ˆ

σ

2



1

N



i 2

i1 N

∑

(16)

However, this is not an unbiased estimator of σ2. We have to alter the denominator slightly:

Usually we want to use s (not s2). Why? s is in the same

units as Y.

s

2



1

N

−

2



i

2 i1

N

∑



N

SSE

−

2

# of degrees of freedom are reduced by 2 because 2 have been “used up” in the

estimation of b₀ and b₁

s



SSE

(17)

Where is s in the R output?

s

Remember whenever you see “standard error” read it as

(18)

f. R2 and s

Why is s (standard error of the regression) so important? It determines the width of prediction intervals!

As a rough approximation, a 95 percent Prediction Interval is the fitted value +/- 2 s!

Does a “high” value of R2 mean good prediction? Not necessarily!

(19)

f. R2 and s

R2=.83. s=$14,000.

P.I. width is +/- $28,000.

Average Price = $100,000

(20)

g. Conditional Distributions vs. Marginal Distributions

Regression models are really all about modeling the conditional distribution of Y given X.

Why are conditional distributions important? We want to develop models for forecasting. What we are doing is

exploiting the information in the conditional distribution of Y given X.

The conditional distribution is obtained by “slicing” the point cloud in the scattergram to obtain the distribution of Y

(21)

Let’s slice up the scattergram in the TV data (but let’s add more data)

Now let’s plot the conditional distributions for each of the slices

Cond Dist of Price given 60 < size < 65

Regression line Marg

Dist of Price

(22)

Key Observations from these plots:

– _{Conditional distributions answer the forecasting problem: if I}

know that a flat-panel TV has Size between 45 and 50 inches , then the conditional distribution (depicted in the second boxplot) gives me a point forecast (the mean) and any prediction interval I want.

– _{The conditional means (medians) seem to line up along the}

regression line

– _{The conditional distributions have much smaller dispersion}

(23)

This suggests two general points:

– _{If X has no forecasting power, then}

the marginal and conditionals will be the same.

– _{If X has some forecasting information or power, then}

conditional means will be different than the marginal or overall mean

and

(24)

Let’s develop a stronger intuition by looking at an example where X has no predictive power.

Flat-panel TV price (Y) vs. the number of HDMI connectors (X).

(25)

The conditional distribution of Y given X (Y|X)

This equation should be read as "the conditional distribution of Y given X is a normal distribution with mean β₀ + β₁ X and standard deviation σ_Y|X " Note: to assume that Y is normal conditional on X does not mean that X has to be normally distributed!

In general,

σ_Y > σ_Y|X if X and Y are related. or

Marginal Variance of Y > Conditional Variance of Y | X

if X is worth knowing!

(26)

To see that this is equivalent to the earlier expression involving the error term, ε, let's compute the conditional mean and variance.

If

since E[ε] = 0,

so

Y_i  β₀  β₁X_i  ε_i

E Y X⎡⎣  x⎤⎦ β₀  β₁x

V ar Y X  x₍ _{)  V ar}_{( ) =σ}Ú 2

(27)

Let’s compute the marginal variance of Y and contrast this to the conditional variance of Y | X!

Var Y

( )  V ar

_i

(

β

₀



β

₁

X

_i



ε

_i

)  V ar

(

β

₁

X

_i



ε

_i

)



β

₁2

V ar X

( )  V ar

_i

( ) 

ε

_i

β

₁2

V ar X

( ) 

_i

σ

_ε2

Var Y

( ) 

_i

β

₁2

V ar X

( ) 

_i

σ

_ε2

>

σ

_ε2



σ

2_Y|X

(28)

h. Another Example of Conditional Distributions

A frequently used application of regression models is the

Market Model or its close cousin, the CAPM

These models are often used as benchmarks against which to gauge the performance of portfolio managers.

The models use a time series of observed portfolio returns with T observations over T time periods. Returns are the only relevant summary of performance!

Where return is defined as:

R

t



Δ

P

_t

D

_t

(29)

Let's examine some data on mutual fund returns.

First compute means and standard deviations for all the data

Notice that:

– _{Some funds have returns less than the t-bill rate}

(30)

Typically, the portfolios are plotted in µ, σ space

If two portfolios have the same variance, we would prefer the one with the highest return. e.g Windsor is preferable to ValMrkt

0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.011 0.010 0.009 0.008 0.007 0.006 0.005 0.004 Std Dev M ea n tbill valmkt eqmkt windsor scudinc putinc keys fid

dref _{Dominated by}

Putnam Income Fund

(31)

A common benchmark for performance is the market index.

Let’s look at the relationship between returns on the market index and Windsor.

– _{Does the Windsor outperform the market?}

– _{Does the Windsor do better in up markets than down}

markets?

These are questions about the distribution of Windsor given the Market or

If ε_t is normal, then this is a standard regression model. The Market Model assumes that R is normal

(32)

Let’s look at a scattergram of the data:

Is a linear regression model

appropriate?

(33)

(34)

How do we relate the regression model to performance

evaluation? We want to measure performance relative to the market. Does the slope work as a performance measure? No!

What about the intercept? This is sometimes called Jensen’s alpha.

The intercept in the above regression can be interpreted as a measure of the risk-adjusted excess return for the

(35)

Glossary of Symbols

Regression Model parameters

β₀ - true line intercept β₁ - true slope

σ or σ_ε - error standard deviation

(36)

Important Equations

i 0 1i i

Y









X



2

i

~

i

d

N

(

0

, )





(

)



  



2

0 1 Y|X

Y | X

x ~ N

x,

Two Versions of Simple Linear Regression Model:

top as regression equation

(37)

Glossary of R commands

• descStat(A): _{produce descriptive statistics for all the}