• No results found

ChIITheSimpleLinearRegressionModel.pptx

N/A
N/A
Protected

Academic year: 2020

Share "ChIITheSimpleLinearRegressionModel.pptx"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

II. The Simple Linear Regression Model

a. Prediction and the Modeling Goal

b. The Simple Linear Regression Model c. Prediction Intervals with the True Model

d. Summary of the Simple Linear Regression Model e. Three Key Characteristics of SLR Model

f. Estimation of σ2

(2)

a. Prediction and the Modeling Goal

Prediction in R:

X b b

) X ( f

Yˆ   01

Prediction Rule

Price = -1408.93 + 57.13 Size

We are developing prediction rules:

Put in a value of X for a Y that we have not observed, and out comes a prediction (a function or black box)

(3)

a. Prediction and the Modeling Goal

How should we quantify accuracy of forecasts?

One method is to specify a range of Y values that are likely, given an X value. This is called a prediction interval.

Prediction Interval: range of Y values that are likely given X

What do we mean by “likely?”

We have to develop a probability model, using a

(4)

b. The Simple Linear Regression Model

The power of statistical inference comes from the ability to make precise statements about the accuracy of the forecasts. In order to do this, we must postulate a

probability model.

The Simple Linear Regression Model:

Part of Y related to X Part of Y independent of X

~ N(0, )

2

Error Term

(5)

b. The Simple Linear Regression Model

For convenience, we assume

E(

ε)

= 0

The size of ε is measured by its standard deviation

Std dev (

ε)

=

σ

The “systematic” part of the regression is given by β0 + β1 X We can interpret this as the “true” regression line:

E[Y|X] =

β

0

+ β

1

X

(6)

b. The Simple Linear Regression Model

Think of E[Y|X] as the average price of flat-panel TVs with size X. Some flat-panel TVs could have a price bigger than the expected value, some smaller, the true line tells us what to expect on average. The error term represents the

influence of factors other than size E[Y|X]

X

Y E[Y|X] = β

0 + β1 X

(7)

b. The Simple Linear Regression Model

What distribution should we use for ε ?

Justifications for using the normal distribution:

It’s the only distribution I know of!It works!

The sum of many r.v.s often has a normal looking distribution

(Central Limit Theorem)

Now the model becomes:

Mean of ε is 0

Sometimes Y is above the

Remember we think of σ as the average size of the error term

Y

β

0

β

1

X 

ε

(8)

Quick Review of Normal Distribution

Properties of Normal: i. Symmetric ii. Uni-modal iii. Thin-tails increasing variance -3.4 -3.3 -3.2 -3.1 -3 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-4 -3 -2 -1 0 1 2 3 4

fx

(x

)

X

Probability Density Function for a Normal Distrubution

(9)

Quick Review of Normal Distribution

Remember the relationship between σ and where the normal distribution puts its mass.

Pr( - 1 < X <  + 1) .68 One Sigma

Pr( - 2 < X <  + 2) .9544 Two Sigma

Pr( - 3 < X <  + 3) .997 Three Sigma

1

2 

3

(10)

b. The Simple Linear Regression Model

Let’s look at the role of σ in determining the dispersion of points about the true regression line.

Size

P

ric

e

1.0 1.5 2.0 2.5 3.0 3.5

5 0 1 00 1 50 2 00 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o o o o o o o o o o o Size P ric e

1.0 1.5 2.0 2.5 3.0 3.5

50 10 0 15 0 20 0 o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o

o o o

o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o

(11)

c. Prediction Intervals with the True Model

You are told (without looking at the data) that β0 = -1400; β1 = 55; σ = 100

and you are asked to predict the price of a TV with X=58. What do you know about Y from the model?

Y = -1400 + 55(58) + ε or

Y = 1790 + ε

but

so

Part of Y we know from X=58 Part of Y unrelated to X, we are unsure about this part

ε

~ N 0,100

(

2

)

(12)

c. Prediction Intervals with the True Model

The model says that the mean value of a TV of size 58 is $1790 and the deviation from that mean is in the range ±$200 or so (2 standard deviations)

We are 95% sure that -200< ε < 200

We are 95% sure that 1790 – 200 < Y < 1790 +200

In general, given an X value and the true model parameters:

95% Prediction Interval: β0 + β1 X ± 2σ

We don’t know the true parameters, these are estimated using least squares. By using estimates, we have introduced another source of uncertainty. Two sources of Prediction Uncertainty:

i. ε (error)

(13)

Assume that all observations are drawn from the regression model discussed in section c and that errors on those observations are independent.

The SLR has 3 basic parameters.

d. Summary of Simple Linear Regression

Y

i

β

0

β

1

X

i

ε

i

ε

i

~ iid N(0,σ

2

)

MODEL

β0 , β1 and σ

Independent and Identically distributed

(14)

e. Key Characteristics of Linear Regression Model

Three Key Characteristics:

1. Mean of Y is linear in X

2. Error terms (deviations from line) are normally distributed (few deviations > 3 sd away from line)

(15)

f. Estimation of σ2

Recall that,

and that σ drives the width of the prediction intervals

One sensible strategy would be to estimate this population average of squared errors with the sample average squared residuals

σ2 = Var(εi) = E (ε⎡⎣ i − E ε⎡⎣ ⎤⎦i )2 ⎤⎦= E ε⎡⎣ ⎤⎦i2

ˆ

σ

2

1

N

i 2

i1 N

(16)

f. Estimation of σ2

However, this is not an unbiased estimator of σ2. We have to alter the denominator slightly:

Usually we want to use s (not s2). Why? s is in the same

units as Y.

s

2

1

N

2

i

2 i1

N

N

SSE

2

# of degrees of freedom are reduced by 2 because 2 have been “used up” in the

estimation of b0 and b1

s

SSE

(17)

f. Estimation of σ2

Where is s in the R output?

s

Remember whenever you see “standard error” read it as

(18)

f. R2 and s

Why is s (standard error of the regression) so important? It determines the width of prediction intervals!

As a rough approximation, a 95 percent Prediction Interval is the fitted value +/- 2 s!

Does a “high” value of R2 mean good prediction? Not necessarily!

(19)

f. R2 and s

R2=.83. s=$14,000.

P.I. width is +/- $28,000.

Average Price = $100,000

(20)

g. Conditional Distributions vs. Marginal Distributions

Regression models are really all about modeling the conditional distribution of Y given X.

Why are conditional distributions important? We want to develop models for forecasting. What we are doing is

exploiting the information in the conditional distribution of Y given X.

The conditional distribution is obtained by “slicing” the point cloud in the scattergram to obtain the distribution of Y

(21)

g. Conditional Distributions vs. Marginal Distributions

Let’s slice up the scattergram in the TV data (but let’s add more data)

Now let’s plot the conditional distributions for each of the slices

Cond Dist of Price given 60 < size < 65

Regression line Marg

Dist of Price

(22)

g. Conditional Distributions vs. Marginal Distributions

Key Observations from these plots:

Conditional distributions answer the forecasting problem: if I

know that a flat-panel TV has Size between 45 and 50 inches , then the conditional distribution (depicted in the second boxplot) gives me a point forecast (the mean) and any prediction interval I want.

The conditional means (medians) seem to line up along the

regression line

The conditional distributions have much smaller dispersion

(23)

g. Conditional Distributions vs. Marginal Distributions

This suggests two general points:

If X has no forecasting power, then

the marginal and conditionals will be the same.

If X has some forecasting information or power, then

conditional means will be different than the marginal or overall mean

and

(24)

g. Conditional Distributions vs. Marginal Distributions

Let’s develop a stronger intuition by looking at an example where X has no predictive power.

Flat-panel TV price (Y) vs. the number of HDMI connectors (X).

(25)

g. Conditional Distributions vs. Marginal Distributions

The conditional distribution of Y given X (Y|X)

This equation should be read as "the conditional distribution of Y given X is a normal distribution with mean β0 + β1 X and standard deviation σY|X " Note: to assume that Y is normal conditional on X does not mean that X has to be normally distributed!

In general,

σY > σY|X if X and Y are related. or

Marginal Variance of Y > Conditional Variance of Y | X

if X is worth knowing!

(26)

g. Conditional Distributions vs. Marginal Distributions

To see that this is equivalent to the earlier expression involving the error term, ε, let's compute the conditional mean and variance.

If

since E[ε] = 0,

so

Yi  β0  β1Xi  εi

E Y X⎡⎣  x⎤⎦ β0  β1x

V ar Y X  x( )  V ar( ) =σÚ 2

(27)

g. Conditional Distributions vs. Marginal Distributions

Let’s compute the marginal variance of Y and contrast this to the conditional variance of Y | X!

Var Y

( )  V ar

i

(

β

0

β

1

X

i

ε

i

)  V ar

(

β

1

X

i

ε

i

)

β

12

V ar X

( )  V ar

i

( ) 

ε

i

β

12

V ar X

( ) 

i

σ

ε2

Var Y

( ) 

i

β

12

V ar X

( ) 

i

σ

ε2

>

σ

ε2

σ

2Y|X
(28)

h. Another Example of Conditional Distributions

A frequently used application of regression models is the

Market Model or its close cousin, the CAPM

These models are often used as benchmarks against which to gauge the performance of portfolio managers.

The models use a time series of observed portfolio returns with T observations over T time periods. Returns are the only relevant summary of performance!

Where return is defined as:

R

t

Δ

P

t

D

t
(29)

h. Another Example of Conditional Distributions

Let's examine some data on mutual fund returns.

First compute means and standard deviations for all the data

Notice that:

Some funds have returns less than the t-bill rate

(30)

h. Another Example of Conditional Distributions

Typically, the portfolios are plotted in µ, σ space

If two portfolios have the same variance, we would prefer the one with the highest return. e.g Windsor is preferable to ValMrkt

0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.011 0.010 0.009 0.008 0.007 0.006 0.005 0.004 Std Dev M ea n tbill valmkt eqmkt windsor scudinc putinc keys fid

dref Dominated by

Putnam Income Fund

(31)

h. Another Example of Conditional Distributions

A common benchmark for performance is the market index.

Let’s look at the relationship between returns on the market index and Windsor.

Does the Windsor outperform the market?

Does the Windsor do better in up markets than down

markets?

These are questions about the distribution of Windsor given the Market or

If εt is normal, then this is a standard regression model. The Market Model assumes that R is normal

(32)

h. Another Example of Conditional Distributions

Let’s look at a scattergram of the data:

Is a linear regression model

appropriate?

(33)

h. Another Example of Conditional Distributions

(34)

h. Another Example of Conditional Distributions

How do we relate the regression model to performance

evaluation? We want to measure performance relative to the market. Does the slope work as a performance measure? No!

What about the intercept? This is sometimes called Jensen’s alpha.

The intercept in the above regression can be interpreted as a measure of the risk-adjusted excess return for the

(35)

Glossary of Symbols

Regression Model parameters

β0 - true line intercept β1 - true slope

σ or σε - error standard deviation

(36)

Important Equations

i 0 1i i

Y



X

2

i

~

i

i

d

N

(

0

, )

(

)

  

2

0 1 Y|X

Y | X

x ~ N

x,

Two Versions of Simple Linear Regression Model:

top as regression equation

(37)

Glossary of R commands

• descStat(A): produce descriptive statistics for all the

References

Related documents

Tabel 4 menunjukkan bahwa mayoritas jumlah peserta menurut tingkat penerapan prinsip pengembangan masyarakat dalam program pemberdayaan ekonomi lokal yang dilakukan PT

Following the discussion of how economic integration affects macroeconomic stability, a panel of central bankers discussed how mone- tary policy should respond to the macroeco-

innovation in payment systems, in particular the infrastructure used to operate payment systems, in the interests of service-users 3.. to ensure that payment systems

or appliances or processes. • In general the heat generation due to internal heat sources may remain fairly constant , and since the heat transfer from the variable

At the same time, Maude’s response to the body’s treatment in Beckett’s works could be characterized as more “affirmative” than earlier readings, insofar as she does not see

In particular, public schools that exhibit higher teacher retirement ratios in 2008/09 (after the early retirement law was in force) also display declines in student

While the Saudi state has allowed women to enable themselves through education and gain some status, there has always been tension between the state and religious

If a player is having a difficult game and is having to make larger level bets, such as Level 6, 7 or 8 bets, he may want to call a game completed upon winning two consecutive bets