ChIVPredictionandDiagnostics.pptx

(1)

Chapter IV. Slide 1

IV. Prediction and Diagnostics

a. Prediction

b. Why Regression Diagnostics?

c. Residuals Plots

(2)

a. Prediction

Model:

The

conditional forecasting problem

can be succinctly stated

as:

–

Predict a “future” observation, y

_f

–

Given X

_f

and the sample data {X

_i

, Y

_i

} i = 1, …, N

The only practical solution to the prediction problem is to use

estimated parameters:

Y

_i



β

₀



β

₁

X

_i



ε

_i

i 1,

K

,

N

ε

_i

~iid

N

0,σ

2

(3)

Chapter IV. Slide 3

a. Prediction

If we use this predictor, we will make a prediction error:

e

f

 Y

f

− ˆ

Y

f

 Y

f

−

b

0

−

b

1

X

f

Let’s draw this:

E[Y_f|X_f] = β₀ + β₁ X

b₀ + b₁ X

X

_f

Y

_f

Sampling error

e

_f

f

ˆY

(4)









  

_f



ˆY E Y | X

_f



_f _f



a. Prediction

Let’s write our prediction error in such a way so that we can see the influence of two factors:

i. the model error term or the inherent randomness ii. estimation error in the model parameters

Y

_f

− ˆ

Y

_f

 

_f

 Y

_f

−

E Y

⎡⎣

_f

| X

_f

⎤⎦

− ˆ

(

Y

_f

−

E Y

⎡⎣

_f

| X

_f

⎤⎦

)

(

)









 

 b  b_ ₀ ₁X_f    b  b_f ₀ ₁X_f _{ } Yˆ_f E Y | X_f _f _

(

) (

)



₀ ₀ ₁ ₁ _f



f

b

β

b

β

X

ε









(5)

Chapter IV. Slide 5

a. Prediction

Now let’s compute a prediction interval for Y

_f

The predictive

standard error

, denoted s

_pred

, is then

s

_pred

 s 1

1

N



X

_f

−

X

(

)

2

N

−

1

(

)s

_X2

⎛

⎝

⎜

⎞

⎠

⎟

.5

Standard Error of the Regression

Var e

(

_f

 Y

_f

− ˆ

Y

_f

)  V ar

( )  V ar

ε

_f

( )

Y

ˆ

_f



σ

2



σ

2

1

N



X

_f

−

X

(

)

2

N

−

1

(

)s

2_X

⎛

⎝

⎜

⎞

⎠

⎟



σ

2

1

1

N



X

_f

−

X

(

)

2

N

−

1

(

)s

_X2

(6)

a. Prediction

Let’s return to the printout and fill-in the formula for the prediction

interval

(

)

(

)

1/2 2 f * *

0 1 f N 2, /2 2 0 1 f N 2, /2 pred

X

1

b

b X

t

s 1

b

b X

t

s

N

N 1 s

 a  a

































(7)

Chapter IV. Slide 7

Up to now, we have assumed that the data are generated by a

linear regression model

What are the basic assumptions of the model?

1. linear conditional mean

2. constant variance (

homoskedasticity

),

3. normal errors

So we should see:

–

a pattern of constant variation around a line

–

very few points more than 2 standard deviations away

(8)

Why Should We Care

?

If the model assumptions are violated:

–

Prediction can be systematically biased

–

Standard errors and t-tests wrong

–

someone may be able to beat you with a different and better

model

How can we detect violations of the model?

–

We must use graphical methods

To drive this point home, let’s look at the “famous” Anscomb data

(9)

Chapter IV. Slide 9

b. Why Regression Diagnostics?

(10)

(11)

Chapter IV. Slide 11

(12)

(13)

c. Residual Diagnostic Plots

Two basic plots are very useful:

i.

Plot of

Residuals vs. Fitted Values

ii. A

Normal Probability Plot

When Model Assumptions Hold

A First Cut: plot Y against X

(works only when you have one X)

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 3 2 1 x y

This data looks great!

Linear association with constant variance.

Normal?

(14)

i. Plot of Residuals vs. Fitted Values

What should this look like?

1.

Residuals should be evenly distributed around the

mean

2.

No relationship between the mean of the residual and

(15)

c. Residual Diagnostic Plots

3 2 1 0 -1 -2 5 4 3 2 1 0 -1 -2 -3 -4 -5 X Y

A key assumption is that the regression model is a linear function.

This is not always true.

(16)

3 2 1 0 -1 -2 2 1 0 -1 -2 X sr es id s

There should be no

relationship between the average value of the

(17)

A constant elasticity relationship implies a curved regression function.

(18)

c. Residual Diagnostic Plots

ii. A Normal Probability Plots

Use to test normality of residuals. Non-normal residuals cause

the following sorts of headaches:

–

"t-tests" and other associated statistics may no longer be t

distributed

–

Least squares estimates are extremely sensitive to large

ε

_i

(19)

c. Residual Diagnostic Plots

Remember that the salient characteristics of the normal

distribution are thin tails and symmetry.

How can we detect departures from normality?

1 0 -1 -2 20 10 0 n=30 Fr eq ue nc y 2 1 0 -1 -2 -3 30 20 10 0 n=100 Fr eq ue nc y

The most basic analysis would be to graph the histogram of the standardized residuals

Neither of these plots look

particularly symmetric

(20)

c. Residual Diagnostic Plots

Let’s compute a norm probablity plot using the

normPlot()

(21)

The

normal probability plot

is a plot of the sample CDF

on a coordinate system in which the normal CDF appears

as a straight line. The sample CDF will appear as a scatter

of points around the normal CDF straight line.

(22)

d. Putting It All Together- The Shock Absorber Example

Suppliers for very large manufacturing firms are facing increasing

pressure to assure their parts customers that the parts they produce meet high quality standards.

This supplier is supplying gas-filled shock absorbers.

The data are measurements on the rebound force of the shock

absorber. Measurements can be taken both before and after the shock absorber was fully assembled. It is cheaper to take

measurements of the shock absorber performance before, rather than after, assembly. See dataset shock.

(23)

Shock Absorber Example. Slide 23

Basic Model

We must formulate a statistical model to predict rebound force after assembly using the before assembly measurement.

This is a classic example of a regression model!

(

)

b  b

 



s

after 0 1 before

2

Rebound

(24)

Descriptive Statistics

(25)

Marginal Distribution of Y

Doesn’t look normal! Three clumps!

(26)

Joint or Bivariate Distribution

Let’s do a scatter plot. Which variable should be on the Y axis?

(27)

Regression Analysis

(28)

Residual Diagnostics

Residuals are much more normal than marginal dist of Y

(29)

T-tests

Suppose before measurements were “perfect” predictors. What would this mean?

One School of Thought:

All you need is very accurate predictions

Another School of Thought (no adjustment):

b 

0 1 0

A

H :

1.0 and

0

(30)

T-tests continued

Let’s test a slight modification:

Since N is relatively small, let’s test at the 10 percent significance level.

b 

0 1

A

H :

1.0

H : otherwise

Step 2: Compute t statistic

> t=(.94946-1)/.0438 > t

[1] -1.153881

Step 1: Compute t critical value

> qt(.05,df=33)

[1] -1.692360

Step 3: Compute p-value

> pt(-1.153881,df=33)*2

(31)

Prediction

(32)

Glossary of Symbols

X

_f

- future value of X for forecasting

Y

_f

- value of Y to be forecasted

(33)

Important Equations

f 1 0 f f f

f

Y

Yˆ

Y

b

X

e









s

_pred

 s 1

1

N



X

_f

−

X

(

)

2

N

−

1

(

)s

_X2

(34)

Glossary of R Commands