• No results found

ChIVPredictionandDiagnostics.pptx

N/A
N/A
Protected

Academic year: 2020

Share "ChIVPredictionandDiagnostics.pptx"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Chapter IV. Slide 1

IV. Prediction and Diagnostics

a. Prediction

b. Why Regression Diagnostics?

c. Residuals Plots

(2)

a. Prediction

Model:

The

conditional forecasting problem

can be succinctly stated

as:

Predict a “future” observation, y

f

Given X

f

and the sample data {X

i

, Y

i

} i = 1, …, N

The only practical solution to the prediction problem is to use

estimated parameters:

Y

i

β

0

β

1

X

i

ε

i

i 1,

K

,

N

ε

i

~iid

N

0,σ

2
(3)

Chapter IV. Slide 3

a. Prediction

If we use this predictor, we will make a prediction error:

e

f

 Y

f

− ˆ

Y

f

 Y

f

b

0

b

1

X

f

Let’s draw this:

E[Yf|Xf ] = β0 + β1 X

b0 + b1 X

X

f

Y

f

Sampling error

e

f

f

ˆY

(4)

  

f

ˆY E Y | X

f

f f

a. Prediction

Let’s write our prediction error in such a way so that we can see the influence of two factors:

i. the model error term or the inherent randomness ii. estimation error in the model parameters

Y

f

− ˆ

Y

f

 

f

 Y

f

E Y

⎡⎣

f

| X

f

⎤⎦

− ˆ

(

Y

f

E Y

⎡⎣

f

| X

f

⎤⎦

)

(

)

 

 b  b 0 1Xf    b  bf 0 1Xf   Yˆf E Y | Xf f

(

) (

)

0 0 1 1 f

f

b

β

b

β

X

ε

(5)

Chapter IV. Slide 5

a. Prediction

Now let’s compute a prediction interval for Y

f

The predictive

standard error

, denoted s

pred

, is then

s

pred

 s 1

1

N

X

f

X

(

)

2

N

1

(

)s

X2

.5

Standard Error of the Regression

Var e

(

f

 Y

f

− ˆ

Y

f

)  V ar

( )  V ar

ε

f

( )

Y

ˆ

f

σ

2

σ

2

1

N

X

f

X

(

)

2

N

1

(

)s

2X

σ

2

1

1

N

X

f

X

(

)

2

N

1

(

)s

X2
(6)

a. Prediction

Let’s return to the printout and fill-in the formula for the prediction

interval

(

)

(

)

1/2 2 f * *

0 1 f N 2, /2 2 0 1 f N 2, /2 pred

X

X

X

1

b

b X

t

s 1

b

b X

t

s

N

N 1 s

 a  a

(7)

Chapter IV. Slide 7

b. Why Regression Diagnostics?

Up to now, we have assumed that the data are generated by a

linear regression model

What are the basic assumptions of the model?

1. linear conditional mean

2. constant variance (

homoskedasticity

),

3. normal errors

So we should see:

a pattern of constant variation around a line

very few points more than 2 standard deviations away

(8)

b. Why Regression Diagnostics?

Why Should We Care

?

If the model assumptions are violated:

Prediction can be systematically biased

Standard errors and t-tests wrong

someone may be able to beat you with a different and better

model

How can we detect violations of the model?

We must use graphical methods

To drive this point home, let’s look at the “famous” Anscomb data

(9)

Chapter IV. Slide 9

b. Why Regression Diagnostics?

(10)

b. Why Regression Diagnostics?

(11)

Chapter IV. Slide 11

b. Why Regression Diagnostics?

(12)

b. Why Regression Diagnostics?

(13)

Chapter IV. Slide 13

c. Residual Diagnostic Plots

Two basic plots are very useful:

i.

Plot of

Residuals vs. Fitted Values

ii. A

Normal Probability Plot

When Model Assumptions Hold

A First Cut: plot Y against X

(works only when you have one X)

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 3 2 1 x y

This data looks great!

Linear association with constant variance.

Normal?

(14)

c. Residual Diagnostic Plots

i. Plot of Residuals vs. Fitted Values

What should this look like?

1.

Residuals should be evenly distributed around the

mean

2.

No relationship between the mean of the residual and

(15)

Chapter IV. Slide 15

c. Residual Diagnostic Plots

3 2 1 0 -1 -2 5 4 3 2 1 0 -1 -2 -3 -4 -5 X Y

A key assumption is that the regression model is a linear function.

This is not always true.

(16)

c. Residual Diagnostic Plots

3 2 1 0 -1 -2 2 1 0 -1 -2 X sr es id s

There should be no

relationship between the average value of the

(17)

Chapter IV. Slide 17

c. Residual Diagnostic Plots

A constant elasticity relationship implies a curved regression function.

(18)

c. Residual Diagnostic Plots

ii. A Normal Probability Plots

Use to test normality of residuals. Non-normal residuals cause

the following sorts of headaches:

"t-tests" and other associated statistics may no longer be t

distributed

Least squares estimates are extremely sensitive to large

ε

i
(19)

Chapter IV. Slide 19

c. Residual Diagnostic Plots

Remember that the salient characteristics of the normal

distribution are thin tails and symmetry.

How can we detect departures from normality?

1 0 -1 -2 20 10 0 n=30 Fr eq ue nc y 2 1 0 -1 -2 -3 30 20 10 0 n=100 Fr eq ue nc y

The most basic analysis would be to graph the histogram of the standardized residuals

Neither of these plots look

particularly symmetric

(20)

c. Residual Diagnostic Plots

Let’s compute a norm probablity plot using the

normPlot()

(21)

Chapter IV. Slide 21

c. Residual Diagnostic Plots

The

normal probability plot

is a plot of the sample CDF

on a coordinate system in which the normal CDF appears

as a straight line. The sample CDF will appear as a scatter

of points around the normal CDF straight line.

(22)

d. Putting It All Together- The Shock Absorber Example

Suppliers for very large manufacturing firms are facing increasing

pressure to assure their parts customers that the parts they produce meet high quality standards.

This supplier is supplying gas-filled shock absorbers.

The data are measurements on the rebound force of the shock

absorber. Measurements can be taken both before and after the shock absorber was fully assembled. It is cheaper to take

measurements of the shock absorber performance before, rather than after, assembly. See dataset shock.

(23)

Shock Absorber Example. Slide 23

Basic Model

We must formulate a statistical model to predict rebound force after assembly using the before assembly measurement.

This is a classic example of a regression model!

(

)

b  b

 

s

after 0 1 before

2

Rebound

Rebound

(24)

Descriptive Statistics

(25)

Shock Absorber Example. Slide 25

Marginal Distribution of Y

Doesn’t look normal! Three clumps!

(26)

Joint or Bivariate Distribution

Let’s do a scatter plot. Which variable should be on the Y axis?

(27)

Shock Absorber Example. Slide 27

Regression Analysis

(28)

Residual Diagnostics

Residuals are much more normal than marginal dist of Y

(29)

Shock Absorber Example. Slide 29

T-tests

Suppose before measurements were “perfect” predictors. What would this mean?

One School of Thought:

All you need is very accurate predictions

Another School of Thought (no adjustment):

b 

b 

0 1 0

A

H :

1.0 and

0

(30)

T-tests continued

Let’s test a slight modification:

Since N is relatively small, let’s test at the 10 percent significance level.

b 

0 1

A

H :

1.0

H : otherwise

Step 2: Compute t statistic

> t=(.94946-1)/.0438 > t

[1] -1.153881

Step 1: Compute t critical value

> qt(.05,df=33)

[1] -1.692360

Step 3: Compute p-value

> pt(-1.153881,df=33)*2

(31)

Shock Absorber Example. Slide 31

Prediction

(32)

Glossary of Symbols

X

f

- future value of X for forecasting

Y

f

- value of Y to be forecasted

(33)

Chapter IV. Slide 33

Important Equations

f 1 0 f f f

f

Y

Y

b

b

X

e

s

pred

 s 1

1

N

X

f

X

(

)

2

N

1

(

)s

X2
(34)

Glossary of R Commands

References

Related documents

AT76 Flight Micro Adjuster Tool Setting Micro Adjuster Screw/Removing and Replacing spindle AT77 venturi Lever Combination Tool Removal of venturi Lever/Inlet fitting.. AT72/F

 Make all people feel welcome at your library  Treat patrons the way that you would like to be. treated

Although Vietnam has one legal system and national economic policies that apply to the whole country, different provinces can often have different means of attracting FDI

• A healthy corporate culture is based on values-driven leadership and it aims to comprehensive organizational health and employee well- being enabling thus best possible

In terms of cost of communication, this sequential method is close to that resulting from the fully implicit scheme, if the total (sum on k) number of iterations of its two

• Copays apply to first six in-network office visits (shared among certain professional and alternative care benefits) with subsequent visits subject to deductible and coinsurance •

Madiedo Earth, Planets and Space 2014, 66 70 http //www earth planets space com/content/66/1/70 LETTER Open Access Robotic systems for the determination of the composition of solar

parameters on labor supply and program participation, in order to illustrate the implications of our estimates for the cumulative marginal tax rate problem in multiple programs;