• No results found

stat4b_chapter08.pptx

N/A
N/A
Protected

Academic year: 2020

Share "stat4b_chapter08.pptx"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

1-1

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 1

Chapter 8

(2)

1-2

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 2Chapter 8, Slide 2

Internet Activity – Part III

(3)

1-3

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 3Chapter 8, Slide 3

(4)

1-4

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 4Chapter 8, Slide 4

Example: First Words and Gesell Score

Does the age at which a child begins to talk predict a future score on a test of mental ability? A study of the development of young children recorded

(5)

1-5

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 5Chapter 8, Slide 5

Let’s Explore….

 Using your calculator, draw a scatterplot of this

data. Look at you Ch. 8 outline.

 What is the correlation coefficient?_______

 What is the slope of the LSRL?______

 What is the y intercept of the LSRL?_______

 Is a linear model appropriate? Why or why not?

 There are two children who seem to have

(6)

1-6

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 6Chapter 8, Slide 6

More Exploring…..

 Using your calculator, for each child with unusual

(7)

1-7

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 7

Outliers, Leverage, and Influence

 Outlying points can strongly influence a

regression. Even a single point far from the body of the data can dominate the analysis.

 Any point that stands away from the others can

(8)

1-8

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 8

Outliers, Leverage, and Influence (cont.)

 The following scatterplot shows that something was awry

(9)

1-9

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 9

Outliers, Leverage, and Influence (cont.)

 The red line shows the effects that one unusual point can

(10)

1-10

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 10

Outliers, Leverage, and Influence (cont.)

 The linear model doesn’t fit points with large

residuals very well.

 Because they seem to be different from the other

(11)

1-11

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 11Chapter 8, Slide 11

Outliers and Influential Points

 An outlier is an observation that lies outside the

overall pattern of the other observations.

 An observation is influential for a statistical

calculation if removing it would markedly change the result of the calculation. Points that are

outliers in the x direction of a scatterplot are often influential for the LSRL.

 An observation is said to have leverage, if it

changes or “levers” the slope.

(12)

1-12

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 12

Outliers, Leverage, and Influence (cont.)

 A data point can also be unusual if its x-value is

(13)

1-13

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 13

Outliers, Leverage, and Influence (cont.)

 A point with high leverage has the potential to

change the regression line.

 We say that a point is influential if omitting it from

(14)

1-14

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 14

Outliers, Leverage, and Influence (cont.)

 The extraordinarily large shoe size gives the

(15)

1-15

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 15

Getting the “Bends”

 Linear regression only works for linear models.

(That sounds obvious, but when you fit a regression, you can’t take it for granted.)

 A curved relationship between two variables

might not be apparent when looking at a

scatterplot alone, but will be more obvious in a plot of the residuals.

 Remember, we want to see “nothing” in a plot

(16)

1-16

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 16

Getting the “Bends” (cont.)

 The scatterplot of residuals against Duration of emperor

penguin dives holds a surprise. The Linearity Assumption says we should not see a pattern, but instead there is a bend.

 Even though it means checking the Straight Enough Condition

(17)

1-17

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 17

Sifting Residuals for Groups

 No regression analysis is complete without a

display of the residuals to check that the linear model is reasonable.

 Residuals often reveal subtleties that were not

(18)

1-18

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 18

Sifting Residuals for Groups (cont.)

 Sometimes the subtleties we see are additional

details that help confirm or refine our understanding.

 Sometimes they reveal violations of the

regression conditions that require our attention.

 You might call the residual plot “The Judge”! It

(19)

1-19

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 19

Extrapolation: Reaching Beyond the Data

 Linear models give a predicted value for each

case in the data.

 We cannot assume that a linear relationship in

the data exists beyond the range of the data.

 The farther the new x value is from the mean in x,

the less trust we should place in the predicted value.

 Once we venture into new x territory, such a

(20)

1-20

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 20

Extrapolation (cont.)

 Extrapolations are dubious because they require

the additional—and very questionable —

assumption that nothing about the relationship

between x and y changes even at extreme values of x.

 Extrapolations can get you into deep trouble.

(21)

1-21

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 21

Extrapolation (cont.)

 Here is a timeplot of the Energy Information

Administration (EIA) predictions and actual prices of oil barrel prices. How did forecasters do?

 They seemed to have missed a sharp run-up in oil prices

(22)

1-22

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 22

Predicting the Future

 Extrapolation is always dangerous. But, when the

x-variable in the model is time, extrapolation becomes an attempt to peer into the future.

 Knowing that extrapolation is dangerous doesn’t

stop people. The temptation to see into the future is hard to resist.

 Here’s some more realistic advice: If you must

(23)

1-23

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 23

(24)

1-24

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 24

Lurking Variables and Causation

 No matter how strong the association, no matter

how large the R2 value, no matter how straight the line, there is no way to conclude from a

regression alone that one variable causes the other.

 There’s always the possibility that some third variable is driving both of the variables you have observed.

 With observational data, as opposed to data from

(25)

1-25

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 25

Lurking Variables and Causation (cont.)

 The following scatterplot shows that the average

(26)

1-26

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 26

Lurking Variables and Causation (cont.)

 This new scatterplot shows that the average life

(27)

1-27

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 27

Lurking Variables and Causation (cont.)

 Since televisions are cheaper than doctors, send TVs to countries with low life expectancies in order to

extend lifetimes. Right?

 How about considering a lurking variable? That makes more sense…

 Countries with higher standards of living have both

longer life expectancies and more doctors (and TVs!).

 If higher living standards cause changes in these

(28)

1-28

Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 8, Slide 28

AP Tips

 Be prepared to read regression output with and

without an outlier. Read carefully to see if the slope has changed. Also look for changes in R2.

 Remember that some influential points have small

References

Related documents

Although the temperature used in the reactive deposition experiments of the bimetallic materials (200ºC) was generally lower than the temperatures employed in the

A wider adoption of nanopore and nanochannel-based devices would also blur the historic distinction between sequencing and mapping; the very long sequencing reads from nanopores

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

Inversely, the new roll generations have no or very little eutectic heat generation and the natural hydrodynamic of the process induces defects such as

(4) The mean path length L ρ (ωt) can be broken down into static and dynamic components, corresponding to the optical path lengths of static tissue and blood,

F·Z.3 Long Term Hydrostatic Strength (LTHS) The estimated tensile stress/strain in tbe wall oftbe pipe in the circumferential orientation due to internal hydrostatic pressure that

It is seen that sample 4 (with the lowest layer thickness of all four analysed) has the lowest fraction of the largest grains, as well as higher values for the smallest

Composing a TOSCA Service Template for a “SugarCRM” Application using Vnomic’s Service Designer, www.vnomic.com. The SugarCRM application include