1-1
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 1
Chapter 9
Re-expressing the
Data:
1-2
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 2
Straight to the Point
We cannot use a linear model unless the
relationship between the two variables is linear. Often re-expression can save the day,
straightening bent relationships so that we can fit and use a simple linear model.
Two simple ways to re-express data are with
logarithms and reciprocals.
Re-expressions can be seen in everyday life—
1-3
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 3
Straight to the Point (cont.)
The relationship between fuel efficiency (in miles
per gallon) and weight (in pounds) for late model
1-4
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 4
Straight to the Point (cont.)
1-5
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 5
Straight to the Point (cont.)
We can re-express fuel efficiency as gallons per
1-6
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 6
Straight to the Point (cont.)
A look at the residuals plot for the new model
1-7
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 7
Goals of Re-expression
Goal 1: Make the distribution of a variable (as
1-8
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 8
Goals of Re-expression (cont.)
Goal 2: Make the spread of several groups (as
1-9
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 9
Goals of Re-expression (cont.)
Goal 3: Make the form of a scatterplot more
1-10
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 10
Goals of Re-expression (cont.)
Goal 4: Make the scatter in a scatterplot spread
out evenly rather than thickening at one end.
This can be seen in the two scatterplots we
1-11
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 11
The Ladder of Powers
There is a family of simple re-expressions that
move data toward our goals in a consistent way. This collection of re-expressions is called the
Ladder of Powers.
The Ladder of Powers orders the effects that the
1-12
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 12
The Ladder of Powers
Ratios of two quantities (e.g., mph) often benefit from a reciprocal.
The reciprocal of the data
–1
An uncommon re-expression, but sometimes useful.
Reciprocal square root
–1/2
Measurements that cannot be negative often benefit from a log re-expression. We’ll use
logarithms here
“0”
Counts often benefit from a square root re-expression.
Square root of data values
½
Data with positive and negative values and no bounds are less likely to benefit from re-expression.
Raw data
1
Try with unimodal distributions that are skewed to the left.
Square of data values
2
Comment Name
1-13
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 13
Plan B: Attack of the Logarithms
When none of the data values is zero or negative,
logarithms can be a helpful ally in the search for a useful model.
Try taking the logs of both the x- and y-variable.
Then re-express the data using some
1-14
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 14
1-15
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 15Chapter 9, Slide 15
Example: Using Models pg. 238 #2
For each of the models listed below, predict y when x = 2.
a) b) c) d) e) x yˆ 1.2 0.8log
x yˆ 1.2 0.8
log
x yˆ 1.2 0.8
x yˆ 1.2 0.81 2 . 1 8 . 0
ˆ x2 x
1-16
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 16Chapter 9, Slide 16
Example: Zurich Zoo
The following data are the shoulder-hip length and the
vertical thickness of the bodies of some quadrupeds at the zoo in Zurich, Switzerland. Predict the vertical thickness of a giraffe if the shoulder-hip length is 145 cm.
Animal length (cm) Height (cm)
Ermine 12 4
Dachshund 35 12
Indian Tiger 90 45
Llama 122 73
1-17
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 17Chapter 9, Slide 17
Example: Pressure and Volume
We attempt to find how the volume of a gas
depends on the temperature and pressure of the gas. If temperature is held constant at 300 K, the following results are obtained. Predict the volume if the pressure is 325.
Pressure 200 250 300 350 400
1-18
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 18Chapter 9, Slide 18
Example: Soil Erosion
The problem of soil erosion is faced by farmers all over the world. The following data was from a study in western India. Predict the amount of erosion is the wind velocity is 24 km/hr.
Velocity 13.5 13.5 14 15 17.5 19 20 21 22 23 25 25 26 27 (km/hr)
1-19
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 19Chapter 9, Slide 19
Example: Female Heights and Weights
Consider the data on x = height (in.) and y =
average weight (lb.) for American females aged 30-39. Predict the weight of a female that is 64.5 inches tall.
X 58 59 60 61 62 63 64 65 66 Y 113 115 118 121 124 128 131 134 137
1-20
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Slide 10- 20Chapter 9, Slide 20
Example: Shoes!
Cyrus Tist was trying to determine how the pressure exerted on the floor by the heel of a shoe depends on the width of the heel and the weight of the person wearing the shoe. He started by measuring the pressure (in psi) exerted by several people wearing a shoe with a heel width of 3.5 inches. The data are summarized below. Predict the pressure exerted on the heel with a width of 3.5 inches if the person weighs 175 pounds.
1-21
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 21
Why Not Just Use a Curve?
If there’s a curve in the scatterplot, why not just fit
1-22
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 22
Why Not Just Use a Curve? (cont.)
The mathematics and calculations for “curves of
best fit” are considerably more difficult than “lines of best fit.”
Besides, straight lines are easy to understand.
We know how to think about the slope and the
1-23
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 23
What Can Go Wrong?
Don’t expect your
model to be perfect.
Don’t stray too far
from the ladder.
Don’t choose a model
1-24
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 24
What Can Go Wrong? (cont.)
Beware of multiple modes.
Re-expression cannot pull separate modes together.
Watch out for scatterplots that turn around.
Re-expression can straighten many bent
1-25
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 25
What Can Go Wrong? (cont.)
Watch out for negative data values.
It’s impossible to re-express negative values
by any power that is not a whole number on the Ladder of Powers or to re-express values that are zero for negative powers.
Watch for data far from 1.
Data values that are all very far from 1 may not
be much affected by re-expression unless the range is very large. If all the data values are large (e.g., years), consider subtracting a
1-26
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 26
What have we learned?
When the conditions for regression are not met, a
simple re-expression of the data may help.
A re-expression may make the:
Distribution of a variable more symmetric.
Spread across different groups more similar.
Form of a scatterplot straighter.
Scatter around the line in a scatterplot more
1-27
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 27
What have we learned? (cont.)
Taking logs is often a good, simple starting point.
To search further, the Ladder of Powers or the
log-log approach can help us find a good re-expression.
Our models won’t be perfect, but re-expression
1-28
Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 9, Slide 28
AP Tips
Make sure that you can make accurate
predictions using a transformed equation.
Make sure that your descriptions use the
transformed variable names, not the original variables, as appropriate.
For example, “89.6% of the variation in
log(weight)…”
Don’t get lost in the technology. Most AP