Example: Boats and Manatees

(1)

Slide 1

Example: Boats and Manatees

Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant linear correlation between the number of registered boats and the number of manatees killed by boats. Use method 2.

Using the same procedure previously illustrated, we find that r = 0.922.

Method 2: Referring to Table A-6, we conclude that there is a

significant linear correlation between number of registered boats and number of manatee deaths from boats.

Figure 9-6

(2)

Slide 2

Use method 1

1 – 0.922 ² 10 – 2

0.922 t = ^{= 6.735}

1 –

r

²

n

^{– 2}

t = r

(3)

Slide 3

Using either of the two methods, we find Method 1: 6.735 > 2.306.

Method 2: 0.922 > 0.632;

That is, the test statistic falls in the critical region.

Conclusion

^:

We therefore reject the null hypothesis. There is sufficient evidence at significance level 0.05 to support the claim of a linear correlation between the number of registered

boats and the number of manatee deaths from boats.

(4)

Slide 4

FIGURE 9-4 Testing for a Linear

Correlation

(5)

Slide 5

Interpreting r:

Explained Variation

The value of r

²

is the proportion of the variation in y that is explained by the linear relationship between x and y.

Manatee example:

With r = 0.922, we get r² = 0.850.

We conclude that 0.850 (or about 85%) of the variation in manatee deaths can be explained by the linear

relationship between the number of boat registrations and the number of manatee deaths from boats. This implies that 15% of the variation of manatee deaths cannot be explained by the number of boat registrations.

(6)

Slide 6

9.3 Regression

Regression Equation

Variables:

X =independent variable, predictor variable or explanatory variable

Y= dependent variable or response variable.

A straight line (linear relationship):

Y = a X + b,

where a is the y-intercept and b is the slope.

(7)

Slide 7

Assumptions

For each x- value,

• Normality: Y is a random variable having a normal (bell-shaped) distribution.

• Homogeneity: All of these y distributions have the same variance.

• Linearity: The distribution of y- values has a mean that lies on the regression line.

(Results are not seriously affected if departures

from normal distributions and equal variances

are not too extreme.)

(8)

Slide 8

Regression Equation

y-intercept of regression equation

ββββ

0 b₀

Slope of regression equation

ββββ

1 b₁

Equation of the regression line y =

ββββ

0 +

ββββ

1 x y = b₀ + b₁ x

Population Parameter

Sample Statistic

^

Given p aired sample data (x,y) of size n

satisfying population parameter equation below.

Regression line where b₀ estimates ββββ₀ and b₁ estimates ββββ₁

.

Q: How to get b

₀

and b

₁

?

(9)

Slide 9

Formula for b ₀ ^and b ₁

Formula 9-2

b

₁ ⁼

n( Σ ^{xy) – (} Σ ^{x) (} Σ ^y)

^(slope)

n( Σ ^x

²

^{) – (} Σ ^x)

²

b

₀ ⁼ ^{y – b}₁ ^x (y-intercept)

Formula 9-3

calculators or computers can

compute these values

(10)

Slide 10

The regression line fits the sample

points best.

(11)

Slide 11

1 2

1 8

3 6

5 4

Data

x y

Calculating the

Regression Equation

n = 4 ΣΣΣΣx = 10 ΣΣΣΣy = 20

ΣΣΣΣx² = 36

ΣΣΣΣy² = 120 ΣΣΣΣxy = 48

n( ΣΣΣΣ ^{xy) – (} ΣΣΣΣ ^{x) (} ΣΣΣΣ ^y)

n( ΣΣΣΣ ^x

²

^{) –(} ΣΣΣΣ ^x)

²

b

₁ ⁼

4(48) – (10) (20) 4(36) – (10)

²

b

₁ ⁼

–8

b

₁ ⁼

44

= –0.181818

In Section 9-2, we used these values to find that the linear correlation coefficient of r = –0.135. Use this sample to find the regression equation.

5.45

(2.5) (-.181818)

- 5

1 0

=

−

= y b x b

The estimated equation of the regression line is:

y ˆ = 5 . 45 − . 182 x

(12)

Slide 12

Example: Boats and Manatees

(13)

Slide 13

Given the sample data in Table 9-1, find the regression equation.

Using the same procedure as in the previous example, we find that b₁ = 2.27 and b₀ = –113 or computer, the estimated regression equation is:

y = –113 + 2.27x

^

Example:

Boats and Manatees

(14)

Slide 14

In predicting a value of y based on some given value of x ...

1. If there is not a significant linear

correlation, the best predicted y-value is y.

Predictions

2. If there is a significant linear correlation, the best predicted y-value is found by

substituting the x-value into the

regression equation.

(15)

Slide 15

Figure 9-8 Predicting the Value of a Variable

(16)

Slide 16

The best regression equation is y = –113 + 2.27x.

Assume that in 2001 there were 850,000 registered boats.

Because Table 9-1 lists the numbers of registered boats in tens of thousands, this means that for 2001 we have

x = 85.

Q: Given that x = 85, find the best predicted value of y, the number of manatee deaths from boats.

^

Revisit Boats and Manatees Example

We do have a significant linear correlation (with r = 0.922).

(17)

Slide 17

Example:

Boats and Manatees

y = –113 + 2.27x

–113 + 2.27(85) = 80.0

^

The predicted number of manatee deaths is 80.0. The actual number of manatee deaths in 2001 was 82, so the predicted value of 80.0 is quite close.

(18)

Slide 18

1. If there is no significant linear correlation, don’t use the regression equation to make predictions.

2. When using the regression equation for predictions, stay within the scope of the available sample data.

3. A regression equation based on old data is not necessarily valid now.

4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.

Guidelines for Using The

Regression Equation

(19)

Slide 19

Definitions

Marginal Change: The marginal change is the amount that a variable changes when the

other variable changes by exactly one unit.

Outlier: An outlier is a point lying far away from the other data points.

Influential Points: An influential point

strongly affects the graph of the regression line.

(20)

Slide 20

Definitions

Residual

for a sample of paired (

x, y

) data, the difference (

y - y

⁾

between an observed sample

y

-value and the value of

y,

which is the value of

y

that is predicted by using the regression equation.

Least-Squares Property

A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

^

Residuals and the

Least-Squares Property

^

(21)

Slide 21

x

1 2 4 5

y

4 24 8 32

y ^{= 5 + 4} x

^

Figure 9-9

Example: Boats and Manatees

Example: Boats and Manatees

Use method 1

0.922

t = = 6.735

r

n

t = r

Conclusion

Interpreting r:

Explained Variation

The value of r

is the proportion of the variation in y that is explained by the linear relationship between x and y.

Manatee example:

9.3 Regression

Regression Equation

Assumptions

For each x- value,

• Normality: Y is a random variable having a normal (bell-shaped) distribution.

• Homogeneity: All of these y distributions have the same variance.

• Linearity: The distribution of y- values has a mean that lies on the regression line.

(Results are not seriously affected if departures

from normal distributions and equal variances

are not too extreme.)

Regression Equation

ββββ

ββββ

ββββ

ββββ

Given p aired sample data (x,y) of size n

satisfying population parameter equation below.

.

Q: How to get b

and b

?

Formula for b 0 and b 1

b

n( Σ xy) – ( Σ x) ( Σ y)

n( Σ x

) – ( Σ x)

b

calculators or computers can

compute these values

The regression line fits the sample

points best.

Calculating the

Regression Equation

n( ΣΣΣΣ xy) – ( ΣΣΣΣ x) ( ΣΣΣΣ y)

n( ΣΣΣΣ x

) –( ΣΣΣΣ x)

b

4(48) – (10) (20) 4(36) – (10)

b

–8

b

44

y ˆ = 5 . 45 − . 182 x

Example: Boats and Manatees

y = –113 + 2.27x

^

Example:

Boats and Manatees

In predicting a value of y based on some given value of x ...

1. If there is not a significant linear

correlation, the best predicted y-value is y.

Predictions

2. If there is a significant linear correlation, the best predicted y-value is found by

substituting the x-value into the

regression equation.

^

Revisit Boats and Manatees Example

Example:

Boats and Manatees

^

1. If there is no significant linear correlation, don’t use the regression equation to make predictions.

2. When using the regression equation for predictions, stay within the scope of the available sample data.

3. A regression equation based on old data is not necessarily valid now.

4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.

Guidelines for Using The

Regression Equation

t = ^{= 6.735}

Formula for b ₀ ^and b ₁

n( Σ ^{xy) – (} Σ ^{x) (} Σ ^y)

n( Σ ^x

^{) – (} Σ ^x)

n( ΣΣΣΣ ^{xy) – (} ΣΣΣΣ ^{x) (} ΣΣΣΣ ^y)

n( ΣΣΣΣ ^x

^{) –(} ΣΣΣΣ ^x)

y ^{= 5 + 4} x