STATISTICS
INFORMED DECISIONS USING DATA
Fifth Edition, Global Edition
Chapter 4
Describing the
Relation between
4.1 Scatter Diagrams and Correlation
Learning Objectives
1. Draw and interpret scatter diagrams
2. Describe the properties of the linear correlation coefficient 3. Compute and interpret the linear correlation coefficient
4. Determine whether a linear relation exists between two variables
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams
(1 of 6)The response variable is the variable whose value can be explained by the value of the explanatory or predictor
variable.
A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams
(2 of 6)EXAMPLE Drawing and Interpreting a Scatter Diagram
The data shown to the right are based on a study for drilling rock. The
researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. So, depth at which drilling begins is the explanatory variable, x, and time (in
minutes) to drill five feet is the response
Depth at Which Drilling Begins, x
(in feet)
Time to Drill 5 Feet, y
4.1 Scatter Diagrams and Correlation
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams
(4 of 6)4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams
(5 of 6)Two variables that are linearly related are positively
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams
(6 of 6)Two variables that are linearly related are negatively
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation Coefficient (1 of 6)
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation Coefficient (2 of 6)
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation Coefficient (3 of 6)
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always between −1 and 1, inclusive. That is, −1 ≤ r ≤ 1.
2. If r = + 1, then a perfect positive linear relation exists between the two variables.
3. If r = −1, then a perfect negative linear relation exists between the two variables.
4. The closer r is to +1, the stronger the evidence is of a positive association between the two variables.
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation Coefficient (4 of 6)
6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. So r close to 0 does not imply no relation, just no linear relation.
7. The linear correlation coefficient is a unitless measure of
association. So the unit of measure for x and y plays no role in the interpretation of r.
8. The correlation coefficient is not resistant. Therefore, an
4.1 Scatter Diagrams and Correlation
4.1 Scatter Diagrams and Correlation
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation Coefficient (1 of 5)
EXAMPLE Determining the Linear Correlation Coefficient
Determine the linear
correlation coefficient of the drilling data.
Depth at Which Drilling Begins, x
(in feet)
Time to Drill 5 Feet, y
4.1 Scatter Diagrams and Correlation
4.1 Scatter Diagrams and Correlation
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation Coefficient (4 of 5)
IN CLASS ACTIVITY
Correlation
Randomly select six students from the class and have them determine their at-rest pulse rates and then discuss the following:
1. When determining each at-rest pulse rate, would it be better to count beats for 30 seconds and multiply by 2 or count beats for 1 full minute?
Explain. What are some other ways to find the at-rest pulse rate? Do any of these methods have an advantage?
2. What effect will physical activity have on pulse rate?
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation Coefficient (5 of 5)
4. Draw a scatter diagram for the pulse data using the at-rest data as the explanatory variable.
5. Comment on the relationship, if any, between the two variables. Is this consistent with your expectations?
4.1 Scatter Diagrams and Correlation
4.1.4 Determine whether a Linear Relation Exists between Two Variables (1 of 2)
Testing for a Linear Relation
Step 1 Determine the absolute value of the correlation coefficient.
Step 2 Find the critical value in Table II for the given sample size.
Step 3 If the absolute value of the correlation coefficient is greater than the critical value, we say a linear relation exists
4.1 Scatter Diagrams and Correlation
4.1.4 Determine whether a Linear Relation Exists between Two Variables (2 of 2)
EXAMPLE Does a Linear Relation Exist?
Determine whether a linear relation exists between time to drill five feet and depth at which drilling begins. Comment on the type of relation that appears to exist
between time to drill five feet and depth at which drilling begins.
The correlation between drilling depth and time to drill is 0.773. The critical value for n = 12 observations is 0.576. Since 0.773 > 0.576, there is a positive linear relation between time to drill five feet and depth at which drilling begins.
Table II
Critical Values for Correlation Coefficient
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (1 of 8)
According to data obtained from the Statistical Abstract of the United States, the correlation between the percentage of the female population with a bachelor’s degree and the
percentage of births to unmarried mothers since 1990 is 0.940.
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (2 of 8)
Certainly not! The correlation exists only because both percentages have been increasing since 1990. It is this relation that causes the high correlation. In general, time series data (data collected over time) may have high
correlations because each variable is moving in a specific direction over time (both going up or down over time; one increasing, while the other is decreasing over time).
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (3 of 8)
Another way that two variables can be related even though there is not a causal relation is through a lurking variable.
A lurking variable is related to both the explanatory and response variable.
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (4 of 8)
EXAMPLE Lurking Variables in a Bone Mineral Density Study
Because colas tend to replace healthier beverages and colas contain caffeine and phosphoric acid, researchers Katherine L. Tucker and associates wanted to know whether cola consumption is associated with lower bone mineral density in
women. The table lists the typical number of cans of cola consumed in a week and the femoral neck bone mineral density for a sample of 15 women. The data were collected through a prospective cohort study.
Table 4 Number of Colas per Week
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (5 of 8)
EXAMPLE Lurking Variables in a Bone Mineral Density Study
The figure on the next slide shows the scatter diagram of the data. The correlation between number of colas per week and bone mineral density is −0.806.The critical value for correlation with n = 15 from Table II in Appendix A is 0.514. Because |−0.806| > 0.514, we
4.1 Scatter Diagrams and Correlation
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (7 of 8)
EXAMPLE Lurking Variables in a Bone Mineral Density Study
In prospective cohort studies, data are collected on a group of subjects through questionnaires and surveys over time. Therefore, the data are observational. So the researchers cannot claim that increased cola consumption causes a decrease in bone mineral density.
Some lurking variables in the study that could confound the results are:
• body mass index
• height
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and Causation (8 of 8)
EXAMPLE Lurking Variables in a Bone Mineral Density Study
The authors were careful to say that increased cola consumption is associated with lower bone mineral density because of potential lurking variables. They never stated that increased cola
4.2 Least-squares Regression
Learning Objectives
1. Find the least-squares regression line and use the line to make predictions
2. Interpret the slope and the y-intercept of the least-squares regression line
4.2 Least-squares Regression
EXAMPLE Finding an Equation that Describes Linearly Relate Data (1 of 2)
Using the following sample data:
x 0 2 3 5 6 6
4.2 Least-squares Regression
EXAMPLE Finding an Equation that Describes Linearly Relate Data (2 of 2)
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the Line to Make Predictions (1 of 7)
The difference between the observed value of y and the predicted value of y is the error, or residual.
Using the line from the last example, and the predicted value at x = 3:
residual = observed y − predicted y
= 5.2 − 4.75
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the Line to Make Predictions (2 of 7)
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the Line to Make Predictions (3 of 7)
The Least-Squares Regression Line
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the Line to Make Predictions (4 of 7)
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the Line to Make Predictions (5 of 7)
EXAMPLE Finding the Least-squares Regression Line
Using the drilling data
(a) Find the least-squares regression line.
(b) Predict the drilling time if drilling starts at 130 feet.
(c) Is the observed drilling time at 130 feet above, or below, average.
(d) Draw the least-squares regression line on the scatter diagram of the data.
Depth at Which Drilling
Begins, x (in feet)
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the Line to Make Predictions (6 of 7)
4.2 Least-squares Regression
4.2 Least-squares Regression
4.2.2 Interpret the Slope and the y-Intercept of the Least-Squares Regression Line (1 of 3)
Interpretation of Slope:
The slope of the regression line is 0.0116. For each additional foot of depth we start drilling, the time to drill five feet increases by
4.2 Least-squares Regression
4.2.2 Interpret the Slope and the y-Intercept of the Least-Squares Regression Line (2 of 3)
Interpretation of the y-Intercept:
The y-intercept of the regression line is 5.5273. To interpret the y -intercept, we must first ask two questions:
1. Is 0 a reasonable value for the explanatory variable?
2. Do any observations near x = 0 exist in the data set?
A value of 0 is reasonable for the drilling data (this indicates that drilling begins at the surface of Earth. The smallest observation in the data set is x = 35 feet, which is reasonably close to 0. So,
interpretation of the y-intercept is reasonable.
4.2 Least-squares Regression
4.2.2 Interpret the Slope and the y-Intercept of the Least-Squares Regression Line (3 of 3)
4.2 Least-squares Regression
4.2.3 Compute the Sum of Squared Residuals
To illustrate the fact that the sum of squared residuals for a least-squares regression line is less than the sum of squared residuals for any other line, use the “regression by eye”