COMPARING DISTRIBUTIONS
Lesson 1: Comparing centers
LESSON 1: OPENER
In the last topic, you looked at data researchers collected on the time it takes drivers to react to a change in driving environment while writing a text message. Along with this data, the researchers also collected data on how long it takes drivers to react when not engaged in any distracting activity. Here are the additional data they collected:
Reaction Time with No Distractions (in seconds)
2.7 1.0 3.0 1.4 3.0
2.0 0.8 1.2 2.1 2.2
0.7 2.4 1.1 0.8 1.0
2.2 2.5 3.1 1.7 3.3
1. Compute the five-‐number summary for the data.
Min = 0.8
Q1 = 1.05 seconds Median = 2.05 seconds Q3 = 2.6 seconds Max = 3.3 seconds
2. Find the mean of the data.
Mean = 1.91 seconds
MID-UNIT ASSESSMENT
Today you will take a mid-‐unit assessment.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
LESSON 1: CONSOLIDATION ACTIVITY
1. A boxplot and histogram for the set of data on the reaction time when texting is shown. Construct a boxplot and histogram for the set of data given in the Opener of reaction time with no distractions. Then record the median and the mean.
Reaction Time when Writing a Text Message Reaction Time with No Distractions
Median = 4.45 Median = 2.05
Mean = 4.605 seconds Mean = 1.91 seconds
2. How do the shapes of the two distributions compare?
Both sets of data are roughly symmetric.
3. How do the centers of the two distributions compare?
The center of the data for “Reaction Time when Writing a Text Message” is higher than the center of the “Reaction Time with No Distractions” data. So, in general, it seems that it takes longer for a person to react to hazards in the roadway while writing a text message than when driving without any distractions.
4. Which measure of center, median or mean, best represents the data? Why?
Since both sets of data are roughly symmetric, the mean is the best measure of center.
LESSON 1: HOMEWORK
Notes or additional instructions based on whole-‐class discussion of homework assignment:
1. Distribution pair #1: Consider this graph of two distributions. Use the graph to answer the questions.
a. How do the shapes of the two distributions compare?
Both distributions are symmetric.
b. Which measure of center would best describe the data?
Since both distributions are symmetric, either the mean or median would be the best measure of center.
c. How do the centers of the two distributions compare?
The distributions have the same center.
2. Distribution pair #2: Consider this graph of two distributions. Use the graph to answer the questions.
a. How do the shapes of the two distributions compare?
Both distributions are symmetric.
b. Which measure of center would best describe the data?
Since both distributions are symmetric, either the mean or median would be the best measure of center.
c. How do the centers of the two distributions compare?
The center of the distribution represented by the blue line is less than the center of the distribution
represented by the red line.
3. Determine whether each statement describes distribution pair 1 or distribution pair 2.
There is a difference in the “typical” data value between the two data sets. Distribution pair 2
The mean and median of the two data sets are the same. Distribution pair 1
The mean of one set of data is greater than the mean of the other set of data. Distribution pair 2
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
4. Choose two different car manufacturers. For each manufacturer, record the gas mileage in miles per gallon of each make of car it sells.
a. Construct a graphical representation showing the miles per gallon of each make of car for each manufacturer.
Answers will vary.
b. What can you conclude by looking at the two graphical representations?
Answers will vary.
c. Choose the most appropriate measure of center and spread for each data set. Explain your choice and what
each measure means in context.
Answers will vary.
LESSON 1: STAYING SHARP
Re vi ew in g id ea s fr om e ar lie r g ra de s 1. Solve. !! 3 14= x 42 x = 92. The ratio of boys to girls in a math class is 5 to 4. If there are 15 boys in the class, how many girls are in the class? 12 girls Pr ep ar in g fo r u pc om in g les so ns
A survey was conducted to determine the number of high school freshmen who have cell phones. Ten students were asked whether or not they had a cell phone. Here are the students’ responses.
Student Phone? 1 Yes 2 Yes 3 No 4 Yes 5 Yes 6 Yes 7 Yes 8 No 9 Yes 10 Yes
3. Use the data provided to fill in the table.
Phone? Number of students Yes 8
No 2
4. Based on this survey, what percentage of high school freshmen do not have a cell phone?
20% Fo cu s ski ll
5. When choosing a measure of center to represent a data set, when is it best to use the median? Explain.
When data are skewed or contain outliers, the median should be chosen over the mean because it is less impacted by extreme values. If data are roughly symmetric, then either the mean or median
can be used.
6. If the mean is chosen as the best measure of center, what measure of center should be used? Why?
Standard deviation. The standard deviation relies upon the mean. If the mean is the best measure of center, then the standard deviation should be used.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
Lesson 2: Comparing spreads
LESSON 2: OPENER
Reaction Time when Writing a Text Message Reaction Time with No Distractions
Median = 4.45 Median = 2.05
Mean = 4.605 seconds Mean = 1.91 seconds
Which measure of spread, IQR or standard deviation, would best describe these sets of data? Justify your response.
Since both distributions are relatively symmetric, either the mean or median would be the best measure of center. If you choose the mean as the measure of center, then report the standard deviation for the spread. If you choose the median as the measure of center, then report the IQR for the measure of spread.
LESSON 2: CORE ACTIVITY
1. What conclusions might you draw about differences in reaction time based on the parallel boxplots?
Answers will vary.
Sample answer: A driver who is not distracted by a text message can react, on average, about 2 seconds quicker than one who is writing a text message. The difference in spread tells you that there is a lot more variability in the reaction time of drivers who are writing a text message. Some drivers can react
quickly, while other drivers take a lot time to react.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
2. Compute the standard deviation of the reaction time data with no distractions.
Observation Deviation deviation Squared Observation Deviation deviation Squared
2.7 0.79 0.6241 0.7 -1.21 1.4641 1.0 -0.91 0.8281 2.4 0.49 0.2401 3.0 1.09 1.1881 1.1 -0.81 0.6561 1.4 -0.51 0.2601 0.8 -1.11 1.2321 3.0 1.09 1.1881 1.0 -0.91 0.8281 2.0 0.09 0.0081 2.2 0.29 0.0841 0.8 -1.11 1.2321 2.5 0.59 0.3481 1.2 -0.71 0.5041 3.1 1.19 1.4161 2.1 0.19 0.0361 1.7 -0.21 0.0221 2.2 0.29 0.0841 3.3 1.39 1.9321
Standard deviation = 0.864 seconds
3. The original question posed by researchers was, "Does writing texts while driving impair a person’s ability to react to hazards on the road?" Based on the centers and spreads of the two data sets, what conclusions can you make?
Answers will vary.
Sample answer: The center of the data for “Reaction Time when Writing a Text Message” is higher than the center of the “Reaction Time with No Distractions” data. So, in general, it seems that it takes longer for a person to react to hazards in the roadway while writing a text message than when driving without any distractions.
The “Reaction Time when Writing a Text Message” data is more spread out. This suggests that writing a text message while driving has different effects on reaction time for different drivers. But from the boxplots you can see that the fastest reaction time when writing a text message is still greater than the average reaction time with no distractions.
So, the researchers can conclude that writing texts while driving does seem to increase the amount of time it takes a person to react to hazards on the roadway.
LESSON 2: CONSOLIDATION ACTIVITY
Scientists treat clouds with certain chemicals, such as silver nitrate, to try to change the amount of precipitation the clouds release. This process is called seeding.
Researchers conducted an experiment to determine the effectiveness of cloud seeding. They chose 52 clouds and randomly assigned 26 of the clouds for treatment with silver nitrate. Then they measured the rainfall (in acre-‐feet) produced by each of the 52 clouds. Here is the data they collected:
Unseeded Seeded
Median = 44.2 acre-‐feet Median = 221.6 acre-‐feet
Mean = 164.6 acre-‐feet Mean = 442.3 acre-‐feet
IQR = 138.6 acre-‐feet IQR = 337.6 acre-‐feet
Standard deviation = 278.4 acre-‐feet Standard deviation = 650.8 acre-‐feet
1. What do the shapes of the parallel boxplots tell you?
The data are not symmetric. Both boxplots indicate that the data is skewed right.
2. Which measure of center best represents the “typical” data value for each data set?
Since both data sets are skewed or contain outliers, the median best represents the center of the distributions.
3. How do the medians compare, and what does this indicate about the rainfall distributions?
The median of the seeded cloud data is higher than the median of the unseeded cloud data. This indicates that the amount of rainfall produced by the typical seeded cloud was greater than the rainfall amount produced by the typical unseeded cloud.
4. Which measure of variability, or spread, best describes each data set?
The inter quartile range (IQR) best describes the spread of each data set since the data are skewed or contain outliers.
5. How do the IQRs compare, and what does this indicate?
The seeded IQR appears greater than the unseeded IQR; therefore, the distribution of rainfall from seeded clouds appears to have greater variability than the distribution of rainfall from unseeded clouds.
6. What conclusions can you make based on the shapes, centers, and spreads of the two sets of data?
The seeded clouds seem to produce a larger typical amount of rainfall than unseeded clouds. The data suggests that cloud seeding is an effective way to increase precipitation.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
LESSON 2: HOMEWORK
Notes or additional instructions based on whole-‐class discussion of homework assignment:
Homework Assignment
Part I: Complete the online More practice in the topic Comparing distributions. Part II: Complete Lesson 2: Staying Sharp.
As you complete the More practice, record below any questions you may have or challenges you encounter with the items.
LESSON 2: STAYING SHARP
Re vi ew in g id ea s fr om e ar lie r g ra de s 1. Solve. !! 5 7= 3x 42 x = 102. When Jason put gas in his car, gas was priced at $1.67 per gallon. If he spent $18.37, how many gallons of gas did he put in his car?
11 gallons of gas Pr ep ar in g fo r u pc om in g les so ns
24 algebra students were asked how much time they spent studying for class last night. Here is what they reported.
Student (minutes) Time Student (minutes) Time
1 30 13 20 2 15 14 30 3 30 15 45 4 45 16 30 5 60 17 20 6 45 18 20 7 15 19 15 8 30 20 30 9 30 21 45 10 45 22 45 11 60 23 60 12 60 24 30
3. Complete the table.
Time spent studying Number of students Less than 30 minutes 6 Between 30 and 45 minutes 14 More than 45 minutes 4
4. What percentage of students spent 30 minutes or more studying last night?
75% Fo cu s ski ll
Data about housing prices in two neighborhoods were collected. The summaries of the data are shown here. Neighborhood A Neighborhood B Mean $500,000 $350,000 Median $350,000 $345,000 Standard deviation $100,000 $25,000 IQR $80,000 $24,000
5. Which value should be reported as the "typical" value of home prices in each neighborhood? Why?
The home prices in Neighborhood A seem to be skewed to the right since the mean is higher than the median. The best measure of center would be the median. The mean and median home prices in neighborhood B are about the same, meaning that either value would be representative.
6. What does the spread tell you about the home prices?
The home prices in Neighborhood A are more spread out. There are several very expensive homes in that neighborhood. The home prices in Neighborhood B are closer together. Most of the homes are around $25,000 of the mean home price.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
Lesson 3: Comparing distributions
LESSON 3: OPENER
Researchers asked 15 women and 15 men to report their yearly salaries. Here are the data they collected:
Women
Salary (in thousands of dollars)
Men
Salary (in thousands of dollars)
31 22 40 48 27 32 43 57 38 52
28 36 46 20 47 46 63 30 60 34
44 32 52 43 24 35 54 49 40 56
1. Calculate the five-‐number summary of the women’s salary data.
Minimum = 20 Q1 = 27 Median = 36 Q3 = 46 Maximum = 52
2. Calculate the five-‐number summary of the men’s salary data.
Minimum = 30 Q1 = 35 Median = 46 Q3 = 56 Maximum = 63
LESSON 3: CORE ACTIVITY
1. Construct parallel boxplots of the women’s and men’s salary data.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
2. Look at the shape, center, and spread of the parallel boxplots. What do these features tell you about the data? Determine whether each statement is true or false.
Both distributions are symmetric. True
The center of the women’s data is higher than the center of the men’s data. False
Either the mean or median is the best measure of center for the two sets of data. True
The two sets of data seem to have similar spreads. True
If the mean is chosen as the measure of center, then the interquartile range is the best
measure of spread. False
3. Which measure of center would you choose to report? Find this measure of center and explain your choice.
Answers will vary.
Mean (women’s salary) = $36 thousand Mean (men’s salary) = $45.9 thousand Median (women’s salary) = $36 thousand Median (men’s salary) = $46 thousand
4. Calculate the measure of spread that corresponds to the measure of center you chose.
Answers will vary. The measure of spread students choose to report depends on the measure of center they chose. If students chose to report the mean, they will find the standard deviation. If students chose to report the median, they will find the interquartile range.
Women’s Salary Men’s Salary Observation Deviation Squared
deviation
Observation Deviation Squared deviation 31 -5 25 32 -13.9 193.21 28 -8 64 46 0.1 0.01 44 8 64 35 -10.9 118.81 22 -14 196 43 -2.9 8.41 36 0 0 63 17.1 292.41 32 -4 16 54 8.1 65.61 40 4 16 57 11.1 123.21 46 10 100 30 -15.9 252.81 52 16 256 49 3.1 9.61 48 12 144 38 -7.9 62.41 20 -16 256 60 14.1 198.81 43 7 49 40 -5.9 34.81 27 -9 81 52 6.1 37.21 47 11 121 34 -11.9 141.61 24 -12 144 56 10.1 102.01
Standard deviation (women) = $10.5 thousand Standard deviation (men) = $10.8 thousand
The IQR for the women’s salary data is 46,000 – 27,000 = 19,000. The IQR for the men’s salary data is 56,000 – 35,000 = 21,000.
5. Based on your analysis, what conclusions can you draw?
Answers will vary.
Sample answer: The center of the data collected from men is higher than the center of the data collected from women. However, the variability in the two data sets is similar.
The data suggest that there is a difference between the men’s and women’s salaries. It seems that women make less money than men.
LESSON 3: REVIEW MID-UNIT ASSESSMNET
Today you will review your mid-‐unit assessment.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.
LESSON 3: HOMEWORK
Notes or additional instructions based on whole-‐class discussion of homework assignment:
A new method for studying mathematics was tested with 20 freshmen enrolled in an algebra class. Their average test scores before the students used the new method and after they used the new method are listed in this table.
Student Before After
Student Before After
A 72 80
K 85 85 B 81 85
L 91 87 C 79 79
M 52 60 D 90 92
N 80 83 E 95 93
O 74 75 F 87 89
P 77 77 G 60 67
Q 61 60 H 65 72
R 88 85 I 74 92
S 86 82 J 66 70
T 70 75
1. Construct two histograms, one for exam scores before the new method and one for exam scores after the new method. Label your histograms “Before” and “After.”
Answers:
2. Describe the data sets. Be sure to compare the centers and spreads of the two data sets.
Answers will vary. Since both data sets are roughly symmetric, students could choose to report either the mean or median.
Sample answer: The Before scores have a mean of 76.65, while the After scores have a mean of 79.4. The standard deviation of the Before scores is 11.72, while the standard deviation of the After scores is 9.90. The shapes of both graphical representations are approximately symmetrical and bell shaped, with peaks in the 70 to 80 range.
3. Based on the histograms you constructed and your comparison of the shapes, centers, and spreads of the data sets, should the new method for studying mathematics be implemented in the high school? Explain your reasoning.
Answers will vary.
Sample answer: Looking at the graphs, there don't seem to be any major differences in the two data sets. Also, the means and standard deviations are relatively the same. The new study method does not appear to improve test scores.
LESSON 3: STAYING SHARP
Re vi ew ing idea s fr om e ar lie r g ra de s 1. Solve. !! 21 27= a 18 a = 142. A gallon of milk weighs 8 pounds. A shipping container of milk has a weight of 20 pounds. How many gallons of milk are in the shipping container?
2.5 gallons of milk Pr ep ar in g fo r u pc om in g le ss on s
50 men and 50 women were asked whether they agreed or disagreed with the statement “I use math every day at my job.” The table shows their responses.
Men Women Agree 40 35
Disagree 10 15
3. What percentage of all the people surveyed said they agreed with the statement “I use math every day at my job?” 75%
4. What percent of men said they agreed with the statement “I use math every day at my job?”
80% Fo cu s ski ll
Data about housing prices in two neighborhoods were collected. The summaries of the data are shown here. Neighborhood A Neighborhood B Mean $500,000 $350,000 Median $350,000 $345,000 Standard deviation $100,000 $25,000 IQR $80,000 $24,000
5. What can you conclude about the home prices in each neighborhood?
The typical home in each neighborhood costs about the same, even though there are a few houses in Neighborhood A that are very expensive.
Neighborhood A has greater variability in home prices than Neighborhood B.
6. If you were selling your house, which neighborhood would you like to be living in? If you were buying a house, which neighborhood would you want to buy in? Explain.
Answers will vary. Some students may say to sell in Neighborhood A because there is a greater chance to make money. Some may say to buy in Neighborhood B because the prices are more predictable.
Copyright © 2017 Charles A. Dana Center at the University of Texas at Austin, Learning Sciences Research Institute at the University of Illinois at Chicago, Agile Mind, Inc.