• No results found

COMPARING DISTRIBUTIONS

N/A
N/A
Protected

Academic year: 2021

Share "COMPARING DISTRIBUTIONS"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

COMPARING DISTRIBUTIONS

Lesson 1: Comparing centers

LESSON 1: OPENER

In  the  last  topic,  you  looked  at  data  researchers  collected  on  the  time  it  takes  drivers  to  react  to  a  change  in  driving   environment  while  writing  a  text  message.  Along  with  this  data,  the  researchers  also  collected  data  on  how  long  it  takes   drivers  to  react  when  not  engaged  in  any  distracting  activity.  Here  are  the  additional  data  they  collected:  

Reaction  Time  with  No  Distractions  (in  seconds)  

2.7   1.0   3.0   1.4   3.0  

2.0   0.8   1.2   2.1   2.2  

0.7   2.4   1.1   0.8   1.0  

2.2   2.5   3.1   1.7   3.3  

 

1. Compute  the  five-­‐number  summary  for  the  data.    

Min = 0.8

Q1 = 1.05 seconds Median = 2.05 seconds Q3 = 2.6 seconds Max = 3.3 seconds

2. Find  the  mean  of  the  data.    

Mean = 1.91 seconds

MID-UNIT ASSESSMENT

Today  you  will  take  a  mid-­‐unit  assessment.    

(2)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

LESSON 1: CONSOLIDATION ACTIVITY

1. A  boxplot  and  histogram  for  the  set  of  data  on  the  reaction  time  when  texting  is  shown.  Construct  a  boxplot  and   histogram  for  the  set  of  data  given  in  the  Opener  of  reaction  time  with  no  distractions.  Then  record  the  median   and  the  mean.    

 

Reaction  Time  when  Writing  a  Text  Message     Reaction  Time  with  No  Distractions  

            Median  =  4.45     Median = 2.05

Mean  =  4.605  seconds     Mean = 1.91 seconds

2. How  do  the  shapes  of  the  two  distributions  compare?  

Both sets of data are roughly symmetric.

3. How  do  the  centers  of  the  two  distributions  compare?  

The center of the data for “Reaction Time when Writing a Text Message” is higher than the center of the “Reaction Time with No Distractions” data. So, in general, it seems that it takes longer for a person to react to hazards in the roadway while writing a text message than when driving without any distractions.

4. Which  measure  of  center,  median  or  mean,  best  represents  the  data?  Why?  

Since both sets of data are roughly symmetric, the mean is the best measure of center.

(3)

LESSON 1: HOMEWORK

Notes  or  additional  instructions  based  on  whole-­‐class  discussion  of  homework  assignment:  

     

 

1. Distribution  pair  #1:  Consider  this  graph  of  two  distributions.  Use  the  graph  to  answer  the  questions.  

 

a. How  do  the  shapes  of  the  two  distributions  compare?  

Both distributions are symmetric.

 

b. Which  measure  of  center  would  best  describe  the  data?  

Since both distributions are symmetric, either the mean or median would be the best measure of center.

c. How  do  the  centers  of  the  two  distributions  compare?  

The distributions have the same center.  

   

2. Distribution  pair  #2:  Consider  this  graph  of  two  distributions.  Use  the  graph  to  answer  the  questions.  

 

a. How  do  the  shapes  of  the  two  distributions  compare?  

Both distributions are symmetric.

 

b. Which  measure  of  center  would  best  describe  the  data?  

Since both distributions are symmetric, either the mean or median would be the best measure of center.

c. How  do  the  centers  of  the  two  distributions  compare?  

The center of the distribution represented by the blue line is less than the center of the distribution

represented by the red line.  

   

3. Determine  whether  each  statement  describes  distribution  pair  1  or  distribution  pair  2.  

There  is  a  difference  in  the  “typical”  data  value  between  the  two  data  sets.   Distribution pair 2  

The  mean  and  median  of  the  two  data  sets  are  the  same.   Distribution pair 1  

The  mean  of  one  set  of  data  is  greater  than  the  mean  of  the  other  set  of  data.   Distribution pair 2  

       

(4)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

4.   Choose  two  different  car  manufacturers.  For  each  manufacturer,  record  the  gas  mileage  in  miles  per  gallon  of  each   make  of  car  it  sells.    

 

  a.   Construct  a  graphical  representation  showing  the  miles  per  gallon  of  each  make  of  car  for  each  manufacturer.  

 

    Answers will vary.

 

  b.   What  can  you  conclude  by  looking  at  the  two  graphical  representations?  

 

    Answers will vary.  

 

  c.   Choose  the  most  appropriate  measure  of  center  and  spread  for  each  data  set.  Explain  your  choice  and  what  

each  measure  means  in  context.    

    Answers will vary.  

   

   

(5)

LESSON 1: STAYING SHARP

Re vi ew in g   id ea s   fr om  e ar lie r  g ra de s   1. Solve.       !! 3 14= x 42       x = 9

2. The  ratio  of  boys  to  girls  in  a  math  class  is  5  to  4.  If   there  are  15  boys  in  the  class,  how  many  girls  are  in  the   class?   12 girls Pr ep ar in g   fo r  u pc om in g   les so ns  

A  survey  was  conducted  to  determine  the  number  of   high  school  freshmen  who  have  cell  phones.  Ten   students  were  asked  whether  or  not  they  had  a  cell   phone.  Here  are  the  students’  responses.  

  Student   Phone?   1   Yes   2   Yes   3   No   4   Yes   5   Yes   6   Yes   7   Yes   8   No   9   Yes   10   Yes    

3. Use  the  data  provided  to  fill  in  the  table.    

Phone?   Number  of  students   Yes   8

No   2

 

4. Based  on  this  survey,  what  percentage  of  high  school   freshmen  do  not  have  a  cell  phone?  

  20% Fo cu s   ski ll  

5. When  choosing  a  measure  of  center  to  represent  a   data  set,  when  is  it  best  to  use  the  median?  Explain.    

When data are skewed or contain outliers, the median should be chosen over the mean because it is less impacted by extreme values. If data are roughly symmetric, then either the mean or median

can be used.  

 

6. If  the  mean  is  chosen  as  the  best  measure  of  center,   what  measure  of  center  should  be  used?  Why?      

Standard deviation. The standard deviation relies upon the mean. If the mean is the best measure of center, then the standard deviation should be used.

 

(6)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

(7)

Lesson 2: Comparing spreads

LESSON 2: OPENER

 

Reaction  Time  when  Writing  a  Text  Message     Reaction  Time  with  No  Distractions  

            Median  =  4.45     Median  =  2.05  

Mean  =  4.605  seconds     Mean  =  1.91  seconds  

 

Which  measure  of  spread,  IQR  or  standard  deviation,  would  best  describe  these  sets  of  data?  Justify  your  response.    

Since both distributions are relatively symmetric, either the mean or median would be the best measure of center. If you choose the mean as the measure of center, then report the standard deviation for the spread. If you choose the median as the measure of center, then report the IQR for the measure of spread.

LESSON 2: CORE ACTIVITY

1. What  conclusions  might  you  draw  about  differences  in  reaction  time  based  on  the  parallel  boxplots?    

 

Answers will vary.

Sample answer: A driver who is not distracted by a text message can react, on average, about 2 seconds quicker than one who is writing a text message. The difference in spread tells you that there is a lot more variability in the reaction time of drivers who are writing a text message. Some drivers can react

quickly, while other drivers take a lot time to react.  

       

(8)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

 

2. Compute  the  standard  deviation  of  the  reaction  time  data  with  no  distractions.  

Observation   Deviation   deviation  Squared     Observation   Deviation   deviation  Squared  

2.7   0.79 0.6241   0.7   -1.21 1.4641 1.0   -0.91 0.8281   2.4   0.49 0.2401 3.0   1.09 1.1881   1.1   -0.81 0.6561 1.4   -0.51 0.2601   0.8   -1.11 1.2321 3.0   1.09 1.1881   1.0   -0.91 0.8281 2.0   0.09 0.0081   2.2   0.29 0.0841 0.8   -1.11 1.2321   2.5   0.59 0.3481 1.2   -0.71 0.5041   3.1   1.19 1.4161 2.1   0.19 0.0361   1.7   -0.21 0.0221 2.2   0.29 0.0841   3.3   1.39 1.9321    

Standard deviation = 0.864 seconds

3. The  original  question  posed  by  researchers  was,  "Does  writing  texts  while  driving  impair  a  person’s  ability  to  react   to  hazards  on  the  road?"  Based  on  the  centers  and  spreads  of  the  two  data  sets,  what  conclusions  can  you  make?  

Answers will vary.

Sample answer: The center of the data for “Reaction Time when Writing a Text Message” is higher than the center of the “Reaction Time with No Distractions” data. So, in general, it seems that it takes longer for a person to react to hazards in the roadway while writing a text message than when driving without any distractions.

The “Reaction Time when Writing a Text Message” data is more spread out. This suggests that writing a text message while driving has different effects on reaction time for different drivers. But from the boxplots you can see that the fastest reaction time when writing a text message is still greater than the average reaction time with no distractions.

So, the researchers can conclude that writing texts while driving does seem to increase the amount of time it takes a person to react to hazards on the roadway.

(9)

LESSON 2: CONSOLIDATION ACTIVITY

Scientists  treat  clouds  with  certain  chemicals,  such  as  silver  nitrate,  to  try  to  change  the  amount  of  precipitation  the  clouds   release.  This  process  is  called  seeding.  

Researchers  conducted  an  experiment  to  determine  the  effectiveness  of  cloud  seeding.  They  chose  52  clouds  and  randomly   assigned  26  of  the  clouds  for  treatment  with  silver  nitrate.  Then  they  measured  the  rainfall  (in  acre-­‐feet)  produced  by  each   of  the  52  clouds.  Here  is  the  data  they  collected:  

 

Unseeded     Seeded  

Median  =  44.2  acre-­‐feet     Median  =  221.6  acre-­‐feet  

Mean  =  164.6  acre-­‐feet     Mean  =  442.3  acre-­‐feet  

IQR  =  138.6  acre-­‐feet     IQR  =  337.6  acre-­‐feet  

Standard  deviation  =  278.4  acre-­‐feet     Standard  deviation  =  650.8  acre-­‐feet  

 

1. What  do  the  shapes  of  the  parallel  boxplots  tell  you?  

The data are not symmetric. Both boxplots indicate that the data is skewed right.

2. Which  measure  of  center  best  represents  the  “typical”  data  value  for  each  data  set?  

Since both data sets are skewed or contain outliers, the median best represents the center of the distributions.

3. How  do  the  medians  compare,  and  what  does  this  indicate  about  the  rainfall  distributions?      

The median of the seeded cloud data is higher than the median of the unseeded cloud data. This indicates that the amount of rainfall produced by the typical seeded cloud was greater than the rainfall amount produced by the typical unseeded cloud.

4. Which  measure  of  variability,  or  spread,  best  describes  each  data  set?  

The inter quartile range (IQR) best describes the spread of each data set since the data are skewed or contain outliers.

5. How  do  the  IQRs  compare,  and  what  does  this  indicate?  

The seeded IQR appears greater than the unseeded IQR; therefore, the distribution of rainfall from seeded clouds appears to have greater variability than the distribution of rainfall from unseeded clouds.

6. What  conclusions  can  you  make  based  on  the  shapes,  centers,  and  spreads  of  the  two  sets  of  data?  

The seeded clouds seem to produce a larger typical amount of rainfall than unseeded clouds. The data suggests that cloud seeding is an effective way to increase precipitation.

(10)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

LESSON 2: HOMEWORK

Notes  or  additional  instructions  based  on  whole-­‐class  discussion  of  homework  assignment:  

     

 

Homework  Assignment  

Part  I:     Complete  the  online  More  practice  in  the  topic  Comparing  distributions.   Part  II:   Complete  Lesson  2:  Staying  Sharp.  

As  you  complete  the  More  practice,  record  below  any  questions  you  may  have  or  challenges  you  encounter  with  the   items.  

(11)

LESSON 2: STAYING SHARP

Re vi ew in g   id ea s   fr om  e ar lie r  g ra de s   1. Solve.       !! 5 7= 3x 42         x = 10

2. When  Jason  put  gas  in  his  car,  gas  was  priced  at  $1.67   per  gallon.  If  he  spent  $18.37,  how  many  gallons  of  gas   did  he  put  in  his  car?  

11 gallons of gas Pr ep ar in g   fo r  u pc om in g   les so ns  

24  algebra  students  were  asked  how  much  time  they   spent  studying  for  class  last  night.  Here  is  what  they   reported.  

Student   (minutes)  Time     Student   (minutes)  Time  

1   30     13   20   2   15     14   30   3   30     15   45   4   45     16   30   5   60     17   20   6   45     18   20   7   15     19   15   8   30     20   30   9   30     21   45   10   45     22   45   11   60     23   60   12   60     24   30  

3. Complete  the  table.  

Time  spent   studying   Number  of   students   Less  than  30   minutes   6 Between  30  and  45   minutes   14 More  than  45   minutes   4  

4. What  percentage  of  students  spent  30  minutes  or  more   studying  last  night?  

  75% Fo cu s   ski ll  

Data  about  housing  prices  in  two  neighborhoods   were  collected.  The  summaries  of  the  data  are   shown  here.         Neighborhood   A   Neighborhood  B   Mean   $500,000   $350,000   Median   $350,000   $345,000   Standard   deviation   $100,000   $25,000   IQR   $80,000   $24,000      

5. Which  value  should  be  reported  as  the  "typical"  value   of  home  prices  in  each  neighborhood?  Why?  

The home prices in Neighborhood A seem to be skewed to the right since the mean is higher than the median. The best measure of center would be the median. The mean and median home prices in neighborhood B are about the same, meaning that either value would be representative.

6. What  does  the  spread  tell  you  about  the  home  prices?  

The home prices in Neighborhood A are more spread out. There are several very expensive homes in that neighborhood. The home prices in Neighborhood B are closer together. Most of the homes are around $25,000 of the mean home price.

 

(12)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

(13)

Lesson 3: Comparing distributions

LESSON 3: OPENER

Researchers  asked  15  women  and  15  men  to  report  their  yearly  salaries.  Here  are  the  data  they  collected:  

Women  

Salary  (in  thousands  of  dollars)  

  Men  

Salary  (in  thousands  of  dollars)  

31   22   40   48   27     32   43   57   38   52  

28   36   46   20   47     46   63   30   60   34  

44   32   52   43   24     35   54   49   40   56  

 

1. Calculate  the  five-­‐number  summary  of  the  women’s  salary  data.  

Minimum = 20 Q1 = 27 Median = 36 Q3 = 46 Maximum = 52

2. Calculate  the  five-­‐number  summary  of  the  men’s  salary  data.  

Minimum = 30 Q1 = 35 Median = 46 Q3 = 56 Maximum = 63

LESSON 3: CORE ACTIVITY

1. Construct  parallel  boxplots  of  the  women’s  and  men’s  salary  data.    

   

(14)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

2. Look  at  the  shape,  center,  and  spread  of  the  parallel  boxplots.  What  do  these  features  tell  you  about  the  data?   Determine  whether  each  statement  is  true  or  false.  

Both  distributions  are  symmetric.   True

The  center  of  the  women’s  data  is  higher  than  the  center  of  the  men’s  data.   False

Either  the  mean  or  median  is  the  best  measure  of  center  for  the  two  sets  of  data.   True

The  two  sets  of  data  seem  to  have  similar  spreads.   True

If  the  mean  is  chosen  as  the  measure  of  center,  then  the  interquartile  range  is  the  best  

measure  of  spread.   False

3. Which  measure  of  center  would  you  choose  to  report?  Find  this  measure  of  center  and  explain  your  choice.  

Answers will vary.

Mean (women’s salary) = $36 thousand Mean (men’s salary) = $45.9 thousand Median (women’s salary) = $36 thousand Median (men’s salary) = $46 thousand

4. Calculate  the  measure  of  spread  that  corresponds  to  the  measure  of  center  you  chose.  

Answers will vary. The measure of spread students choose to report depends on the measure of center they chose. If students chose to report the mean, they will find the standard deviation. If students chose to report the median, they will find the interquartile range.

Women’s  Salary     Men’s  Salary   Observation   Deviation   Squared  

deviation  

  Observation   Deviation   Squared   deviation   31   -5 25   32   -13.9 193.21 28   -8 64   46   0.1 0.01 44   8 64   35   -10.9 118.81 22   -14 196   43   -2.9 8.41 36   0 0   63   17.1 292.41 32   -4 16   54   8.1 65.61 40   4 16   57   11.1 123.21 46   10 100   30   -15.9 252.81 52   16 256   49   3.1 9.61 48   12 144   38   -7.9 62.41 20   -16 256   60   14.1 198.81 43   7 49   40   -5.9 34.81 27   -9 81   52   6.1 37.21 47   11 121   34   -11.9 141.61 24   -12 144   56   10.1 102.01

Standard deviation (women) = $10.5 thousand Standard deviation (men) = $10.8 thousand

The IQR for the women’s salary data is 46,000 – 27,000 = 19,000. The IQR for the men’s salary data is 56,000 – 35,000 = 21,000.

(15)

5. Based  on  your  analysis,  what  conclusions  can  you  draw?  

Answers will vary.

Sample answer: The center of the data collected from men is higher than the center of the data collected from women. However, the variability in the two data sets is similar.

The data suggest that there is a difference between the men’s and women’s salaries. It seems that women make less money than men.

LESSON 3: REVIEW MID-UNIT ASSESSMNET

Today  you  will  review  your  mid-­‐unit  assessment.    

(16)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

LESSON 3: HOMEWORK

Notes  or  additional  instructions  based  on  whole-­‐class  discussion  of  homework  assignment:  

     

 

A  new  method  for  studying  mathematics  was  tested  with  20  freshmen  enrolled  in  an  algebra  class.  Their  average  test   scores  before  the  students  used  the  new  method  and  after  they  used  the  new  method  are  listed  in  this  table.  

Student   Before   After  

 

Student   Before   After  

A   72   80  

 

K   85   85   B   81   85  

 

L   91   87   C   79   79  

 

M   52   60   D   90   92  

 

N   80   83   E   95   93  

 

O   74   75   F   87   89  

 

P   77   77   G   60   67  

 

Q   61   60   H   65   72  

 

R   88   85   I   74   92  

 

S   86   82   J   66   70  

 

T   70   75    

1. Construct  two  histograms,  one  for  exam  scores  before  the  new  method  and  one  for  exam  scores  after  the  new   method.  Label  your  histograms  “Before”  and  “After.”  

Answers:

                     

2. Describe  the  data  sets.  Be  sure  to  compare  the  centers  and  spreads  of  the  two  data  sets.  

Answers will vary. Since both data sets are roughly symmetric, students could choose to report either the mean or median.

Sample answer: The Before scores have a mean of 76.65, while the After scores have a mean of 79.4. The standard deviation of the Before scores is 11.72, while the standard deviation of the After scores is 9.90. The shapes of both graphical representations are approximately symmetrical and bell shaped, with peaks in the 70 to 80 range.

3. Based  on  the  histograms  you  constructed  and  your  comparison  of  the  shapes,  centers,  and  spreads  of  the  data  sets,   should  the  new  method  for  studying  mathematics  be  implemented  in  the  high  school?  Explain  your  reasoning.  

Answers will vary.

Sample answer: Looking at the graphs, there don't seem to be any major differences in the two data sets. Also, the means and standard deviations are relatively the same. The new study method does not appear to improve test scores.

(17)

LESSON 3: STAYING SHARP

Re vi ew ing  idea s   fr om  e ar lie r  g ra de s   1. Solve.       !! 21 27= a 18       a = 14

2. A  gallon  of  milk  weighs  8  pounds.  A  shipping  container   of  milk  has  a  weight  of  20  pounds.  How  many  gallons  of   milk  are  in  the  shipping  container?  

2.5 gallons of milk Pr ep ar in g   fo r  u pc om in g   le ss on s  

50  men  and  50  women  were  asked  whether  they  agreed   or  disagreed  with  the  statement  “I  use  math  every  day   at  my  job.”  The  table  shows  their  responses.  

 

  Men   Women   Agree   40   35  

Disagree   10   15      

3. What  percentage  of  all  the  people  surveyed  said  they   agreed  with  the  statement  “I  use  math  every  day  at  my   job?”     75%        

4. What  percent  of  men  said  they  agreed  with  the   statement  “I  use  math  every  day  at  my  job?”  

80% Fo cu s   ski ll  

Data  about  housing  prices  in  two  neighborhoods   were  collected.  The  summaries  of  the  data  are   shown  here.         Neighborhood   A   Neighborhood  B   Mean   $500,000   $350,000   Median   $350,000   $345,000   Standard   deviation   $100,000   $25,000   IQR   $80,000   $24,000    

5. What  can  you  conclude  about  the  home  prices  in  each   neighborhood?  

The typical home in each neighborhood costs about the same, even though there are a few houses in Neighborhood A that are very expensive.

Neighborhood A has greater variability in home prices than Neighborhood B.

6. If  you  were  selling  your  house,  which  neighborhood   would  you  like  to  be  living  in?  If  you  were  buying  a   house,  which  neighborhood  would  you  want  to  buy  in?   Explain.  

Answers will vary. Some students may say to sell in Neighborhood A because there is a greater chance to make money. Some may say to buy in Neighborhood B because the prices are more predictable.

(18)

Copyright  ©  2017  Charles  A.  Dana  Center  at  the  University  of  Texas  at  Austin,  Learning  Sciences  Research  Institute  at  the  University  of  Illinois  at  Chicago,  Agile  Mind,  Inc.  

References

Related documents

This conclusion is further supported by the following observations: (i) constitutive expression of stdE and stdF in a Dam + background represses SPI-1 expression (Figure 5); (ii)

Working hours are a major feature of studies of working time. Yet, working hours in Labour Force Surveys and similar large-scale surveys are not measured in the same way,

In our 2021 State of the Data Center Report, we found that more than half of respondents (58%) reported noticing a trend for organizations to move away from the public cloud and

As the analysis will show, the answer turns out to depend on the specific measure of absorptive capacity we look at, and which particular Arab country we are evaluating, but

This result is in good agreement with other measurements elsewhere of the strontium ion clock transition frequency [29], and when both statistical and systematic errors are

Despite these moves, the approach taken by the three main migration systems (cases No.2, 3 and 4 in Table 1) remains appreciably different. That is what we shall see now in the

Cyber  Analysis:   The  art  of  human-­led  analysis  of  security   and  non-­security  related  data  from  logical  and  physical   domains  in  order  to

Making sacramental wine requires special attention and care, starting with qvevri washing and marani hygiene and ending with fermentation, aging and storage. During