10
syllabus
syllabus
rref
efer
erence
ence
Strand:
Statistics and probability
Core topic:
Data collection and
presentation
In this
In this
cha
chapter
pter
10A Calculating and
interpreting the mean
10B Mean, from frequency
distribution tables
10C Mean, from grouped data
10D Median and mode
10E Best summary statistics
10F Range and interquartile
range
10G Standard deviation
10H Comparing sets of data
Describing,
exploring and
comparing data
Introduction
Archie is an archeologist. He is passionate about his job, which involves digging for buried artefacts, classifying his findings and piecing them together to unravel and record the history of past civilizations. Imagine his excitement when he uncovered a site of buried skulls in Egypt!
Further investigation confirmed that these were male skulls which had orig-inated from a race residing in Egypt. He was keen to place their existence in time. Delving into existing records, he uncovered measurements on male Egyptian skulls recorded for two time periods – one around 4000 BC and the other around AD 150.
These measurements confirmed a change in skull shape over the time period and this was taken as evidence of interbreeding of the Egyptians with migrant populations over the years. If Archie compared the measurements on record with those he made on his recently excavated skulls, he could possibly identify a time in history when this race existed.
The measurements of male Egyptian skulls on record for 4000 BC and AD 150 were:
1. breadth of skull 2. height of skull and 3. length of skull.
The recorded data for the measurements (in mm) of 30 male Egyptian skulls are collated in the table on the following page.
Where should Archie start? Statistical techniques enable us to summarise sets of data, which can then be compared. If Archie can summarise these two data sets, he could then compare them with his own measurements.
In this chapter, we shall investigate the main methods available to describe data sets such as these. These methods employ measures of central tendency, in particular the mean, median and mode. We shall also examine the range and interquartile range, the standard deviation, and stem plots and boxplots. We shall then see how these measures can be used to compare sets of data.
In the previous chapter we investigated boxplots as a tool for comparing data sets. We now explore this tool further, endeavouring to place Archie’s skull at some period in history. Combining this with other statistical tools may enable us to provide a solution for Archie.
Height
Breadth
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
383
4000 BC AD 150
Breadth Height Length Breadth Height Length
131 138 89 137 123 91
125 131 92 136 131 95
131 132 99 128 126 91
119 132 96 130 134 92
136 143 100 138 127 86
138 137 89 126 138 101
139 130 108 136 138 97
125 136 93 126 126 92
131 134 102 132 132 99
134 134 99 139 135 92
129 138 95 143 120 95
134 121 95 141 136 101
126 129 109 135 135 95
132 136 100 137 134 93
141 140 100 142 135 96
131 134 97 139 134 95
135 137 103 138 125 99
132 133 93 137 135 96
129 136 96 133 125 92
132 131 101 145 129 89
126 133 102 138 136 92
135 135 103 131 129 97
134 124 93 143 126 88
128 134 103 134 124 91
130 130 104 132 127 97
138 135 100 137 125 85
128 132 93 129 128 81
127 129 106 140 135 103
131 136 114 147 129 87
124 138 101 136 133 97
You may not be familiar with some of the following statistical terms. We shall investigate them further, in this chapter.
1 A set of test results is shown below. 8, 3, 6, 4, 5, 4, 9, 7, 4, 6, 5
a Arrange the scores in ascending order. b How many scores are in the set?
c In what position does the middle score lie?
d What is the value of the middle score (the median)? e What is the range of the data?
f Calculate the average (mean).
g How many scores are below the mean? How many above?
h Give the most frequently occurring score (mode) of the set of data.
i Comment on any difference in value between the mean, median and mode. j Determine values for the lower and upper quartiles.
2 The mean, median and mode are measures of ‘central tendency’. Explain what this term ‘central tendency’ means.
3 The spread of the scores can be determined using a number of statistical measures. Name some measures of ‘spread’ with which you are familiar.
4 What is the relationship between the median and the quartiles? 5 In a boxplot, which of the following are true?
a The quartiles divide the data into four sections of equal length.
b The median is the score with an equal number of data values above it and below it. c If the ‘whiskers’ are longer than the ‘box’, it means that there are more scores in the
whiskers than there are in the box.
d The whole ‘box’ contains the same number of scores as the two ‘whiskers’ together. e It is possible to calculate the mean of the set of data by observing the values in the
boxplot.
6 For those statements in question 5 that are incorrect, explain why this is so. Adjust the statements to make them correct.
Calculating the mean
If you were to survey a group of people about what they believe is meant by the word ‘average’, you would find a variety of answers.
When looking at a set of statistics we are often asked for the average. The average is a figure that describes a typical score. In statistics, the correct term for the average is the mean. The mean is the first of three measures of central tendency that we shall be studying. The others are the median and the mode.
The statistical symbol for the mean is x–. The formula for the mean is
x
–=
∑
xn
---C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
385
In mathematics, the symbol Σ (sigma) means sum or total, x represents each individual score in a list and Σx is therefore the sum of the scores. The sum is divided by n, which represents the number of scores.
A graphics calculator can be used to calculate and display many statistical functions. There are several brands of graphics calculator, but the Texas Instrument T83 will be the model referred to in illustrations. Other brands of calculator allow calculations and displays with similar instructions. Many of the exercises lend themselves to either manual working or graphics calculator use.
Find the mean of the scores 17, 16, 13, 15, 16, 20, 10, 15.
THINK WRITE
Find the total of all scores. Total = 17 + 16 + 13 + 15 + 16 + 20 + 10 + 15 Σx = 122
Divide the total by 8 (the number of scores).
Mean =
x– = 15.25 1
2 122
8 --- Σx
n ---
1
WORKED
E
xample
Calculate the mean of the set of data below, using a graphics calculator. 10, 12, 15, 16, 18, 19, 22, 25, 27, 29
THINK WRITE/DISPLAY
Enter the data in L1.
(Press , select 1:Edit... and press to access the screen.)
Calculate the mean. (a) Press .
(b) Highlight CALC in the top line.
(c) Highlight 1:1–Var Stats and press . (d) Type L1 and press .
(e) A number of values are given. The top entry = 19.3 gives us the mean.
= 19.3 1
STAT ENTER
2
STAT
ENTER ENTER
x
x
2
WORKED
E
xample
Interpreting the mean
When we use the mean, we are attempting to represent the central value of the data. Let us investigate what affects its value. Consider five scores: 1, 2, 3, 4 and 5. The value of the mean is the total (15), divided by the number of scores (5). The answer, 3, clearly lies in the centre. What would be the value of the mean if the last score had been 20 instead of 5? The answer of 6, where only one score lies above the mean and four lie below it, clearly demonstrates the influence of extreme values on the mean. Since the calculation takes into account the values of all scores, a check must be applied to deter-mine whether the resulting value is a reasonable representation of the centre of the data.
Calculating and interpreting
the mean
Use a graphics calculator or manual working for the following. 1 Copy and complete the following:
Another word commonly used for ‘mean’ is __________. The mean is calculated by finding the __________ of the scores, then dividing by the __________ of scores. The mean is a measure of __________ tendency. Two other measures are __________ and __________.
2 Calculate the mean of each of the following sets of scores. a 4, 8, 3, 5, 5
b 16, 24, 30, 35, 23, 11, 45, 28 c 65, 92, 56, 84
d 9.2, 9.7, 8.8, 8.1, 5.6, 7.5, 8.5, 6.4, 7.0, 6.4 e 356, 457, 182, 316, 432, 611, 299, 355
3 Majid sits for five tests in mathematics. His percentages on the tests were 45%, 90%, 67%, 86% and 75%. Calculate Majid’s mean percentage on the five tests. How many of his percentages were above the mean, and how many below?
remember
1. The mean is the statistical term for ‘average’.
2. The mean is calculated by adding all scores then dividing by the number of scores. That is,
x – =
3. As a measure of central tendency, the mean represents a value for the ‘centre’ of the scores.
4. Check to determine the number of scores above and below the mean. 5. The value of the mean is affected by extremes in scores.
6. Remember to include correct units in your final answer. x
∑
n
---remember
10A
WORKED Example
1
WORKED Example
2
SkillS
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
387
4 An oil company surveys the price of petrol in eight Brisbane suburbs. The results are listed below.
Manly 76.9 c/L Kenmore 72.9 c/L
Bardon 73.4 c/L Nundah 70.9 c/L
Springwood 72.3 c/L Mansfield 75.8 c/L
Oxley 73.9 c/L Boondall 71.1 c/L
Based on these results, calculate the mean price of petrol in cents per litre in Brisbane. Is this mean a realistic representation of the central value? Explain.
5 The seven players on a netball team have the following heights: 1.65 m, 1.81 m, 1.75 m, 1.78 m, 1.88 m, 1.92 m and 1.86 m. Calculate the mean height of the players on this team, correct to 2 decimal places. How many of the players have heights above the mean height?
6 A golf ball manufacturer randomly tests the mass of 10 golf balls from a batch. The batch will be considered satisfactory if the average mass of the balls is between 44.8 g and 45.2 g. The masses, in grams, of those tested are:
45.19, 45.06, 45.35, 44.78, 45.47, 44.68, 44.95, 45.32, 44.60, 44.95. a Will the batch be passed as satisfactory?
b Which ball has a mass which is furthest from the mean — the lightest one or the heaviest one?
7 Consider the five values, 1, 2, 3, 4 and 5. The mean is calculated as 3. a What happens to the value of the mean if 10 is added to each score? b What effect does multiplying each score by 10 have on the mean’s value?
Means of skull measurements
Refer to the table of skull measurements for 4000 BC and AD 150 displayed earlier in the chapter.
1 Using the breadth, height and length measurements (in mm) for 4000 BC, calculate the mean for each set of data.
2 Draw up the table shown below and include the means calculated above. The means for the corresponding measurements for AD 150 have been included for comparison.
3 Note the difference between the means for the 4000 BC measurements and the corresponding ones for AD 150. Do you notice a trend?
4 Examine the breadth data for 4000 BC. How many scores are above the mean? How many scores are below the mean?
5 Examine the height and length data sets for 4000 BC and determine the number of scores above and below the mean in each set.
6 In your opinion, does the mean appear to represent a value close to the centre of each data set?
inve
stigatio
n
in
ve sti
ga
ti o
n
4000 BC AD 150
Breadth Height Length Breadth Height Length
Mean 136.2 130.3 93.5
Frequency distribution tables
In the last section, we dealt with easily manageable quantities of data. However, more commonly we are confronted with the task of processing much larger data sets. Making sense of large quantities of data is best achieved by using a frequency distribution table. The headings for this table are Score (x), Tally (optional), Frequency (f) and a fourth column, (fx), which contains the score (x) multiplied by the frequency (f). The total of this fourth column indicates the total of all the scores. The mean is then calcu-lated by dividing this total of all scores by the sum of the frequency column (which represents the total number of scores). Written as a formula, this is:
x–=
∑
fx f∑
---Complete the frequency table below, then calculate the mean.
Score (x) Tally Frequency (f) fx
4 | | | 5 | | | | | | 6 | | | | | | | | | 7 | | | | | | | | | | | 8 | | | | | | | | 9 | | | | |
Σf= Σfx=
THINK WRITE
Complete the frequency column from the tally column.
Complete the fx column by multiplying each score by the frequency.
Sum the frequency column. Sum the fx column.
Use the formula to calculate the mean.
x – =
x= x= 6.76 1
2
3 4
Score
(x) Tally
Frequency
(f) fx
4 | | | 3 12
5 | | | | | | 7 35 6 | | | | | | | | | 11 66 7 | | | | | | | | | | | 13 91 8 | | | | | | | | 10 80 9 | | | | | 6 54
Σf=50 Σfx=338
5
∑
fxf
∑
---338 50
---3
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
389
To enlist the aid of a graphics calculator in determining the mean in worked example 3: 1. Enter the data.
(a) To clear any previous equations press and clear any functions. (b) Press , select 1:EDIT and
press .
(c) Enter the scores in L1 and the frequencies in L2.
2. Set up the calculator to calculate the mean. (a) Press , select CALC, then
the 1-Var Stats option. Type L1 and
L2 separated by a comma.
(b) Press to display the number of statistical measures.
(c) Amongst other statistical data you can read off the number of scores, the sum of the scores and the mean.
Graphics Calculator
Graphics Calculator
tip!
tip!
Calculating the mean
Y= STAT ENTER
STAT
ENTER
remember
1. The mean for a large number of scores is generally calculated from a frequency distribution table. A graphics calculator can also be used.
2. The formula for the mean is
x–=
∑
fxf
∑
---remember
Mean, from frequency
distribution tables
1 Using our skull measurements for breadth for 4000 BC, draw up a frequency distribution table as shown below. The tallies for each score have been included. Copy and complete the frequency (f) column and the (fx) column; total the last two columns; then calculate the mean. Notice that its value is the same as that calculated before, using the individual scores.
a
Same value Same value
as n. as total
of scores.
b x– =
= ———?
Breadth (x) Tally Frequency (f) fx
119 |
124 |
125 | |
126 | |
127 |
128 | |
129 |
130 |
131 | | | |
132 | | |
134 | | |
135 | |
136 |
138 | |
139 | |
141 |
Σf = Σfx =
10B
WORKED Example
3
E
XCEL
Spreadshe
et
Mean
fx
∑
f
∑
---C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
391
2 A class’s marks (out of 10) on a spelling test are recorded in the frequency table below.
a Copy and complete the table.
b Use the formula to calculate the class’s mean.
c How many scores are greater than the mean?
3 An electrical store records the number of television sets sold each week over a year. The results are shown in the table below.
a Copy and complete the table.
b Calculate the mean number of television sets sold each week over the year. Give your answer correct to one decimal place.
Score (x) Tally Frequency (f) fx
4 | |
5 | | | |
6 | | | |
7 | | | | | | | |
8 | | |
9 | | | |
10 | |
Σf= Σfx=
No. of television sets sold (x)
No. of
weeks (f) fx
16 4
17 4
18 3
19 6
20 7
21 12
22 8
23 2
24 4
25 2
Σf= Σfx= Mean
∑
fxf
∑
---=
EXCEL Spreadshe
et
4 In a soccer season a team played 50 matches. The number of goals scored in each match is shown in the table below.
a Redraw this table in the form of a frequency distribution table.
b Use your table to calculate the mean number of goals scored each game.
c By calculating the number of scores below and above the mean, decide whether its value is suitable as a measure of central tendency. Justify your decision.
5 A clothing store records the dress sizes sold during a day. The results are shown below.
12 14 10 12 8 12 16 10 8 12 10 12 18 10 12 14 16 10 12 12 12 14 18 10 14 12 12 14 14 10
a Present this information in a frequency table. b Calculate the mean dress size sold this day. c Comment on your answer.
6
There are eight players in a Rugby forward pack. The mean mass of the players is 104 kg. The total mass of the forward pack is:
7
A small business employs five people on a mean wage of $380 per week. A manager is then employed and receives $500 per week. What is the mean wage of the six employees?
8
The mean height of five starting players in a basketball match is 1.82 m. During a time out, a player who is 1.78 m tall is replaced by a player 1.88 m tall. What is the mean height of the players after the replacement has been made?
Grouping data and using grouped data
In some cases, the range of data values is so great that grouping the data into classes makes the data more manageable. For example, consider the following data set of people with ages ranging from 25 to 49. We might group the ages in intervals of 5 in the form 25–29, 30–34 etc. This means that all the values (25, 26, 27, 28 and 29) would be grouped in one class. The centre of this class would be 27, and this is the value usedNo. of goals 0 1 2 3 4 5
No. of matches 4 9 18 10 5 4
A 13 kg B 104 kg C 112 kg D 832 kg
A $380 B $400 C $480 D $2400
A 1.78 m B 1.82 m C 1.84 m D 1.88 m
Mathc
ad
Mean
m
multiple choiceultiple choice
m
multiple choiceultiple choice
m
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
393
as the score (x). This class centre is then multiplied by the frequency, (f). In this case, the value obtained for the mean is an estimate rather than an exact value. Sometimes the choice of the size of the class intervals also has an effect on the accuracy of the mean.
Complete the frequency distribution table and use it to estimate the mean of the distribution.
Class Class centre (x) Tally Frequency (f) fx
25–29 | | | |
30–34 | | | | | | | |
35–39 | | | | | | | | | | |
40–44 | | | | | | | | | |
45–49 | | | | | |
Σf= Σfx=
THINK WRITE
Calculate the class centres.
Complete the frequency column from the tally column.
Multiply each class centre by the frequency to complete the fx column.
Sum the frequency column.
Sum the fx column.
Use the formula to calculate the mean.
x – =
x =
x = 38 1
2
3
Class
Class centre
(x) Tally
Frequency
(f) fx
25–29 27 | | | | 4 108
30–34 32 | | | | | | | | 9 288
35–39 37 | | | | | | | | | | | 13 481
40–44 42 | | | | | | | | | | 12 504
45–49 47 | | | | | | 7 329
Σf=45 Σfx=1710 4
5
6
∑
fxf
∑
---1710 45
---4
WORKED
E
xample
A graphics calculator can be used to calculate the mean from a grouped data frequency distribution. In such cases, the class centre can be entered as L1 and the frequency as L2. Remember to set up the 1-Var Stats to recognise these two lists (Xlist: L1 and Freq: L2).
Mean, from grouped data
1 a Using our skull measurements for breadth for 4000 BC (shown previously as an ungrouped frequency distribution), draw up the table below, using class intervals 119–121, 122–124 etc. Complete the columns and calculate the mean.
b Does the mean differ from the two previous calculations? Explain any difference.
Compare with Σfx from exercise 10B, question 1. x
– = = __________? Class Class centre
(x)
Tally Frequency (f)
fx
119–121 120
122–124
125–127
128–130
131–133
134–136
137–139
140–142
Σf = Σfx =
Graphics Calculator
Graphics Calculator
tip!
tip!
Calculating the mean
from grouped data
remember
1. The mean is the statistical term for average.
2. The mean is calculated by adding all scores then dividing by the number of scores.
3. When calculating the mean from a frequency distribution table, a column for frequency × score (fx) is added. The mean is then calculated using the formula x
– = .
4. If the frequency distribution uses grouped data, the fx column is calculated using class centres for the x-value.
5. The mean can also be calculated using a graphics calculator. fx
∑
f
∑
---remember
10C
WORKED Example
4
fx
∑
f
∑
---C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
395
2 The table below shows a set of class marks on a test out of 100.
a Copy and complete the frequency distribution table. b Use the table to calculate the mean class mark. c In which class interval does the mean lie?
3 In the heats of the 100-m freestyle at a swimming meet, the times of the swimmers were recorded in the table below.
a Copy and complete the frequency distribution table. b Use the table to calculate the mean time.
c How many swimmers swam faster than this mean time?
4 A cricketer played 50 innings in test cricket for the following scores.
23 65 8 112 54 0 84 12 21 4 25 105 74 40 1 15 33 45 21 47
16 70 22 33 21 8 34 36 5 7
69 104 57 78 158 0 51 16 6 16
0 49 0 14 28 52 21 3 3 7
a Put the above information into a frequency distribution table using appropriate groupings. b Use the table to estimate the batting average
for this player.
c Repeat the exercise using a different size class interval. Compare your answers. Class
Class centre
(x) Tally
Frequency
(f) fx
31–40 |
41–50 | | |
51–60 | | | |
61–70 | | | | | |
71–80 | | | | | | | | |
81–90 | |
91–100 | |
Σf= Σfx=
Time Class centre (x) No. of swimmers (f) fx
50.01–51.00 4
51.01–52.00 12
52.01–53.00 23
53.01–54.00 38
54.01–55.00 15
55.01–56.00 3
5 Use the statistics function on your calculator to find the mean of each of the following scores, correct to 1 decimal place.
a 11, 15, 13, 12, 21, 19, 8, 14 b 2.8, 2.3, 3.6, 2.9, 4.5, 4.2
c 41, 41, 41, 42, 43, 45, 45, 45, 45, 46, 49, 50
6 Use your calculator to find the mean from each of the following tables.
7 The table below shows the heights of a group of people.
Calculate the mean of this distribution.
8 Seventy students were timed on a 100-m sprint during their P.E. class. The results are shown in the table below.
a Calculate the class centre for each group in the distribution. b Use your calculator to find the mean of the distribution.
a Score Frequency b Score Frequency
3 7 28 5
4 10 29 18
5 18 30 25
6 19 31 25
7 38 32 14
8 27 33 10
9 10 34 3
10 5
Height Class centre Frequency
150–154 152 7
155–159 157 14
160–164 162 13
165–169 167 23
170–174 172 24
175–179 177 12
Time (s) 12 to <13 13 to <14 14 to <15 15 to <16 16 to <17
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
397
9 A drink machine is installed near a quiet beach. The number of cans sold over the first 10 weeks after its installation is shown below.
4 39 31 31 50 43 70 45 57 71 18 26 3 52 51 59 33 51 27 62 30 90 3 30 97 59 33 44 99 62 72 6 42 83 19 49 11 6 63 4 53 20 45 58 1 9 79 41 2 33 97 71 52 97 69 83 39 84 92 43 71 98 8 97 18 89 21 9 4 17
a Put this information into a frequency distribution table using the classes 1–10, 11–20, 21–30 etc.
b Calculate the mean number of cans sold per day over these 10 weeks.
c Using the raw data above, calculate the number of days on which the sales were greater than the mean.
Median and mode
So far we have used the mean as a measure of the typical score in a data set. Consider the case of someone who is analysing the typical house price in an area. On a particular day, five houses are sold in the area for the following prices:
$175 000 $149 000 $160 000 $211 000 $850 000
For these five houses the mean price is $309 000. The mean is much greater than most of the houses in the data set. This is because there is one score which is much greater than all the others. For such data sets, we need to use a different measure of central tendency.
Median
The median is the middle score in a data set (of n scores), when all scores are arranged in order. If the data set consists of an odd number of scores, there is one score which lies exactly in the middle. For a data set consisting of an even number of scores, the median will always occur half way between two scores.
Work
SHEET
10.1
Using single scores
The position of the median can be found using the formula:
Median position = th score
The median becomes more complicated when there is an even number of scores because there are two scores in the middle. When there is an even number of scores, the median is the average of the two middle scores.
Median from a frequency distribution table of ungrouped data
The median can be calculated from a frequency distribution table if we extend the table by adding a cumulative frequency column. This column ‘cumulates’ or totals the fre-quencies as we descend the rows. It is then possible to determine which scores are in each position. Consider the frequency distribution table following.
n+1 2
---For the scores 3, 4, 8, 2, 2, 6, 9, 1, 6 calculate the median.
THINK WRITE
Rewrite the scores in ascending order. There are 9 scores here.
1, 2, 2, 3, 4, 6, 6, 8, 9
The median is the middle score, that is,
the th score.
Median = th score
Median= 5th score
Median= 4 1
2
n+1 2
---9+1 2
---5
WORKED
E
xample
Find the median of the scores 13, 13, 16, 12, 19, 18, 20, 18.
THINK WRITE
Write the scores in ascending order. 12, 13, 13, 16, 18, 18, 19, 20
There is an even number (8) scores, so average the two middle scores.
Median = th score.
Median = th score
= 4.5th score
that is, half way between 4th and 5th score. The 4th score is 16. The 5th score is 18.
Median =
Median= 17 1
2
n+1 2
---8+1 2
---16+18 2
---6
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
399
There are 30 scores in this distribution and so the middle two scores will be the 15th and 16th scores. By looking down the cumulative frequency column we can see that these scores are both 6. Therefore, 6 is the median of this distribution.
Score Frequency
Cumulative frequency
4 1 1 The 1st score is 4.
5 6 7 The 2nd–7th scores are 5.
6 9 16 The 8th–16th scores are 6.
7 8 24 The 17th–24th scores are 7.
8 4 28 The 25th–28th scores are 8.
9 2 30 The 29th and 30th scores are 9.
Find the median for the frequency distribution at right.
THINK WRITE
Redraw the frequency table with a cumulative frequency column.
There are 45 scores and so the middle score is the 23rd score.
Median = score
Median= score
Median= 23rd score
Median= 36 Look down the cumulative frequency
column to see that the 23rd score is 36.
Score Frequency
34 3
35 8
36 12
37 9
38 8
39 5
1
Score Frequency
Cumulative frequency
34 3 3
35 8 (3 + 8) 11
36 12 (11 + 12) 23
37 9 (23 + 9) 32
38 8 (32 + 8) 40
39 5 (40 + 5) 45
2 n+1
2
---45+1 2 ---3
7
WORKED
E
xample
Mode
There are many examples where neither the mean nor the median is the appropriate measure of the typical score in a data set.
Using single scores
Consider the case of a clothing store. It needs to re-order a supply of dresses. To know what sizes to order it looks at past sales of this particular style and gathers the following data:
8 12 14 12 16 10 12 14 16 18 14 12 14 12 12 8 18 16 12 14
For this data set the mean dress size is 13.2. Dresses are not sold in size 13.2, so this has very little meaning. The median is 13, which also has little meaning as dresses are sold only in even-numbered sizes.
What is most important to the clothing store is the dress size that sells the most. In this case size 12 occurs most frequently. The score that has the highest frequency is called the mode.
When two scores share the ‘highest’ frequency, that is, occur an equal number of times, both scores are given as the mode. In this situation the scores are bimodal. If all scores occur an equal number of times, then the distribution has no mode.
Mode from a frequency distribution table
To find the mode from a frequency distribution table, we simply give the score that has the highest frequency.
Find the mode of the scores below. 4, 5, 9, 4, 6, 8, 4, 8, 7, 6, 5, 4.
THINK WRITE
The score 4 occurs most often and so it is the mode. Mode = 4
8
WORKED
E
xample
For the frequency distribution at right state the mode.
THINK WRITE
The highest frequency is 14 which belongs to the score 17 and so 17 is the mode.
Mode = 17
Score Frequency
14 3
15 6
16 11
17 14
18 10
19 7
9
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
401
When a table is presented using grouped data, we do not have a single mode. In these
cases, the class with the highest frequency is called the modal class.
Median and mode
1 Copy and complete the following:
The median score is the __________ one, when the scores are __________ __________. The formula for the position of the median score is __________. For an even number of scores, it is the __________ of the two middle ones. When using a frequency distribution table, the median is obtained from the __________ __________ column.
2 The scores of seven people on a spelling test are given below.
5 6 5 8 5 9 8
Calculate the median of these marks.
3 Below are the scores of eight people who played a round of golf.
75 80 81 76 84 83 81 82
Calculate the median for this set of scores.
4 Find the median for each of the following sets
of scores.
a 3, 4, 5, 5, 5, 6, 9
b 5.6, 5.2, 5.4, 5.3, 5.8, 5.4, 5.3, 5.4
c 45, 62, 39, 88, 75
d 102, 99, 106, 108, 101, 103, 102, 105, 102, 101
5 A factory has 80 employees. Over a two-week period
the number of people absent from work each day was recorded and the results are shown below. 3, 1, 5, 4, 3, 25, 4, 2, 4, 5
a Calculate the median number of people
absent from work each day.
b Calculate the mean number of people
absent from work each day.
c Does the mean or the median give a better measure
of the typical number of people absent from work each day? Explain your answer.
remember
1. The median is the middle score in a data set or the average of the two middle scores. The scores must be arranged in order.
2. The median can be found using the cumulative frequency column of a frequency table.
3. The mode is the score that occurs the most. 4. Remember to include units in the final answer.
remember
10D
WORKED
Example
5
E
XCEL
Spreadshe et
Median
WORKED
Example
6
SkillS HEET
10.2
E
XCEL
Spreadshe et
Median DIY
6 The table below shows the number of cans of drink sold from a vending machine at a high school each day.
7 The table at right shows the number of accidents a tow truck attends each day over a three-week period.
Calculate the median number of accidents attended by the tow truck each day.
8 The table at right shows the number of errors made by a machine each day over a 50-day period. Calculate the median number of errors made by the machine each day.
9
There are 25 scores in a distribution. The median score will be the: A 12th score
B 12.5th score C 13th score
D average of the 12th and 13th scores. Score Frequency
Cumulative frequency
17 4
18 9
19 6
20 12
21 8
22 5
23 4
24 2
WORKED Example
7
a Copy and complete the frequency distribution table.
b Use the table to calculate the median number of cans of drink sold each day from the vending machine.
No. of
accidents No. of days
2 4
3 12
4 3
5 1
6 1
No. of errors
per day Frequency
0 9
1 18
2 13
3 6
4 3
5 1
m
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
403
10
For the scores 4, 5, 5, 6, 7, 7, 9, 10 the median is:
11
Consider the frequency table at right. The median of these scores is:
12 The table below shows the number of sick days taken by each worker in a small busi-ness.
a Copy and complete the frequency distribution table. b Calculate themedian class for this distribution. 13 Copy and complete the following:
The mode is the __________ __________ score. If two scores occur most frequently an equal number of times, we have two modes, and this is termed a __________ dis-tribution. In a frequency distribution table of grouped data, we generally do not attempt to find a single mode, but give the __________ __________.
14 For each of the following sets of scores find the mode. a 2, 5, 3, 4, 5
b 8, 10, 7, 10, 9, 8, 8 c 11, 12, 11, 15, 14, 13
d 0.5, 0.4, 0.6, 0.3, 0.2, 0.4, 0.6, 0.9, 0.4 e 110, 113, 100, 112, 110, 113, 110
15 Find the mode for each of the following. (Hint: Some are bimodal and others have no mode.)
a 16, 17, 19, 15, 17, 19, 14, 16, 17 b 147, 151, 148, 150, 148, 152, 151 c 2, 3, 1, 9, 7, 6, 8
d 68, 72, 73, 72, 72, 71, 72, 68, 71, 68 e 2.6, 2.5, 2.9, 2.6, 2.4, 2.4, 2.3, 2.5, 2.6
A 5 B 6 C 6.5 D 7
A 2
B 3
C 8
D 13
Days sickness Frequency
Cumulative frequency
0–4 10
5–9 12
10–14 7
15–19 6
20–24 5
25–29 3
30–34 2
m
multiple choiceultiple choice
m
multiple choiceultiple choice Score Frequency
1 12
2 13
3 8
4 7
5 5
EXCEL Spreadshe
et
Mode
WORKED Example
8
EXCEL Spreadshe
et
16 Use the tables below to state the mode of the distribution.
17 Use the frequency histogram below to state the mode of the distribution.
18 For each of the following grouped distributions, state the modal class.
19 The weekly wage (in dollars) of 40 people is shown below.
376 592 299 501 375 366 204 359 382 274 223 295 232 325 311 513 348 235 329 203 556 419 226 494 205 307 417 204 528 487 543 532 435 415 540 260 318 593 592 393
a Use the classes $200–$249, $250–$299, $300–$349 etc. to display the information in a frequency distribution table.
b From your table, calculate the median class.
WORKED Example
9 a Score Frequency b c
1 2
2 4
3 5
4 6
5 3
Score Frequency
5 1
6 3
7 5
8 8
9 5
10 3
Score Frequency
38 2
39 4
40 1
41 5
42 6
43 3
44 6
45 2
12 0 10 20 30
13 14 15 16 17 18 Score
Frequenc
y
40
19 20 5
15 25 35
a Class Frequency b
1–4 6
5–8 12
9–12 30
13–16 23
17–20 46
21–24 27
25–28 9
Class Frequency
1–7 3
8–14 8
15–21 9
22–28 25
29–35 12
36–42 11
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
405
1 Copy the frequency table above and complete the class centre column.
2 Complete the cumulative frequency column.
3 How many scores in the data set were above 30?
4 How many scores in the data set were 40 or less?
5 Is the data set an example of grouped or ungrouped data?
6 Draw a frequency histogram for the data set.
7 On your histogram draw a frequency polygon for this data set.
8 Calculate the mean of the data.
9 In which class would the median lie?
10 Which is the modal class?
Class Class centre Frequency
Cumulative frequency
1–10 5
11–20 15
21–30 29
31–40 37
41–50 11
1
Best summary statistics
Having now examined all three summary statistics, it is important to recognise when it is appropriate to use each one. In some circumstances, one summary statistic may be more appropriate than the others. For example, a shoe manufacturer notes that in a new style of sporting footwear:
mean size sold is 8.63 median size is 8.75 mode size is 9.
Summary statistics for
skull measurements
Looking back at the data on Egyptian skulls, we are now in a position to
summarise the measurements with respect to the mean, median and mode for each set.
1 Draw the table below. (The values for AD 150 have been included for comparison.)
2 For the time period 4000 BC:
a enter the values for the means calculated previously b calculate the median for each set
c determine the mode for each set.
3 Compare the figures you obtained for 4000 BC with the corresponding values for AD 150. Jot down comments in the final column.
4 Write a paragraph indicating what you feel has happened to the shape of the Egyptian skulls over the time period 4000 BC to AD 150.
inve
stigatio
n
in
ve sti
ga
ti o
n
4000 BC AD 150 Comment
Mean x– Breadth 136.2
Height 130.3
Length 93.5
Median 15.5th score
Breadth 137
Height 130
Length 94
Mode Breadth 137
Height 135
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
407
In this case, the mode is the most useful measure as the manufacturer needs to know which size sells the most. The mean and median are of less use to the manufacturer.
The term average is often used indiscriminately, being interpreted sometimes as the mean, sometimes as the median and sometimes as the mode. The figure that best sup-ports the cause of the author is the one which (unfortunately) tends to be promoted. We need to be aware of this, particularly when interpreting statistics. When we summarise and report statistical information, we need to act in a responsible manner and report figures that are not misleading.
For each of these examples you will need to think carefully about the relevance of each summary statistic in terms of the particular example.
Below are the wages of ten employees in a small business.
$220 $230 $290 $275 $265 $250 $1500 $220 $220 $240
a Calculate the mean wage. b Calculate the median wage. c Calculate the mode wage.
d Does the mean, median or mode give the best measure of a typical wage in this business?
THINK WRITE
a Total all the wages. a Total = $3710
Divide the total by 10. Mean = $3710 ÷ 10 = $371
b Write the wages in ascending order. b $220 $220 $220 $230 $240 $250 $265 $275 $290 $1500
Average the 5th and 6th score to find the median.
Median =
Median= $245
c $220 is the score that occurs most often and so this is the mode.
c Mode = $220
d The mean is larger than what is typical because of one very large wage, and the mode is the lowest wage and so this is not typical. Therefore, the median is the best measure.
d The median is the best measure of the typical wage as the mode is the lowest score, which is not typical, and the mean is inflated by the $1500 wage.
1
2
1
2 $240+$250
2
---10
WORKED
E
xample
Best summary statistics
1 There are ten houses in a street. A real estate agent values each house with the following results.
$150 000 $190 000 $175 000 $150 000 $650 000 $150 000 $165 000 $180 000 $160 000 $180 000 a Calculate the mean house valuation.
b Calculate the median house valuation. c Calculate the mode house valuation.
d Which of the above is the best measure of central tendency?
2 The table below shows the number of shoes of each size that were sold over a week at a shoe store.
a Calculate the mean shoe size sold. b Calculate the median shoe size sold. c Calculate the mode of the data set. d Which measure of central tendency
has the most meaning to the shoe store proprietor?
Size Frequency
4 5
5 7
6 19
7 24
8 16
9 8
10 7
remember
1. The three summary statistics are:mean — calculated by adding all scores, then dividing by the number of scores median — the middle score or average of the two middle scores (when scores are arranged in order)
mode — the score with the highest frequency.
2. Be careful when using the mean. One or two extreme scores can greatly increase or decrease its value.
3. When the mean is not a good measure of central tendency, the median is used. 4. The mode is the best measure in some examples where discrete data mean that
the mean and median may have very little meaning.
remember
10E
WORKED Example
10
Mathc
ad
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
409
3 The table below shows the crowds at football matches over a season.
a Calculate the mean crowd over the season. b Calculate the median class.
c Calculate the modal class.
d Which measure of central tendency would best describe the typical crowd at foot-ball matches over the season?
4
Mr and Mrs Yousef research the typical price of a large family car. At one car yard they find six family cars. Five of the cars are priced between $30 000 and $40 000, while the sixth is priced at $80 000. What would be the best measure of the price of a typical family car?
5 Thirty men were asked to reveal the number of hours they spent doing housework each week. The results are given below.
1 5 2 12 2 6 2 8 14 18
0 1 1 8 20 25 3 0 1 2
7 10 12 1 5 1 18 0 2 2
a Represent the data in a frequency distribution table. (Use classes 0–4, 5–9, 10–14 etc.) b Find the mean number of hours that the men spend doing housework.
c Find the median class for hours spent by the men at housework. d Find the modal class for hours spent by the men at housework.
6 The resting pulse rates of 20 female athletes were measured. The results are shown below.
50 62 48 52 71 61 30 45 42 48 43 47 51 52 34 61 44 54 38 40
a Represent the data in a frequency distribution table using appropriate groupings. b Find the mean of the data.
c Find the median class of the data. d Find the modal class of the data.
e Comment on the similarities and differences between the three values.
Crowd Class centre Frequency
10 000 to <20 000 15 000 95
20 000 to <30 000 25 000 64
30 000 to <40 000 35 000 22
40 000 to <50 000 45 000 15
50 000 to <60 000 55 000 3
60 000 to <70 000 65 000 0
70 000 to <80 000 75 000 1
A Mean B Median C Mode D All are equally important.
m
multiple choiceultiple choice
7 The following data give the age of 25 patients admitted to the emergency ward of a hospital.
18 16 6 75 24 23 82 74 25 21 43 19 84 72 31 74 24 20 63 79 80 20 23 17 19
a Represent the data in a frequency distribution table. (Use classes 1–15, 16–30, 31–45, etc.)
b Find the mean age of patients admitted.
c Find the median class of age of patients admitted. d Find the modal class for age of patients admitted.
e Do any of your statistics (mean, median or mode) give a clear representation of the typical age of an emergency ward patient?
f Give some reasons that could explain the pattern of the distribution of data in this question.
8 The batting scores for two cricket players over six innings are as follows:
Player A 31, 34, 42, 28, 30, 41 Player B 0, 0, 1, 0, 250, 0
a Find the mean score for each player.
b Which player appears to be better if the mean result is used? c Find the median score for each player.
d Which player appears to be better when the decision is based on the median result? e Which player do you think would be more useful to have in a cricket team and
why? How can the mean result sometimes lead to a misleading conclusion?
9 The following frequency table gives the number of employees in different salary brackets for a small manufacturing plant.
a Workers are arguing for a pay rise but the management of the factory claims that workers are well paid because the mean salary of the factory is $22 100. Is this a sound argument?
b Suppose that you were representing the factory workers and had to write a short submission in support of the pay rise. How could you explain the management’s claim? Provide some other statistics to support your case.
Position Salary ($) No. of employees
Machine operator 18 000 50
Machine mechanic 20 000 15
Floor steward 24 000 10
Manager 62 000 4
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
411
Wage rise
The workers in an office are trying to obtain a wage rise. In the previous year, the ten people who work in the office received a 2% rise while the company CEO received a 42% rise.
1 What was the mean wage rise received in the office last year? 2 What was the median wage rise received in the office last year? 3 What was the modal wage rise received in the office last year?
4 The company is trying to avoid paying the rise. What statistic do you think they would quote about last year’s wage rises? Why?
5 What statistic do you think the trade union would quote about wage rises? Why?
6 Which statistic do you think is the most ‘honest’ reflection of last year’s wage rises? Explain your answer.
Summary statistics for house prices
Quoting different averages can give different impressions about what is normal. Try the following task.1 Visit a local real estate agent and study the properties for sale in the window. Alternatively, retrieve the for-sale ads for a real estate company from the newspaper.
2 Calculate the mean, median and mode price for houses in the area.
3 If you were a real estate agent and a person wanting to sell his/her home asked what the typical property sold for in the area, which figure would you quote? 4 Which figure would you quote to a person who wanted to buy a house in the area?
Best summary statistics and
comparison of samples
For this investigation, work in groups of 3 to 6 students.Examine each of the following statistics.
• The typical mark in maths among Year 11 students.
• The number of attempts taken by Years 11 and 12 students to get their driver’s licence.
• The typical number of days taken off school by Year 11 students so far this year. 1 For each of the above, gather your data by selecting a random sample.
2 Calculate the mean, median and mode for each topic.
3 Compare your results with the results of other students who will have selected their samples from the same population.
4 In each case, state the best summary statistic and explain your answer to the other groups in your class.
inve
stigatio
n
in
ve sti
ga
ti o
n
inve
stigatio
n
in
ve sti
ga
ti o
n
inve
stigatio
n
in
ve sti
ga
ti o
n
Measures of dispersion or spread
Once a set of scores has been collected and tabulated, we are ready to make some con-clusions about the data. Two key concepts are the range and the interquartile range, which are used to measure the spread of a set of scores.
Range
The range is the difference between the highest and the lowest score.
Range = highest score − lowest score
Range from single scores
A smaller range will usually represent a more consistent set of scores. Exceptions to this are when one or two scores are much higher or lower than most.
A graphics calculator can also be used to determine the range of a distribution. The 1-Var Stats displays min X (lowest score) and max X (highest score). The difference between these two values indicates the range of the data.
Range from a frequency distribution table
When we are calculating the range from a frequency distribution table, we find the highest and lowest scores from the score column. We do not use any information from the frequency column in calculating the range. When the data are presented in grouped form, the range is found by taking the highest score from the highest class and the lowest score from the lowest class.
There are 17 players in the squad for a State of Origin match. The number of State of Origin matches played by each member of the squad is shown below.
2 6 12 8 1 4 8 9 24
4 5 11 14 6 11 15 10 What is the range of this distribution?
THINK WRITE
The lowest number of matches played is 1.
Lowest score = 1 match
The highest number of matches played is 24.
Highest score = 24 matches
Calculate the range by subtracting the lowest score from the highest score.
Range = 24 − 1 = 23 matches 1
2
3
11
WORKED
E
xample
Graphics Calculator
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
413
Interquartile range
In many cases, the range is not a good indicator of the overall spread of scores. Consider the two sets of scores below, showing the wages of people in two small businesses.
A: $240, $240, $240, $245, $250, $250, $260, $800 B: $180, $200, $240, $290, $350, $400, $500, $600
The range for business A = $800 − $240 and for business B = $600 − $180
= $560 = $420
While the range for business A is greater, by looking at the wages in the two busi-nesses, we can see that the wages in business B are generally more spread. The range uses only two scores in its calculation. The interquartile range is usually a better measure of dispersion (spread). We looked at this in the previous chapter.
The quartiles are found by dividing the data into quarters. The lower quartile is the lowest 25% of scores, the upper quartile is the highest 25% of scores.
Before we can calculate an interquartile range we must be able to calculate the median. To calculate the median, we must first arrange the scores in ascending order. The median is the middle score (if there is an odd number of scores; or the average of the two middle scores if there is an even number of scores). Remember that the median
position is the th score.
The frequency distribution table at right shows the heights (in cm) of boys competing for a place on a basketball team.
Find the range of these data.
THINK WRITE
The lowest score is at the bottom of the 170 to <175 class.
Lowest score = 170 cm
The highest score is at the top of the 195 to <200 class.
Highest score = 200 cm
Range = highest score − lowest score. Range = 200 − 170 = 30 cm
Height Frequency
170 to <175 3
175 to <180 6
180 to <185 12
185 to <190 10
190 to <195 8
195 to <200 1
1
2
3
12
WORKED
E
xample
The interquartile range is the difference between the upper quartile and the lower quartile. To find the lower and upper quartiles we arrange the scores in ascending order. The lower quartile is of the way through the distribution and the upper quartile is of the way through the distribution.
To find the interquartile range we follow the steps below. 1. Arrange the data in ascending order.
2. Divide the data into halves by finding the median.
(a) If there is an odd number of scores the median score should not be included in either half of the scores.
(b) If there is an even number of scores the middle will be half way between two scores and this will divide the data neatly into two sets.
3. The lower quartile will be the median of the lower half of the data. 4. The upper quartile will be the median of the upper half of the data.
5. The interquartile range will be the difference between the medians of the two halves of the data.
Calculate the median of:
a 2, 5, 8, 8, 8, 11, 12 b 45, 69, 69, 87, 88, 92, 99, 100.
THINK WRITE
a These scores are already arranged in ascending order, so there is no need to reorder. There are 7 scores, so the
median is the 4th score.
a Median = 8
b There are 8 scores, so the median is the average of the
4th score and the 5th score.
b Median =
= 87.5 7+1
2
---=4th score
8+1 2
---=4.5th score
87+88 2
---13
WORKED
E
xample
1 4
--- 3
4
---14
WORKED
E
xample
Find the interquartile range of the following data which shows the number of home runs scored in a series of baseball matches.
12, 9, 4, 6, 5, 8, 9, 4, 10, 2
THINK WRITE
Write the data in ascending order. 2, 4, 4, 5, 6, 8, 9, 9, 10, 12 Divide the data into two equal halves. 2, 4, 4, 5, 6 8, 9, 9, 10, 12 The lower quartile will be the median
of the lower half.
Lower quartile = 4 runs
The upper quartile will be the median of the upper half.
Upper quartile = 9 runs
The interquartile range will be the upper quartile minus the lower quartile.
Interquartile range = 9 − 4 = 5 runs 1
2 3
4
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
415
The interquartile range can also be calculated using a graphics calculator.
The data below give the amount spent (to the nearest whole dollar) by each child in a group that was taken on an excursion to the Brisbane Exhibition.
15 12 17 23 21 19 16 11 17 18 23 24 25 21 20 37 17 25 22 21 19
Calculate the interquartile range for these data.
THINK DISPLAY
Enter the data. (a) Press .
(b) Select 1:Edit by pressing . (c) Enter the data in L1.
Note: There is no need to organise the data into ascending order first.
Obtain the values of the quartiles. (a) Press .
(b) Select CALC. Make sure that 1-Var Stats is set up as Xlist: L1 and Freq: 1.
(c) Select 1:1–Var Stats by pressing .
(d) Type L1. Press .
A list of statistics appears. Locate the first and third quartiles.
Scroll down the screen using the key. Q1= 17 and Q3= 23
So, IQR = $23 − $17 = $6 1
STAT
ENTER
2
STAT
ENTER
ENTER
3 ▼
15
WORKED
E
xample
remember
1. Measures of dispersion are used to measure the spread of a set of scores. 2. The range is calculated by subtracting the lowest score from the highest score. 3. A single outlying score can enlarge the range. The interquartile range is
therefore a better measure of dispersion.
4. The interquartile range is found by subtracting the lower quartile from the upper quartile.
5. The lower and upper quartiles are found by dividing the scores into two equal halves. The median of the lower half is the lower quartile and the median of the upper half is the upper quartile.
6. Remember to show units in your final answer.
remember
Range and interquartile
range
1 Copy and complete the following:
The range is a measure of __________ or __________ of a set of scores. It can be calculated by subtracting the __________ score from the __________ score. The value of the range can be affected by a single __________ score. For this reason, the __________ range is sometimes a better measure of the spread of the scores. It can be calculated as the difference between the __________ quartile and the __________ quartile. The __________ divides the scores in half; the lower quartile represents a score below which lies __________ of the scores; the upper quartile represents a score above which __________ of the scores lie. The lower quartile, median and upper quartile divide the distribution into __________ equal parts. In each of these parts there is the same number of __________.
2 Find the range of each of the following sets of data. a 2, 5, 4, 5, 7, 4, 3
b 103, 108, 111, 102, 111, 107, 110
c 2.5, 2.8, 3.4, 2.7, 2.6, 2.4, 2.9, 2.6, 2.5, 2.8 d 3.20, 3.90, 4.25, 7.29, 1.45, 2.77, 8.39 e 45, 23, 7, 47, 76, 89, 96, 48, 87, 76, 66
3 Use the frequency distribution tables below to find the range for each of the following sets of scores.
a Score Frequency b Score Frequency
1 2 38 23
2 6 39 46
3 12 40 52
4 10 41 62
5 7 42 42
43 45
c Score Frequency
89 12
90 25
91 36
92 34
93 11
94 9
95 4
10F
WORKED Example
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
417
4 For the grouped dispersions below, state the range.
5 The scores below show the number of points scored by two AFL teams over the first 10 games of the season.
Sydney: 110 95 74 136 48 168 120 85 99 65
Collingwood: 125 112 89 111 96 113 85 90 87 92
a Calculate the range of the scores for each team.
b Based on the results above, which team would you say is the more consistent?
6 Two machines are used to put approximately 100 Smarties into boxes. A check is made on the operation of the two machines. Ten boxes filled by each machine have the number of Smarties in them counted. The results are shown below.
Machine A: 100, 99, 99, 101, 100, 101, 100, 100, 101, 108 Machine B: 98, 104, 96, 97, 103, 96, 102, 100, 97, 104
a What is the range in the number of Smarties from the first machine?
b What is the range in the number of Smarties from the second machine?
c Ralph is the quality control officer and he argues that machine A is more consis-tent in its distribution of Smarties. Explain why.
7 Find the median for each of the data sets below.
a 3, 4, 4, 5, 7, 9, 10
b 17, 20, 19, 25, 29, 27, 28, 25, 29
c 52, 55, 53, 53, 54, 55, 52, 53, 54, 52
d 12, 14, 15, 12, 14, 19, 17, 15, 18, 20
e 56, 75, 83, 47, 93, 35, 84, 83, 73, 20, 66, 90
a Class Frequency b Class Frequency
51–60 2 150 to <155 12
61–70 8 155 to <160 25
71–80 15 160 to <165 38
81–90 7 165 to <170 47
91–100 1 170 to <175 39
175 to <180 20
c Class Frequency
40–43 48
44–47 112
48–51 254
52–55 297
56–59 199
60–63 84
WORKED
Example
12
WORKED
Example
13
8 For each of the data sets in question 7, calculate the interquartile range.
9
For the frequency table below, what is the range?
10
Calculate the interquartile range of the following data. 17, 18, 18, 19, 20, 21, 21, 23, 25
11
The interquartile range is considered to be a better measure of the variability of a set of scores than the range because it:
A takes into account more scores
B is the difference between the upper and lower quartiles
C is easier to calculate
D is not affected by extreme values.
12
The distribution below shows the ranges in the heights of 25 members of a football squad.
Which of the statements below is correct?
A The range of the distribution is 40.
B The range of the distribution is 49.
C The range of the distribution is 9.
D The range can be estimated only by using the cumulative frequency.
Score Frequency
25 14
26 12
27 19
28 25
29 19
A 4 B 5 C 6 D 17
A 3 B 4 C 5 D 8
Height (cm) Class centre Frequency
Cumulative frequency
140–149 144.5 2 2
150–159 154.5 5 7
160–169 164.5 10 17
170–179 174.5 7 24
180–189 184.5 1 25
WORKED
Example
14,15
E
XCEL
Spreadshe
et
Interquartile range
m
multiple choiceultiple choice
m
multiple choiceultiple choice
m
multiple choiceultiple choice
m
C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a
419
Standard deviation
We have already discussed using the range and the interquartile range as measures of the spread of a data set. However, the most commonly used measure of spread is the
standard deviation.
The standard deviation is a measure of how much a typical score in a data set differs from the mean.
Standard deviation from single scores
The standard deviation may be found by entering a set of scores into your calculator, just as you do when you are finding the mean. Your calculator will have a statistical function that gives the standard deviation.
There are two standard deviation functions on your calculator. The first, σn, is the
population standard deviation. This function is used when the statistical analysis is conducted on the entire population.
When the statistical analysis is done using a sample of the population, a slightly dif-ferent standard deviation function is used. Called the sample standard deviation, this value will be slightly higher than the population standard deviation.
The sample standard deviation will be found on your calculator using theσn − 1orthesnfunction.
Below are the scores out of 100 achieved by a class of 20 students on a science exam. Calculate the mean and the standard deviation.
87 69 95 73 88 47 95 63 91 66 59 70 67 83 71 57 82 65 84 69
THINK WRITE
Enter the data set into your calculator.
Retrieve the mean using the x– function. x– = 74.05 marks
Retrieve the standard deviation using the σn function. σn= 13.07 marks 1
2 3
16
WORKED
Example
Ian surveys twenty Year 11 students and asks how much money they earn from part-time work each week. The results are given below.
$65 $82 $47 $78 $108 $94 $60 $79 $88 $91 $50 $73 $68 $95 $83 $76 $79 $72 $69 $97 Calculate the mean and standard deviation.
THINK WRITE
Enter the statistics into your calculator.
Retrieve the mean using the x– function. x– = $77.70
Retrieve the standard deviation using the σn − 1
function, as a sample has been