When dealing with sets of data, we are always working with either the population or a sample from the population. The means of a data set representing the population and a sample are calculated in the same way; however, they are represented by
different symbols. The mean of a population is represented by the Greek letter μ (mu)
and the mean of a sample is represented by x (x-bar), a lower case x with a bar on top. In this section we will only use x to represent the mean.
The mean of a set of data is what is referred to in everyday language as the average. For the set of data {4, 7, 9, 12, 18}:
x= 4+ 7 + 9 + 12 + 18
5 = 10.
The formal definition of the mean is:
x= x
n
where x represents the sum of all of the observations in the data
set and n represents the number of observations in the data set.
MASTER Stem 2 3 4 5 6 7 8 9 10 11 Leaf 3 1 5 8 2 5 5 7 1 2 3 3 6 7 7 1 1 3 7 7 6 7 9 1 3 Key: 2
|
3 = 231.8
Unit 3 AOS DA Topic 3 Concept 1 Measures of centre—mean Concept summaryNote that the symbol, is the Greek letter, sigma, which represents ‘the sum of’. The mean is also referred to as a summary statistic and is a measure of the centre of a distribution. The mean is the point about which the distribution ‘balances’.
Consider the masses of 7 potatoes, given in grams, in the photograph.
170 g 100 g 145 g 190 g 160 g 120 g 130 g
The mean is 145 g. The observations 130 and 160 ‘balance’ each other since they are each 15 g from the mean. Similarly, the observations 120 and 170 ‘balance’ each other since they are each 25 g from the mean, as do the observations 100 and 190. Note that the median is also 145 g. That is, for this set of data the mean and the median give the same value for the centre. This is because the distribution is symmetric.
Now consider two cases in which the distribution of data is not symmetric.
Case 1
Consider the masses of a different set of 7 potatoes, given in grams below. 100 105 110 115 120 160 200
The median of this distribution is 115 g and the mean is 130 g. There are
5 observations that are less than the mean and only 2 that are more. In other words, the mean does not give us a good indication of the centre of the distribution. However, there is still a ‘balance’ between observations below the mean and those above, in terms of the spread of all the observations from the mean. Therefore, the mean is still useful to give a measure of the central tendency of the distribution but in cases where the distribution is skewed, the median gives a better indication of the centre. For a positively skewed distribution, as in the previous case, the mean will be greater than the median. For a negatively skewed distribution the mean will be less than the median.
Case 2
Consider the data below, showing the weekly income (to the nearest $10) of 10 families living in a suburban street.
$600 $1340 $1360 $1380 $1400 $1420 $1420 $1440 $1460 $1500 In this case, x = 13320
10 = $1332, and the median is $1410.
One of the values in this set, $600, is clearly an outlier. As a result, the value of the mean is below the weekly income of the other 9 households. In such a case the mean is not very useful in establishing the centre; however, the ‘balance’ still remains for this negatively skewed distribution.
The mean is calculated by using the values of the observations and because of this it becomes a less reliable measure of the centre of the distribution when the distribution is skewed or contains an outlier. Because the median is based on the order of the
observations rather than their value, it is a better measure of the centre of such distributions.
Calculate the mean of the set of data shown. 10, 12, 15, 16, 18, 19, 22, 25, 27, 29
THINK WRITE
1 Write the formula for calculating the mean, where x is the sum of all scores; n is the number of scores in the set.
x= x n
= 10+ 12 + 15 + 16 + 18 + 19 + 22 + 25 + 27 + 29
10
x= 19.3
2 Substitute the values into the formula and evaluate.
The mean, x, is 19.3.
WOrKed eXaMPLe
When calculating the mean of a data set, sometimes the answer you calculate will contain a long stream of digits after the decimal point.
For example, if we are calculating the mean of the data set 44, 38, 55, 61, 48, 32, 49
Then the mean would be:
x = x n
= 327 7
= 46.714 285 71…
In this case it makes sense to round the answer to either a given number of decimal places, or a given number of significant figures.
rounding to a given number of signifi cant fi gures
When rounding to a given number of signifi ed fi gures, we are rounding to the digits in a number which are regarded as ‘signifi cant’.
To determine which digits are signifi cant, we can observe the following rules:
• All digits greater than zero are signifi cant
• Leading zeros can be ignored (they are placeholders and are not signifi cant)
• Zeros included between other digits are signifi cant
• Zeros included after decimal digits are signifi cant
• Trailing zeros for integers are not signifi cant (unless specifi ed otherwise)
The following examples show how these rules work:
0.003 561 — leading digits are ignored, so this has 4 signifi cant fi gures
70.036 — zeros between other digits are signifi cant, so this has 5 signifi cant fi gures
5.320 — zeros included after decimal digits are signifi cant, so this has 4 signifi cant fi gures
450 000 — trailing zeros are not signifi cant, so this has 2 signifi cant fi gures
78 000.0 — the zero after the decimal point is considered signifi cant, so the zeros between other numbers are also signifi cant; this has 6 signifi cant fi gures
As when rounding to a given number of decimal places, when rounding to a given number of signifi cant fi gures consider the digit after the specifi ed number of fi gures. If it is 5 or above, round the fi nal digit up; if it is 4 or below, keep the fi nal digit as is. 5067.37 — rounded to 2 signifi cant fi gures is 5100
3199.01 — rounded to 4 signifi cant fi gures is 3199
0.004 931 — rounded to 3 signifi cant fi gures is 0.004 93
1 020 004 — rounded to 2 signifi cant fi gures is 1 000 000