Arithmetic Mean (Aver (Average) age)
Thearithmetic meanarithmetic mean (or average), _ x, lies at thecentre of a set of numeric data values.
It is found by adding up all the data values and then dividing the total by the sample size as shown in the following formula:
_ x=
Sum of all the observations
___________________
Number of observations =
∑xi ___
n 3.13.1
Where: _ x = the sample arithmetic mean (average) n = the number of data values in the sample
xi = the value of theith data value of random variablex
∑xi = the sum of then data values, i.e.x1 +x2 +x3 +x4 + … +xn Example 3.1 Financial Advisors’ Training Study
Example 3.1 Financial Advisors’ Training Study
The number of seminar training days attended last year by 20 financial advisors is shown in Table 3.1. What is the average number of training days attended by these financial advisors?
(SeeExcel file C3.1 – financial training.)
Table Table 3.13.1 Financial advisors’ training days (n = 20)
16 20 13 19 24 22 18 18 15 20
21 21 18 20 18 20 15 20 18 20
Solution Solution
To find the average, sum the number of days for all 20 financial advisors (∑xi = 376) and divide this total by the number of financial advisors (n = 20).
_ x=
___
37620 = 18.8 days
On average, each financial advisor attended 18.8 days of seminar training last year.
The arithmetic mean has the following twoadvantages: It uses all the data values in its calculation.
It is an unbiased statistic (meaning that, on average, it represents the true mean).
These two properties make it the most widely used measure of central location.
The arithmetic mean, however, has twodrawbacks:
It is not appropriate for categorical (i.e. nominal or ordinal-scaled) data.
For example, it is not meaningful to refer to the ‘average’ colour of cars or ‘average’
preferred brand or ‘average’ gender. The arithmetic mean can only be applied to numeric (i.e. interval and ratio-scaled) data.
It is distorted byoutliersoutliers. An outlier is an extreme value in a data set.
For example, the mean of 3, 4, 6 and 7 is 5. However, the mean of 3, 4, 6 and 39 is 13, which is not representative of the majority of the data values.
Applied Business Statistics
These two drawbacks require that other measures of central location be considered. Two alternative central location measures are the median and the mode.
Median Median
Themedianmedian (Me) is themiddle number of an ordered set of data. It divides an ordered set of data values into two equal halves (i.e. 50% of the data values lie below the median and 50% lie above it).
Follow these steps to calculate the median for ungrouped (raw) numeric data:
Arrange then data values in ascending order.
Find the median by first identifying the middle position in the data set as follows:
– If n is odd, the median value lies in the
(
____ n + 12)
th position in the data set.– If n is even, the median value is found by identifying the
(
__n2)
th position and then averaging the data value in this position with the next consecutive data value.Example 3.2(a) Monthly Car Sales (
Example 3.2(a) Monthly Car Sales (n = 9 months) = 9 months)
Find the median number of cars sold per month by a dealer over the past nine months based on the following monthly sales figures:
27 38 12 34 42 40 24 40 23
Solution Solution
Order the data set:
12 23 24 27 3434 38 40 40 42
Sincen = 9 (i.e.n is odd), the median position is ____ 9 + 1
2 = 5th position. The median value therefore lies in the 5th data position. Thus the median monthly car sales is 34 cars. This means that there were four months when car sales were below 34 cars per month and four months when car sales were above 34 cars per month (not necessarily consecutive months).
Example 3.2(b) Monthly Car Sales (
Example 3.2(b) Monthly Car Sales (n = 10 months) = 10 months)
Find the median number of cars sold per month by a dealer over the past ten months based on the following monthly sales figures:
27 38 12 34 42 40 24 40 23 18
Solution Solution
Order the data set:
12 18 23 24 2727 3434 38 40 40 42
Sincen = 10 (i.e.n is even), the data value in the
(
___10 2)
th = 5th position is 27.Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
Average the 5th and 6th position values (i.e. ______ 27 + 34
2 = 30.5) to give the median value.
Thus the median monthly car sales is 30.5 cars. This means that there were five months when car sales were below 30.5 cars per month and five months when car sales were above 30.5 cars per month.
To Calculate the Median for Grouped Numeric Data
Use these methods when the data is already summarised into a numeric frequency distribution (or ogive).
Graphical approach
Using the ‘less than’ ogive graph (i.e. cumulative frequency polygon), the median value is found by reading off the data value on the x-axis that is associated with the 50%
cumulative frequency located on they-axis.
Arithmetic approach
– Based on the sample size, n, calculate __n2to find the median position.
– Using the cumulative frequency counts of the ‘less than’ ogive summary table, find the median interval (i.e. the interval that contains the median position (the
(
__n2)
th data value)).– The median value can be approximated using the midpoint of the median interval, or calculated using the following formula to give a more representative median value:
Me = Ome +
__ __ ___
c[
__n2 – f (<)]
f me 3.23.2
Where: Ome = lower limit of the median interval c = class width
n = sample size (number of observations) f me = frequency count of the median interval
f (<) = cumulative frequency count of all intervals before the median interval The formula takes into account ‘how far’ into the median interval the median value lies.
Example 3.3 Courier
Example 3.3 Courier Delivery TimeDelivery Times Studys Study
A courier company recorded 30 delivery times (in minutes) to deliver parcels to their clients from its depot. The data is summarised in the numeric frequency distribution and ogive as shown in Table 3.2.
Table 3.2
Table 3.2 Numeric frequency distribution and ogive for courier delivery times (minutes) TTiimmee FFrreeqquueennccyy CuCummuullaattiivvee
Applied Business Statistics
Find the median delivery time of parcels to clients by this courier company.
Solution Solution
Sincen = 30, the median delivery time will lie in the
(
___30 2)
th = 15th ordered data position.Using the ogive (of cumulative counts) in Table 3.2, the 15th data value falls in the 30 – < 40 minutes interval. This identifies the median interval. An approximate median delivery time for parcels is therefore 35 minutes (the interval midpoint). However, a more representative median value can be found by using Formula 3.2, where:
Ome = 30 minutes c = 10 minutes f me = 9 deliveries f (<) = 8 deliveries
Me = 30 + 10(30
___
2 − 8) ______
9 = 30 + 7.78 = 37.78 minutes
Thus the median parcel delivery time is 37.78 minutes. This means that half the deliveries occurred within 37.78 minutes while the other half took longer than 37.78 minutes.
The median has one majoradvantage over the mean – it is not affected by outliers. It is therefore a more representative measure of central location than the mean when significant outliers occur in a set of data.
Adrawback of the median, however, is that it cannot be calculated for categorical data.
It makes no sense, for example, to refer to a ‘middle’ brand of fuel types. Thus a median, like the mean, can also only be applied to numeric data.
Mode Mode
Themodemode (Mo) is defined as themost frequently occurring value in a set data. It can be calculated both for categorical data and numeric data.
The following are illustrative statements that refer to the mode as the central location measure:
Colgate is the brand of toothpaste most preferred by households.
The most common family size is four.
The supermarket frequented most often in Kimberley is Checkers.
The majority of machine breakdowns last between 25 and 30 minutes.
Follow these steps to calculate the mode:
For small samples of ungrouped data, rank the data from lowest to highest, and identify, by inspection, the data value that occurs most frequently.
For large samples of discrete or categorical (nominal and ordinal-scaled) data:
– construct a categorical frequency table (see Chapter 2)
– identify the modal value or modal category that occurs most frequently.
Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
For large samples of continuous, numeric (ratio-scaled) data:
– calculate a numeric frequency distribution (see Chapter 2)
– identify the modal interval as the interval with the highest frequency count – use either the midpoint of the modal interval as an approximate modal value or apply
the following formula to calculate a more representative modal value.
Mo = Omo + ___________ c( f m – f m – 1)
2 f m – f m – 1 – f m + 1 3.33.3
Where: Omo = lower limit of the modal interval c = width of the modal interval f m = frequency of the modal interval
f m−1 = frequency of the interval preceding the modal interval f m+1 = frequency of the interval following the modal interval
The modal formula weights (‘pulls’) the modal value from the midpoint position towards the adjacent interval with the higher frequency count. If the interval to the left of the modal interval has a higher frequency count than the interval to the right of the modal interval, then the modal value is pulled down below the midpoint value, and vice versa.
Example 3.4 Courier
Example 3.4 Courier Delivery TimeDelivery Times Studys Study
Refer to Example 3.3 for the problem description and Table 3.3 for the sample data of 30 delivery times that have been summarised into a numeric frequency distribution.
Find the modal delivery time of parcels to clients from the courier service’s depot (i.e.
what is the most common courier delivery time?) Table 3.3
Table 3.3 Numeric frequency distribution of courier delivery times (minutes) TTiimme ie inntteerrvvaallss FFrreeqquueennccy cy coouunntt
10 –
<
20 3
20 – < 30 55
30
304400––< < 99
40 – < 50 77
50 –
<
60 6
Total
Total 30
Solution Solution
From the numeric frequency distribution, the modal interval (interval with the highest frequency) is 30 – < 40 minutes. The midpoint of 35 minutes can be used as the approximate modal courier delivery time.
Applied Business Statistics
To calculate a more representative modal value, apply the modal Formula 3.3 with:
Omo = 30 minutes c = 10 minutes f m = 9 deliveries f m−1 = 5 deliveries f m+1 = 7 deliveries
Mo = 30 + __________ 10(9 − 5)
(2(9) − 5 − 7)= 30 + 6.67 = 36.67 minutes
Thus the most common courier delivery time from depot to customers is 36.67 minutes.
The mode has severaladvantages:
It is a valid measure of central location forall data types (i.e. categorical and numeric).
If the data type is categorical, the mode defines the most frequently occurring category.
If the data type is numeric, the mode is the most frequently occurring data value (or the midpoint value of a modal interval, if the numeric data has been grouped into intervals).
The mode is not influenced by outliers, as it represents the most frequently occurring data value (or response category).
The mode also has one maindisadvantage:
It is a representative measure of central location only if the histogram of the numeric random variable is unimodal (i.e. has one peak only). If the shape is bi-modal, there is more than one peak, meaning that two possible modes exist, in which case there is no single representative mode.