Arithmetic Mean (Aver (Average) age)

Thearithmetic meanarithmetic mean (or average),^_x, lies at thecentre of a set of numeric data values.

It is found by adding up all the data values and then dividing the total by the sample size as shown in the following formula:

_ x=

Sum of all the observations

___________________

Number of observations =

∑x_i ___

n 3.13.1

Where:^_x = the sample arithmetic mean (average) n = the number of data values in the sample

x_i = the value of thei^th data value of random variablex

∑x_i = the sum of then data values, i.e.x₁ +x₂ +x₃ +x₄ + … +x_n Example 3.1 Financial Advisors’ Training Study

Example 3.1 Financial Advisors’ Training Study

The number of seminar training days attended last year by 20 ﬁnancial advisors is shown in Table 3.1. What is the average number of training days attended by these ﬁnancial advisors?

(SeeExcel ﬁle C3.1 – ﬁnancial training.)

Table Table 3.13.1 Financial advisors’ training days (n = 20)

16 20 13 19 24 22 18 18 15 20

21 21 18 20 18 20 15 20 18 20

Solution Solution

To find the average, sum the number of days for all 20 financial advisors (∑x_i = 376) and divide this total by the number of financial advisors (n = 20).

_ x=

___

³⁷⁶

20 = 18.8 days

On average, each ﬁnancial advisor attended 18.8 days of seminar training last year.

The arithmetic mean has the following twoadvantages: It uses all the data values in its calculation.

It is an unbiased statistic (meaning that, on average, it represents the true mean).

These two properties make it the most widely used measure of central location.

The arithmetic mean, however, has twodrawbacks:

It is not appropriate for categorical (i.e. nominal or ordinal-scaled) data.

For example, it is not meaningful to refer to the ‘average’ colour of cars or ‘average’

preferred brand or ‘average’ gender. The arithmetic mean can only be applied to numeric (i.e. interval and ratio-scaled) data.

It is distorted by^outliers^outliers. An outlier is an extreme value in a data set.

For example, the mean of 3, 4, 6 and 7 is 5. However, the mean of 3, 4, 6 and 39 is 13, which is not representative of the majority of the data values.

Applied Business Statistics

These two drawbacks require that other measures of central location be considered. Two alternative central location measures are the median and the mode.

Median Median

The^median^median (M_e) is themiddle number of an ordered set of data. It divides an ordered set of data values into two equal halves (i.e. 50% of the data values lie below the median and 50% lie above it).

Follow these steps to calculate the median for ungrouped (raw) numeric data:

Arrange then data values in ascending order.

Find the median by ﬁrst identifying the middle position in the data set as follows:

– If n is odd, the median value lies in the

(

^____ⁿ^{+ 1}₂

)

^th position in the data set.

– If n is even, the median value is found by identifying the

(

^__ⁿ₂

)

^th position and then averaging the data value in this position with the next consecutive data value.

Example 3.2(a) Monthly Car Sales (

Example 3.2(a) Monthly Car Sales (n = 9 months) = 9 months)

Find the median number of cars sold per month by a dealer over the past nine months based on the following monthly sales ﬁgures:

27 38 12 34 42 40 24 40 23

Solution Solution

Order the data set:

12 23 24 27 3434 38 40 40 42

Sincen = 9 (i.e.n is odd), the median position is ____ ^{9 + 1}

2 = 5^th position. The median value therefore lies in the 5^th data position. Thus the median monthly car sales is 34 cars. This means that there were four months when car sales were below 34 cars per month and four months when car sales were above 34 cars per month (not necessarily consecutive months).

Example 3.2(b) Monthly Car Sales (

Example 3.2(b) Monthly Car Sales (n = 10 months) = 10 months)

Find the median number of cars sold per month by a dealer over the past ten months based on the following monthly sales ﬁgures:

27 38 12 34 42 40 24 40 23 18

Solution Solution

Order the data set:

12 18 23 24 2727 3434 38 40 40 42

Sincen = 10 (i.e.n is even), the data value in the

(

^___¹⁰₂

)

^th = 5^th position is 27.

Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics

Average the 5^th and 6^th position values (i.e. ______ ^{27 + 34}

2 = 30.5) to give the median value.

Thus the median monthly car sales is 30.5 cars. This means that there were ﬁve months when car sales were below 30.5 cars per month and ﬁve months when car sales were above 30.5 cars per month.

To Calculate the Median for Grouped Numeric Data

Use these methods when the data is already summarised into a numeric frequency distribution (or ogive).

Graphical approach

Using the ‘less than’ ogive graph (i.e. cumulative frequency polygon), the median value is found by reading off the data value on the x-axis that is associated with the 50%

cumulative frequency located on they-axis.

Arithmetic approach

– Based on the sample size, n, calculate^__ⁿ₂to ﬁnd the median position.

– Using the cumulative frequency counts of the ‘less than’ ogive summary table, ﬁnd the median interval (i.e. the interval that contains the median position (the

(

^__ⁿ₂

)

^th data value)).

– The median value can be approximated using the midpoint of the median interval, or calculated using the following formula to give a more representative median value:

M_e = O_me +

___

[

^__ⁿ₂^–^f^(<)

]

f _me 3.23.2

Where: O_me = lower limit of the median interval c = class width

n = sample size (number of observations) f _me = frequency count of the median interval

f (<) = cumulative frequency count of all intervals before the median interval The formula takes into account ‘how far’ into the median interval the median value lies.

Example 3.3 Courier

Example 3.3 Courier Delivery TimeDelivery Times Studys Study

A courier company recorded 30 delivery times (in minutes) to deliver parcels to their clients from its depot. The data is summarised in the numeric frequency distribution and ogive as shown in Table 3.2.

Table 3.2

Table 3.2 Numeric frequency distribution and ogive for courier delivery times (minutes) TTiimmee FFrreeqquueennccyy CuCummuullaattiivvee

Applied Business Statistics

Find the median delivery time of parcels to clients by this courier company.

Solution Solution

Sincen = 30, the median delivery time will lie in the

(

^___³⁰₂

)

^th = 15^th ordered data position.

Using the ogive (of cumulative counts) in Table 3.2, the 15^th data value falls in the 30 – < 40 minutes interval. This identiﬁes the median interval. An approximate median delivery time for parcels is therefore 35 minutes (the interval midpoint). However, a more representative median value can be found by using Formula 3.2, where:

O^me = 30 minutes c = 10 minutes f _me = 9 deliveries f (<) = 8 deliveries

M_e = 30 + 10⁽³⁰

___

2 − 8) ______

9 = 30 + 7.78 = 37.78 minutes

Thus the median parcel delivery time is 37.78 minutes. This means that half the deliveries occurred within 37.78 minutes while the other half took longer than 37.78 minutes.

The median has one majoradvantage over the mean – it is not affected by outliers. It is therefore a more representative measure of central location than the mean when signiﬁcant outliers occur in a set of data.

Adrawback of the median, however, is that it cannot be calculated for categorical data.

It makes no sense, for example, to refer to a ‘middle’ brand of fuel types. Thus a median, like the mean, can also only be applied to numeric data.

Mode Mode

The^mode^mode (M_o) is deﬁned as themost frequently occurring value in a set data. It can be calculated both for categorical data and numeric data.

The following are illustrative statements that refer to the mode as the central location measure:

Colgate is the brand of toothpaste most preferred by households.

The most common family size is four.

The supermarket frequented most often in Kimberley is Checkers.

The majority of machine breakdowns last between 25 and 30 minutes.

Follow these steps to calculate the mode:

For small samples of ungrouped data, rank the data from lowest to highest, and identify, by inspection, the data value that occurs most frequently.

For large samples of discrete or categorical (nominal and ordinal-scaled) data:

– construct a categorical frequency table (see Chapter 2)

– identify the modal value or modal category that occurs most frequently.

Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics

For large samples of continuous, numeric (ratio-scaled) data:

– calculate a numeric frequency distribution (see Chapter 2)

– identify the modal interval as the interval with the highest frequency count – use either the midpoint of the modal interval as an approximate modal value or apply

the following formula to calculate a more representative modal value.

M_o = O_mo + ___________ ^{c( f}^m^–^f^m^{– 1})

2 f _m – f _m_{– 1} – f _m_{+ 1} 3.33.3

Where: O^mo = lower limit of the modal interval c = width of the modal interval f _m = frequency of the modal interval

f _m₋₁ = frequency of the interval preceding the modal interval f _m₊₁ = frequency of the interval following the modal interval

The modal formula weights (‘pulls’) the modal value from the midpoint position towards the adjacent interval with the higher frequency count. If the interval to the left of the modal interval has a higher frequency count than the interval to the right of the modal interval, then the modal value is pulled down below the midpoint value, and vice versa.

Example 3.4 Courier

Example 3.4 Courier Delivery TimeDelivery Times Studys Study

Refer to Example 3.3 for the problem description and Table 3.3 for the sample data of 30 delivery times that have been summarised into a numeric frequency distribution.

Find the modal delivery time of parcels to clients from the courier service’s depot (i.e.

what is the most common courier delivery time?) Table 3.3

Table 3.3 Numeric frequency distribution of courier delivery times (minutes) TTiimme ie inntteerrvvaallss FFrreeqquueennccy cy coouunntt

10 –

20 3

20 – < 30 55

304400––< < 99

40 – < 50 77

50 –

60 6

Total

Total 30

Solution Solution

From the numeric frequency distribution, the modal interval (interval with the highest frequency) is 30 – < 40 minutes. The midpoint of 35 minutes can be used as the approximate modal courier delivery time.

Applied Business Statistics

To calculate a more representative modal value, apply the modal Formula 3.3 with:

O_mo = 30 minutes c = 10 minutes f _m = 9 deliveries f _m₋₁ = 5 deliveries f _m₊₁ = 7 deliveries

M_o = 30 + __________ ^{10(9 − 5)}

(2(9) − 5 − 7)= 30 + 6.67 = 36.67 minutes

Thus the most common courier delivery time from depot to customers is 36.67 minutes.

The mode has severaladvantages:

It is a valid measure of central location forall data types (i.e. categorical and numeric).

If the data type is categorical, the mode deﬁnes the most frequently occurring category.

If the data type is numeric, the mode is the most frequently occurring data value (or the midpoint value of a modal interval, if the numeric data has been grouped into intervals).

The mode is not inﬂuenced by outliers, as it represents the most frequently occurring data value (or response category).

The mode also has one maindisadvantage:

It is a representative measure of central location only if the histogram of the numeric random variable is unimodal (i.e. has one peak only). If the shape is bi-modal, there is more than one peak, meaning that two possible modes exist, in which case there is no single representative mode.

In document Applied Business Statistics. Methods and Excel-based Applications-Juta (2016).pdf (Page 75-80)

Arithmetic Mean (Aver (Average) age)