Data Analysis Add-in
TheDescriptive StatisticsDescriptive Statistics option in theData AnalysisData Analysis add-in (see Figure 3.10) will compute all the above descriptive measures, except the quartiles.
Figure Figure 3.103.10 The descriptive statistics option in Data AnalysisData Analysis add-in
Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
Example 3.15 Household Electricity Consumption Study Example 3.15 Household Electricity Consumption Study
Refer to the data in Table 3.4.
Compute the descriptive statistical measures for daily household electricity consumption.
Solution Solution
The output showing all the descriptive statistics measures for daily household electricity consumption is shown in Table 3.9.
TTable able 3.93.9 Descriptive statistics measures for daily household electricity consumption Electricity usage (kWh)
Electricity usage (kWh)
Mean 46.9
Standard error 1.838
Median 46.5
Mode 43
Standard deviation 8.22 Sample variance 67.57
Kurtosis −0.854
Skewness 0.113
Range 28
Minimum 33
Maximum 61
Sum 938
Count 20
Excel does not offer an option to construct a box plot. Such options are, however, available in many other statistical software packages such as SPSS and Minitab.
Refer to Appendix 2 for a flowchart summary of Descriptive Statistics tools. It shows the appropriate statistical methods for different data types of variables. It summarises the statistical methods for a single categorical variable; for two categorical variables together;
for a single numeric variable; for two numeric variables together; and finally, for a single numeric variable segmented by one or more categorical variables.
3.10 Summary 3.10 Summary
This chapter covered all the numeric descriptive statistical measures that can be used to profile sample data. These measures expressed the location, spread and shape of the sample data in numeric terms. Each measure was defined and calculated, and the conditions under which each would be appropriate to use were identified. The influence of data type and the presence of outliers are identified as the primary criteria determining the choice of a suitable numeric descriptive measure to describe sample data. A method for identifying outliers was described.
For qualitative random variables represented by categorical data, only the mode is appropriate to describe the profile of its sample data. Measures of spread and skewness
Applied Business Statistics
are not relevant for categorical data. All numeric descriptive measures are appropriate to describe the profile of quantitative random variables.
All descriptive measures can be computed inExcel by using either appropriate function keys or theDescriptive StatisticsDescriptive Statistics option in theData AnalysisData Analysis add-in. Only the box plot cannot be produced throughExcel.
These descriptive measures – particularly the mean, standard deviation (variance) and concept of symmetry – are important for the area of inferential analysis, which will be covered in chapters 7 to 12.
The chapter ended with an overview of all the exploratory data techniques that are relevant to either a categorical or a numeric random variable. This identifies valid statistical techniques for each type of random variable (i.e. qualitative and quantitative). A summary flowchart of descriptive statistical tools is given in Appendix 2.
Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
Exercises
Exercises
This symbol denotes data available onlineii
1
1 Identify the central location measure (mean, median, mode) referred to in each statement.
(a) Half of our employees have more than 10 years’ work experience.
(b) The most popular city amongst foreign tourists is Cape Town.
(c) The average time taken to complete a task is 18 minutes.
2
2 Which non-central location measure (lower or upper quartile) is described in the following statement?
‘In terms of work experience, 25% of our employees each have more than 15 years of experience.’
3
3 For which of the following statements would the arithmetic mean be inappropriate as a measure of central location? (Give a reason.) State which measure of central location would be more appropriate, if necessary.
(a) the ages of children at a playschool
(b) the number of cars using a parking garage daily (c) the brand of cereal preferred by consumers (d) the value of transactions in a clothing store
(e) the weight of hand luggage carried by airline passengers (f) your choice of daily newspaper available in your city
(g) the responses by citizens on a five-point rating scale to the statement:
‘South Africa should be divided into two time zones.’
4
4 Which statements below are true and which are false? Give a reason for your answer.
If the median mass of five parcels for delivery by a courier service is 6.5 kg and one further 7 kg parcel is added to the consignment, then:
(a) the new median mass will be about 6.6 kg (b) the median will increase
(c) it is impossible for the new median mass to be less than it was (d) it is impossible for the new median mass to stay exactly at 6.5 kg
(e) the median may increase, but that depends on the actual masses of all five parcels.
5
5 The following measures of central location were calculated for the number of people per household in Mossel Bay: mode = 2 people; mean = 4.1 people;
median = 3 people.
If there are 9 245 households in the municipal district of Mossel Bay, which of the following methods is appropriate to calculate the likely total number of persons living in Mossel Bay?
(a) Multiply the number of households by 2.
(b) Multiply the number of households by 4.1.
(c) Multiply the number of households by 3.
6
6 X3.6 X3.6 – – unit unit truststrusts
ii
The percentage returns last year for seven general equity unit trusts were 9.2;
8.4; 10.2; 9.6; 8.9; 10.5 and 8.3. Calculate the mean and standard deviation of percentage returns.
Applied Business Statistics
The following calculations can be performed either manually, using a calculator, or The following calculations can be performed either manually, using a calculator, or using
usingExcel. 7
7 X3.7 X3.7 – – luggage luggage weightsweights
ii
The mass (in kg) of the hand luggage of seven air passengers was as follows:
11 12 8 10 13 11 9
(a) Find the average and standard deviation of the hand luggage masses.
(b) Interpret the meaning of each descriptive statistic in (a).
(c) Find the coefficient of variation of the hand luggage masses.
(d) Is there high relatively variability in the mass of hand luggage between passengers? Explain briefly.
8
8 X3.8 X3.8 – – bicycle bicycle salessales
ii
The number of bicycles sold monthly by a bicycle dealer was:
25 18 30 18 20 19 30 16 36 24
(a) Find the mean and median number of bicycles sold monthly. Interpret each descriptive statistics measure.
(b) Find the range, variance and standard deviation of the number of bicycles sold monthly. Interpret the range and standard deviation measures.
(c) Calculate the lower and upper quartiles of monthly bicycle sales. Interpret.
(d) Use the approximate skewness formula to estimate the degree of symmetry in the distribution of monthly bicycle sales. Interpret its meaning.
(e) Construct a box plot of monthly bicycle sales. Interpret the plot.
(f) If the dealer uses the formula ‘mean plus one standard deviation’ to decide on the opening stock level of bicycles at the beginning of next month, will he run out of stock during the month if he receives orders for 30 bicycles next month? Assume no extra bicycles can be ordered.
9
9 X3.9 X3.9 – – setting setting timestimes
ii
The setting time for ceramic tile glue is an important quality feature of the product.
A manufacturer of ceramic tile glue tested a sample of nine batches from a large consignment and recorded the setting times (in minutes) of each batch:
27 22 31 18 20 25 21 28 24
(a) Find the mean and standard deviation of setting times of the ceramic tile glue.
(b) How consistent are the setting times across the different batches? Compute the coefficient of variation as a consistency index measure.
(c) If the consistency index must be less than 10% for the consignment to be passed by the quality controller, will this consignment be approved for dispatch?
Explain your answer.
10
10 X3.10 X3.10 – – wage wage increasesincreases
ii
A labour consultant analysed the agreed percentage wage increases in 16 wage negotiations conducted between labour unions and employers. They were:
5.6 7.3 4.8 6.3 8.4 3.4 7.2 5.8
8.8 6.2 7.2 5.8 7.6 7.4 5.3 5.8
Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
(a) Find the mean and median negotiated percentage wage increases.
(b) Find the variance and standard deviation of the percentage wage increases.
(c) Compute two standard deviation limits about the mean. Interpret these.
(d) How consistent are the percentage wage increases agreements? Compute the coefficient of variation as a consistency index measure.
11
11 Two groups of bank trainees each wrote a banking exam with the following percentage results:
M
Meeaann VaVarriiaannccee SSaammpplle e ssiizzee
Group 1 76 110 34
Group 2 64 88 26
(a) Compute the coefficient of variation of exam scores for each trainee group.
(b) Which group showed greater consistency in exam score performance? Why?
12
12 X3.12 X3.12 – – meal meal valuesvalues
ii
A restaurant owner randomly selected and recorded the value of meals enjoyed by 20 diners on a given day. The values of meals (in rand) were:
44 65 80 72 90 58 44 47 48 35
65 56 36 69 48 62 51 55 50 44
(a) Define the random variable and its data type.
(b) Compute the mean and standard deviation of the value of meals at the restaurant.
(c) What is the median value of a meal at the restaurant? Interpret its meaning.
(d) What meal value occurs most frequently?
(e) Which central location measure would you choose? Why?
13
13 X3.13 X3.13 – – days days absentabsent
ii
The human resources department of a company recorded the number of days absent of 23 employees in the technical department over the past nine months:
5 4 8 17 10 9 30 5 6 15 10 9
2 16 15 18 4 12 6 6 15 10 5
(a) Find the mean, median and modal number of days absent over this nine-month period.
Interpret each central location measure.
(b) Compute the first quartile and the third quartile of the number of days absent.Interpret these quartile values for the human resources manager.
(c) The company’s policy is to keep its absenteeism level to within an average of one day per employee per month. Based on the findings in (a), is the company successful in managing its absenteeism level? Explain.
14
14 X3.14 X3.14 – – bad bad debtsdebts
ii
The Gauteng chamber of business conducted a survey amongst 17 furniture retailers to identify the percentage of bad debts in each company’s debtors’ book.
The bad debts percentages are as follows:
Applied Business Statistics
2.2 4.7 6.3 5.8 5.7 7.2 2.6 2.4 6.1 6.8
2.2 5.7 3.4 6.6 1.8 4.4 5.4
(a) Find the average and standard deviation of the percentage of bad debts amongst the 17 furniture retailers surveyed.
(b) Find the median percentage of bad debts amongst the 17 furniture retailers surveyed.
(c) Interpret the findings from (a) and (b).
(d) Is there a modal percentage of bad debts? If so, identify it and comment on its usefulness.
(e) Calculate the skewness coefficient for percentage of bad debts. Is the data skewed?
(f) Compute the first quartile and the third quartile of the percentage of bad debts amongst the furniture retailers surveyed. Interpret these quartile values.
(g) The chamber of business monitors bad debt levels and will advise an industry to take corrective action if the percentage of bad debts, on average, exceeds 5%.
Should the chamber of business send out an advisory note to all furniture retailers based on these sample findings? Justify your answer.
15
15 X3.15 X3.15 – – fish fish shopshop
ii
A fish shop owner recorded the daily turnover of his outlet for 300 trading days as shown in the frequency table.
D
Daaiilly y ttuurrnnoovveerr NNuummbbeer r oof f ddaayyss 500 – < 750 15
750 – < 1 000 23 1 000 – < 1 250 55 1 250 – < 1 500 92 1 500 – < 1 750 65 1 750 – < 2 000 50
(a) Compute and interpret the (approximate) average daily turnover of the fish shop. (Hint: Use the weighted average formula, with the midpoint of each interval asxi.)
(b) Find the median daily turnover of the fish shop. Interpret its meaning.
(c) What is the modal daily turnover of the fish shop?
(d) Find the maximum daily turnover associated with the slowest 25% of trading days.
(e) What daily turnover separates the busiest 25% of trading days from the rest?
16
16 X3.16 X3.16 – An economist conducted a study to identify the percentage of family income– grocery grocery spendspend
ii
allocated to the purchase of groceries. She surveyed a random sample of 50 families and compiled the following numeric frequency distribution:
Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
P
(a) Compute and interpret the (approximate) mean percentage of family income allocated to grocery purchase. (Hint: Use the weighted average formula with the midpoint of each interval asxi.)
(b) What is the maximum percentage of income that is allocated to grocery purchase by:
(i) the lower 50% of families?
(ii) the lower 25% of families?
(c) 25% of families spend more than a specific percentage of their income on groceries. What is that percentage of income value?
17
17 X3.17 – equity portfolioX3.17 – equity portfolio
ii
Find the average price paid per share in an equity portfolio consisting of:
40 shares bought for R15 each; 10 shares bought for R20 each; 5 shares bought for R40 each; and 50 shares bought for R10 each. Use the weighted average formula.
18
18 X3.18 – car salesX3.18 – car sales
ii
Value Cars, a pre-owned car dealership with branches throughout Gauteng, last month sold 5 cars at R25 000 each; 12 cars at R34 000 each; and 3 cars at R55 000 each. What was the average price per car sold by Value Cars last month?
Use the weighted average formula.
19
19 X3.19 – rental increasesX3.19 – rental increases
ii
Office rental agreements contain escalation clauses. For a particular office complex in the Nelspruit CBD, the escalation rates based on the previous year’s rental over four years were 16%, 14%, 10% and 8% respectively.
Use the geometric mean to find the average annual escalation rate in office rentals for this office complex over the four-year period.
20
20 X3.20 – sugar increasesX3.20 – sugar increases
ii
The price of a kilogram of sugar increased by 5%, 12%, 6%, 4%, 9% and 3%
over the past six years.
(a) Find the average annual percentage increase in the price of sugar (per kg) using the geometric mean.
(b) Why is the geometric mean more suitable than the arithmetic mean?
21
21 X3.21 X3.21 – – water water usageusage
ii
Thirty households in a Paarl suburb were surveyed to identify their average water usage per month (in kilolitres, kl). The usage per household was:
10 18 30 13 42 14 9 15 19 20
25 15 24 12 15 16 22 22 8 33
50 26 16 32 25 26 16 26 25 12
Applied Business Statistics
UseExcel’sData > Data Analysis > Descriptive StatisticsData > Data Analysis > Descriptive Statistics option and, where necessary, the function keyQUARTILE.INCQUARTILE.INC to answer the following questions:
(a) Find the mean, median and modal water usage across the 30 households.
(b) Find the variance and standard deviation of water usage per household.
(c) Find the first and the third quartile of water usage amongst the 30 households.
(d) Interpret the findings from (a) to (c) for the municipal officer who conducted this survey.
(e) If there are 750 households in the Paarl suburb, what is the most likely total water usage (in kl) amongst all these households:
(i) in a month?
(ii) in a year?
22
22 X3.22 X3.22 – – veal veal dishesdishes
ii
The price of a veal cordon bleu meal (in rand) was taken from the menus of 28 Durban restaurants in a survey conducted byLifestyle magazine into the cost of
‘dining out’. The prices are:
48 66 60 90 58 68 53 63 64 55 64 58 54 72
56 80 55 62 75 48 55 45 48 72 52 68 56 70
UseExcel’sData > Data Analysis > Descriptive StatisticsData > Data Analysis > Descriptive Statistics option and, where necessary, the function keysQUARTILE.INCQUARTILE.INC andPERCENTILE.INCPERCENTILE.INC, to answer the following questions:
(a) Define the random variable and its data type.
(b) Find the mean and median price of a veal cordon bleu meal. Interpret each measure.
(c) Can you identify a modal price? Give its value and discuss its usefulness.
(d) Identify the value of the standard deviation of the price of veal cordon bleu.
(e) Compute the skewness coefficient. Does the data appear to be skewed? If so, why?
(f) Which central location measure would you choose to report in the article on
‘dining out’? Why?
(g) What is the least price that a patron to one of these restaurants would pay if they dined out at any one of the most expensive 25% of restaurants?
(h) The least expensive 25% of restaurants do not charge above what price for the veal cordon bleu meal?
(i) What is the least price to be paid for the most expensive 10% of veal cordon bleu meals?
23
23 X3.23 X3.23 – – fuel fuel billsbills
ii
The monthly fuel bills of a random sample of 75 Paarl motorists who commute to work daily by car were recorded in a recent survey.
(a) Use the Data > Data Analysis > Descriptive StatisticsData > Data Analysis > Descriptive Statistics option inExcel to find the mean, median, variance, standard deviation and skewness coefficient of the monthly fuel bill of the sample of car commuters.
(b) Interpret the meaning of each descriptive statistic in (a).
(c) Is the data skewed? If so, in what direction? What would be the cause of skewness?
Chapter 3 – Describing Data: Numeric Descriptive Statistics Chapter 3 – Describing Data: Numeric Descriptive Statistics
(d) Find the coefficient of variation for monthly fuel bills. Is the relative variability between the sampled motorists’ monthly fuel bills low?
(e) Use the Excel function keyQUARTILE.INCQUARTILE.INC to find the lower and upper quartiles of monthly fuel bills of motorists. Interpret each quartile.
(f) Compile the five-number summary table for monthly fuel bills.
(g) Construct a box plot of monthly fuel bills.
(h) Describe the profile of the monthly fuel bills of Paarl motorists who use their cars to commute daily to work and back.
(i) Assume that the cost of fuel is R10 per litre and that there are 25 000 motorists who commute to work daily in Paarl by car. Estimate the most likely total amount of fuel used (in litres) by all car commuters in Paarl in a month.
24
24 X3.24 X3.24 – – service service periodsperiods
ii
The Association of Professional Engineers recently surveyed a sample of 100 of its members to identify their years of experience.
(a) UseExcel’sData > Data Analysis > Descriptive StatisticsData > Data Analysis > Descriptive Statistics option to find the mean, median, standard deviation and skewness measure of the years of experience for the sample of professional engineers.
(b) Interpret the meaning of each descriptive statistical measure in (a).
(c) Use Excel’sData > Data Analysis > HistogramData > Data Analysis > Histogram option to compute a frequency distribution and a histogram of the years of experience for the sample data.
(Hint: Use the bin range given in the database.)
(d) Compute an interval for the years of experience of professional engineers that covers one standard deviation either side of the sample mean.
What percentage of the sampled professional engineers does this represent?
(e) The Association of Professional Engineers would like to see a mix of
‘experience’ and ‘new blood’ amongst its members. Ideally they would like to have at least 20% of their members with less than three years of experience (i.e.
‘new blood’ members) and at least 20% of their members with more than 12 years of service (i.e. ‘experienced’ members). Based on the histogram and frequency distribution in (c), is this desired mix of experience being achieved? Comment.
25
25 X3.25 X3.25 – – dividend dividend yieldsyields
ii
A survey of 44 JSE-listed companies recorded their dividend yields (as a percentage) for last year as shown in the following table:
5.3 4.8 3.1 4.1 6.1 4.1 3.2 4.6 7.6 1.6
4.6 1.9 4.8 2.9 1.5 5.1 2.8 3.6 3.3 5.5
2.8 4.2 3.6 3.1 5.9 2.9 3.6 4.1 4.9 2.7
3.7 7.1 6.2 2.8 5.1 3.8 3.4 3.9 5.8 6.3
6.8 4.3 5.1 5.4
(a) Define the random variable and its data type.
(b) Use Excel’s Data > Data Analysis > Descriptive StatisticsData > Data Analysis > Descriptive Statistics option to find the mean, median, mode, standard deviation and skewness measure of the dividend yields of the JSE-listed companies.
(c) Interpret the meaning of each descriptive statistical measure in (b).
Applied Business Statistics
(d) Which central location measure would you use to report on the dividend yields of companies? Why?
(e) Use Excel’sData > Data Analysis > HistogramData > Data Analysis > Histogram option to compute a numeric frequency distribution and a histogram of the dividend yields of the sampled JSE companies. (Hint: Use the bin range given in the database.)
(f) Use Excel’s function key operations (MINMIN,MAXMAX,MEDIANMEDIAN andQUARTILE.INCQUARTILE.INC) to compute the five-number summary table.
(g) Construct a box plot of the dividend yields of the JSE-listed companies and interpret the profile of the dividend yields declared by JSE companies last year.
(h) What was the minimum dividend yield declared by the top 10% of JSE companies? (Hint: Use thePERCENTILE.INCPERCENTILE.INC function key.)
(i) What percentage of JSE companies did not declare more than a 3.5% dividend yield last year? (Hint: Compute the ogive to find the cumulative percentages.) 26
26 X3.26 X3.26 – – rosebudsrosebuds
ii
A commercial flower grower in Hazyview sells fresh-cut rosebuds to retailers in
A commercial flower grower in Hazyview sells fresh-cut rosebuds to retailers in