Single Numeric Variable
Numeric Frequency Distribution
Anumeric frequency distributionnumeric frequency distribution summarises numeric data into intervals ofequal width. Each interval shows how many numbers (data values) falls within the interval.
Follow these steps to construct a numeric frequency distribution:
Determine thedata range.
Range = Maximum data value − Minimum data value 2.12.1 For the age of grocery shoppers, the age data range is 69 − 23 = 46 years.
Choose thenumber of intervals. As a rule, choose between five and eight intervals, depending on the sample size: the smaller the sample size, the fewer the number of intervals, and vice versa. Forn = 30 shoppers, choose five intervals.
Determine theinterval width.
2.2 2.2
Use this as a guide to determine a ‘neat’ interval width. For the ‘age’ variable, the approximate interval width is ___46 5 = 9.2 years. Hence choose an interval width of 10 years.
Set up theinterval limits. The lower limit for the first interval should be a value smaller than or equal to the minimum data value and should be a number that is easy to use.
Since the youngest shopper is 23 years old, choose the lower limit of the first interval to be 20.
Thelower limits for successive intervals are found by adding the interval width to each preceding lower limit. Theupper limits are chosen to avoid overlaps between adjacent interval limits.
LLoowweer r lliimmiitt UUppppeer r lliimmiitt 20 < 30 (or 29) 30 < 40 (or 39) 40 < 50 (or 49) 50 < 60 (or 59) 60 < 70 (or 69) Interval width = Data range
Number of intervals
Applied Business Statistics
The format of <30 (less than 30) should be used if the source data is continuous, while an upper limit such as 29 can be used if the data values are discrete.
Tabulate the data values. Assign each data value to one, and only one, interval. Acount of the data values assigned to each interval produces thesummary table, called thenumericnumeric frequency distribution
frequency distribution.
Whenconstructing a numeric frequency distribution, ensure that:
the interval widths are equal in size
the interval limits do not overlap (i.e. intervals must bemutually exclusive) each data value is assigned to only one interval
the intervals are fullyinclusive (i.e. cover the data range)
the sum of the frequency counts must equal the sample size,n, or that the percentage frequencies sum to 100%.
The frequency counts can be converted to percentages by dividing each frequency count by the sample size. The resultant summary table is called apercentage frequency distributionpercentage frequency distribution. It shows the proportion (or percentage) of data values within each interval.
Histogram
Ahistogramhistogram is a graphic display of a numeric frequency distribution.
Follow these steps to construct a histogram:
Arrange the intervals consecutively on thex-axis from the lowest interval to the highest.
There must be no gaps between adjacent interval limits.
Plot the height of each bar (on they-axis) over its corresponding interval, to show either the frequency count or percentage frequency of each interval. The area of a bar (width × height) measures the density of values in each interval.
Example 2.3 Grocery Shoppers Survey – Profiling the Ages of Shoppers Example 2.3 Grocery Shoppers Survey – Profiling the Ages of Shoppers
Refer to the ‘age of shoppers’ variable in Table 2.1.
1 Construct a numeric frequency distribution for the age profile of grocery shoppers.
2 Compute the percentage frequency distribution of shoppers’ ages.
3 Construct a histogram of the numeric frequency distribution of shoppers’ ages.
Management Questions Management Questions
1 How many shoppers are between 20 and 29 years of age?
2 What is the most frequent age interval of shoppers surveyed?
3 What percentage of shoppers belong to the most frequent age interval?
4 What percentage of shoppers surveyed are 60 years or older?
Solution Solution 1 and 2
The numeric and percentage frequency distributions for the ages of grocery shoppers are shown in Table 2.6, and are based on the steps shown above.
Chapter 2 – Summarising Data: Summary Tables and Graphs
Table 2.6
Table 2.6 Numeric (and percentage) frequency distribution – age of shoppers A
3 Figure 2.5 shows the histogram of the numeric frequency distribution for shoppers’
ages.
Figure 2.5 Histogram – age of shoppers Management Interpretation
Management Interpretation
1 There are six shoppers between the ages of 20 and 29 years.
2 The most frequent age interval is between 30 and 39 years.
3 30% of shoppers surveyed are between 30 and 39 years of age.
4 10% of shoppers surveyed are 60 years or older.
If the numeric data arediscrete values in a limited range (5-point rating scales, number of children in a family, number of customers in a bank queue, for example), then theindividual discrete values of the random variable can be used as the ‘intervals’ in the construction of a numeric frequency distribution and a histogram. This is illustrated in Example 2.4 below.
Example 2.4 Grocery Shoppers Survey – Profiling the Family Size of Shoppers Example 2.4 Grocery Shoppers Survey – Profiling the Family Size of Shoppers
Refer to the random variable ‘family size’ in the database in Table 2.1. Construct a numeric and percentage frequency distribution and histogram of the family size of grocery shoppers surveyed.
Management Questions Management Questions
1 Which is the most common family size?
2 How many shoppers have a family size of three?
3 What percentage of shoppers have a family size of either three or four?
Applied Business Statistics
Solution Solution
Family size is a discrete random variable. The family sizes range from 1 to 5 (see data in Table 2.1). Each family size can be treated as a separate interval. To tally the family sizes, count how many shoppers have a family size of one, two, three, four and of five.
Table 2.7 and Figure 2.6 shows the numeric and percentage frequency tables and histogram for the discrete numeric data of family size of grocery shoppers.
Table 2.7
Table 2.7 Numeric (and percentage) frequency distribution – family size of shoppers FFaammiilly y ssiizzee TTaallllyy CCoouunntt PPeerrcceennttaaggee
Figure 2.6 Histogram – family size of shoppers Management Interpretation
Management Interpretation
1 The most common family size of grocery shoppers is two.
2 There are eight shoppers that have a family size of three.
3 43.4% (26.7% + 16.7%) of shoppers surveyed have a family size of either three or four.
Cumulative Frequency Distribution
Data for a single numeric variable can also be summarised into acumulative frequencycumulative frequency distribution
distribution.
Acumulative frequency distributioncumulative frequency distribution is a summary table of cumulative frequency counts which is used to answer questions of a ‘more than’ or ‘less than’ nature.
Chapter 2 – Summarising Data: Summary Tables and Graphs
Follow these steps to construct a cumulative frequency distribution:
Using the numeric frequency distribution, add an extra interval below the lower limit of the first interval.
For each interval, beginning with this extra interval, ask the question: ‘How many data values fallbelow this interval’supper limit?’
The answer is: the sum of all frequency counts (or perce ntages)below this current interval upper limit.
For this extra lower interval, the cumulative frequency count (or percentage) will always be zero.
A shortcut method to find each successive cumulative frequency count is to add the current interval frequency count to the cumulative frequency immediately preceding it.
The last interval’s cumulative frequency count must equal the sample size,n, (if frequency counts are summed) or 100% (if percentages are summed).
Ogive
Anogiveogive is a graph of a cumulative frequency distribution.
Follow these steps to construct an ogive:
On a set of axes, mark the interval limits on thex-axis (including the extra lower interval).
On they-axis, plot the cumulative frequency counts (or cumulative percentages) against the upper limit of its interval. Plot the frequency count (or percentage) of zero opposite the upper limit of the extra lower interval.
Join these cumulative frequency points to produce a line graph.
The line graph starts at zero count (or 0%) at the upper limit of the extra interval.
The line graph ends at the sample size,n, (or 100%) at the upper limit of the last interval.
This ogive graph can now be used to read off cumulative answers to questions of the following type:
How many (or what percentage) of observations lie below (or above) this value?
What data value separates the data set at a given cumulative frequency (or cumulative percentage)?
Note: The ogive graph can provide answers for both less than andmore than type of questions from the same graph.
Example 2.5 Grocery Shoppers Survey – Analysis of Grocery Spend Example 2.5 Grocery Shoppers Survey – Analysis of Grocery Spend
Refer to the numeric variable ‘spend’ (amount spent on groceries last month) in Table 2.1.
1 Compute the numeric frequency distribution and percentage frequency distribution for the amount spent on groceries last month by grocery shoppers.
2 Compute the cumulative frequency distribution and its graph, the ogive, for the amount spent on groceries last month.
Applied Business Statistics
Management Questions Management Questions
1 What percentage of shoppers spent less than R1 200 last month?
2 What percentage of shoppers spent R1 600 or more last month?
3 What percentage of shoppers spent between R800 and R1 600 last month?
4 What was the maximum amount spent last month by the 20% of shoppers who spent the least on groceries? Approximate your answer.
5 What is the approximate minimum amount spent on groceries last month by the top-spending 50% of shoppers?
Solution Solution
1 The numeric frequency distribution for amount spent is computed using the construction steps outlined earlier.
The range is R2 136 − R456 = R1 680. Choosing five intervals, the interval width can be set to a ‘neat’ width of R400 (based on ______ R1 680
5 = R336). The lower limit of the first interval is set at a ‘neat’ limit of R400, since the minimum amount spent is R456. The numeric and percentage frequency distributions are both shown in Table 2.8.
TTable 2.able 2.88 Numeric (and percentage) frequency distributions – grocery spend G
Grroocceerry y ssppeennd d ((RR)) CoCouunntt PPeerrcceennttaaggee 400 < 800– 7 23.3%
800 – < 1 200 14 46.7%
1 200 – < 1 600 5 16.7%
1 600 – < 2 000 3 10.0%
2 000 < 400– 2 1 3.3%
Total
Total 30 100%
2 The cumulative frequency distribution (ogive) for amount spent on groceries last month is computed using the construction guidelines outlined above for the ogive.
Based on the numeric frequency distribution in Table 2.7, an additional interval (0 – < 400) is included. The cumulative frequency count for this interval is zero, since no shopper spent less than R400 on groceries last month. Referring to the upper limits for each successive interval above R400, the following cumulative counts are derived:
7 shoppers spent up to R800
21 (= 7 + 14) shoppers spent up to R1 200 26 (= 21 + 5) shoppers spent up to R1 600 29 (= 26 + 3) shoppers spent up to R2 000
all 30 shoppers (= 29 + 1) spent no more than R2 400 on groceries last month.
Chapter 2 – Summarising Data: Summary Tables and Graphs
The ogives for both the frequency counts and percentages are shown in Table 2.9.
TTable able 2.92.9 Cumulative frequency distributions (count and percentage) – grocery spend N
Nuummeerriic c ffrreeqquueennccy y ddiissttrriibbuuttiioonn CCuummuullaattiivve e ddiissttrriibbuuttiioonn G
Grroocceerry y ssppeennd d ((RR)) CCoouunntt PPeerrcceennttaaggee CCoouunntt PPeerrcceennttaaggee
0 – < 400 0 0.0% 0 0.0%
400 –800 < 7 23.3% 7 23.3%
800 –200 < 1 14 46.7% 21 70.0%
1 200 – < 6001 5 16.7% 26 86.7%
1 600 – < 0002 3 10.0% 29 96.7%
2 000 – < 4002 1 3.3% 30 100.0%
Total
Total 30 100%
Figure 2.7 shows the percentage ogive graph. Note that the % cumulative frequency is 0% at R400 (the upper limit of the extra interval) and 100% at the upper limit of R2 400 for the last interval. This means that no shopper spent less than R400 or more than R2 400 last month on groceries.
120 100 80 60 40 20
00 200 400 600 8001 0001 2001 4001 6001 8002 0002 2002 400
Amount spent last month (R) %
f o g r o c e r y s h o p p e r s
Figure 2.7Figure 2.7 Ogive (%) – grocery spend Management Interpretation
Management Interpretation
1 70% of shoppers spent less than R1 200 on groceries last month.
2 13.3% (100% − 86.7%) of shoppers spent R1 600 or more on groceries last month.
3 63.4% (86.7% − 23.3% or 46.7% + 16.7%) of shoppers spent between R800 and R1 600 on groceries last month.
4 The bottom 20% of shoppers spent no more than R770 (approximately) on groceries last month. (Using the percentage cumulative frequency polygon, this answer is found by projecting 20% from they-axis to the polygon graph and reading off the amount spent on thex-axis.)
5 From they-axis value at 50%, the minimum amount spent on groceries by the top-spending 50% of shoppers is (approximately) R1 000.
Applied Business Statistics
Note: The ogive is a less than cumulative frequency graph, but it can also be used to answer questions of a more than nature (by subtracting the less than cumulative percentage from 100%, or the cumulative count fromn, the sample size).
Box Plot
Abox plotbox plot visually displays the profile of a numeric variable by showing itsminimum and maximum values and various intermediate descriptive values (such asquartiles andmedians).
The box plot is covered in Chapter 3, as it is constructed from descriptive statistical measures for numeric variables that will only be derived in that chapter.