• No results found

Two Categorical Variables Cross-tabulation Table

A cross-tabulation table (also called a contingency table) summarises the joint responses of two categorical variables. The table shows the number (and/or percentage) of observations that jointly belong to each combination of categories of the two categorical variables.

This summary table is used to examine the association between two categorical measures.

Follow these steps to construct a cross-tabulation table:

Prepare a table with m rows (m = the number of categories of the first variable) and n columns (n = the number of categories of the second variable), resulting in a table with (m × n) cells.

Assign each pair of data values from the two variables to an appropriate category–

combination cell in the table by placing a tick in the relevant cell.

When each pair of data values has been assigned to a cell in the table, count the number of ticks per cell to derive the joint frequency count for each cell.

Sum each row to give row totals per category of the row variable.

Sum each column to give column totals per category of the column variable.

Sum the column totals (or row totals) to give the grand total (sample size).

These joint frequency counts can be converted to percentages for easier interpretation. The percentages could be expressed in terms of the total sample size (percent of total), or of row subtotals (percent of rows) or of column subtotals (percent of columns).

The cross-tabulation table can be displayed graphically either as a stacked bar chart (also called a component bar chart) or a multiple bar chart.

Stacked Bar Chart

Follow these steps to construct a stacked bar chart:

Choose, say, the row variable, and plot the frequency of each category of this variable as a simple bar chart.

Split the height of each bar in proportion to the frequency count of the categories of the column variable.

This produces a simple bar chart of the row variable with each bar split proportionately into the categories of the column variable. The categories of column variable are ‘stacked’ on top of each other within each category bar of the row variable.

Note: The stacked bar chart can also be constructed by choosing the column variable first and then splitting the bars of the column variable into the category frequencies of the row variable.

Multiple Bar Chart

Follow these steps to construct a multiple bar chart:

For each category of, say, the row variable, plot a simple bar chart constructed from the corresponding frequencies of the categories of the column variable.

Display these categorised simple bar charts next to each other on the same axes.

The multiple bar chart is similar to a stacked bar chart, except that the stacked bars are displayed next to rather than on top of each other.

The two charts convey exactly the same information on the association between the two variables. They differ only in how they emphasise the relative importance of the categories of the two variables.

Example 2.2 Grocery Shoppers Survey – Store Preferences by Gender

Refer to the ‘store preference’ variable and the ‘gender’ variable in Table 2.1.

1 Construct a cross-tabulation table of frequency counts between ‘store preference’

(as the row variable) and ‘gender’ (as the column variable) of shoppers surveyed.

2 Display the cross-tabulation as a stacked bar chart and as a multiple bar chart.

3 Construct a percentage cross-tabulation table to show the percentage split of gender for each grocery store.

Management Questions

1 How many shoppers are male and prefer to shop at Checkers?

2 What percentage of all grocery shoppers are females who prefer Pick n Pay?

3 What percentage of all Checkers’ shoppers are female?

4 Of all male shoppers, what percentage prefer to shop at Spar for their groceries?

5 Is there an association between gender and store preference (i.e. does store preference differ significantly between male and female shoppers)?

Solution

1 The row categorical variable is ‘store preference’: 1 = Checkers; 2 = Pick n Pay;

3 = Spar. The column categorical variable is ‘gender’: 1 = female; 2 = male.

Table 2.3 Cross-tabulation table – grocery store preferences by gender

Store Gender Total

1 = Female 2 = Male

1 = Checkers 7 3 10

2 = Pick n Pay 10 7 17

3 = Spar 2 1 3

Total 19 11 30

To produce the cross-tabulation table, count how many females prefer to shop at each store (Checkers, Pick n Pay and Spar) and then count how many males prefer

33 to shop at each store (Checkers, Pick n Pay and Spar). These joint frequency counts are shown in Table 2.3. The cross-tabulation table can also be completed using percentages (row percentages, column percentages or as percentages of the total sample).

2 Figure 2.3 and Figure 2.4 show the stacked bar chart and multiple bar chart respectively for the cross-tabulation table of joint frequency counts in Table 2.3.

Checkers Pick n Pay Spar

Figure 2.3 Stacked bar chart – grocery store preferences by gender

The stacked bar chart highlights overall store preference, with gender split by store.

Pick n Pay is the most preferred shop (17 shoppers out of 30 prefer Pick n Pay), followed by Checkers (10 out of 30) and only 3 prefer to shop at Spar. In addition, of the 17 shoppers who prefer Pick n Pay, 10 are female and 7 are male; of the 10 shoppers who prefer Checkers, 7 are female and 3 are male; and of the 3 shoppers who prefer Spar, 2 are female and only 1 is male.

Checkers Pick n Pay Spar

Figure 2.4 Multiple bar chart – grocery store preferences by gender

The multiple bar chart places more emphasis on the gender differences between stores.

3 Table 2.4 shows, for each store separately, the percentage split by gender (row percentages), while Table 2.5 shows, for each gender separately, the percentage breakdown by grocery store preferred (column percentages).

Table 2.4 Row percentage cross-tabulation table (store preferences by gender)

Store Gender Total

1 = Female 2 = Male

1 = Checkers 70% 30% 100%

2 = Pick n Pay 59% 41% 100%

3 = Spar 67% 33% 100%

Total 63% 37% 100%

From Table 2.4, of those shoppers who prefer Checkers, 70% are female and 30%

are male. Similarly, of those who prefer Pick n Pay, 59% are female and 41% are male. Finally, 67% of customers who prefer to shop at Spar are female, while 33%

are male. Overall, 63% of grocery shoppers are female, while only 37% are male.

Table 2.5 Column percentage cross-tabulation table (store preferences by gender)

Store Gender Total

1 = Female 2 = Male

1 = Checkers 37% 27% 33%

2 = Pick n Pay 53% 64% 57%

3 = Spar 11% 9% 10%

Total 100% 100% 100%

From Table 2.5, of all female shoppers, 37% prefer Checkers, 53% prefer Pick n Pay and 11% prefer to shop for groceries at Spar. For males, 27% prefer Checkers, 64%

prefer Pick n Pay and the balance (9%) prefer to shop at Spar for their groceries.

Overall, 33% of all shoppers prefer Checkers, 57% prefer Pick n Pay and only 10%

prefer Spar for grocery shopping.

Management Interpretation

1 Of the 30 shoppers surveyed, there are only three males who prefer to shop at Checkers.

2 33.3% (10 out of 30) of all shoppers surveyed are females who prefer to shop at Pick n Pay.

35 3 70% (7 out of 10) of all Checkers shoppers are female. (Refer to the row percentages

in Table 2.4.)

4 Only 9% (1 out of 11) of all males prefer to shop at Spar. (Refer to the column percentages in Table 2.5.)

5 Since the percentage breakdown between male and female shoppers across the three grocery stores is reasonably similar (i.e. approximately 63% female, 37%

male), gender and store preference are not associated.

2.3 Summarising Numeric Data

Numeric data can also be summarised in table format and displayed graphically. The table is known as a numeric frequency distribution and the graph of this table is called a histogram.

From Table 2.1, the numeric variable ‘age of shoppers’, will be used to illustrate the construction of a numeric frequency distribution and its histogram.

Single Numeric Variable