Mining Interesting Segment-Specific Associations

2. ITEM RECOMMENDATION: The approach should name specific categories which might be suitable for customized promotional campaigns in each segment. The recommended categories

4.3 Mining Interesting Segment-Specific Associations

For demonstration purposes, we apply the next steps primarily to the segment-specific associations of the “wine” segment found in the dataset without the HFC (cf. the market basket prototype of segment k= 1 in Figure 4.8). About 6.13% (=184) of all the sample’s households belong to the wine segment.

After the 3456 transactions of the segment were pooled into c_k=1, the APRIORI algorithm mined 388 frequent itemsets with an heuristically predefined minimum support of 1%. Given that this number of frequent itemsets is too high and needs to be reduced to a more manageable figure, the 200 frequent itemsets with the highest all-confidence value and a minimum length of two categories are chosen for further examination. A separate cache stores all single frequent sub-itemsets since they are needed for the calculation of the optimization model in Section 4.4.

As predicted in Section 3.3.1, the distribution of category purchase frequencies within the generated transaction pools c_kof a segment is highly skewed, since the item combinations of only a few categories within the sparse transaction data characterize a segment. For example, different kinds of wines occur within the wine-segment disproportionately more often than other categories. Figure 4.16 illustrates this aspect in the “wine” and ”baby“ cluster. This justifies filtering the frequent itemsets with the all-confidence value because it reduces the risk of defining weakly-related cross-support patterns.

0.0

Figure 4.16: Skewed distribution of purchase frequencies within (a) the baby-segment and (b) the wine-segment

The analyst can examine the revealed frequent itemsets of the wine segment more easily if he applies the hierarchical cluster approach described in Section 3.3.2. It separates the frequent itemsets into smaller

and more manageable association groups. Each branch of a dendrogram shows the decision maker the similar frequent itemsets according to the distance measure shown in Equation 3.7. Ward’s (1963) algorithm compresses the distance matrix which consists of a value for each pair of the 200 frequent itemsets. Figure 4.18 plots the corresponding dendrogram.

[Table 4.2] hard alc. [Table 4.3] mustard

Figure 4.17: Dendrogram of frequent itemset grouping in the wine segment

Determining the ”right” number of association groups l for this hierarchical cluster analysis is just as difficult as it is for the KCCA algorithms. To approximate an adequate l, the highest jumps within the fusion level are considered for the interval l = [25, 35] (cf. Decker and Schimmelpfennig 2002).

From a managerial point of view, a value between 25 and 35 seems to be acceptable since it defines a manageable number of association groups. Creating association groups splits the problem of exploring all the frequent itemsets of a segment into more easily solvable subproblems. This becomes even more useful if the itemsets of a group present a similar itemset composition according to which they can be sorted. Figure 4.17 depicts the dendrogram gained from the grouping of the 200 itemsets mined in the wine segment. The dashed horizontal gray line cuts off 26 association groups, with the gray rectangles marking two groups as examples. The one on the left-hand side includes five itemsets with relationships among categories of hard alcoholic beverages listed in Table 4.2. The association group on the right-hand side encloses itemsets containing mustard (see Table 4.3).

Usually, the groups do not only include frequent itemsets whose categories present a similarity reflecting, for example, a common characteristic (such as the hard alcoholic beverages). In most cases, they are simply heterogeneous and mix together itemsets containing different sorts of products, such as the second association group in Table 4.3. Nevertheless, arranging the found segment-specific associations into groups can help in determining special areas of similar itemsets, in sorting and separating the output lists into more manageable sublists, and in getting a better understanding of the segment’s category purchase correlations (cf. Toivonen et al. 1995, Gupta et al. 1999).

The suggested KCCA cluster algorithm seems to build customer segments characterized by similar

cat-Itemset Support All-confidence

Table 4.2: Group of itemsets in the wine segment that include hard alcoholic beverages

Itemset Support All-confidence

Table 4.3: Group of itemsets in the wine segment including mustard

egory correlations. To verify whether the generated associations specify the segments from which they have been derived, we use the itemset grouping. The idea is to partition the combined itemsets of two distinguished segments with the hierarchical cluster analysis previously introduced. The solution should reflect the existence of two association groups, with each group containing frequent itemsets that clearly arise from their corresponding segments. In addition, both segments are expected to have some common frequent itemsets.

Hence, 75 mined associations from two segments – here, the baby and the wine segment – are combined into one set of 150 associations. The corresponding transactions in c₈ and c₁ are pooled in a conjoint dataset. By means of Equation 3.7, the distance for each pair of itemsets is calculated and transferred to the distance matrix. After grouping the combined associations as proposed above, the branches of the resulting dendrogram in Figure 4.18 point out some smaller groups of associations on the left-hand side and two larger groups on the right-hand side of the plot. In fact, the small groups define itemsets which could be found in both segments (see the gray rectangles in Figure 4.18 and the corresponding itemsets listed in Table 4.4). However, the itemset partitioning rearranges the found segment-specific itemsets of the wine and baby clusters into two bigger association groups on the right-hand side. Consequently, the hierarchical clustering of the combined itemsets supports the cluster solution of the previous partition algorithm.

Source Itemset Support All-confidence

Baby segment {mayonnaise, mustard} 0.0143 0.1792

Wine segment {mayonnaise, mustard} 0.1172 0.1465

Baby segment {detergents, washing-up liquid} 0.0250 0.2618 Wine segment {detergents, washing-up liquid} 0.0117 0.1226

Table 4.4: Groups of two identical itemsets found in both segments

Frequent itemsets in the baby segment

Frequent itemsets in the wine segment

Common itemsets

Figure 4.18: Dendrogram of frequent itemset grouping in the combined transactions of two segments

The grouping of the frequent itemsets identifies the different characteristics of the clusters with regard to the included category correlations. This verifies that our partitioning algorithm determines the purchasing behavior of the average households of a segment quite appropriately. Otherwise, the efficiency of the marketing efforts would be in danger since the targeted promotional campaigns depend on the segment-specific associations found in the data.

Generally, grouping the itemsets could provide the retailer with information about the existing category correlations of a specific segment. But since it is difficult to extract and to evaluate groups without visual inspection, an automated program can hardly be expected to implement this method. Although it is a useful extension for mining programs, we do not implement it for our fully-automated simulation in Section 4.5. The simulation comprises only the modules of the approach which can be computed without much user interaction, i.e. the KCCA, association rule mining, filtering of frequent itemsets and the optimization model.

In document Building a Data Mining Framework for Target Marketing (Page 108-111)