Customer Analysis for Software XploRe From Data Mining to Marketing Strategy

(1)

— From Data Mining to Marketing Strategy

Diplomarbeit

zur Erlangung des akademischen Grades eines Master of Science

an der Wirtschaftswissenschaftlichen Fakult¨at der Humboldt-Universit¨at zu Berlin

Eingereicht von Jianqiu Wang Am 27. Mai 2003 Matrikel-Nr.: 161426

Pr¨ufer: Prof. Dr. Wolfgang H¨ardle

(2)

(3)

Abstract 1

Introduction 3

1. Customer analysis 5

1.1 Customer Behaviour . . . 5

1.1.1 Customers “Black Box” . . . 5

1.1.2 Consumer buying process . . . 6

1.1.3 Customer behaviour model . . . 8

1.1.4 Factors influencing customer buying behaviour . . . 10

1.2 Market Segmentation and Profiling . . . 12

1.2.1 Market segmentation . . . 13

1.2.2 Customer profiling . . . 22

1.3 Market targeting and Positioning . . . 23

1.3.1 Market Targeting . . . 23

1.3.2 Positioning . . . 24

2. Data Mining 26 2.1 The process of Data mining . . . 26

2.1.1 Data Collection and Selection . . . 26

2.1.2 Data Preparation . . . 28

2.1.3 Mining . . . 29

2.1.4 Result Interpretation . . . 29

2.2 The Aspects of Data Mining . . . 29

2.2.1 Applications . . . 30

2.2.2 Operations . . . 31

2.2.3 Data Mining Techniques . . . 31 i

(4)

ii Index of contents

3. XploRe user and customer analysis 39

3.1 About XploRe . . . 39

3.2 XploRe user(2002) and customer descriptive analysis . . . 39

3.2.1 Data collection . . . 39

3.2.2 Data cleaning and preparation . . . 41

3.2.3 Data descriptive analysis and result . . . 42

3.2.4 Comparing the user and customer of XploRe . . . 46

3.2.5 Measures of Improvement . . . 46

3.3 Cluster analysis for XploRe user data 2002 . . . 47

3.3.1 Cluster analysis of categorical data . . . 47

3.3.2 Clustering with IBM intelligent Miner . . . 53

3.3.3 Cluster analysis with XploRe . . . 59

3.3.4 Comparison of Cluster Analysis Results: IBM Intelligent Miner versus XploRe . . . 63

3.4 Analysis of the latest User data (2003) . . . 63

3.4.1 Results of analysis of 2003 data . . . 63

3.4.2 Comparison of historical user data . . . 72

3.5 Complementary analysis . . . 78

3.5.1 Analysis of regrouped data . . . 78

3.5.2 Analysis of high profitable sector . . . 82

4. Suggested marketing strategy for XploRe 85 4.1 Marketing Strategy and Marketing mix . . . 85

4.1.1 marketing strategy . . . 85

4.1.2 Marketing Mix . . . 85

4.2 Develop the marketing strategy for XploRe . . . 91

4.2.1 Niche market strategy . . . 92

4.2.2 Target Market . . . 92

4.2.3 Product position of XploRe:¹⁰³ . . . 92

(5)

4.2.4 General XploRe marketing strategy pyramids . . . 93

4.2.5 General Marketing Mix . . . 96

4.2.6 Special marketing mix for clusters . . . 101

4.2.7 Marketing research - suggestions for further analysis . . . . 103

References 107 Appendix 116 Appendix 1: User 220702 – Frequency Analysis . . . 117

Appendix 2: Customer Frequency Analysis (Nov. 05) . . . 120

Appendix 3: Customer Registration form. . . 121

Appendix 4: Characteristics of User220702 Clusters by XploRe . . . 122

Appendix 5: User 130303 – Frequency Analysis . . . 123

Appendix 6: User 13032003 – Intelligent Miner Cluster Analysis . . . . 126

Appendix 7: Comparison of User and Regrouped User Data . . . 128

Appendix 8: User 130303 – (Regrouped) Frequency Analysis . . . 129

Appendix 9: Regrouped User – Intelligent Miner Cluster Analysis . . . 132

Appendix 10: Institute Users – Frequency Analysis . . . 134

Erkl¨arung zur Urheberschaft 137

(6)

iv Index of contents

(7)

1.1 The customers “Black box”. . . 6

1.2 A sequential model of the buying process . . . 7

1.3 Consumer Behaviour model. . . 9

1.4 Factors influencing consumer behaviour. . . 10

1.5 The process of marketing segmentation. . . 14

1.6 Alternative consumer demand categories. . . 15

1.7 SAGACITY. . . 20

1.8 Targeting strategies. . . 24

3.1 Sample of online survey questionnaire. . . 40

3.2 Clustering of Users 2002. . . 55

3.3 Clustering of user 2003. . . 67

3.4 Software used in 2000 and 2003. . . 74

3.5 Information resource in 2000 and 2003. . . 75

3.6 Clustering of regrouped user data. . . 81

4.1 4P of marketing mix . . . 86

v

(8)

vi Index of contents

(9)

1.1 Broad- based ACORN classifications ²³ . . . 18

1.2 National readership survey socio-economic groups ²⁴ . . . 19

2.1 The aspects of data mining . . . . 30

3.1 Summary and decription of the varibale of User 22/07/02 data . . 44

3.2 Summary and descripiton of the variables for customer data . . . 45

3.3 Comparison of XlopRes Users and Customers . . . . 47

3.4 Character characteristics of User IBM Intelligent Miner Clusters (2002) . . . . 57

3.5 Comparison of Clustering results with IBM Intelligent Miner and XploRe . . . . 64

3.6 Summary and description of the variables for User data 2003 . . . 65

3.7 Comparison of User 220702 and User 130303 . . . . 72

3.8 Comparison of software used in 2000 and 2003 . . . . 73

3.9 Comparison of information resources in 2000 and 2003 . . . . 74

3.10 Comparison of country in 2000 and 2003 . . . . 76

3.11 Comparison of continent in 2000 and 2003 . . . . 76

3.12 Comparison of User clusters of 2000 and 2003 . . . . 77

3.13 Summary and description of the variables of regrouped User data 2003 . . . . 80

3.14 Comparison of Institute user and General user . . . 84

vii

(10)

viii Index of contents

(11)

This thesis paper presents a case study of customer analysis with the purpose of to developing a marketing strategy for the statistical software XploRe. The customers analysed include the users, who downloaded XploRe free trial version through web site and the actual customers, who bought XploRe. Descriptive analysis was conducted for both data, which leaded to the conclusion that research institutes represent is the high- profit able sector for of XploRe. For users data, data mining method clustering was undertaken to identify the customer segments. Two different clustering methods were tested on the same users data set with different software IBM Intelligent Miner and XploRe. As the a result, the users of XploRe were divided into four clusters by both methods, “Internet surfer”,“Academia”, “Linux user” and “Home worker”. Through the comparison of historical data for of user data 2003 and data 20020, more facts and trends of XploRe market and customers were discovered regarding the software used, information resource, new market and the undergoing changes in customer segments. Based on the results of customer analysis, the suggestions for marketing strategy, marketing mix and further analysis were outlined.

Key words: customer analysis, market segmentation, data mining, clustering, marketing strategy, marketing mix

1

(12)

2 Abstract

(13)

Customer analysis is a crucial step for the development of marketing strategy.

Only when the company has a clear view of its customers could , the proper strategy and actions could then be undertaken to gain competitive advantage in the market.

In the current time, together with the development of digital data management systems, the capability for of gathering, storing and accessing to the information has improved dramatically. This trend brings the difficulty for companies when they confront the huge amount of data. Data mining is a important technology for the companies to conduct customer analysis for large data set. It discoveries valuable information which is useful for marketing.

The research presented in this paper tried to segment the customers and find the trends and facts of XploRe market, so that the suggestions for marketing strategy could be derived based on the results. XploRe is a statistical software which aims at “sophisticated users who are looking for a flexible, programmable statistics package with an emphasis on more advanced procedures”.¹ It is important for XploRe marketer to understand its customer and market. The customer data studied here include the data of XploRe users (the potential customer) and actual customers (the buyers). The user data was collected through an online questionnaire preceding the downloading process of XploRe trial version, while through the returned registration forms the customer data was gathered. With the purpose of comparison, two sets of user data were analysed and two clustering methods were tested with two software IBM Intelligent Miner and XploRe.

The user data 2002 is from October 11, 2001 to July 22, 2002 and with 1734 profiles. The raw data of user data 2003 contains 2593 profiles and is collected from October 11, 2002 to March 13, 2003. The customer data includes data of 32 profiles from July 1, 2000 to August 30, 2002.

Only descriptive analysis was taken for customer data due to its low amount of records. For user data, the data mining process of clustering was conducted to segment the market. The mining run for user data consists of several steps:

cleaning the raw data with MS Excel, transferring data to IBM Intelligent Miner or XploRe, performing cluster analysis. The clustering identified four groups of XploRe customers, namely “Internet surfer”, “Academia”, “Linux user” and

1H¨ardle, Klinke and M¨uller, 1999, P17.

3

(14)

4 Introduction

“Home worker”. Each cluster possesses its distinguishable features.

The comparison of customer and user 2002 leaded to the discovery of high profitable sector research institute. XploRe and IBM Intelligent Miner (IM) delivered similar clustering results for user data, but IM performed better in visualisation and computational efficiency. Comparing the results of historical data between user data 2003 and user data 2000, some trends were identified. More professional users switched to command driven software. XploRe made progress in commu- nicational channels. Asia, especially Japan emerged as new market. From the aspects of segments, Internet surfer is a brand-new group in 2003, which indicates the entering of Internet age. The appearance of “Home worker” in 2003 instead of

“Researcher” in 2002 gives hint in the problem in the survey questionnaire. More Academia take non-personal channels to get information. This again confirms the improvement made by XploRe in communication channels. Linux users were very stable during the period.

Based on the findings of analysis, some suggestions for marketing strategy and further analysis were made for XploRe marketer.

This paper consists of mainly four parts. The first two sections following the introduction lay the theoretical foundation for the customer analysis and data mining. Section three is presents engaged for the analysis and results. Marketing strategy and suggestions are developed in the fourth section. At the end, the summary gives a brief overview for the whole paper.

(15)

In the current market space, the competition is intensive. The market is abundant with all kinds of products. To win the decision of customers to their products, the companies should get a deep sight into what the customers really need and how to influence their purchasing e decision. Therefore, the companies should now have a “customer focus” conducting business with the emphasis on the understanding of the customers and the market.

Customer analysis is the study of customers and their behaviour, which is central to achieve a “customer focus”. ² The purpose of conducting customer analysis is to achieve marketing goals, such as the following: ³

• Customer acquisition finding the new customer

• Customer cross sell further sales of different products to the same customer

• Customer up sell the customer makes greater use of the same product or service

• Customer retention keeping the customer loyal

1.1 Customer Behaviour

In order to understand the customer buying behaviour, we should first understand the customer behaviour.

1.1.1 Customers “Black Box”

Customer behaviour here means that the behaviour of individuals who purchase for private or household consumption. These customers buy goods which are not a part of the value chain, and the purpose of purchasing is not to generate profit.

Buying behaviour depends on the individual reaction to the internal and external stimuli; therefore, it is difficult to predict. “Black box” is the item that describes

2WWW14

3Heygate, Richard, 1998.

5

(16)

6 1. Customer analysis

the customer purchasing decision, which is difficult to access but is crucial for the purchasing determination.

In order to develop appropriate products that are attractive to the customers, firms need to have an insight into what happens in the “black box”. Figure ??

presents the customer‘s “black box”. In the customers “black box”, the customer actually gather information, evaluate and compare, then come to a decision, which is called the “Consumer buying process”.

Black box

- Identification of needs - Evaluation of “offers”

that Satisfy need - Comparsion of substitute

products and brands - Purchase

- Post-purchase evaluation

Aspirations

Motivation Education Personality Beliefs

External stimuli

- Social pressure - Legal requirments - Physical factors - Economic cycle

Consumer

People – Place - Promotion -- -- Product – Price – Process – Physical environment

Marketer

7Ps

Fig. 1.1: The customers “Black box”.

1.1.2 Consumer buying process

Buying decision process

The buying process starts with the customers desire of a product. This want might be the result of internal stimuli like hunger and thirsty or the result of external stimuli, such as advertisement.

Next step is the search for information. The consumers may collect information consciously or unconsciously from various resources. There are four kinds of information resources:

1. Personal sources such as family, friends, colleagues and neighbours;

3Bannes, E., McClelland, B.,etc., 1997, P139.

(17)

Recognition of the problem

The search for information

Evaluation of the alternatives

The purchase decision

Post-puchase behaviour

Fig. 1.2: A sequential model of the buying process

2. Public sources such as the mass media and consumers organisation;

3. Commercial sources such as advertising, sales staff and brochures;

4. Experimental sources such as handling or trying the product.

Through information gathering, the customers get aware become aware of the var- ious products and brands in the market, then they will evaluate the alternatives, and finally make the purchase decision.

After purchasing major items or expenditure, many people experience cognitive dissonance also called “post purchase anxiety”. They wonder whether they have made the correct purchasing decision. To reduce this anxiety, they will look for confirmation. For example, they might ask friends to approve that their purchase is a right choice.

Figure 1.2 summarises the stages of consumer buying process: Recognition of the problem, The search for information, Evaluation of the alternatives, The purchase decision and Post-purchase behaviour.

Companies should present themselves in each buying process stage and try to be distinguished among all other products and brands of competitors. To let a brand or product be the final choice of customer, companies need to have clear understanding of the evaluative criteria used by consumers in comparing products, which was mentioned before.

3Wilson, R. W. S. and Gilligan, C., P170.

(18)

Five buying roles

The purchase process normally involves several persons, each has his distinct role.

Each role doesnt necessarily require to be the a different person. One person can play several roles in a purchasing process.

The five roles in a purchasing process are:

• The Initiator: The person who suggests buying the product or service.

• The influencer: Person whose comments can affect the decision of purchas- ing.

• The decider: The person who decide whether to buy and which product to buy.

• The buyer: Who executes the purchase.

• The user: The final consumer of the product or service.

For example, a mother buys ice cream for her child. The child is the user; the mother is the decider and buyer. The company should understand the function that each role plays in the buying process in order to put effective influence on customers buying decision through proper action.

1.1.3 Customer behaviour model

The customer behaviour model indicates the procedure and basic elements, which happens inside the customers “black box” or consumer buying process.

The most basic, simplest and best known model of buyer behaviour is the AIDA, which stands for Awareness, Interest, Desire and Action.⁴

The model introduced here composes of six interrelated components.⁵

1. Information or facts: refers to the precept caused by stimulus.

2. Product recognition defines to what the extent the buyer knows about the product to distinguish it from others products.

4Baker, M. and hart, S., 1999, P63.

5Howard, J. A., 1994, P31-56.

(19)

F R I P

A C

Fig. 1.3: Consumer Behaviour model.

3. Attitude towards the product refers to what the customer expects from the product to satisfy their particular needs.

4. Confindence in judging the product is the customers degree of certainty that his or her evaluative judgement of a product is correct.

5. Intention to buy is the mental state that reflects the customers plan to buy some specific number of products from a particular brand in some specified time period.

6. Purchase is caused by the intention to buy. It is defined as when the cus- tomer has paid for a product or has made some financial commitment to buy some specified amount during some specified time period.

F- Information R- product recognition C-Confidence A-Attitude I-Intention P- Purchase

When consumers evaluate a product, they also employ certain evaluative criteria, which have several aspects:

1. The products attributes such as its price, performance, quality, and styling.

2. Their relatively importance to the consumer.

3. The consumers perception of each brands image.

4. The consumers utility function for each of the attributes.

These evaluative criteria come cross with the elements in the consumer behaviour model. For instance, product recognition, attitude towards the product and confidence in judgement are the three parts in the buyers image of a product. They all have vital impact on the consumers buying decision.

(20)

Culture Sub-culture Social class

Economic cycle Social pressure Legal requirement New technology

Reference groups Family

Roles and status

The buyer Cultural

Environmental

Social

Psychological Motivation Learning Perception

Beliefs and attitudes Personal

Age and life cycle stage Occupation

Economic circumstance Lifestyle and personality

Fig. 1.4: Factors influencing consumer behaviour.

1.1.4 Factors influencing customer buying behaviour

Various factors influence customer buying behaviour. Generally we could put them into five categories: Psychological factors, Cultural factors, Social factors, Personal factors and Environmental factors. ^{6 78}

1.P sychologicalf actors

Human needs include the basic needs, like shelta, food and drink, and higher level needs, such as friendship and achievement. People purchase goods to satisfy their needs. The purchasing behaviour can be considered as the result of internal and external stimuli.

Maslow (1943) has suggested that behaviour can explained by a hierarchy of needs. He grouped peoples needs into five levels and argued that when a person is satisfied with one level of needs, he will strive for another level of needs. Maslows five levels of needs are Physiological needs, Safety needs, Social needs, Esteem needs and Self-actualisation needs.⁹

Physiological needs are the basic needs for human being to survival, such as food and drink. Only after these needs are satisfied, the other level of needs will be

6WWW11

7Bannes, E., etc., 1997, P139-149.

8Environmental factors are external factors, while the other four factor categories are internal factors that influence consumer buying behaviour.

9Bannes, E., Mcclelland, B., etc., 1997, P139-184.

(21)

desired.

Safety needs refers to peoples needs for security, stability and predictability. Ser- vices, such as insurance, guarantees, etc. are the products to satisfy humans safety needs.

Social needs explain the humans desire of love and sense of belonging. At this level, people will seek to join association and clubs.

Self-actualisation is the highest level of needs. It demonstrates itself in the search of status, esteem, achievement and recognition. To satisfy this level of needs, people turn to the luxurious products, like perfumes, high-tech products, cars, etc..

Only after people achieve all these level of needs, they will then turn to the realisation of their potential, which is expressed in concern for external issue, like volunteer work.

2. Personal factors

Personal factors are the set of buyers personal characteristics, including age, occupation, lifestyle, personality, and economic circumstances.

3. Cultural factors

Culture factors include culture, sub-culture and social class.

Culture is a set of shared values, which define peoples behaviour. Language is the best example of culture difference. Not rightly using a language will cause misunderstanding. And also there are attitude differences between eastern and western culture towards family and individual.

A large society or culture is normally divided into subculture groups, which define more subtle behaviour norms. Subculture groups include ethnic groups, religious groups, racial groups and geographical groups etc.. They exhibit the difference in culture preference, ethnic taste, attitudes, life style and taboos.

Social class is also called socio-economic group. It is decided by the income level, education and occupation. The often-used social class model divides the society into upper class, upper middle class, lower class, upper working class, working class and others.

4. Social factors

Social factors includes reference groups, family, social role and status.

Reference groups are defined as “all groups that have a direct (face-to- face) or

(22)

indirect influence on the persons attitude or behaviour”.¹⁰ Reference groups can be divided into four types.

1. Primary membership groups are generally informal, and interact within the members, such as family, neighbours, colleagues and friends.

2. Secondary membership groups are more formal than primary memberships, and the interactions between members are less. These include religious groups, professional groups, trade unions.

3. Aspirational groups are groups that one would like to belong to.

4. Dissociating groups are groups, whose values and behaviour are rejected by the individual.

5. Environmental factors

Environmental factors consist of economic, social, political, technological aspects.

Economic cycle, social pressure, legal requirements, new technology all will influence consumers purchase decision on which product to buy and the way to buy it.

1.2 Market Segmentation and Profiling

When firms try to sell their products in customer markets, they should not only try to identify the factors that influence the customers “black box”, but also to estimate whether there is enough number of customers who need their offer. It is important for the companies to compare their capabilities and the objectives of customers, so that they can decide whether they are able to serve the market with appropriate products profitably. Therefore, firms must identify market need, segment the total customer into potential customer groups, which are likely and able to purchase the offer, and also position the product or service as attractive alternative to other offers of the target groups.

10Wilson, Gilligan and Person, 1994, P160.

(23)

1.2.1 Market segmentation

“Market segmentation is the subdivision of a market into distinct subsets of customers, where any subsets may conceivably be selected as a target market to be reached with a distinct marketing mix”.¹¹

Market segmentation is inspired by Kotlers “Targeting marketing”. As Kotler said, that in “target marketing, “the seller distinguishes the major market segments, targets one or more of these segments, and develops products and services tailored to each selected segments”. ¹²

Because each individual has different preference, characteristics, taste and interest, their buying behaviour patterns are various and heterogeneous, it is almost impossible or unprofitable for a company or single product to serve all of the needs. Furthermore, the communication of marketing mix to a non-homogenous group will also be inefficient. Therefore, the companies search for the groups with attractive attribute, then concentrate on them to develop specific products, services and to utilise specific marketing resources to gain the maximal market return.

Segmentation identifies the subsets of buyers who share the similar needs and demonstrate the similar buying behaviour. It subdivides a heterogeneous total customer market into smaller, manageable and homogenous clusters by criteria.

The similar patterns of buyers needs and buying behaviour, which are identifiable and relevant to the buying decision, exist in each cluster.

Customer segmentation brings major benefits to the companies:¹³

• Efficiency

Because the customers are subdivided, companies could only focus on the interested markets. Therefore, they could allocate and utilise their resources more efficiently.

• Effectiveness

Through segmentation, the needs of each customer segments could be better identified and examined. Thus, the understanding and awareness of the customer needs could be enhanced. The companies could tailor their products and marketing measures to meet customer needs more effectively. Due

11Kotler, 1995, p286.

12Kotler, 1991, P262.

13WWW29.

(24)

Defining the market

Selecting the base for segmentation

Dividing the market and profiling

Fig. 1.5: The process of marketing segmentation.

to the improved marketing effectiveness, the response rate of customer will also increase, thus, the return and profit from marketing investment will also be improved.

• New Market

Segmentation could help companies to identify the new market opportunities. The needs and characteristic of the total customer /market are so various diverse that some unique feature of a small group are not distinguishable. After segmentation, company could discover those markets with unique features. They could offer the valuable opportunities for companies to enter new markets.

The process of market segmentation¹⁴

The process of market segmentation is composed of three steps.

1. Defining the market

The total market for a product or service comprise oses all of the consumers who

14Bannes, E., McClelland, B., and Meyer, R, 1997, P181-185.

(25)

Homogeneous demand

Consumers have relatively similar needs or desires for a product or service category

Diffused demand

Consumers´needs and desires are so divers that no clear clusters (segments) can be identified

Clustered demand

Consumers´needs and desires can be grouped into two or more idenitifiable clusters (segments), each with its own set of purchase criteria

Fig. 1.6: Alternative consumer demand categories.

desire or potentially desire it, and willing to and able to buy it. It is necessary to analyse the market in terms of its size and pattern of demand.

There are three patterns of demand categories: ¹⁵ 1. Homogeneous demand

All consumers in a market have similar needs and wants.

2. Diffused demand

Consumers needs are diverse and no clear segments can be identified. This suggests the need for customisation.

3. Clustered demand

Consumers need and desires can be grouped into several identifiable segments. Each has its own set of purchase criteria.

2. Selecting the approach and bases for segmentation

Identification of market segmentation could be conducted based on detailed market research, or on basic analysis of customer data held within a company. Many companies keep customer records detailing information such as age and gender.

15Bannes, E., McClelland, B, etc. , P181-183.

(26)

There are generally two types of methods for of market segmentation.^{16 17} 1. A Priori methods:

In a prior approach, the basis for segmentation is set in advance. The primary market research is not necessary. Thus, the analysis of second data resources, the customer information at hand, manger intuition and other methods will be employed to set the segmentation basis for the buyers according to their usage patterns (heavy, medium, light and non-user), demographic characteristics (age, sex, income) or psychographic profiles (personality). After the basis setting, a research will be conducted to identify the size, location and potential of each segment. The marketing decision will be based on which segment the marketing efforts should be concentrated. For example, classification is a prior approach.

2. Post hoc methods:

Post hoc approach segments the market depending on the research finding, rather than decides the segmentation basis in advance. The primary market research is conducted to collect the classification and descriptor variables. Segments will be defined only after all the relevant information is collected and analysed. The research might highlight the particular attributes, attitudes or benefits, with which particular groups of customers are concerned. The result then becomes the basis for dividing the market.

3. Dividing the market and profiling the segments

Based on the data gathered, the process of dividing the market into identifiable market segments is carried out. The information obtained will give details regarding to the nature of customer segments. This is called segment profiling.

Profiling associates tapes each segment with certain characteristics, and aggre- gates the customer with similar characteristics into group and separates them from those with different characteristics.

Criteria of customer segmentation

A market could be segmented in various ways. There are problems with segmentation, such as the relevance and quality of the data, intuition, continuous process

16WWW31

17Han, J. and Kamber, M, 2001, P281-319.

(27)

and over-segmentation. A good segmentation should be relevant for buying behaviour and satisfy the following requirements:^{18 19}

• Size: the market should be big enough to guaranty a good segmentation.

It is dangerous to over segment an already very small market.

• Difference: the difference between the member of the segments should exist and could be measured through data collection approach.

• Measurability: The company is able to collect information that measures the nature of buying behaviour for the segmentation.

• Substantiality: The selected segmentation should be profitable regarding to the marketing mix resources designed especially for it.

• Accessibility: The extend that the marketing effort could reach the segmen- tation.

• Stability over time: The segmentation should last a certain period without dramatic change in major features.

• Responsive to communication means: The segmentation sensitive to the marketing mix and communication means.

Variables for customer segmentation

Almost all factors which affect customers buying process and decision can be used as the variables of customer segmentation. Generally the variables for customer segmentation can be put into five categories: Demographic, Socio- economic Grade, Psychographics and life style, Behavioural, Geographic and Geo-demographics. ^{20 21}

1. Demographic variables

Demographic variables categorise the market according to the population characteristics and population profiles. Customers are subdivided into groups based on one or more demographic variables such as age, sex, religion, race, nationality, family size and stage of family life cycle. For example, the custom seller groups

18WWW20

19Wilson, R. and Gilligan, C., 1997, P275.

20Kalakota, R. and Whinston A. B..

21McDonald M. and Dunbar I., P85-91.

(28)

ACORN Group 1981

Population %

A Agricultural areas 1, 811, 485 4.3

B Modern family housing, higher incomes 8, 667, 137 16.2 C Older housing of intermediate status 9, 420, 477 17.6 D Older terraced housing 2, 320, 846 4.3 E Better - off council estates 6, 976, 570 13.0 F Less well-off council estates 5, 032, 657 9.4 G Poorest council estates 4, 048, 658 7.6

H Multi-racial areas 2, 086, 026 3.9

I High-status non-family areas 2, 248, 207 4.2 J Affluent suburban housing 8, 514, 878 15.9 K Better-off retirement areas 2, 041, 338 3.8

U Unclassified 388, 632 0.7

Tab. 1.1: Broad- based ACORN classifications ²³

customer regarding their ages. Like age of 20-30, this group are the customers, who are more like to purchase trendy items.

2. Geographic and Geo-demographics

Geographic segmentation divides the market into different geographic units such as countries, regions, counties, cities and postcode etc. Geographic system is based on the proposition that the neighbourhood area in which you live will be reflected in your professional status, income, life stage and behaviour. The neighbourhood types are initially identified using national census data.

ACORN (A Classification of Residential Nneighbourhoods) is an example of geographic systems. ACORN classifies consumers into 43 demographic and be- haviourally distinct clusters. The clusters are based on the type of neighbourhood, socio-economics status and the buying behaviour and preference.²² A Broad- based ACON classification is conducted in Great Britain in 1981. It segments the residents in Great Britain into 12 categories.

3. Socio-economic Grade

The buying behaviour is often influenced by the social class of a person The factors include income, status, education etc. National Readership Survey scales

22Kurs, M., Ryan, B., Lamb, G. etc., 2001.

23Bannes, E., McClelland, etc., 1997, P201.

(29)

Grade Social Classification Occupation

A Upper Middle Class Higher managerial, professional or administrative jobs B Middle Class Middle managerial, professional or

C1 Lower middle class Supervisory or clerical jobs, Junior management C2 Skilled working class Skilled manual workers

D Working class Unskilled and semi-skilled manual workers

E Subsistence level Pensioners, unemployed, casual or low grade workers Tab. 1.2: National readership survey socio-economic groups²⁴

is one of the popular classifications, which and is based on the occupation of the main wage earner of the household.

A further development of the life stages socio-economic grade model is SAGAC- ITY, developed by Research Services Ltd.. This model combines life stages with income and social class.

4. Psychographic variables

Psychographics attempts to classify individuals by their attitudes, personality and life styles.

(1)Personality

Personality is used as variable to segment the market. The earliest segmentation was conducted by Riesman et al (1950) in early 1950s. It identified three distinct types of social characterisation and behaviour: ²⁵

1. Traditional directed behaviour, which changes little over time and which as a result, is easy to predict and is used as a basis for segmentation.

2. Other directness, in which the individual attempts to fit in and adapt to the behaviour of the peer group.

3. Inner directness, where the individuals is seemingly indifferent to the be- haviour of others.

(2) Attitude

Attitude includes the customers attitudes towards risk, degree of loyalty, the

24Kurs, M., Ryan, B., Lamb, G. etc., 2001

24Blois Keith, 2000, P389.

25Wilson, Gilligan and Pearson, 1994, P291

(30)

Life Cycle Income Occupation

Family

Late Pre-family Dependent

Better off

Better off Worse off

Worse off

White-collar

White-collar White-collar Blue-collar

Blue-collar Blue-collar Blue-collar Blue-collar Blue-collar

Fig. 1.7: SAGACITY.

(31)

likelyhood of taking new products, etc. Many of the personality variables could also use as the descriptor of the attitude.

(3) Lifestyle

The consumers behaviour is determined by the way we live our lives as well. It arises from a complex relationship between our aspirations, surest situation, and perception of self, income and attitudes. Life style market segmentation offers a detailed view of buyers because it composes of numerous characteristics related to their activities, interests and opinions. The life style consist mainly of three dimensions: ²⁶

1. Activities: Work, hobbies, social events, vacations, entertainment, club, membership, community, shopping, sports.

2. Interests: Family, home, job, community, recreation, fashion, food, media, and achievements.

3. Opinions: Selves, social issues, politics, business, economics, education, products, future, culture.

5. Behavioural variables (1) Benefit sought variables

This group of variables for segmenting customer considers the motive for a purchase. It groups consumers according to specific benefits that they seek in a product. Even if two customers bought exactly the same products, the benefit they expected may vary. Benefit segmentation is therefore based on behaviour processes, involving thought and action, as opposed to age and socio-economic class, which are defined according to individual characteristics. It closely identifies the customers needs and represents a powerful method of understanding and influencing behaviour.

In applying for this approach, a company should begins by attempting to measure consumers value systems and their perceptions of various brands within a given product class. The information gathered is then used as the basis of marketing segmentation. Benefiting segmentation begins by determining the principal benefits that the customers are seeking in the product, the kinds of people who look for each benefit and the benefit delivered by each brand. For example, for teeth

26McDonald, M. and Dunbar, I., 2000, P89.

(32)

paste market, four segments are identified according to benefit: Seeking economy, Decay prevention, Cosmetic and Taste benefits.

(2) User status

The market can be divided into five segments, according to user status: non- users, ex-users, potential users, first-time users and regular users. First-time user and potential users can be further subdivided on the basis of usage rate.

(3) Loyalty Status and Brand Enthusiasm

Loyalty status categorises the customers on the basis of the extent and depth of their loyalty to particular brands or products. Most typically there are four categories: Hard core loyals, soft-core loyals, shifting loyals and switchers.²⁷

1. Hard core loyals are customers who consistently buy the same brands or product.

2. Soft-core loyals are those who are willing to choose from a limited brand set. Their Loyalty is divided among the limited brands or products.

3. Shifting loyals consists of consumers who shift their loyalty from one brand to another. After they shift the brand, they will not buy the ex-brand any more.

4. Switcher loyals are those who show no loyalty to any single brand. Their buying pattern is typically determined either by the special offers available or by their search for variety.

(4) Critical events

Major or critical events generate ones needs, which can be satisfied by the pro- vision of a special collection of products and/or services. Typical examples are marriage, the death of someone in the family, unemployment, illness, retirement and moving house, etc..

1.2.2 Customer profiling

Customer segmentation and customer profiling are two elements of Customer Re- lationship Management (CRM). Customer Profiling is performed after customer segmentation. Customer Profiling is to “locate clusters within the customer file that outperform the average”.²⁸ It creates customer segment profile, which labels

27Wilson, Gilligan and Pearson, 1994, P291.

28WWW18

(33)

the customers with their attributes.

Identifying the characteristic of the customers helps the company to decide which segments will respondse best to their marketing effort. When companies get clearer overview about the attributes and demands of the customer segments, they could then decide what action and what resource should be taken and located to the selected customer segments. Furthermore, according to pre-built models, customer profiling can also be used to find potential customers and delete inactive or “bad” customers.

The profiling attributes are similar as the segmentation attributes. For example, the profiling attributes include: Geographic, Cultural and e and ethnic, Economic conditions (Incomes and /or purchasing power), Age, Values, attributes, beliefs, Lifestyle Knowledge and awareness, Lifestyle, Media, Recruitment method. For acquired customer, the variable of customer behaviour could also be employed as profiling variables, such as shopping frequency, complaining, frequency, satisfied degree of satisfaction and preferences, etc.

1.3 Market targeting and Positioning

1.3.1 Market Targeting

The next task after customer segmentation and profiling is market targeting.

Companies choose one segment or several segments as the target market. The target market is the market that company decides to serve. Specific marketing mix and resources will be developed to serve the target market.

The companies normally adopts on e of the three targeting strategies:²⁹

• Undifferentiated strategy: Company ignores the difference between each cus- tomer segments, and regards the whole market as a single market. Single marketing mix is adopted for the whole market. This is the so called “mass marketing”.

• Differentiated strategy: The whole market is divided into several segments.

The company develops different marketing mix for different segments.

28Keith Blois, 2000, P398.

29Amstrong, G.and Kotler, P., 2002, P255-258.

(34)

Differentiated Strategy Concentrated Strategy Undifferentiated Strategy

Organisation

Marketing Mix

Marketing Mix 1 Marketing Mix 2 Marketing Mix 3

Segment 1

Segment 1 Segment 2 Segment 3 Segment 3 Segment 2

Entire market

Fig. 1.8: Targeting strategies.

• Concentrated strategy: The company chooses one or several market seg- ments, but only take the single marketing mix. Under this strategy, the company tries to have a high market share in one or several niches markets, instead of struggling to have a small share in the whole market. For the firms with limited resource, this strategy is very appealing.

1.3.2 Positioning

The purpose of target marketing is to focus on the selected target market, fine- tune the market mix to provide a group of potential customers with superior value, therefore, to build up unique position of product in the customers view.

A products position is “the complex set of perceptions, impressions, and feeling that it induces in consumers, compared with competing products”.³⁰ Positioning refers to the how customer think about proposed and /or present brands in a market. ³¹The fundamental idea of positioning is competitive advantage. ³²Through

30Bannes, McClelland, Meyer and Wieseh¨ofer, 1997, P230.

31WWW33

32WWW30

(35)

the differentiated market mix, the special needs and demands of customers could be satisfied. Thus, the customers will view the product or brand as superior to the others, and place the product or brand with a distinct position. To position a product, the marketer must appeal to the target customers strongly with its strength and differences using proper marketing mix.

(36)

2. Data Mining

Data mining, which is also known as Knowledge Discovery in Database KDD,³³ is a powerful new technology, which help company to identify the important information among the “sea” of data. Data mining technology is commonly used for customer analysis.

Fayyad defined data mining as “a non-trivial process aimed at identifying, valid, novel, potentially useful and ultimately understandable pattern in data”.³⁴ While Grameier and Rudolph consider data mining in terms of “all methods and techniques, which allow to analyse very large data sets to exact and discover previ- ously unknown structures and relations out of such huge heaps of details. These information is filtered, prepared and classified so that it will be a valuable aid for decisions and strategies”.³⁵

Data mining extract the implicit, previous unknown and potentially useful data from the data in order to automate the process of discovering the significant pattern and trends.

2.1 The process of Data mining

The process of data mining could be summarised in as the four stages: Data collection and selection, Data preparation, Data mining, and Result interpretation.³⁶

37

2.1.1 Data Collection and Selection

The Ways of data collection include:

• In-house customer database: Companies normally keep records of cus- tomers. The information of customer could be gathered from mailing list, receipt, memberships, warranty registrations, etc.

33Kotala, P., Perera, A., Kai Zhou, J.,ect.

34Fayyad, U., Piatetsky-Shapiro, G. et. al., P6.

35Grameier, J., and Rudolph A..

36IBM’s Data Mining Technology, 1996

37Bounsaythip, C. and Rinta-Runsala, E., 2001

26

(37)

• External resource: There are resources, from which one could obtain infor- mation such as demographic information.

• Research survey: The often-used way to collect particular information is to conduct a survey. The survey could be conducted through face-to-face interview, telephone interview, and postal questionnaire or via Internet.

During the collection of data, two types of variables should be collected:³⁸ Clas- sification Variables classify the data set into groups. Most demographic, geographic, psychographic or behavioural variable can be used to classify customer into segments.

• Demographic variables: Age, gender, income, ethnicity, marital status, ed- ucation, occupation, household size, length of residence, type of residence, etc.

• Geographic variables: City, state, zip code, census tract, county, region, metropolitan or rural location, population density, climate, etc.

• Psychographic variables: Attitudes, lifestyle, hobbies, risk aversion, per- sonality traits, leadership traits, magazines read, television programmes watched, etc.

• Behavioural variables: Brand loyalty, usage level, benefits sought, distribu- tion channels used, reaction to marketing factors, etc.

Descriptor variables are variables used to describe and distinguish each subgroup from each other in a data set. We could say that the descriptor variables stand for the characteristic of the represented data set. Descriptor variables must be easily obtainable variables that already exist in or appended to the customer files. Many classification variables could be used as descriptor variables.

The data is normally stored in a data warehouse. As the data warehouse contains all diverse types of data, so that to conducting data mining, the data that will be used in analysis should be selected in the first step.

38WWW7

(38)

28 2. Data Mining

2.1.2 Data Preparation

Before data can be analysed, the original collected data must be prepared first prepared in order make to let it suitable for the analysis. Data preparation consists of the following stages:

1. Data cleaning:

• Check out abnormal, out of bounds or ambiguous items.

• Strip out unwanted fields or items. Some attributes are useless for analysis purpose, such as version numbers, email address, etc.

• Resolve inconsistent data formats, data encoding, geographical spellings, abbreviations and punctuation

2. Data description

• Supply meta data such as row or value counts or variables

3. Data Transformation:

• Convert string variables into numeral or numeric categorical variables, or interpreting or replacing codes into text.

• Check missing values. Delete or replace them by default values.

• Add computed field as input or target.

• Combine data from multiple sources under a common code.

• Identify Find out multiple used fields that are multiple times.

• Convert continuous variable into category variable for some methods.

• Convert nominal data into metric data.

(39)

4. Data Sampling³⁹

• Required for training or model building

5. Data pruning

• Identify dependent, independent and correlated columns or variables

2.1.3 Mining

At the mining stage, various techniques could be used to extract the valuable information from the final prepared data. For example: To create an accurate, sym- bolic classification model to predict whether a reader will continue to subscribe for a newspaper. First, clustering technique should be conducted to segment the subscribers database; then, the rule is introduced to create a classification model automatically for each desired cluster, through which one could predict the behaviour of a customer.

2.1.4 Result Interpretation

Result interpretation is not only to visualise (graphically or logically) the output of data mining, but also to filter the information and identify the most valuable and proper result, which will help in the decision making. If the interpreted result is not satisfactory, the data mining stage or even the whole data mining procedure should be repeated. The final extracted information must be comprehensible.

2.2 The Aspects of Data Mining

Data mining could be distinguished between the aspects of applications, operations, techniques and algorithms.^{40 41}

39Ferguson, Mike

40WWW 4

41IBM’s Data Mining Technology, 1996

(40)

30 2. Data Mining

Applications Database marketing Customer segmentation Customer retention Fraud detection Credit checking Web site analysis

Operations Prediction and classification modelling Link analysis

Database segmentation Deviation detection Techniques Supervised Induction

Clustering

Association discovery Sequence discovery

Tab. 2.1: The aspects of data mining

2.2.1 Applications

Data mining is widely used in customer analysis and marketing. The following areas cover the main application of data mining.⁴²

Customer segmentation: Data mining tools automate the process of find predictive information in large database. The companies, especially the retailers, banks, are interested in knowing if there are sub-group customers who exhibit certain characteristics. They could use data mining to clustering the customers, discover interested groups. For example, companies use data mining to analyse the historical mailing list in order to find out the high return to investment group, so that they could determine the new mailing target groups. Banks and credit companies classify the credit scoring to identify the customer segments, which has lower risks.

Relationship management: Data mining discovers and identifies the previous unknown relationships hiding in the data. The buying patterns of a customer are of interested to by the retailers and advertisers. Combined with customer segmentation, data mining could help them to find out the relationship between the purchase of product items, and customer types, or to improve the conduction of a advertisement campaign on special media for specific group of customers.

42Carbone, Patricia L.

(41)

2.2.2 Operations

• Predictive and classification modelling: Predictive model uses the contents of database, which reflect historical data to automatically generate a model that can predict a future behaviour. Classification sub-divides a data set according to number of special outcomes. The goal of modelling operation is to create the generalised character characteristics description for the data.

For instance, a marketing executive may be interested in predicting whether a particular consumer will switch to a new product.

• Link analysis: The goal of link analysis is to establish the relationship between the records in database. The retailers want to know which items will be purchased by a customer together in order to make decision in the items layout and goods purchasing. For instance, if it is found that customer will buy a CD after the purchasing a CD Player, then the store manager should decide to put the CD counter close to the CD player counter.

• Database segmentation: The database often contains various types of data, so that it is often necessary to segment the data into small groups with related records. The purpose could be either to obtain a general description for each collection or to prepare for a further analysis, such as model creation or link analysis. Suppose the store manager wants to know the combination of goods purchased by customer in a particular visit period.

The database could first be segmented according to time period attribute, such as “Christmas sale”. Then the link analysis could be conducted to find out the relationship between the combined goods.

• Deviation detection: The aim of deviation detection is to identifying the outlier in a particular dataset whether its presentation is due to noise, im- purities or causal reason. This operation is opposite to database segmentation, and is often carried out together with segmentation. Because outliers express the deviation from some known expectation and norm, therefore, deviation detection often is the source of true discovery.

2.2.3 Data Mining Techniques

Numerous techniques support the operations of data mining to find the desired groups or relationships.

(42)

32 2. Data Mining

Classification and predictive modelling is supported by supervised induction techniques. Clustering supports database segmentation. Association discovery and sequence discovery are used for the link analysis. The deviation detection is supported by statistical techniques.

The desired relationships to be discovered by data mining are:⁴³ Classes: in which the data items is located into predetermined groups.

Clusters: in which the data items are grouped by logical relationships.

Associations: data is mined to identify associations.

Sequential patterns: data is mined to anticipate the behaviour patterns and trends.

Supervised Induction

Supervised induction is “the process to automatically create a classification model from a sets of records (example)”⁴⁴, which is called the training sets. The records in the training set must belong to a set of pre-defined classes. Each class has a distinguishable pattern, which is generated from the existing records. Once the model is set up and induced, a new record could be automatically put into a class according to its pattern.

Supervised induction contains steps of classification and prediction to put elements into ppredetermined erformed groups according to some criterion. The numbers of subgroups and the feature of each subgroup are defined at beginning.

Then, the feature of the observation will be compared with the criterion and then be put into corresponding ed group.⁴⁵ This is usually done in two steps:

• Step 1: Build a model to describe the predetermined data set groups or classes. The model contains a set of classification rules (labels).

• Step 2: If the accuracy of the model or classifier is acceptable, the model can be used to classify the new unlabeled data groups or elements.

Clustering Clustering is a method of grouping data elements into homogenous groups. It divides a heterogeneous data set into disjoint sub-groups, so that the elements in any ner one cluster is highly similar, while the elements in different

43Chung, H. M., Gray, P. and Manino, M., 1998

44IBM’s Data Minging Technology, 1996.

45Han, J. and kamber M., 2001, P279-325

(43)

clusters are with highly dissimilarity. Clustering is an unsupervised technique and is employed when you wan to find groups of similar records without any precon- ditions. The elements inside a cluster are highly similar to each other, while the elements between clusters are highly dissimilar according to some criterion. The difference between clustering and classification is that in clustering, the numbers of subgroups and the features (label) of each subgroup are unknown in advance, while in classification, the numbers of subgroups and the feature of each subgroup are defined at the beginning.

Cluster analysis has two steps:⁴⁶

• Choose a proximity measure

A proximity measure decides the similarity or “closeness” of objects. The homogenous objects are more similar and closer.

• Choose a clustering strategy

In this step, the clustering algorithm and/or initial parameters are decided.

According to the chosen proximity measure and method, the whole data set is divided into groups (clusters). The elements within a group should be as closer as possible and the dissimilarity between groups should be as large as possible.

After the clusters are built, normally some descriptive methods could will be employed to describe each cluster in order to get a comprehensive overview of the dissimilarity between clusters.

1. Proximity measure

The commonly used proximity measures include Jaccard, Tanimoto, Simple Matching, Minkowski Kulczynski and Euclidean distance.

2. Clustering strategy (method)

The clustering methods generally belong to several major family:⁴⁷

1. Hierarchical algorithms 2. Iterative partitioning 3. Density search

46H¨ardle, W. and Simar, L, P295-313.

47Aldenderfer M. S. and Blashfield, R. K., P35.

(44)

34 2. Data Mining

4. Factor analytic 5. Clumping

6. Graphic theoretic

Here we only discuss two basic clustering algorithm methods: Hierarchical algorithms and Iterative partitioning algorithm.

(1) Hierarchical algorithms

Hierarchical clusteringc can be performed using algorithm is composed of two main types different of procedures: Agglomerative procedure and Splitting procedure.

• Agglomerative procedure starts from the finest partition. It considers each observation as a cluster, then puts groups together to form new clusters.

At each stage in the procedure, the number of clusters is reduced by one, by through the joining or fusing two groups into one, which are considered to be the closest or most similar groups. Aggolomerative algorithm is a frequently used procedure. It contains the following steps:^{48 49}

1. Construct the finest partition. Normally each observation is a group.

2. Compute the distance or dissimilarity matrix.

3. Find out the closest or most similar groups.

4. Put the two most similar groups together to form a cluster.

5. Computer the distance or dissimilarity between the new groups, get a reduced distance or similarity matrix.

6. Repeat the step 3 to step 5, until the optimal clusters are formed.

• Splitting procedure is opposite to the agglomerative procedure. It considers the whole data set as a cluster to start with, then splits the cluster into sub groups to form new clusters.

• The linkage for Agglomerative algorithm There are many linkages to mea- sure the proximity or similarities of elements and groups. The frequently normally used linkages are:

48Mardia, K.V., Kent, J.T. and Bibby, J.M., 1979, P360-390.

49Everitt, B. S. and Dunn, G., 1991, P99-126.

(45)

Single linkage defines the smallest distance of individual as the distance of two groups.

Complete linkage is opposite to the single linkage, defines the largest dis- tance of individuals as the distance of two groups.

Average linkage (non-weighted and weighted) computes the average distance.

Centroid linkage uses the natural geometrical distance as the distance of groups.

Median linkage chooses the median of individual distances as the distance of groups.

Ward Linkage is related to the centroid linkage, but it uses rather an “in- teria” distance rather than a geometric distance.

(2) Iterative Partitioning algorithms

Partitioning algorithms starts with given groups. Then the elements exchange between groups until the highest homogeneity within groups and highest hetero- geneity between groups or some criterion is reached.

The iterative partitioning algorithms are normally undertaken according to the following steps :⁵⁰

1. Begin with an initial partition of a chosen certain numbers of clusters.

Compute the centriods of these clusters.

2. Allocate each data point to the cluster that has closest centroid.

3. Compute the new centroids for new clusters. The clusters are not changed until a complete pass through of the data.

4. Iterated the steps of (2) and (3) until no data points change clusters and reach the highest similarity inside the cluster.

Association rule discovery

Association rule discovery is an iterative approach, also known as level-wise search. Association rule methods try to discover interesting relationships between the items in data and identify the customers behaviour patterns. The A typical association rule example is the “Marketing basket analysis”. This analysis tries y to find out when the customers do shopping, what kinds of products are

50Aldenderfer M. S. and Blashfield, R. K., P45-49.

(46)

36 2. Data Mining

more likely to be put into the “shopping basket” together. Through this analysis, retailers are able to identify which items are frequently purchased together by the customers.

An association rule is the relationship of the form X ⇒ Y , where X is the antecedent item set and Y is the consequent item set. For example: “customers who purchased item X are very likely also to purchase item Y at the same time”.⁵¹ There are two measures for each rule: support and confidence.⁵²

• Support (or prevalence) indicates the occurrence frequency of an itemset.

s(A ⇒ B) = P (A ∪ B)

• Confidence (Certainty or Predictability) measures the validity of the pat- tern. It indicates, denotes how strong the strength of the relationship between the items, and to what degree an item depends on the others.

For example: Among the customers who buy computers, only 5% customers are students. and buy laptop. But if a customer is also a student, the possibility of his buying a computer is 20%. In this rule: 5% is support and 20% is the confidence.

Two other important measures for association rule discovery are: Expected confi- dence - the possibility of an items purchasing regardless what other items haves been bought together. For instance, “customers buy a computer 40% of the time”, 40% is Expected confidence.

Lift - refers to the difference between the confidence of a rule and the expected confidence, either in the form of absolute difference or in the form of ratio. When Lift is negative or less than one, it means the itemset of the rule are unlikely to happen or two products are unlikely to be purchased at the a same time.

The goal of association discovery is to find out all the associations with s% support and c% confidence in the data of transaction.

1. Data format

Two types of format are used to form the data for association discovery:

1. Horizontal format: each entry as a row, each attribute is a column.

51Kotala, P. K, Perera, A., Kai Zhou, J., etc., 2001

52WWW4