A Study of Discovering Customer Value for CRM:Integrating Customer Lifetime
Value Analysis and Data Mining Techniques
Chi-Wen Chen1 Chyan Yang2 Chiun-Sin Lin1
1Department of Management Science, National Chiao Tung University 2Institute of Information Management, National Chiao Tung University
1001 University Road, Hsinchu, Taiwan 300, ROC Corresponding Author: Chi-Wen Chen
E-Mail Address: [email protected]
Abstract
Customer relationship management (CRM) has become an important strategy for businesses. Under the current competitive commercial environment, the discovering, maintenance, and strengthening of customer value is a key for businesses to earn sustainable profits. Past studies have found that customer lifetime value (CLV) can be used to calculate each customer's contribution to the company, and data mining can be employed as an analytical tool to discover customers' potential behavioral patterns and characteristics. Though both are complementary, rarely are there studies applying the two methods at the same time. This research develops a conceptual framework which combines CLV analysis and data mining techniques to enhance CRM. Firstly, we use CLV analysis to calculate customers’ current value (CCV) and customers’ potential value (CPV). Next, the clustering analysis is used to group customers based on each customer’s CCV and CPV. Finally, a data mining method is employed to discover the characteristics and the potential purchasing behavioral patterns of each group. By establishing a customer value pyramid based on our finding and providing the marketing implications for each group, this research served as a reference for managers engaging in CRM strategy.
Keywords: Customer Relationship Management, Customer Potential Value, Customer Lifetime Value,Data Mining
1. Introduction
The concept of customer relationship management (CRM) has pervaded several industries in the past decade. The focus of companies has shifted from treating customers as just an entity involved in the business process to treating them as a crucial component of their success (Jain and Singh 2002) Companies have been interested in actively developing relationships with targeted customers because they are becoming increasingly aware of the many potential benefits provided by CRM (Kim, Suh, and Hwang 2003). Thus, CRM is increasingly found at the top of corporate agendas (Guru, Ranchhod, and Hackney 2003). Recently, increased emphasis for CRM has been placed on developing a measurement to understand the value that customer created and to give managers a better idea of how their CRM policies and programs are working (Winer 2001).
One of good candidates to develop this measurement is Customer Lifetime Value (CLV) analysis which is the present value of all future profits generated from a customer (Gupta and Lehmann 2003; Haenlein, Kaplan, and Beeser 2007).CLV is very useful tool for measuring customer value because it is a systematic way to understand and evaluate a firm’s relationship with its customer (Haenlein et al. 2007; Jain and Singh 2002). However, CLV analysis may not be able to understand and predict customers’ behavioral patterns. As a result, the information that CLV analysis provides is not good enough to make a correct decision for managers. Thus, it urgently needs a tool to assist CLV analysis to carry out the measurement of CRM.
Data mining is another good candidate to measure customer value for CRM. It is the process of searching and analyzing data in order to find implicit, but potentially useful, information. It involves selecting, exploring and modeling large amounts of data to uncover previously unknown patterns, and ultimately comprehensible information, from large databases (Berry 1997; Frawley, Piatetsky-Shapiro, and Matheus 1992). Therefore, Data mining can identify valuable customers, predict future purchase behaviors, and enable companies to make proactive, knowledge-driven decisions (Jie Sun 2008; Rygielski, Wang, and Yen 2002). These benefits are what CLV analysis needs. Nevertheless, data mining could not calculate the present value of all future profits generated from a customer. In order words, it can not know the transactional values that customers have.
Given the above introduction on CLV and data mining, we could realize that CLV and data mining are all very useful tools for measuring customers’ value and behavioral patterns, respectively, but they all have their own problems. On the one hand, CLV can calculate each customer’s potential value, yet it could not analyze customers’ behavioral patterns. On the other hand, data mining can extract customer’s potential buying patterns, but it is not capable of understanding customers’ transactional values. Obviously, they seem to be complementary and needs to be worked together, but few studies combine these two methods together for CRM. Therefore, the major purpose of our paper is to develop a process which integrates CLV analysis and data mining techniques to improve the measurement of customer value and enhance the efficiency and effectiveness of CRM.
2. Literature Review
2.1. The Importance of Measurement of CRM
CRM can be defined as managerial efforts to manage business interactions with customers by combining business processes and technologies that seek to understand a company's customers (Kim et al. 2003). Gurau and his colleagues (2003) indicated that the implementation of customer-centric systems comprises a number of essential stages: (1) collect information about customers; (2) calculate the CLV; (3) segment the customers in terms of value (profitability) and establish the priority segments. Hwang and his colleagues (2004) also reported that precise evaluation of customer value and targeted customer segmentation must be critical parts for the success of CRM. Mulhern (1999) presented that customers are important intangible assets of a firm that should be valued and managed. Ryals (2002) suggested that in order to manage relationships as assets, companies need to know which are their most valuable and which are their least valuable relationship assets so that appropriate marketing strategies can be put in place. On a basis of literature with regard to CRM above, we may infer that there are some key successful factors for implementing CRM, including calculating the CLV, finding out the most and least valuable customer, segmenting the customers in terms of value, establishing the priority segments, and identifying unique characteristics of each customer with target segment.
2.2. Customer Lifetime Value analysis
Many CRM researchers attempt to develop a comprehensive model of customer profitability since the question who and where profitable customers are is always a very important issue in CRM (Hwang, Jung, and Suh 2004). For this reason, CLV becomes one of central ideas of CRM. Hwang and his colleagues (2004) define CLV as Sum of the revenues gained from company’s customers over the lifetime of transactions after the deduction of the total cost of attracting, selling, servicing customers, and taking into account the time value of money. Pfeifer, Haskins and Conroy (2004) , from CRM perspective, assert that CLV is the present value of the future cash flows attributed to the customer relationship. Similiarly, Ryals (2002) who considered individual customer and future potential customer suggests that CLV is the present
value of a customer’s future purchases. Usually, the CLV calculation is based on the expected purchases of a single customer and adjusted back to the present day using a discount rate.
The literatures of CLV model have taken multiple directions. However, the basic structure model of CLV is shown as Equation 1 (Jain and Singh 2002).
(
)
∑
=+
−−
=
n i i i id
C
R
CLV
11
0.5 (1)Wherei = the period of cash flow from customer transaction;Ri= revenue from the customer in period
i
; Ci= total cost of generating the revenueRiin period i;n = the total number of periods ofprojected life of the customer under consideration. This simple CLV model has been extended to more sophisticated model considering more critical factors. For example, Hwang and his colleagues (2004) who thought that the existing CLV models did not consider Customer Potential Value(CPV) develop a novel CLV model which contains socio-demographic information, past profit contribution, expected future cash flow (potential value) and customer loyalty. Indeed, Bauer, Hammerschmidt and Braehler (2003) indicate that many CLV models do not provide marketing-relevant information regarding customer specific details, such as expected cross-selling or references. Similarly, Kim and Kim (1999) also suggest that it is important to consider cross-selling and up-selling as well to calculate customer value.
Due to this consideration, it is important to take potential behavior factors into CLV analysis. Schmittlein and Peterson (1987) proposed a model, called the Pareto/NBD model (or SMC model), considering the idea of future active of each customer. Future active means how many purchases can be expected from each customer during any future time period of interest. Schmittlein and Peterson (1994) extended the model to explicitly incorporate dollar volume of past purchases and further constructed the purchase volume model with Pareto/NBD’s customer transaction/retention model. This factor is quite essential since it can predict customer’s future purchases and could be considered into CLV analysis. Indeed, Jain and Singh (2002) assert that this model provides a sophisticated way to get these probabilities of a customer being active in each time period. The probabilities thus can then be used to calculate CLV.
2.3. Data Mining Techniques
Shaw and his colleagues (2001) indicate that the knowledge about customers from these databases is critical for the marketing function, but much of this useful knowledge is hidden and untapped. Since databases contain so much data, it becomes almost impossible to manually analyze them for valuable decision-making information (Goebel and Gruenwald 1999). Under such a circumstance, data mining tools can help uncover the hidden knowledge and understand customer better (Shaw et al. 2001; Yen and Lee 2006). Gerry and Chye (2002)thought that data mining provides the technology to analyze mass volume of data and/or detect hidden patterns in data to convert raw data into valuable information.
To discover hidden characteristics, we need to know the process of data mining (Hui and Jha 2000). Hui and Jha (2000) propose the steps in data mining process: (1) Establish mining goals. (2) Select data. (3)Preprocess data. (4) Transform data. (5) Combine data tables and project the data onto working spaces. (6) Store data. (7) Mine data. (8) Evaluate mining results. (9)Perform various operations such as knowledge filtering from the output, analyzing the usefulness of extracted knowledge, and presenting the results to the user for feedback Our research will extend this process to build discovering customer value process which combines CLV analysis and data mining techniques.
2.4. CLV Analysis and Data Mining Techniques for CRM
CLV analysis and data mining techniques have strong relationship with CRM, because CLV analysis are a systematic way to understand and evaluate a firm’s relationship with its customer(Jain and Singh 2002) and data mining techniques can help uncover the hidden knowledge and understand customer better (Shaw et al. 2001). Rygielski and his colleagues (2002) also indicate that data mining have made CRM a new area where firms can gain a competitive advantage and identify valuable customers, predict future behaviors, and enable firms to make proactive, knowledge-driven decisions.
3. Conceptual Framework and Process 3.1 A Conceptual Framework of Discovering CCV And CPV
Based on literature reviews, our research integrates CLV analysis and data mining techniques together. The conceptual framework is displayed in Fig1.
We evaluate CLV by considering two factors—customer current value (CCV) and customer potential value (CPV). SAS Institute Inc. (2006) indicates that to manage customers’ value, it is important to understand both their current value and long-term potential value. Therefore, this study extends this concept and, based on literature, develops a CLV model which combines both CCV and CPV to better understand customers’ value. In the current value section, Hwang and his colleagues (2004) develop a novel CLV model which considered CCV, CPV and customer loyalty at same time. Our research adopts their CCV part. The model is shown in equation. 2.
( )(
)
∑
= − + = i i i i N t t N i p i t d CLV 0 1π
(2) it
: service period index of customer i iN: total service period of customer i
d
: Interest rate ( )ip t
π : Past profit contribution of customeriat periodti The sum of
( )(
)
Ni ti i p t d − + 1π represent net present value (NPV) of the past profit contribution, whereπ
( )
ti is the profit contribution of customer i at period ti and(
)
Ni tid −
+
1 is the interest rate factor, which transforms the past profit into the present value.
In the potential value section, a prediction for the potential value is obtained by two elements — expected customer active probability and expected purchase volume. Hence, we obtain the following equation to compute the potential value of customer i.
Potential Valuei = Prob(Active )i×E(Volume )i (3)
Enhancing the Efficiency of CRM CLV Analysis Data Mining Techniques Discovering CLV, CPV and Behavioral Patterns
Where Prob (Active)i is the probability that customer i would be still active in the next transaction. E
(Volume)i means the excepted that customer i purchase volume in the next transaction. That is, the equation
above means expected purchase volume from a particular customer who still have active for the company in the next transaction period.
We replicated the estimation of the Pareto/NBD model used by (Schmittlein, Morrison, and Colombo 1987) to obtain the necessary parameter estimates for this study and to calculate Prob (active) for each customer (see Appendix 1) . On the other hand, we use extend Pareto/NBD model, which developed by (Schmittlein and Peterson 1994) , to predict E(volume).
In the data mining technique, the data mining techniques used in this study are association rule, OLAP, decision tree and cluster analysis, which are broadly used for discovering potential buying pattern, confirming characteristics of customer and segmenting customer.
3.2 The Process of Discovering Customer Value
Fig 2 shows the process of discovering customer value. This process is different from that of SAS Institute Inc. (2006) in that the whole process contains more complete and detail information of how to empirically calculate CPV and how to combine CLV and Data Mining techniques. It consists of six major processes: selecting the raw data, data preprocessing, CLV analysis, discovering preliminary CCV and CPV, applying data mining techniques, discovering CCV and CPV and implementation. Our research would follow this process to estimate the empirical data.
3.3 Data Description
Raw data of this study is two year transaction and socio-demographic data of hypermarket. This dataset is composed of approximately 270,000 records of transaction, account information of 8,295 customers and over 1,500 items of product.
3.4 Data Preprocessing
There are a number of tables in the database. Nevertheless, not all the data are related to the chosen purposes. After an initial selecting, our research considers four tables to constitute the data structure (star-schema). The step of data cleaning is to remove the noisy, erroneous, and incomplete data. All the records with missing values are deleted to avoid problems in calculation. Additionally, some customer had never bought anything before, so these customers are not suitable for CLV analysis, thus we would delete it. Our research uses SQL to deal with those data.
Some data are necessary to transformation into a specified format (defined during the construction of the database) in order to mine tasks. For example, a new column “Expected_active_pro” (Expected active probability) is created by calculating the Pareto/NBD model. This new attribute is suitable for further analyzing CPV.
Fig 2. The Process of Discovering CCV and CPV CCV and CPV Discovering CCV and CPV and Implementation Data Preprocessing Data Mining Techniques Discovering Preliminary CCV and CPV Raw CLV Analysis Data Data Cleaning Selection of Data
Socio-Demographic Transaction data
Probabilityactive & Expected buying Customer current Discovering CCV and CRM Strategy Visual Display Socio-Demographic (Discovering characteristics) Clustering (Customer Decision Tree
Association Rule OLAP (Cross-Selling and
4. Empirical study
After data preprocessing, in this section, we estimate and evaluate the model from the empirical data through CLV analysis and data mining techniques.
4.1 CLV Analysis for CCV
In respect of CLV analysis, according to Equation 2, each individual customer's profit value in all terms, after being discounted, is added to give an individual customer's current value. Table 1 provides the result of the top 10 customers' discounted current value, total number of transactions of each customer, and first-time transaction. In the table, customer No.6 has a high total number of transactions and early first-time transaction times, so his CCV is high (317.18). In contrast, Customer No.2's total number of transactions is low, so his CCV is low (1.43), even though his transactions took place early.
4.2 CLV Analysis for CPV
CPV's calculations are divided into two parts, customer activity probability (Prob(Active)) and expected future purchase volume (E(Volume)) . This study calculates the Prob (Active) based on the Pareto/NBD model developed by Schmittlein, Morrison and Colomobo (1987).
After obtained the necessary parameter estimates for this study, we found that parameter α < β; Therefore, all parameters are put into Case A2 (see Appendix 1). During the calculation process, each customer's latest transaction time and total number of transactions must be found out, and the Gauss hypergeometric function is calculated. The result is listed in Table 2 (only Top 10 customers). Table 2 shows that the closer the latest transaction time (the bigger the t) , the bigger the Prob (Active). For instance, Customer No.2's latest transaction time is t = 1, so his Prob (Active) is as low as 0.021. On the other hand, Customer No.4's latest purchase time is t = 19, so his Prob (Active) is as high as 0.980.
Table 1 CCV analysis results
Customer ID Total number of transactions First-time transaction CCV results
1 30 4 283.37 2 1 1 1.43 3 12 7 173.60 4 19 15 191.65 5 7 16 65.08 6 26 1 317.18 7 7 19 26.45 8 11 7 19.91 9 8 8 75.95 10 5 13 41.12
As to expected future purchase volume, reference is given to the calculation method proposed by Schmittlein and Peterson (1994). First, in the entire transaction database, the total average sale and variance are calculated, 6.54 and 11.992, respectively. Next, each customer's average sale, variance and reliability coefficient are calculated. Besides, because the variance for customers who transacted only once is zero, all customers' variances must be calculated for the mean to serve as an index for the variances. The result is 10.837 as shown in Table 3 (only Top 10 customers). The result finds that the smaller the variance, the more stable the purchase volume is expected for the next period. For instance, Customer No.5's average sales is 5.74, variance 2.466, so his E (Volume) is 5.751. By contrast, Customer No.3's average sale is 9.4, variance 25.171, so his E (Volume) is 9.064.
Table 2 Prob(Active) analysis results
Customer ID latest transaction time (t) F(a2,b2;c2;z2(t)) F(a2,b2;c2;z2(T)) Prob(Active)
1 16 1.8109 1.652012 .183 2 1 1.153 1.010496 .021 3 18 1.224 1.297464 .961 4 19 1.3663 2.032917 .980 5 16 1.1342 1.232249 .811 6 18 1.5784 1.680811 .927 7 19 1.1123 1.112324 .980 8 18 1.2021 1.253595 .962 9 12 1.2129 1.131325 .184 10 13 1.1113 1.07563 .508
Table 3 E(volume) analysis results Customer ID Average sales Variance Reliability coefficientE(volume) 1 6.77 6.303 .982 6.768 2 1.08 10.837* .525 3.672 3 9.40 25.171 .884 9.064 4 5.93 9.761 .980 5.938 5 5.74 2.466 .984 5.751 6 6.27 10.253 .973 6.276 7 6.34 6.976 .923 6.351 8 4.45 5.939 .965 4.520 9 6.27 19.188 .833 6.316 10 5.30 3.691 .942 5.368
After Prob (Active) and E (Volume) are obtained, CPV can then be calculated based on Equation.3. The result is shown in Table 4 (only Top 10 customers) below. It is found that Customer No.3's probability of re-transaction Prob (Active) as well as expected future purchase volume E (Volume) are big, so his CPV is the biggest (8.715). On the other hand, Customer No.2's probability of re-transaction Prob(Active) as well as expected future purchase volume E (Volume) are small, so his CPV is the smallest (0.078).
4.3 Clustering Customers for Similar CCV and CPV
In the next step, using the K-means method based on the CCV and CPV results were applied for customer groupings. This must specify the number of clusters, k, in advance. After k = 1, 2, 3, 4, 5 is tested, it is found that the number of clusters k = 3 has the best clustering effect. The result is provided in Table 5 below. With regard to CCV, the cluster center point of cluster 3 is the highest, with a total number of customers 303; the second-highest cluster center point is cluster 2, with a total number of customers 1,340. Cluster 3 has the highest number of customers, totaling 6,652, yet its cluster center point is the lowest. With regard to CPV, cluster 3's cluster center point is the highest, with 3,175 customers; next is cluster 1, with 2,061 customers in all. The lowest cluster center point is cluster 2, with 3,059 customers.
Table 4 CPV analysis results Customer ID Prob(Active) E(Volume) CPV 1 .183 6.768 1.241 2 .021 3.672 0.078 3 .961 9.064 8.715 4 .980 5.938 6.132 5 .811 5.751 4.665 6 .927 6.276 5.822 7 .980 6.351 6.467 8 .962 4.520 4.352 9 .184 6.316 1.167 10 .508 5.368 2.727
4.4 Customer Value Matrix and 20:80 Rules
With the horizontal axis as CPV and the vertical axis as CCV, this study attempts to construct a customer value matrix and assigns name and number to each cluster, as shown in Fig 3 below. In the figure, cluster 9 has the highest CCV and CPV, so this group of customers is named golden customers. Cluster 7 has a high CCV but a low CPV, so this group of customers must be win back. Though cluster 3 has a low CCV, its CPV is high. For this reason, it has a customer group worthy of being developed. Cluster 1 has a low CCV and a low CPV as well, so it is a nomadic customer group.
To understand the profitability and customer number of each cluster, cross-analysis was carried out, and the result is shown in Table 6. The golden customer group accounts for a mere 3.1% of the total number of customers, but its profit makes up nearly 40%. The nomadic customer group account for 32.2% of the total customer number, whereas its profit merely makes up 5.14%. Overall, the total number of low-CCV customer groups (groups 1, 2, 3) account for 80%, but its profit only makes up 16.86%. By contrast, the total number of middle-CCV and high-CCV customer groups account for about 20%, but its profit reaches up to 83%.This tendency corresponds to the 20/80 rule, i.e., customers with profits among the top 20% have contribution to the enterprise as high as 80%; by contrast, customers with profits among the bottom 80% have a mere 20% contribution to the enterprise. In addition to finding the 20/80 rule phenomenon, however, this study also ventures, through the finding of potential value, deep into the important strategic significance behind the 20/80 rule, namely, part of the customers amid the top 20% belong to future customers (clusters 7 and 4) with low potential and must be given attention and touted back. In the customer group with profits amid the bottom 80%, some (cluster 3) may have great future purchase potential and are therefore worthy of being developed. Such a concept is different from the past thinking in which importance is simply placed on customers with profits amid the top 20% while neglecting customers with profits amid the bottom 80%.
C ust ome r C ur re nt Value High-
CCV 7.customers needing to win back 8. Middle-CPV ,High-CCV Customers 9.Golden Middle-
CCV ,Middle-CCV 4.Low-CPV 5. Middle-CPV ,Middle-CCV Middle- CCV 6.High CPV Low-
CCV 1.Nomadic customers 2. Middle-CPV ,Low CCV developed customers 3.Worthy of being
Low-CPV Middle-CPV High-CPV Customer Potential Value
4.5 OLAP, Association Rules and Decision Tree with CPV
The aim of this research is to discover CCV and CPV in observed databases so that it could better understand different value about different customers and develop new strategies to provide better service. In the previous sections, we used CLV analysis to calculate customers’ value into clusters. In this section, the OLAP, association rule and decision tree are used to create customer profile in golden customers (we selected golden customers to analyze because it’s most important, valuable and meaning).The purposes of those methods are to discover characteristics of customers, potential buying patterns, cross-selling opportunities, buying preferences and to generate rules for predicting who are potential customers.
4.5.1. OLAP with Golden Customers
In respect of the features of the demographic statistics over the golden customers, through the data stored in the demographic database, features such as INCOME_BY_YEARLY, TOTAL_CHILDREN, GENDER, AGE, MARTIAL_STATUS, OCCUPATION, and EDUCATION_DEGREE are chosen for multidimensional analysis (OLAP) along with the golden customers. A number of the golden customers' attributes are obtained as follows: between ages 30-39, 60-69 and 80-89, females are more than males, but age between 70-79, males are more than females; customers with yearly income at 30,000-50,000 have the most contribution, and the percentage of married and that of single are almost the same. Most customers have a educational degree in senior high school, while customers with a graduate degree are fewest. The customer group with four children has a higher value, followed by those with 3 children. Customers without children have the lowest contribution, followed by those with one child. In terms of occupation, most customers are professionals.
4.5.2 Association Rule with Golden Customers
Though the golden customers have purchase potential, the preference, features as well as the pattern of their purchase potential remain to be known. As a result of this, association rules were used to seek the association between purchases in order to understand the possibility of cross-selling. The total number of golden customers' transactions is 30,029. Assuming support = 1.5%, confidence = 80%, the result is shown in Table 7 below. A total of 11 records of data are obtained for purchase potential pattern.
There are six association rules reaching 100% confidences in table7. Product categories of association rule No.1 are dairy products and bread, implying the possibility that customers may use bread with dairy products for the meal. In addition, product categories of association rule No.2, 3, and 4 are beverage and prepared food, indicating the potential for combining beverage and prepared food for selling in the store. Product categories of association rules of No.5 are prepared food and frozen food. Finally, product category of rule No.6 is the same one – beverage.
Table7 The results of association rules for golden customers, Support=1.581%
Head Implies Body Confidence
1. Pro.= [208] AND Pro. = [695] ==> Pro. = [811] AND Pro. = [958] 100% 2. Pro. = [1454] AND Pro. = [215] ==> Pro. = [21] AND Pro. = [436] 100% 3. Pro. = [1313] AND Pro. = [177] ==> Pro. = [1558] AND Pro. = [648] 100% 4. Pro. = [1208] AND Pro. = [415] ==> Pro. = [454] AND Pro. = [672] 100% 5. Pro. = [415] AND Pro. = [454] ==> Pro. = [1208] AND Pro. = [672] 100% 6. Pro. = [1153] AND Pro. = [1189] ==> Pro. = [1196] AND Pro. = [1504] 100% 7. Pro. = [1454] AND Pro. = [21] ==> Pro. = [215] AND Pro. = [436] 80% 8. Pro. = [1313] AND Pro. = [1558] ==> Pro. = [177] AND Pro. = [648] 80% 9. Pro. = [1023] AND Pro. = [1251] ==> Pro. = [519] AND Pro. = [746] 80% 10. Pro. = [1153] AND Pro. = [1196] ==> Pro. = [1189] AND Pro. = [1504] 80% 11. Pro. = [119] AND Pro. = [284] ==> Pro. = [528] AND Pro. = [611] 80%
4.6.3. Decision Tree with Golden Customers
This study adopts a decision tree to predict golden customers’ rules, which might be of some help to an understanding of new customers. Table 8 lists rules with a higher possibility of producing golden customer by selecting 85% or above rules, sorted out by percentage in descending order. The result has produced 9 decision rules.
5. Conclusions
Knowledge of the CCV and CPV of individual customers can assist organizations to segment markets, to specify marketing mix elements and then further to allocate marketing resources in a way that returns high levels of profits. Particularly in today's commercially competitive environment, businesses are increasingly capable of storing a large amount of data in their databases. Such knowledge, thereby, becomes more and more important. Therefore, how to obtain knowledge of customers' CCV and CPV becomes a very essential issue.
This study proposes the concept of combining CLV analysis and data mining techniques. While SAS Institute Inc. have propose a similar concept, this study further analyzes transaction data and socio-demographic data and empirically find out nine customers’ segments based on CCV and CPV and their purchase pattern as well as characteristics by using Data Mining techniques. The 20/80 rules behind nine customers’ segments is also obtained. The entire operating procedure might be helpful for enhancing CRM.
After analysis, this study attempts to construct a customer value matrix, identifies the attributes of the 20/80 rule, and further understands the implication of the 20/80 rule. That is, among the customer groups
Table 8 The results of analysis of Decision Tree for Golden Customers
Rules: IF( ), THEN Golden Customers Probability Age = 70-79 and Annual Income = $70,000 - $90,000 and Education = College and Occupation =
Management 93.18%
Age = 40-49 and Gender = F and Annual Income = $130,000 - $150,000 and
Education = High School Degree and Occupation = Management 90.48% Age = 30-39 and Gender = M and Annual Income = $70,000 - $90,000 and
Education = High School and Occupation = Management 89.93% Age = 80-89 and Gender = F and Annual Income = $130,000 - $150,000 and
Education = Graduate Degree and Occupation = Management 89.29% Age = 50-59 and Gender = F and Annual Income = $90,000 - $110,000 and
Education = High School Degree and Occupation not = Management 88.06% Age = 60-69 and Gender = F and Annual Income = $150,000 + and
Education = High School Degree and Occupation = Professional 87.31% Age = 80-89 and Gender = F and Annual Income = $90,000 - $110,000 and
Education = Graduate Degree 86.89%
Age = 60-69 and Gender = F and Annual Income = $70,000 - $90,000 and
Education = Partial High School and Occupation not = Management 85.93% Age = 40-49 and Gender = F and Annual Income = $30,000 - $50,000 and
(clusters 4, 5, 6, 7, 8, 9) with profits in the top 20%, 3.1% of the customers belong to the high-CPV and high-CCV "golden customer group". This type of customers has the greatest contribution to the company. However, some important issues should be concerned by managers with these clusters. While the value provided by these groups is the highest, cluster 4 and 7 do not have great potential and loyalty in the future. Therefore, managers should have to pay attention on these two clusters by understanding their purchasing tendency, and find a way as well as use different marketing strategies to win they back. Such a concept is very crucial for CRM in that if the same marketing strategy is employed, the customer group of the top 20% will not be managed effectively. Understanding what is the most important and which customers must be won back would benefit the deployment of marketing resources.
The past CRM concept also noted that the contribution of customers of the last 80% is not high and it is not worth for managers to spend time and effort on them. Thus, percentage of marketing efforts shall not need to invest too much on them. This study, however, finds that customers of the bottom 80% are not necessarily unworthy of being developed. Instead, there is a group to which attention must be paid. It is the "worthy-of-being-developed" group. While currently this group of customers does not contribute very much, it would have a great potential and possibly would make a great contribution to the company in the future if managers use the right strategy for them. Hence, it is worthwhile to place marketing resources to this type of customers.
In the end, this study aims at the golden customers to find out their behavioral patterns by using data mining technique, including association rules, OLAP as well as decision tree.
5.1. Managerial Implication
On a basis of the findings of the 9 clusters in our study, we build a pyramid-type customer value structure, as shown in Fig 4. Color changes from light to dark bottom-up, which represents levels of contribution of existing value. The clusters at the bottom are groups of customers with low current value. While there are many customers in this group, they do not contribute to the company's profit very much (making up about 20% of the total profit). Hence, their color is lighter. They are followed by the second layer in terms of levels of contribution of existing value, and the top layer has the highest level of contribution of existing value. The middle axis (inside the pyramid, with an up arrow) is called the CRM core axis, and the customer group in contact with it is the axis of the company's operations—the higher the more importance. These customers all have a potential and are worthy of being much cared for. The farther away from the axis (toward the left and the right), the less importance.
Customer groups lying in the axis area are as follows: "golden customers", "high-CPV and low-CCV", "middle-CPV and high-CCV", "customers needing to win back", and “worthy-of–being-developed”. For these clusters, a data mining model needs to be used to further discuss their potential qualities. In particular for the golden customer group, a sound marketing solution and versatile services are needed to be provided, with retention of customer relations as the goal. In respect of the "customers needing to win back" group, a deeper interview is needed and every effort should be made to find out the possible causes of the group's leaving, make improvements, and win them back. As to the "worthy-of-being-developed" group, care is necessary, and its possible contribution in the future cannot be neglected. A marketing solution should be created for it. As for the "high-CPV and low-CCV" and the "middle-CPV and high-CCV" groups, apart from retaining relations, consideration needs to be given as to how to enhance the level of relations. Customers lying away from the axis are: "Low-CPV and Middle-CCV", "nomadic customers", "middle-CPV and middle-CCV", and "middle-CPV and low-CCV". Investment should be avoided as much as possible in these customer groups. Besides the "middle-CPV and middle-CCV" group as well as "low-CPV and Middle-CCV", whose relations need to be retained, investment of resources must be reduced in the other groups in order to avoid wasting. Especially for the nomadic cluster, it only came to purchase
once or twice and future potential is quite low. Therefore, future investment should be avoided being launched into this group. On the whole, the company should strengthen monitoring and creation over customer value and enhance management of potential and loyalty to increase the effect of CRM.
5.2. Limitations and Suggestions for Future Research
In this paper, we propose a conceptual framework, and the process of integrating CLV analysis and data mining techniques. Currently few CRM systems have conducted clustering based on the customer profits. Thus, Future studies may build a set of CRM system based on our model. Additionally, because this study lays more stress on the building of a CCV and CPV procedure not on comparison with different methodologies. While we employ the common model to calculate CCV and CPV, it still has some others methodology which could be used to analyze customers' value. Hence, it is possible that there is another better method which could be used to find out more accurate information of customer’s potential buying patterns. Thus, future studies can consider different methodologies to compare with our study and, in turn, find out the most suitable methodology for enhancing the effectiveness of CRM.
Appendix 1 The Probability that Given Customer is Still Active
Case1: α>β. The probability that a customer is still active, given an observed history of X purchases in time (0,T) since trial, with the most recent purchase at time t, is
P[Active|γ,s,α>β,X=x,t,T] = ( ) ( ) ( ( )) 1 1 1 1 1 1 1 1 1, ; ; , ; ; 1 − + ⎪⎭ ⎪ ⎬ ⎫ ⎪⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + + + + Fa bc zT T T t z c b a F t T t T s x s rx s s α β α β α α γ (A1) Where ( ) y y z s x c s b s x a + − = + + + = + = + + =γ 1 1 γ 1 αα β 1 ; 1; 1; …… Case2:α<β P[Active|γ,s,α<β,X=x,t,T] ( ) ( ) ( ( )) 1 2 2 2 2 2 2 2 2, ; ; , ; ; 1 − + + ⎪⎭ ⎪ ⎬ ⎫ ⎪⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + + − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ + + + + + Fa b c z T T T t z c b a F t T t T s x s r x s r x β α α β β α γ (A2) where
( )
y y z s x c x b s x a + − = + + + = + = + + =β
α
β
γ
γ
γ
2 2 2 2 ; ; 1; ……….. Case3:α=β [ ] 1 1 1 , , , , , | − + + ⎪⎭ ⎪ ⎬ ⎫ ⎪⎩ ⎪ ⎨ ⎧ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + + + + + = = = s x r t T s x s T t x X s Active P α α γ β α γ (A3) In (A1) and (A2), F(a, b; c; z) is the Gauss hypergeometric functionReference
1. Bauer, H. H., Hammerschmidt, M., and Brahler, M. (2003), "The customer lifetime value concept and its contribution to corporate," Yearbook of Marketing and Consumer Research, 1, 47-67.
2. Berry, M.J.A., & Linoff, G. (1997), Data mining techniques for marketing, sales, and customer support: Wiley, New York.
3. Frawley, William J., Gregory Piatetsky-Shapiro, and Christopher J. Matheus (1992), "Knowledge discovery in databases: an overview," AI Magazine, 13 (3), 57-70.
4. Gerry, Chan Kin Leong and Koh Hian Chye (2002), " Data mining and customer relationship marketing in the banking industry," Singapore Management Review, 24 (2), 1-27.
5. Goebel, Michael and Le Gruenwald (1999), "A survey of data mining and knowledge discovery software tools," ACM SIGKDD Explorations, 1, 1-20.
6. Gupta, Sunil and Donald R. Lehmann (2003), "Customers as assets," Journal of Interactive Marketing, 17 (1), 9–24.
7. Guru, Clin, Ashok Ranchhod, and Ray Hackney (2003), "Customer-centric strategic planning:
integrating CRM in online business systems," Information Technology and Management, 4, 199-214. 8. Haenlein, Michael, Andreas M. Kaplan, and Anemone J. Beeser (2007), "A Model to Determine
Customer Lifetime Value in a Retail Banking Context," European Management Journal, 25 (3), 221-34.
9. Hui, S. C. and G. Jha (2000), "Data mining for customer service support," Information & Management, 38, 1-13.
10. Hwang, Hyunseok, Taesoo Jung, and Euiho Suh (2004), "An LTV model and customer segmentation based on customer value: a case study on the wireless telecommunication industry," Expert Systems with Applications, 26, 181-88.
11. Jain, D. and S. S. Singh (2002), "Customer Lifetime Value Research in Marketing: A Review and Future Directions," Journal of Interactive Marketing, 16, 34-46.
12. Jie Sun, Hui Li (2008), "Data mining method for listed companies’ financial distress prediction," Knowledge-Based Systems, 21, 1-5.
13. Kim, B. and S. Kim (1999), "Measuring up-selling potential of life insurance customers: application of a stochastic frontier model," Journal of Interactive Marketing, 13 (4), 2-9.
14. Kim, J., E. Suh, and H. Hwang (2003), "A model for evaluating the effectiveness of CRM using the balanced scorecard," Journal of Interactive Marketing, 17, 5-19.
15. Mulhern, Francis J. (1999), "Customer profitability analysis: measurement, concentration, and research directions," Journal of Interactive Marketing, 13, 25-40.
16. Pfeifer, Phillip E, Mark E Haskins, and Robert M Conroy (2005), "Customer Lifetime Value,
Customer Profitability, and the Treatment of Acquisition Spending," Journal of Managerial Issues, 17, 11-25.
17. Ryals, L. (2002), "Are your customers worth more than money?" Journal of Retailing and Customer Services, 9, 241-51.
18. Rygielski, Chris, Jyun-Cheng Wang, and David C. Yen (2002), "Data mining techniques for customer relationship management," Technology in Society, 24 (4), 483-502.
19. SAS (2006), "Customer value management: are you customer-focused or customer-obsessed? balancing customer and shareholder value," SAS White Paper,
http://www.sas.com/offices/asiapacific/taiwan/images/gary_notes/customer_value_management.pdf, (Retrieved July 2009).
20. Schmittlein, David C., Donald G. Morrison, and Richard Colombo (1987), "Counting your customers: who are they and what will they do next?," Management Science, 33, 1-24.
21. Schmittlein, David C. and Robert A. Peterson (1994), "Customer base analysis: an industrial purchase process application," Marketing Science, 13, 41-67.
22. Shaw, Michael J., Chandrasekar Subramaniam, Gek Woo Tan, and Michael E. Welge (2001), "Knowledge management and data mining for marketing," Decision Support Systems, 31, 127-37. 23. Winer, R. S (2001), "A framework for customer relationship management," California Management
Review, 43, 89-105.
24. Yen, Show-Jane and Yue-Shi Lee (2006), "An efficient data mining approach for discovering interesting knowledge from customer transactions," Expert Systems with Applications, 30, 650-57.