Artificial Neural Network Based Data Mining Technique for Customer Classification for Market Forecasting

(1)

Artificial Neural Network Based Data Mining Technique for Customer

Classification for Market Forecasting

1

_{Velu C. M.,}

*2

_{Kashwan K. R.}

1

_{Department of Computer Science and Engineering Dattakala Group of Institution, Swami}

Chincholi, Daund Pune District-413130, Maharashtra, INDIA.

*2

_{Department of Electronics and Communication Engineering, Sona College of Technology}

(An Autonomous Institution Affiliated to Anna University), Salem –636005, TN, INDIA

*

_{[email protected]}

Abstract

This research work is focused on an investigation of market forecasting for the sales by using data mining techniques. It attempts to classify the customers into different groups with commodity buying behavior observed over a long period of time. The different groups are based on buying of a specific brand or type of commodity, the buying frequency of the commodity and preference for a commodity in presence of similar commodity availability. The research also looks into customer classification into regular or random visitor categories. This helps to treat the customer in different ways, such as royalty bonus and special discounts. More than this, it helps to predict the commodity movements for advance procurement and inventory management. The grouping of customers is achieved by developing an online intelligence computation based data mining system. The system automatically updated the data base with each new entry of customer purchase. The variables considered for the grouping of customers are frequency of commodity buying, brand preference, regular and random visitors. The computing system is buildup by using intelligent model based on Artificial Neural Network (ANN). The experimental data consists of the transaction records of customers. The validity of the system is crosschecked by real time survey for collecting actual transaction records of the customers. The methodology adopted involves getting questionnaires answered by customers during their purchase time. The results then are compared with the results of computing based forecasts. The results indicate that more than 99 % of forecast for the sale is met. Similarly the customer classification is estimated at 99.97 % for high valued, 99.91% for medium valued and 99.94% for low valued customers. The system employs supervised learning environment of neural networks. Data dimensions are reduced by using Principal Component Analysis (PCA) for removal of redundant information. This has reduced training time as online updates require faster data processing.

Keywords

: Data Mining, Artificial Neural Networks, PCA, Market Forecasting, Customer Mapping

1. Introduction

The performance based marketing approaches are considered more effective for predictions. The challenges are in choosing most relevant metrics for a forecasting model. The key metrics will definitely produce results as much close as possible to forecast value. Artificial Neural Network (ANN) based data mining techniques are used for market segmentations. Data mining is also comparatively more intelligent technique which is based on software tools employed to collect transaction details of customers for analyzing forecasting of sales over a period of time [1]. Forecasting proactively may help in improved returns as it also reduces overheads such as inventory holding cost or out of stock penalty. It is wise to use traditional skills of statistic models with data mining techniques enabled by Information and Communication Technology (ICT) networking [2]. The use of ICT for statistic models result in more robustness and fast computation. Statistic models are more computational oriented and needs long data analysis before any meaningful information is derived. Data mining techniques are quite often used for customer segmentation to forecast business to customer metrics [3]. Some of the examples may include airline industries to retain customers and also look for new customers as the business is highly volatile and complex [4]. The other example may be that of banking sector where data mining models are used for customer segmentation. It helps in long time planning strategy or retention of existing customers. Banking sector requires both, the retention of existing customers and also new customer base expansion for continued growth [5]. The new statistic models developed by

(2)

using ICT and data mining techniques for most relevant metrics can be useful for banking sectors [6]. Association rule mining has been used as a technique for knowledge discovery from huge database for many applications [7]. It can be successfully used for business forecasting as it involves retrieving relevant information from a pool of database of customer transactions [8].

Data mining techniques are normally used for knowledge extraction and for web searching from huge information data collections. The very purpose may be to improve information management system to provide customers with better buying options [9]. It improves customer relation management by retaining old customers and adding new customers. The model can derive information which relates the customer preferences for buying various house hold items. In the absence of ICT, customer preference can be found out by conducting surveys, which may be time consuming and complex process. Data mining techniques can use direct information from customer transaction and purchase records for analyzing the information by using feature selection algorithms to extract knowledge [10]. The knowledge obtained so can be used in forecasting sales and trends in near future. The customer relationship management also concentrates on determining the customer lifetime value. Many organizations would like to know the returns from customers. This can be further used for long term planning and forecasting [11]. Many statistical prediction models are limited to few cases but data mining techniques are more general for adapting any case easily with small modifications. These have also higher accuracy in predicting customer lifetime value [12]. Many recent research works, such as heart registration based segmentation have used data mining concept of 3D data visualization. It can be useful in decision making and understanding of complex business issues [13]. The 3-dimensional analysis can be applied to slice, rotate and zoom in the data projections to obtain various minute details by visual perception. The business management for marketing and decision making can be easily represented by customer knowledge management as presented in [14] and [15]. Song et al. have implemented a new method to determine the dynamic and ever changing customer behavior by using customer sales data [16]. The research described in [17] has amply demonstrated that it is possible to monitor the changes among customers. Subsequently, the corresponding rules for future predictions are formulated. In recent times, the e-commerce has grown to very large volumes. This has an advantage that all transactions are already recorded in computing systems and thus it becomes very easy to collect all volumes of data [18]. The data analysis and knowledge extraction becomes almost an automatic exercise for such a system.

2. Market Prediction Metrics for Data Mining

The forecasting metrics are mainly used in monthly sale projections and revenue generation predictions. The customer value matrix is widely used to analyze customer values based on customer data base. A pioneering work was done in proposing the customer value matrix by Marcus [19]. The stock exchange is most dynamic market place where data mining is very useful and relevant in the sense that huge data handling in the real time situation is done automatically online by networked computing systems [20]. The main technique used for stock exchange market prediction may be cluster technique which is simple and fast for most of the computational requirements [21]. The customer value matrix is good resource for most of the customer information. These may include earning, fashion or style, buying habits, use of electronic multimedia, gender differentiation, different age groups, customer needs, reading access, choice of style, products with current technology and so on. The parameter metrics are divided into different groups and then, customers are classified in one of the groups based on the data records available from purchase records. The ANN based computational outcomes are used to make decision based on transaction information available online and in real time marketing. The matrix is formed by making information compatible to software used for the classification. For conforming parameter, a 1 is chosen while for a nonconforming parameter, a 0 is chosen. In the next step, ANN based intelligent data mining techniques are applied to find

out patterns of information, if any, from value matrix. The whole operation is done automatically. New data are added online at every event and the computation is updated instantaneously for classification. Algorithm based analysis is also done immediately as the whole system is real time and connected online through local networks.

Figure 1 shows a basic ANN based computational model. It has an input neuron layer which directly receives the inputs from computational network in real time transactions from market.

(3)

The second layer is called a hidden layer of neurons which does certain computation based on the mathematical function selected for the purpose. There are many mathematical functions such as logarithmic, sigmoid or probabilistic functions. These functions are also called as threshold functions. The main function of neural network is to update the weights of the matrix each time a training data is presented to the neural network. The training sessions keep modifying weight matrix until it ceases to change. This is called convergence conditions. Quite often, the time for training sessions become critical as the speed of data processing is very important for online system. It has to update computation results as transactions keep occurring in real time environment. The output neuron is simply a classification result. It allows the output to be put in a class if it belongs to that class else it discards the output. The output neurons may be more than one in case of classification of output requires it to be classified in more than one classes.

3. Customer Segmentation and Classification

A simplified ANN based model as shown in Figure 1 is used for customer segmentation and classification purpose. This model is simple and fast for data mining process which can be used for finding patterns, associations, correlations among customer data base etc. The technique converges rapidly compared to other software based techniques. The technique can be very useful for many attributes. It needs only a reasonable sampling from customer transaction for billing system information. The current system developed is programmed based on neural network algorithms. The algorithms are automatically computed and customer data are updated for next computation. For the currently developed system, it is most important that all applicable rules are found with the help of ANN based computation. The final computation aims at finding certain attributes which mostly contribute for the classification of customers into different groups. Once the clusters are formed and all the customers’ records are exhausted, it forms data base for the next segmentation operation to be done subsequently. Since the customer behavior is very much changeable, the computation is done based on the dynamic nature of patterns in the data base. The customer segmentation is done repeatedly at multiple levels for better end result of classification by computational method. ANN based computational classification of customers work in accordance to certain rules which are based on the following principles. New Customer Life style Brand

.

Classification Input Neurons Hidden Neurons Output Neuron

(4)

 Attributes have inherent property of hierarchy and thus can be classified by ANN architectures.  Attributes with low values have low contribution and thus can be discarded.

 Rules for finding of attributes are chosen appropriately to suit different applications.

 Dimensions of transaction data can be reduced by using Principal Component Analysis (PCA) to represent only dominating attributes. This reduces algorithm execution time drastically.  There is a possibility to incorporate feedback between multi-level mining to improve the

accuracy. This has main objective of modifying weights of matrix.

 The classification can be in terms of multiple classes or only one class of interest. The attributes that does not belong to the class of interest are discarded.

 The classification results are compared with historical results and previously forecast results to validate the computational results or else the corrections are to be applied.

 The model is repeated for next round of computation.

The classifications of customers into different categories produce an opportunity to analyze the market environment for next forecasting. Firstly the redundant data are removed from the data base as it consumes training time longer, thereby making computational system slow. For this purpose, PCA technique is adopted for dimensional reduction. The technique is not described here in detail but can be referred to [22] for further details. The reference [22] discusses PCA in reducing dimensions of data from rolling bearing and spectrum analysis. This research work has applied PCA to reduce dimensions of data from market transactions. The reduced data are only used for further classification.

4. Data Compatibility and Data Enhancement

Firstly the data size is reduced by removing redundant data entries which actually does not add any significant information. The reduction is done to minimize data size by eliminating a few data sets which are indicated by PCA results and which don’t contribute to information. The business days are also converted into months and then months into years. Data enhancement is performed to determine certain results based on indirect information which can be available in transaction records in a hidden form. The priority may be introduced by making high or low priority choices. The attributes may be classified into high priority and low priority to deal with the situation of customer relationship classification. This can be further explained with the help of an example that a customer has fashion style interest, uses internet for the information updates, uses current technology, uses branded items and has hobby of reading books for gaining knowledge. If all data related to above mentioned features are available from past transaction online records then this customer may be classified as high valued customer for new products launched in near future. The attributes computed from this analogy are used for all other future transactions to be classified into a customer segment or a customer group. The analysis of this nature can be important factor to be considered for future forecasting on customer patterns. The data enhancement can be used for data processing to improve information interpretation for the better results and understanding. Data enhancement also refers to making compatibility in a format that computers understand such as an example cited above that certain attributes can be represented by binary 0 for the absence and by binary 1 for the presence of the attribute. The

computational system also produces outputs such as binary 0 for not conforming to expectation and

binary 1 for conforming to the expectation. The thresholding functions can be used for approximation

to binary 0 or binary 1 from any fractional value that may arise as a result of computation.

5. Artificial Neural Network Based Model

ANN is a statistical tool used to model the relationship between independent and dependent variables by using linear mathematical functions. Independent variables are called scalable variables whereas dependent variables are called explanatory variables. If there is only one explanatory variable, it is simple regression else it is multiple regressions. Almost all real time problems are solved by using multiple regressions. Linear regression is quite suitable for prediction and forecasting model where the observed data are used to find model parameters. Subsequently prediction can be made even without

(5)

observing data. Scalable variable is set arbitrarily and dependent variable is predicted by using regression model which is developed. Normally regression model is expressed in the simplest form as

Fi (x) = xiB + ei, where i = 1, 2, ….n, Fi (x) is called regressand or measured variable or response

variable, xiregressors or explanatory variables or predictor variables, B is called parameter vector or

regression coefficients, ei is called error term or is unobserved random variable that can be treated as a

noise in the model. The proposed model focuses on determination of B.

The proposed model first identifies customer classes using clustering algorithm and then generate association rules using neural networks. Similarity and dissimilarity are measured among the rules. Depending upon chosen threshold, the rule matching is performed which results in determination of the degree of change. The degree of change in knowledge can be used to determine trends of change in customer behavior. ANN based model for sales projections is illustrated in Figure 2, which shows data flow and data computation paths. It collects data from transaction records and then the same data are processed by ANN networks for extraction of the useful information.

6. Sales Forecasting Model

The data has been collected online directly from transaction billing counter through computing systems. A part of data was also collected by direct questionnaire to the customer after they have made purchase. Thus the hard data collected are used for testing of the results produced by software based proposed model. Main objective is to classify the customers into one of the three classes of high valued, medium valued and low valued of high, medium and low income groups respectively as shown in Figure 3. Based on the transaction information for last one year, model has done classification and also direct questionnaire result is available for verifications. The research study is mainly focused on the type and amount of purchase made by customers over the period of long time. Next, data transformation is carried out as explained in previous section. Using software models, data enrichment is then performed. For this, classification technique based on supervised neural network is applied to find possible clusters. A supervised model for classification is implemented by using instances. The model consists of input layer, hidden layer and output layer of neurons as shown in Figure 1. The ANN algorithms are then used to validate the classification results obtained by supervised classification. PCA technique is used to reduce the redundant dimensions for faster computational

Transaction Information from Online Records

Neural Networks

Test Data Training Data

Results

Classification of Customers for Forecasting Sales

Results Validation Historical Data

(6)

results. The idea is based on the multiple validation of the model and thus to improve further accuracy of customer segmentation. The steps of ANN algorithms are listed below. Sequence flow for ANN and PCA based computations are summarized in Table 1 with steps for the execution of the computation. PCA algorithms reduce dimensions by considering only first three components for the computational purposes.

Table 1. Steps for ANN and PCA algorithm execution flow

Artificial Neural Networks Principal Component Analysis

Step 1: Initialize weights with all values as 1

Step 2: Get inputs Step 3: Produce outputs

Step 4: Compare output to estimate error Step 5: Stop if error ≤ threshold, else go to Step 6 Step 6: Propagate error back through network Step 7: Update the weight matrix of all the layers Step 8: Loop - Repeat Steps 2 through 7

Step 1: Get data with all dimensions

Step 2: Determine first three principal components Step 3: Check the first three components contribute

more than 99 % of attributes Step 4: Replace original data with reduced

dimensional data

Step 5: Perform ANN based computation

Step 6: Record the time session for new simulation Step 7: For next data, Loop – go to Step 2

The training of neural networks consists of updating weights automatically so as to minimize error in the desired output with reference to actual outputs. ANN based intelligent algorithms are widely accepted method for finding errors. Figure 2 shows steps and sequences for achieving validation and testing of the developed model. Similarly, PCA is applied for reducing dimensions of data

.

(7)

Table 2. Number of customers forecasted for next five months

Classification Technique

High Valued Medium Valued Low Valued Total Customer

Survey - Current Month 928 1192 1568 3688

ANN – Current Month 952 1087 1654 3693

Forecast Month -1 940 1081 1685 3706

Forecast Month - 2 1030 1052 1660 3742

Forecast Month - 3 1024 1103 1630 3757

Forecast Month - 4 1041 1236 1598 3875

Forecast Month - 5 1062 1256 1674 3992

7. Results and Discussions

The experimental work is carried out for a long time for daily purchasing activities of customers in a super market. The direct questions were posed to the customers and a software based model generated instant results by simply acquiring transaction informs through online. A total of 3688 customer’s transactions, based on survey, were analyzed and used as inputs to segmentation model. Also, the direct questionnaire results were compiled manually for validation and testing purpose. Table 2 shows the results. Figure 3 and Figure 4 show simulated results graphically. As shown in Figure 4, three different classes of customers are marked. A shaded area indicated that two classes are very closely located and that the customers from one class may migrate into other class. The final results of forecast for next five months are also projected in Figure 3 and Figure 4 respectively. The projected results of forecast are illustrated in Figure 5. As proven by the results available from simulation tests, the classification of customers was quite close to actual results observed subsequently. Direct questionnaire classification is considered as standard reference value. The classification is in comparison with standard reference considering total number of customers. It is observed that results of correct classification for the current month predicted are 99.97 % for high valued customers, 99.91% for medium valued customers and 99.94% for low valued customers. The other analyses of results have

Figure 4. Intelligent Classification of Customers into Dynamic Segmentations

Number of Months N umber of C us to mer s

Customer area likely to be in both, high valued or medium valued classes

(8)

also led to the conclusion that the high valued and medium valued customers collectively accounted for 76.82 % of total revenue forecast. Low valued customers may have contributed less proportionate revenue to the market. The high valued customers have generated high revenue contribution, thus are considered more important and to be retained or probably increase in their number will be good indication of higher revenue in the future. But same time medium and low valued customers are also very critical as their number is quite large. The classification as illustrated by Figure 3 has just produced classification for different customers. The super market could easily interpret the results and adopted to use professional knowledge to offer value added service to its customers and improve the customer relationship.

8. References

[1] Lim Chia Yean and Khoo, V.K.T., “Customer relationship management: Computer-assisted Tools for Customer Lifetime Value Prediction”, Information Technology, ITSim, IEEE International Symp. Pages 1180 – 1185, 2010.

[2] Archana Kumari, Umesh Prasad and Pradip Kumar Bala, “Retail Forecasting Using Neural Nework and Datamining Technique : A Review and Reflection”, International Journal of Emerging Trends of Technology in Computer Science, Vol 2, No. 6, Pages 266 – 269, 2013. [3] Mehzabin Shaikh and Gyankamal J. Chhajed, “Review on Financial Forecasting Using Neural

Network and Data Mining Technique”, Oriental Journal of Computer Science & Technology, Vol. 5, No. 2, Pages 263 – 267, 2011.

[4] Huirong Zhang Yun Chen, “An Analysis of the Applications of Data Mining in Airline Company CRM”, Fuzzy Systems and Knowledge Discovery, Sixth IEEE International Conference on, Vol 7, Page(s) 290 – 293, 2009.

[5] Wu Dong Sheng, “Application Study on Banks’s CRM Based on Data Mining Technology”, Electrical Information and Control Engineering, ICEICE, IEEE International Conference on, Pages 5727 – 5731, 2011.

[6] Shaw M. J., Subramaniam C., Tan G. W. and Welge M. E., “Knowledge management and Data Mining for marketing”, Decision Support Systems, Pages 127 – 137, 2001.

[7] Sallam Osman Fageeri, Rohiza Ahmad, Baharum B. Baharudin, “BBT: An Efficient Association Rules Mining Algorithm Using Binary-Based Technique”, International Journal of Advancement in Computing Technology (IJACT), Vol. 6, No. 4, Pages 14 – 25, 2014.

[8] Alireza Fazlzadeh, Mostafa Moshiri Tabrizi and Kazem Mahboobi, “Customer relationship management in small-medium enterprises: The case of science and technology parks of Iran”, African Journal of Business Management, Vol. 5, No. 15, Pages 6159-6167, 2011.

N umbe r of Cus to me rs P re dic ted Number of Months

(9)

[9] Jill Dyche, “The CRM Handbook: A Business Guide to CRM”, Addison-Wesley Professional, First Edition, 2002.

[10]Zulaiha Ali Othman, Zurina Muda, Low Mei Theng, Muhamed Rafique Othman, “Record to Record Feature Selection Algorithm for Network Intrusion Detection”, International Journal of Advancement in Computing Technology (IJACT), Vol. 6, No. 2, Pages 163 – 175, 2014.

[11]Huaping Gong Qiong Xia, “Study on Application of Customer Segmentation Based on Data Mining Technology”, Future Computer and Communication, FCC, IEEE International Conference on, Pages 167 – 170, 2009.

[12]Young Sung Cho Keun Ho Ryu “Implementation of Personalized Recommendation System Using Demograpic Data and RFM Method in e-commerce”, Management of Innovation and Technology, 2008, ICMIT, 4TH_{IEEE International Conference on, Pages 475 – 479, 2008.}

[13]Ken Cai, Rong-qian Yang, Li-hua Li, Xiao-ming Wu, “Automatic 3D Whole Heart Registration-Based Segmentation Using Mutual Information and B-Splines”, International Journal of Advancement in Computing Technology (IJACT), Vol. 3, No. 11, Pages 1 – 8, 2011.

[14]Das A., Ng W. K. and Woon Y. K., “Rapid Association Rule Mining”, Information and Knowledge Management, 10th_{International Conference on, ACM Press, Pages 474 – 481, 2001.}

[15]Shuxiang Xu and Yunling Liu, “Neural Networks for Business Decision Making”, International Journal of Advancement in Computing Technology (IJACT), Vol. 6, No. 2, Pages 49 -58, 2014. [16]Song H.S., Kim J.K. and Kim S.H., “Mining the Change of Customer Behavior in an Internet

Shopping Mall” Expert Systems with Applications,21(3), Pages 157 – 170, 2001.

[17]Han Yan, Zhihong Zou, "Application of a Hybrid ARIMA and Neural Network Model to Water Quality Time Series Forecasting", Journal of Convergence Information Technology (JCIT), Vol. 8, No. 4, Pages 59 -70, 2013.

[18]Chun-Cho Chen, Ching-Sung Wu and Rebecca Chung-Fern Wu, “e-Service Enhancement Priority Matrix: The case of an IC Foundry”, Elsevier Journal of Information & Management, Vol. 43, No. 5, Pages 572 – 586, 2006.

[19]Marcus C., “A Practical yet Meaningful Approach to Customer Segmentation”, Journal of Consumer Marketing, Vol. 15, No. 5, Pages 494-504, 1998.

[20]Debashish Das and Mohammad Shorif Uddin, “Data Mining and Neural Network Techniques in Stock Market Prediction: A Methodological Review”, International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 1, Pages 117 – 127, 2013.

[21]Abhishekh Gupta and Samidha D Sharma, “Clustering-Classification based Prediction of Stock Market Future Prediction”, International Journal of Computer Science and Information Technologies (IJSCIT), Vol. 5, No. 3, Pages 2806 – 2809, 2014.

[22]Su Yang, Yuan Zhong-hu, Qi Xiao-xuan, “A Feature Extraction Method of Rolling Bearings Faults based on PCA and Power Spectrum Analysis”, International Journal of Advancement in Computing Technology (IJACT), Vol. 5, No. 4, Pages 681 – 688, 2013.