Using Evolving Fuzzy Classifiers to Classify Consumers with Different Model Architectures

(1)

Physics Procedia 25 ( 2012 ) 1627 – 1636

2012 International Conference on Solid State Devices and Materials Science

Using Evolving Fuzzy Classifiers to Classify Consumers with

Different Model Architectures

Rong Zhao

a

, Chunlai Chai

a,Xiaowei Zhoub

a_{College of Computer Science and Information Engineering, Zhejiang Gongshang University,Hangzhou, Zhejiang, P.R.China,} b_{HWC,Lancaster,UK}

Abstract

This study introduces two alternative methods for evolving fuzzy classifiers (eClass and FLEXFIS-CLass) in order to classify consumers into different categories for directing marketing purposes. We describe in detail the learning mechanisms of these classifiers and different types of model architectures including single model architectures (SM) and multi-model architectures (MM). Note that single-model architectures have different consequents: singletons corresponding to class labels, linear consequents regressing over the features and eClass MIMO which is applicable in multi-class classification. Furthermore, we place emphasis on classification accuracy and effectiveness of these approaches and compare the proposed classifiers with well-established ones, such as CART and k-NN, and also popular SVM method. The result indicates that they compare favorably with others in term of precision. With these different model architectures, managers can use the introduced approaches to classify consumers to their categories and determine the most profitable decisions.

Keywords : evolving fuzzy classifiers; single-model architectures; multi-model architectures; classification accuracy and effectiveness; classify consumers into different categories.

1.Introduction

In the realm of business, it is generally known to all marketers nowadays that nearly 80% of their business is closely related to as few as only 20% of their consumers [1]. Additionally, we find that it costs as much as five times more to attract a new consumer as it can to keep a loyal one [2]. As a result, an accurate method of classifying the consumers is required today in order to make different marketing strategies for those individuals who are of distinct values to the company and hence successfully handle the relationships with them.

(2)

The classification of consumers can be influenced by many factors, such as social position, emotion, characteristics and so forth, and therefore has relatively high uncertainty and randomness. So, many researchers are now exploring the use of artificial intelligence techniques, such as neural networks [4] and CART [12] method. Indisputably, nowadays fuzzy classifiers are superior to traditional statistical methods to some extent and applied quite often for decision making [3] because of a high predictive quality and accuracy it can achieve. In this paper, we focus on two alternative data-driven evolving modeling approaches, called eClass and FLEXFIS-Class.

The objective of this study is to evaluate the effectiveness of these two families of evolving fuzzy rule-based classifiers in classifying the consumers. Secondly, we also investigate the advantage of these proposed methods, when they are compared with other ones using the same data set.

2.Two model structures

2.1.Fuzzy classifiers with SM architecture

y Singleton consequent:

The fuzzy classifiers with singletons as class labels are also called ‘classical fuzzy classification model’ and often exploited in classification tasks. You can find cases in [13]. The ‘classical’ architecture for fuzzy classifiers has fuzzy premise part and singleton outputs which correspond to the Class labels. The ith rule is shown in the following way:

i i ip

p i

i IFx IS AND ANDx IS THENy L

Rule : ₁ F₁ ... F (1)

Where p is the dimensionality of the input space,

F

_i₁,...,

F

_ip are p antecedent fuzzy sets and i

L is the crisp

output class label from the set_^1,...,K_` with the K the number of classes. When classifying a new sample

x

G

, the final crisp class label is obtained by taking the class label of the rule with the highest activation degree, i.e. by calculating:

i

L

y

with i R i i W d d 1 max arg (2)

where R is the number of rules and

W

_i is the activation degree of rule i defined by a t-norm that is

usually a product operator: ₍ ₎

1 ij j p j i P x W 7 , where ij

P are the membership functions of the antecedent

fuzzy sets defined by a Gaussian:

2 2_/ ) )( 2 / 1 ( xj cij ij ij

e

V

P

(3)

where

c

_ij indicates the centers and

V

_ij is the spread of the membership functions. Furthermore, we

introduce K rule weights

w

_i₁

,...

w

_iK, which signify the confidence of the ith rule in all the K classes,

whereas the highest weight corresponds to the output classL_i. The weights are determined based on the

support of the clusters:

i il il n n w (4) where i

n is the number of samples assigned to the antecedent part of the ith rule during the incremental

(3)

fall into class l. Based on the confidence (rule weights), the classifier confidence

k

CF of the overall output

class label_y _L_l;l

_^

1,...,K

_`

is calculated through the following weighted average:

¦

R i i R i il i l w CF 1 1

W

(5)

y Linear or polynomial consequent functions:

If the consequent part in (1) is extended by introducing a linear hyper-plane, and then, a fuzzy rule is defined as follows:

)

(

...

:

IFx

₁

IS

₁

AND

ANDx

IS

THENy

l

x

Rule

_i

F

_i _p

F

_ip _i _i

G

(6)

with

l

_ia polynomial up to a certain degree.

Then the overall output is calculated as a regression over the features vector

x

G

, using the

multi-input-single output (MISO) TS fuzzy system:

) ( ) ( 1 x l x y _i R i i G G

¦

O (7) with

¦

R l l i i x 1 ) ( W W O G (8) and p ip i i i i x a a x a x a x l ( ) 0 1 1 2 2 ... G (9) y Multiple consequents:

This model structure [8] is based on the multi-input-multi-output (MIMO) extension of the evolving fuzzy Takagi-Sugeno models [6], named eClassMIMO

) ( ...

:IFx1IS 1AND ANDx IS THENy l x

Rulei i p ip i im

G F

F (10)

where l_im is a linear in parameters consequent per class, m_^1,..,K_`. We can get winning class label as

follows:

)

(

ˆ

max

arg

)

(

,..., 1

x

y

x

class

y

_m K m

G

(11) with ) ( ) ( ) ( ˆ 1 x l x x y im R i i m G G G

¦

O (12) where p Kp K K p K im

x

a

x

l

.

)

(

1 2 1 1 12 11 0 10

x

G

(13)

(4)

2.2.Fuzzy classifiers with MM architecture

The multi-model(MM) architecture is based on K MISO TS fuzzy regression models (6) using K

indicator matrices for the K classes, which comes from the original classification matrix, including

various features and a label entry. If the corresponding row belongs to the ith class, each label is set to 1;

otherwise, it is set to 0. The confidence _CF_kof the overall output value y m

^

1,...,K

`

is given by

normalizing the maximal output value with the sum of the output values from K models:

¦

mK m m k m l

x

y

x

y

CF

1 ,..., 1

))

(

ˆ

,

0 max(

)

(

ˆ

max

G

(14)

3.Evolving classifier with flexfis-class

FLEXFIS-Class (Flexible Fuzzy Inference Systems for Classification) is a special evolving mechanism, which is deduced from FLEXFIS [9], incremental training procedure for evolving TS fuzzy models during on-line processing of streams of data or measurement recordings. Thus, it is applicable for function approximation and regression tasks.

3.1.FLEXFIS-Class SM

The training procedure for the rules’ antecedent parts (1) and (6) is based on clustering which is an evolving version of VQ and the algorithm of this version is detailed in the Fig.1. The updated or created cluster in the high-dimensional data space is projected onto axes to form one-dimensional Gaussian fuzzy sets, i.e. ( ) (1/2)(xj cij)2/ ij2

j ij x e

V

P

, after each new sample. And the fuzzy sets projected from one

cluster on each axis form the antecedent part of one rule. Once a new cluster is generated, the parameters in the antecedent parts are updated and new rules and fuzzy sets are evolved.

(5)

Fig. 1 Workflow of evolving version of vector quantization.

For the singleton SM architecture (1), the crisp class label in the ith rule’s the antecedent part is obtained by counting the relative proportions of all the samples which formed the corresponding cluster

to all the K classes and taking the maximal value of these proportions:

)

(

max

1 il K l i

w

L

(15)

For the SM architecture with polynomial or linear consequent functions (6), (weighted) recursive least squares (wRLS) [6] is used to update the linear parameters.

3.2.FLEXFIS-Class MM

For the MM structure, it applies eVQ for incremental clustering, as is demonstrated in Fig.1, by connecting it with a cluster evolution condition and an alternative distance strategy for sample-wise winner selection. The updated cluster centers and surfaces are projected onto the axes after each

(6)

incremental learning step for the purpose of adjusting the fuzzy sets and rules. Moreover, it uses recursive local learning by exploiting wRLS [6] for the linear consequent parameters and gives a near-optimal solution in the LS sense [9].

In addition, FLEXFIS-Class MM possesses the flexibility to simply extend the MM structure by opening-up a new TS fuzzy system for this class. The algorithm is summarized as follows.

FLEXFIS-Class MM

y CollectN data points which are enough for the actual dimensionality, and then estimate the ranges of

all features from these points.

y Generate K initial TS fuzzy models for K classes present with N data points by using batch mode

FLEXFIS [9], i.e. eVQ and axis-projection first and LS afterwards.

Incremental Phase:

y Take the next sample

x

G

, normalize it in all dimensions using the estimated ranges, and elicit its class

label, named k.

y IFkdK, update the kth TS fuzzy model by taking y=1 as target value and using FLEXFIS(-MOD)

[9]

y ELSE, a new class label is introduced and a new TS-fuzzy model is initiated by using

FLEXFIS(-MOD) [9]; KmK1

y Update all other TS fuzzy models by taking y=0 as target value and using FLEXFIS(-MOD) [9]

y Update ranges of the features

y IF new data points are still available, THEN go to Step 3; ELSE stop.

4.Evolving classifiers with eClass

eClass is similar to FLEXFIS-Class in some aspects, but there are also a lot of differences that will be shown later. Classifiers in eClass family are efficient in learning and optimizing of rules, and their rule-base structure is compact and highly flexible. They are robust and conservative in creating new rules with high generalization capabilities and descriptive power [6]. eClass family comes from eTS, the first evolving fuzzy rule-based system introduced in [6,7], and further developed in [8]. eTS was first applied for classification by using a form of an MM classifier and was further developed for MIMO case with first-order linear regression consequents.

4.1.eClassMIMO

The antecedent part of the rule-base in eClass is designed using the evolving clustering approach eClustering. It has come from the Mountain clustering and its extension, called subtractive clustering, but

is recursive and applicable on-line. The kernel idea is that of a potential- a value that signifies the density

in the data space [6]. Data samples with high potential are considered to be candidates to form a cluster. The potential of sample is defined as inverse of the sum of the distance between a data sample and all the other data samples using Cauchy function:

¦

₁1 ( ) ()2 1 1 )) ( ( _k i k i x k x k x P (16)

By exploiting its recursive expression, an on-line algorithm has been developed:

1 ) ( 2 ) ( ) ( ) 1 ( 1 )) ( ( k k k k k k k x Pk ] E D (17) where

(7)

) ( ) ( 1 2 _k x k p j j

¦

D ) 1 ( ) 1 ( ) (k E k D k E , _E₍₁₎ ₀ and

¦

p j j j k k x k 1 ) ( ) ( ) ( K ] ) 1 ( ) 1 ( ) (k j k xj k j K K ,_K_j₍₁₎ ₀ eClass MIMO

y Read the next input x (k) and label L (k), classify it with the existing classifier using (11).

y Form the binary target output by assigning ’1’ to the corresponding output in the MIMO TS

classifier and setting the other K-1 outputs to ’0’.

y Calculate recursively the potential of the newly read data sample, z(k) using(17).

y Update the potentials of all existing focal points using(18):

2 1 1 1 1 ) 1 ( ) ( ) ( 2 ) ( ) 1 ( ) ( j k p j i i k i k i k i k k x P P k P k P

¦

F

(18)

y Compare the potentials of the newly read data sample and the updated potentials of all of the focal

points. Update covariance matrix and consequent parameters of the rules.

4.2.Other members of the eClass family

y eClass0 uses class labels as consequences and eClustering with (2).

y eClass MISO can be seen as a simplified version of eClass MIMO, using a single linear

polynomial function as described in (6). It requires less computational resources than eClass MIMO for problems with more than two classes but has poorer performance.

y eClassM has M models that are eClass MISO type. The outputs are generated by using (11).

5.Data and methodology

For this study, we use consumer purchasing data from the data warehouse of a retail enterprise. We randomly select a total of 12931 data samples and eliminate 1939 ones. So 10992 samples are available, which can be classified into 10 different classes in the study. The remaining data set is divided randomly into a training set of 7492 samples (nearly two-thirds of overall samples) and a predict/test set of 3500 samples (around one-third of overall samples), used to measure the predictive accuracy of the models.

The consumers can be categorized into four major groups, according to the sale features of the retail business: Core consumers (Group a), potential consumers who should be focused on first (Group b), available consumers who have certain kinds of, not great, values to the company (Group c) and those who have ‘bad prospects’ for the enterprise (Group d). Since we find from the data set that the number of consumers who belong to Group b or Group c is much larger than one in any other two groups, we additionally classify the samples from Group b into four categories, as well as the samples in Group c. Therefore, we use ten numbers(0,1,2,3,4,5,6,7,8,9) to denote ten classes ordered by the importance of the consumers to the company. For instance, the number 0 means the set of core consumers, and the number 9 represents the set of consumers who have bad prospects.

(8)

TABLE 1 Features of the data set which is used to classify consumers

In aspect of basic information: 1. age of consumer 2. gender of consumer 3. job of consumer

4. education

5. marital status 6. income per month In aspect of current value: 7. basic purchases

8. cost of service

9. word-of-mouth

10. consumption in competitive enterprise

11. average purchase cycle In aspect of potential value: 12. average growth rate of purchase

13. repurchase rate

14. number of recommendations 15. number of recommended individuals

16. share of wallet

Before using the data set to evaluate the classifiers for consumer classification, we need to do pre-processing with these samples: Quantifying some qualitative variables: such as gender (1: male; 0: female); Discretizing certain data with rough set theory, like income per month (0: more than 100 thousand Yuan; 2: 50~100 thousand Yuan; 3: 10~50 thousand Yuan; 4….).

Based on the data set containing 7492 training data samples and 3500 test data samples and 16 features, we apply different kinds of classifiers with two model architectures to achieve consumer classification. They are trained with the training sample for building up the classification models in an evolving manner, and their performance is tested with the test sample which is used for classification only. Since the proposed classifiers are for data streams in on-line mode, we simulate it as an on-line pseudo-stream by performing a loading of data samples and evolve the fuzzy classifiers with each new point separately. In order to enable a fair comparison among these approaches, all other conditions should remain the same.

6.Results and analysis

Table 2 shows the results of the study when applying different model structures to the data set mentioned above and the comparison of evolving variants to different classification methods.

TABLE 2 Results of comparison among different methods

Method Option Rate (%) Rules FLEXFIS-Class SM (singletons) 89.65 77

SM (polynomials) 19.89 77

MM 96.23 Over 560 (56–60 per model)

eClass MISO 94.42 18 MIMO 97.40 18 M 97.77 64 (6–8 per model) Incremental SVMs [5] 89.59 – CART [12] 97.68 – Probabilistic NN [4] 95.46 – k-NN [11] 97.40 – Linear regr. by indicator mat. [14] 82.0 –

From Table2, it can be recognized that eClass and FlEXFIS-Class MM can compete with the well-known classification methods (the decision tree-based classification method CART with optimal pruning strategy [12], k-NN [11] and possibilistic neural networks [4]) with respect to classification accuracy, and

(9)

outperform the incremental SVM [5]. Note that SVM is weaker in terms of accuracy and it is often less useful when there are a large number of samples. FLEXFIS-Class SM is more transparent and less accurate, while FLEXFIS-Class MM produced 10 TS fuzzy models with 56-60 rules per model, which is hardly interpretable, but could achieve an accuracy of over 96% and this classification rate is close to the results of the decision tree classifier CART and k-NN. Since the results for CART and k-NN were produced with the best parameter grid search procedure (for CART optimal pruning strategy, for k-NN the parameter k was varied from 1 to 20), the strength of FLEXFIS-Class MM on this data set should be valued.

The results also show that eClass can achieve high accuracy rates with SM architecture (18 rules) and 6-8 rules per class for the MM case, which clearly outperforms other classifiers. We find that eClass MISO and eClass MIMO are well interpretable with 18 fuzzy rules each and eClass can achieve even higher classification rate having 10 MISO fuzzy models with just 6-8 fuzzy rules each.

The results in the last row are got when using linear regression by indictor matrix, serving as the ‘control group’.

7.Conclusion

In this paper, we propose two families of evolving fuzzy rule-based classifiers (eClass and FlEXFIS-Class) to direct marketers taking distinct actions to customers belonging to different categories, with two model structures: single model architectures, multi-model architectures. Both methods evolve their rule structure and adapt their parameters in antecedent and consequent parts on demand from newly loaded samples. They compare favorably with the well-established classifiers, and the incremental version of the recently popular SVM method [5] in terms of accuracy.

The study tries to apply new methods to the classification of consumers. Though there may be a lot of shortcomings when using these classifiers, which need to be overcome in the future, this study can serve as an attempt to classify consumers in novel ways. Since marketers nowadays are receiving more and more data with a short time because of the economy globalization and network, people may find it difficult to deal with these data arriving continually. While conventional approaches are off-line and see the whole data set at once, the proposed classifiers having on-line learning mechanisms that are recursive ( thus applicable in real-time), which also can compare favorably with these well-established classifiers in term of accuracy, may be more useful in the classification of consumers and in other fields in the future.

References

[1] Cook VJ, Mindak WA. A search for the constants: The ‘heavy user’ revisited! , Journal of Consumer Marketing 1984;1:79–81.

[2] Stewart TA. After all you’ve done for your customers, why are they still not happy? Fortune 1995;11:178–82.

[3] H. Maturino-Lozoya, D. Munoz-Rodriguez, F. Jaimes-Romera, H. Tawfik, Handoff algorithms based on fuzzy classifiers, IEEE Trans.Vehicular Technol. 49 (6) (2000) 2286–2294.

[4] P. Wasserman, Advanced Methods in Neural Computing, Van Nostrand Reinhold, New York, 1993.

[5] C. Diehl, G. Cauwenberghs, SVM incremental learning, Proc. Internat. Joint Conf. on Neural Networks, vol. 4, 2003, pp. 2685–2690.

[6] P. Angelov, D. Filev, An approach to online identification of Takagi–Sugeno fuzzy models, IEEE Trans. Systems Man Cybernet.—Part B 34(1) (2004) 484–498.

[7] P. Angelov, Evolving Rule-Based Models: A Tool for Design of Flexible Adaptive Systems, Springer, Berlin, Germany, 2002.

[8] P. Angelov, X.-W. Zhou, Evolving fuzzy systems from data streams in real-time, in: 2006 Internet. Symp. on Evolving Fuzzy Systems, 2006, pp. 29–35.

(10)

[9] E. Lughofer, FLEXFIS: a robust incremental learning approach for evolving TS fuzzy models, IEEE Trans. Fuzzy Systems (special issue on Evolving Fuzzy Systems), 2008, to appear.

[10] E. Lughofer, Evolving Fuzzy Models—Incremental Learning, Interpretability and Stability Issues, Applications, VDM Verlag Dr. Müller, Saarbrücken, 2008.

[11] R. Duda, P. Hart, D. Stork, Pattern Classification, second ed., Wiley-Interscience, Southern Gate, Chichester, West Sussex, England, 2000.

[12] L. Breiman, J. Friedman, C. Stone, R. Olshen, Classification and Regression Trees, Chapman & Hall, Boca Raton, 1993. [13] S. Abe, M. Lan, A method for fuzzy rules extraction directly from numerical data and its application to pattern classification, IEEE Trans. Fuzzy Systems 3 (1) (1995) 18–28.

[14] H. Ishibushi, T. Nakashima, T.Murata, Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems, IEEE Trans. Systems Man Cybernet.—Part B: Cybernet. 29 (5) (1999) 601–618.