Layer 1: The number of nodes in this layer is equal to x. Let a be the input to the pth node in layer 1 then the

(1)

NEURO FUZZY SCHEME FOR

SIMULTANEOUS FEATURE

SELECTION AND ERROR APPROACH

N. Angayarkanni1 Assistant Professor

Department of Electronics and Communication Engineering PGP College of Engineering and Technology

Namakkal-637 207 Tamil Nadu, India. [email protected]

Dr.D.Kumar2 Dean Research Periyar Maniammai University

Vallam, Thanjavur-613 403 Tamil Nadu, India. [email protected]

Abstract

This paper proposes a neuro fuzzy scheme for designing a classifier along with feature selection and an error analysis. In this the network learns the important features and the classification rules. This scheme was trained by error back propagation in three phases. A score function is computed for each feature and in each subsequent phases of the modelling. The feature which has the highest score from among the set of unused features is selected and used. In this method the feature selection is not done in an on line manner. A methodology for simultaneous feature analysis and system identification in a four layered neuro fuzzy frame work. The proposed system are effective to get the error result between the training and test sets .The classifier described is also sensitive to changes in parameters of membership function and hence helpful for error analysis.

Key words: Classification, feature analysis, neuro fuzzy systems

1. INTRODUCTION

Neural networks do the main work of learning input-output mappings through minimization of suitable error function by means of back-propagation training algorithm. Neural network are not able to learn or represent knowledge explicitly, which a fuzzy system can do through fuzzy if then rules [3]. The structure of an

evaluation network includes four units and n input units from the environment. A methodology for simultaneous

feature analysis and system identification in a four-layered neuro-fuzzy framework. With three steps for Classification and error analysis. In first step some coarse definition of initial membership functions, the network selects important features and learns the initial rules. In second step the redundant nodes as detected by the feature attenuators are pruned, and the network is returned to gain performance in its reduced architecture. In third step the architecture is further reduced by pruning incompatible rules. After pruning the network represents the final set of rules.

The learning task may include the identification of the main control parameters or the development and tuning of the fuzzy memberships used in the control rules. One can distinguish three classes: supervised learning,

reinforcement learning, and unsupervised learning. In supervised learning, a teacher [1] provides the desired

(2)

2. THE NETWORK STRUCTURE

Let be our network has number of inputs, like (a1, a2, a3…., ax) and classes (c1, c1…, cc). The first layer is the input layer, the second layer is the membership function and feature selection layer, the third layer is called the intersection layer [1] and the fourth layer is the output layer. The activation function of each node with its inputs and outputs are discussed next layer by layer.

Layer 1: The number of nodes in this layer is equal to x. Let ap be the input to the pth node in layer 1 then the

output of this section is

yp= ap (1)

Layer 2: Each node in this layer represents the membership function of a linguistic value associated with an input linguistic variable. The output of a layer 2 node represents the membership grade of the input with respect to a linguistic value. Bell shaped membership functions are used here. All connection weights between the

nodes in layer 1 and layer 2 are unity. If there be Si fuzzy sets associated with the ith feature then the number of

nodes in this layer is Σi-1s Si. The output of a node in layer 2 is computed by

yn=exp{-(yp-µn)2/σn2} (2)

µnand σnare the centre and spread, respectively [8] of the bell shaped function. The tuneable parameter

βpassociated with each input feature ap called a feature modulator.βps are learned by back-propagation

2

- βp

yn = y (1-e ) (3)

When βp takes a large value then yn tends to yn and for small values of βp2 , yn tends to 1, thereby making the

feature indifferent. In learning phase I, the parameters of the membership functions are kept fixed and only the

βps are learned through error back-propagation. In learning phase III, the βps is kept fixed and the membership

function parameters are updated.

Fig .1.The Network Structure

Layer 3: This layer is called the antecedent layer. Each node in this layer represents the IF part of a fuzzy

rule.Number of nodes in layer 3 is 81. The output of the mth node in layer 3 is 1/q

∑ nεPmyqn

ym = (4) |Pm|

(3)

Layer 4: The nodes in this layer perform an OR operation, which combines the intersection node of layer 3 with layer 4. The output of node in this layer represents the certainty with which a data point belongs to class l

yl= maxnεPm ( ymωlm ) (5)

where pl represents the set of indexes of the nodes in layer 3 connected to the node l of layer 4. Since ωlm s are

interpreted as certainty factors, each should be non-negative and it should lie in [0, 1]. The error

back-propagation algorithm or any other gradient based search algorithm does not guarantee that ωlm will remain

non-negative, even if one starts the training with non-negative weights [1] and hence, the model ωlm by e-glm2 . The

non- negative alone would be enough. So ωlm = glm2 Therefore, the output (activation function) of the lth node in

layer 4 will be

yl = maxmεPl(ymg2lm) (6)

Learnable weights ωlm are updated in all the three learning phases.

3. ERROR ANALYSIS

Error back-propagation algorithm is used for error analysis. The simple perception is just able to handle linearly separable or linearly independent problems. By taking the derivative of the error of the network with respect to each weight, [9] one will learn a little about the direction of the error. In fact, if one takes the negative of this derivative and then proceeds to add it to the weight, the error will decrease until it reaches local minima.

This makes sense because if the derivative is positive, it indicates that the error is increasing when the weight is increasing. The obvious thing to do then is to add a negative value to the weight and vice versa if the derivative is negative. Because the taking of these partial derivatives and then applying them to each of the weights takes place, starting from the output layer to hidden layer weights, then the hidden layer to input layer

weights. This algorithm has been called the error back-propagation algorithm. All the training phases use the

concept of back-propagation to minimize the error function N N c

Layer 4: The output of nodes in this layer əE

Δ1= əzl

Thus

Δl=-(ol-yl) (8)

Layer 3: Δm of this layer will be

Δm=əE/əym

Then the output of Δm will be

Δm = (∑lεQm Δl glm2) (9)

Here Q m is the set of indexes of the nodes in layer 4 connected with node m of layer 3.

Layer 2: Δn for layer 2 is

Δn = əE/əyn

Hence,

ym ynq-1

Δn = ∑ Δm (10)

mεRn ΣnεPm ynq

Rn is the set of indexes of nodes in layer 3 connected with node n in layer 2.

With the Δ calculated for each layer now we can write the weight update equation and the equation

for updating βp.

əE/ə glm = ( əE/əzl )(əzl/ə glm ) (11)

e=1/2∑Ei =1/2 ∑∑(oil –yil )2 (7)

(4)

The delta value Δ of a node in the network is defined as the influence of the node output on E The derivation of the delta values and adjustment of the weights are presented by layer wise next.

əE/ə glm = (∑lεQm 2Δlzm glm). (12)

Similarly

əE/əβp = -∑Δn (2 βp e-βp2 y n) (yp -µn 2) (13)

nεRp σn

Hence the update equations for weights glm and βp

glm (t+1) = glm (t) – (0.01* əE/əglm) (14)

βp (t+1) = βp (t) – (0.01* əE/əβp ) (15)

Update of these values will reduce the error

4. TUNING OF MEMBERSHIP FUNCTION

While the tuning of membership function parameters tries to improve the performance of a rule, tune parameters of the membership functions considering each rule separately. As a result, if both modulator functions and membership function parameters [2] are tuned, the learning process may become unstable. Therefore, tuning of parameters of membership functions should be done after feature elimination.

Table.1.Summary of data sets.

5. RESULT AND DISCUSSION

Iris flower has four features and three classes Here we have used three fuzzy set for each of the four features.

In this case the initial network has 81 intersection nodes resulting in 81*3=243 rules. After pruning of the redundant nodes the number of intersection nodes becomes 9, hence, at this stage the number of rules is

9*3=27 rules. Next, the incompatible rules are pruned to obtain nine rules. For IRIS flower we found four less

used rules and we removed them. The final architecture represented only five rules that are depicted in Figure 1.

Name Total

size

Test size

No. Of classes

No. Of features

(5)

Fig.2.Rules for classification of iris

Linguistic rules are shown in Table .2. In IRIS data features 3 and 4 represent petal length and petal width of iris flower, respectively. In Table 3 pl and pw represent the petal length and petal width, respectively.

Table.2.Initial Architecture of the Network used for IRIS

Layer 1 3

Layer 2 12

Layer 3 81

Layer 4 3

In Table 3 pl and pw represent the petal length and petal width, respectively.

Table.3. Rules for IRIS Data length and petal width of iris flower

Rule No Rule Class

1 If pl is close to 1.5 & pw is close to

0.25

Class 1

1.25

Class 2

1.25

Class 3

2.25

Class 3

2.25

(6)

Pruning of redundant nodes reduces the number of intersection classes to 81. The removal of incompatible rules

consequently yields 81 rules. Among these 81 rules 32 rules were zero rulesand there were no less used rules.

Result of error with epoch shown in figure 3for class1, figure 4 for class 2, and figure 5 for class 3.

0 5 10 15 20 25 30 35 40 45 50

-1.98 -1.98 -1.98 -1.9799 -1.9799 -1.9799 -1.9799 -1.9799 -1.9798

E

rro

r

Epoch Class1

Fig.3. Error approach for class 1

Error is minimized after 30 epochs itself .Error is calculated by error back-propagation.

0 5 10 15 20 25 30 35 40 45 50

-1.98 -1.98 -1.98 -1.98 -1.98 -1.98 -1.98 -1.98 -1.98 -1.98

E

rro

r

Epoch Class2

Fig . 4. Error approach for class2

(7)

0 5 10 15 20 25 30 35 40 45 50 -1.98

-1.98 -1.98 -1.98 -1.98 -1.98 -1.98 -1.98 -1.98

Er

ro

r

Epoch Class3

Fig.5.Error approach for class 3

5. CONCLUSION

A fuzzy modelling tool for target selection in direct marketing has been developed. Here the network starts with all possible rules and the training process retains only the rules required for classification, thus resulting in a smaller architecture of the final network. The final network has a lower running time than the initial network. The error result between the training and test sets were obtained.Important problem of interest related to this may be to find the sensitivity of the output of a neuro-fuzzy classifier with respect to its internal parameters. Therefore, authors believe that this architecture is general enough for use in other rule-based systems which perform fuzzy logic inference.

REFERENCES

[1] Barenji H.R and Khedkar,P, , “Learning and tuning fuzzy controllers through reinforcements”, IEEE Trans. Neural Networks, vol. 3, pp. 724-740, 1992.

[2] Chakraborty. D and Pal. N. R, “Integrated feature analysis and fuzzy rule-based system identification in a neuro-fuzzy paradigm”, IEEE Trans. Syst. Man Cybern. B, vol. 31, pp. 391–400. , 2001.

[3] Chakraborty.D and Pal.N.R, ,”Neuro-Fuzzy Scheme for Simultaneous Feature Selection and Fuzzy Rule-Based Classification”, IEEE Trans. Neural Networks, Vol.15, pp.110-123,2001.

[4] Freeman.J.A and Skapura.B.M, , “Neural networks, Algorithms applications and programming Techniques”,Addison-Wesely,1990. [5] Ishibuchi. H and Nozaki .K, Yamamoto .N, and Tanaka. H, “Selecting fuzzy if-then rules for classification problems using genetic

algorithms”, IEEE Trans. Fuzzy Syst., vol. 3, pp.260–27,1990.

[6] Kasabov .Nand Song .Q, “DENFIS: dynamic evolving neural-fuzzy inference system and its application for time series prediction”, IEEE Trans. Fuzzy Syst., vol. 10, pp.144-154,2002.

[7] Mao. J and Jain A. K, “Artificial neural networks for feature extraction and multivariate data projection”, IEEE Trans. Neural Networks, vol. 6, pp.296–317, 1995.

[8] Sandeep Paul and Satish Kumar, “Subsethood-Product Fuzzy Neural Inference System (SuPFuNIS)”, IEEE Trans.Neural Networks,Vol.13, pp.578-599,2002.

[9] Sathish Kumar, “Neural Networks: A Classroom Approach”, Tata McGraw-Hill Publishing Company, New Delhi, 2004.

[10] Setnes .M and Kaymak. U, “Fuzzy modelling of client preference from large data sets: an application to target selection in direct marketing”, IEEE Trans. Fuzzy Syst., vol. 9, pp. 153–163, 2001.