Neural network as a classier - Cost-Based Evaluation Framework

4.2 Cost-Based Evaluation Framework

4.4.1 Neural network as a classier

In the design of neural network as a classier, the target output y0 deter-

mines the type of classication result of a class as faulty or not faulty. The following conditions are taken into account to predict the accuracy of classication which are mentioned below for four neural network approaches as a

Chapter 4 Cost-Based Evaluation Framework classier: Classif ier=y0 =⇒          >0, Class is F aulty <0, N ot F aulty (4.16) From Equation 4.16, it can be noticed that if the target output of the neural network y0 is greater than `zero', then a class is classied as faulty,

else it is classied as not-faulty class". Table 4.4 shows the confusion matrix constructed before any neural network model is used as a classier. This table contains a total number of 965 classes, among which 777 classes have zero bugs and the remaining 188 classes have at least one bug.

Table 4.4: Confusion matrix Not-Faulty Faulty

Not-Faulty 777 0

Faulty 188 0

4.4.1.1 Logistic classier

The logistic regression method helps to indicate whether a class is faulty or not, but does not convey anything about the possible number of faults in the class.

Table 4.5: Confusion matrix for Logistic classier Not-Faulty Faulty

Not-Faulty 767 10

Faulty 172 16

In comparison with Table 4.4, after applying logistic regression as a classier (Table 4.5), a total of 783 (767 + 16) classes are correctly classied as not-faulty with an accuracy of 81.13%.

Chapter 4 Cost-Based Evaluation Framework 4.4.1.2 ANN as a classier

Table 4.6 and Table 4.7 show the classication matrix for Gradient Decent and Levenberg Marquardt learning techniques of ANN.

Table 4.6: Confusion matrix for Gradient Descent Not-Faulty Faulty

Not-Faulty 761 16

Faulty 157 31

In comparison with Table 4.4, Gradient Descent ANN (Table 4.6) is able to classify a total of 792 (761+31) classes as not-faulty with a accuracy of 82.07%.

Table 4.7: Confusion matrix for Levenberg Marquardt Not-Faulty Faulty

Not-Faulty 756 21

Faulty 171 17

In comparison with Table 4.4, LM model (Table 4.7) is able to classify a total of 773 (756+17) classes as not-faulty with an accuracy of 80.10%. 4.4.1.3 RBFN as a classier

Three layered RBFN is considered, in which six CK metrics are taken as input nodes, nine hidden centers are taken as hidden nodes and output is the fault classication rate. Table 4.8 shows the classication matrix when Basic RBFN is used as a classier.

From Table 4.4 it is clear that before applying Basic RBFN as classier, a total number of 777 classes contained zero bugs and 188 classes contained at least one bug. After applying Basic RBFN classier (Table 4.8), a total of 607 (517 + 90) classes are correctly classied as not-faulty with an accuracy

Chapter 4 Cost-Based Evaluation Framework Table 4.8: Confusion matrix for Basic RBFN

Not-Faulty Faulty

Not-Faulty 517 260

Faulty 98 90

a. Gradient Descent learning method

Equation 3.21 and 3.22 are used for updating center and weight during training phase. After simplifying Equation 3.21, the equation is represented as:

Cij(k+ 1) =Cij(k)−η1(y0−y)W i

φi

σ2(xj −Cij(k)) (4.17) and the modied Equation 3.22 can be framed as:

Wi(k+ 1) =Wi(k) +η2(y0−y)φi (4.18)

where σ is the width of the center and k is the current iteration number.

Table 4.9 shows the classication matrix for Gradient RBFN. Gradient RBFN classied totally 898 classes (776 + 122) as not-faulty with an accuracy rate of 93.05 %.

Table 4.9: Confusion matrix for Gradient RBFN Not-Faulty Faulty

Not-Faulty 776 1

Faulty 66 122

b. Hybrid learning method

In Hybrid learning method, centers are updated using Equation 3.23, while weights are updated using supervised learning methods. In this work, Least Mean Square Error (LMSE) algorithm is used for updating the weights.

Table 4.10 shows the classication matrix for Hybrid RBFN. Hybrid RBFN classied totally 767 classes (747 + 20) as not-faulty with an accuracy of 79.4%.

Chapter 4 Cost-Based Evaluation Framework Table 4.10: Confusion matrix for Hybrid RBFN

Not-Faulty Faulty

Not-Faulty 747 30

Faulty 168 20

4.4.1.4 FLANN as a classier

Table 4.11 shows the obtained classication matrix when FLANN technique is used as a classier.

Table 4.11: Confusion matrix for FLANN Not-Faulty Faulty

Not-Faulty 742 35

Faulty 160 28

After applying FLANN classier (Table 4.11), a total of 770 (742+28) classes are correctly classied as not-faulty with an accuracy of 79.79%. 4.4.1.5 PNN as a classier

Table 4.12 shows the classication matrix obtained by applying PNN technique as a classier.

Table 4.12: Confusion matrix for PNN Not-Faulty Faulty

Not-Faulty 775 2

Faulty 181 7

In comparison with Table 4.4, it is observed that, after applying PNN as a classier (Table 4.12), a total of 782 (775 + 7) classes are correctly classied

Chapter 4 Cost-Based Evaluation Framework The fault removal cost for AIF version 1.6 obtained by applying Logistic regression, ANN, RBFN, FLANN and PNN techniques are tabulated in Table 4.13. In this cost-based evaluation framework, the Estimated fault removal cost of the software without using fault prediction technique `Tcost' is found to be 3546.8 sta hours per defect.

Table 4.13: Fault removal cost for AIF 1.6 using various classier models

Classication model Precision TP Rate FP Rate TN Rate FN Rate Accuracy Ecost NEcost Logistic regression 61.54 08.51 01.29 98.71 91.49 81.13 3119.4 0.8795 Gradient Descent 65.96 16.49 02.06 97.94 83.51 82.07 3109.7 0.8762 Levenberg Marquardt 44.74 09.04 02.70 97.30 90.96 80.10 3145.3 0.8868 RBFN Basic 25.71 47.87 33.46 66.54 52.13 62.90 3622.3 1.0213 RBFN Gradient 99.19 64.89 00.13 99.87 35.11 93.50 2922.0 0.8238 RBFN Hybrid 40.00 10.64 03.86 96.14 89.36 79.40 3162.8 0.8917 FLANN 44.44 14.89 04.50 95.50 85.11 79.79 3162.1 0.8915 PNN 77.78 03.72 00.26 99.74 96.28 81.03 3114.3 0.8780

4.4.1.6 Comparison of cost analysis

Data set of AIF version 1.6 from PROMISE repository is chosen to evaluate the impact of fault prediction technique. The fault removal cost (NEcost) computed using the proposed framework is used to evaluate the models.

To illustrate eectiveness of the proposed cost-based evaluation framework, classier models such as Logistic regression, ANN, RBFN, FLANN and PNN are used for computing misclassication cost. The goal is to demonstrate the cost evaluation framework and suggest whether performing fault prediction using particular prediction model is useful or not rather than identifying the best" fault-prediction model.

Table 4.13 shows the various parameters related to cost evaluation framework along with NEcost. NEcost is the criterion used in evaluating a classication model to show the usefulness of fault prediction.

Chapter 4 Cost-Based Evaluation Framework From Table 4.13, it can be observed that:

i. The Gradient Descent approach of RBFN classier obtained the best classication rate of 93.50% when compared with other four models, and ii. Gradient Descent RBFN incurs less cost involved in testing (with a

NEcost ratio of 0.8238).

This indicates that performing fault prediction on the basis of classication cost involved using Hyrbid RBFN method is very much eective in comparison with LR, GD, LM, FLANN and PNN models.

It is observed that, Normalized fault removal cost (NEcost), which is the ratio of Ecost and Tcost (Equation 4.9) determines the fault prediction model's eectiveness in a succinct manner.

The proposed cost-based evaluation framework provides:

1. A binary yes or no scale whether to perform fault prediction analysis or not.

2. A criterion to choose a better fault prediction model based not only on the obtained accuracy rate but also taking NEcost into consideration.

4.5 Summary

In this chapter, an attempt has been made to design a cost based evaluation framework for nding the eciency of the developed fault prediction model using dierent neural network models as classiers. Models such as LR, ANN, RBFN FLANN and PNN were used as classiers. NEcost of the proposed cost based analysis framework was used as a criterion to evaluate the eectiveness of the respective models. From the obtained results, it is noticed that the Gradient Descent approach of RBFN obtained better results in terms of fault removal cost when compared with LR, GD, LM, Basic and Hyrbid RBFN,

Chapter 5 Test Data Generation for

Object-Oriented Program using

Meta-heuristic Search Algorithms

An automated approach for test data generation is very crucial as manual process is laborious and time consuming. It is observed that, test data generation for object-oriented programs is dicult because of inheritance, polymorphism, and data abstraction.

Here, Control ow graph for Object-Oriented program is automatically generated, referred to as Extended Control Flow Graph (ECFG). From the generated ECFG, test data are generated for a selected target path. Here test data are generated for the case study i.e., Bank ATM withdrawal task [15] using three meta-heuristic algorithms viz., Clonal selection algorithm, Binary particle swarm optimization, and Articial bee colony. Experimental results show that Articial bee colony algorithm generates suitable test data.

Chapter 5 Test Data Generation for Object-Oriented Program

5.1 Introduction

Test data generation in program testing, is the process of identifying a set of test data which satises the given testing criterion [123]. A test data generator is a tool which helps a tester in generating test data for a given program. Most of the test data generating activities are based on the aspects of either path wise generation [124, 125, 126] or data specic generation [127, 128, 129] or random test data generation [130]. However, these techniques require complex algebraic computations. Hence, articial intelligence based approaches can be applied for reducing testing eorts.

In this chapter, three meta-heuristic search algorithms viz., Clonal selection algorithm, Binary particle swarm optimization and Articial bee colony optimization are used for generating test data for Object-Oriented program. All the paths selected in a module need to be executed (at least once), and thus generating a large set of test data (manually) for these set of paths is quite a dicult task. Hence, certain degree of automation may be thought of as an alternative to minimize the use of testing resources. The meta-heuristic search algorithms help in achieving this goal by generating test data required to cover the paths in a module, in an near-optimal way.

In document Software Fault Prediction and Test Data Generation Using Articial Intelligent Techniques (Page 111-119)