Software for Improving Source Code Quality

(1)

Abstract—This paper presents a new software quality support tool, a Java source code analyzer and programmer advisor based on artificial intelligence techniques. It describes a new approach to automatically evaluate and improve source code quality and consequently the outcoming software product. This tool, called IJA (Intelligent Java Analyzer), builds a model from source code metrics, classifies attributes and then generates recommendations for the author. It is composed by an artificial neural network for data classification and an expert system for dynamic suggestion selection.

The results exposed in this work shows that the proposed prototype could be presented as a solid approach for supporting the software quality assurance process.

.

Keywords—Software metrics, artificial intelligence, neural networks, clustering algorithms, expert systems

I. INTRODUCTION

N order to prove that a computer program is mature and free of bugs, and that Software Requirements Specifications (SRS), have been met is necessary to have a strategy that supports this process. The goal for any software project is to accomplish the above mentioned requirements, which means to achieve the best quality possible. Historically, the word

“quality” has been adapted and has evolved together with the different technologies to which it has been applied. In the thirties, the metallurgical industry defined quality as a compliance to requirements; any deviation from such requirements meant loss of quality or limited trust in product quality. In the fifties, quality costs increased exponentially.

Therefore, specifications including tolerance (i.e., a deviation from perfection) were proposed. Inspections ensured that the product fell within a predefined tolerance. The goal of such inspections was to avoid corrections through the identification of product deviations from the original specification [3].

Software development does not imply serial production costs, but is an intensive activity [4], and is reasonable to apply quality concepts in software development. It requires several specialists interaction and coordination during all development stages. In the following subsections, different perspectives of software quality are presented. The present project proposes a

Franco Madou is researcher at Universidad de Palermo, Buenos Aires, Argentina (e-mail: francomadou19@ hotmail.com).

Martín Agüero, is researcher at Universidad de Palermo.

([email protected]).

Gabriela Esperón is researcher at the Universidad de Palermo, Buenos Aires, Argentina ( e-mail: [email protected]).

Daniela López De Luise is researcher at Universidad de Palermo, Buenos Aires,Argentina (e-mail: [email protected])

tool that allows realizing an automatic audit of Java source code. Its goal is to increase the quality of the source code checking if this fulfills with the specification. To do so, IJA uses a neural network to identify the profile of the programmer. It also builds recommendations according to detected errors employing software metrics and finally advises the programmer how to improve his source code

A. Software Quality

"Conformance to requirements" implies that requirements must be clearly stated such that they cannot be misunderstood.

Then, in the development and production process, measurements are taken regularly to determine conformance to those requirements. The nonconformances are regarded as defects—absence of quality. For example, one requirement (specification) for a certain radio may be that it must be able to receive certain frequencies more than 30 miles away from the source of broadcast. If the radio fails to do so, then it does not meet the quality requirements and should be rejected. The

"fitness for use" definition takes customers' requirements and expectations into account, which involve whether the products or services fit their uses. Since different customers may use the products in different ways, it means that products must possess multiple elements of fitness for use. According to Juran [30], each of these elements is a quality characteristic and all of them can be classified into categories known as parameters for fitness for use. The two most important parameters are quality of design and quality of conformance. [1]

B. Software industry requirements

Since software deals with man-made artifacts, is needed to view software development as an experimental science and build models of the artifacts and the processes by which they are manufactured. To do this is needed to isolate and categorize the components of the discipline, define notations for representing these components, and specify the interrelationships among these components as they are manipulated. The components of the discipline consist of various processes (life cycle models, methods, techniques, tools), products (code components, requirements, designs, specifications, test plans), and other forms of experience (resource models, defect models, quality models, economic models). We need to build descriptive models of the discipline components to better understand: (1) the nature of the processes and products and their various characteristics, (2) the variations among them, (3) the weaknesses and strengths of both, and (4) mechanisms to predict and control them. [2]

C. Economic Impact

Today the complexity of software is increasing at an alarming rate. Quality is defined as a bundle of attributes and Franco Madou, Martín Agüero, Gabriela Esperón, Daniela López De Luise

Software for Improving Source Code Quality

I

(2)

the level of those attributes holds a positive value. Few companies are interested in advanced testing techniques as a way of forecasting field reliability based on test data and of calculating defect density to benchmark quality. Standardized automated testing scripts along with standard metrics would also provide a more consistent method to determine when to stop testing. Most developers prefer early bug detection, at the same development stage. Based on the software community and user surveys, the US annual costs of an inadequate infrastructure for software testing is estimated to range from

$22.2 to $59.5 billion¹. Over half of these costs are borne by software users in the form of error avoidance and mitigation activities. The remaining costs are borne by software developers and reflect the additional test resources that are consumed due to inadequate testing tools and methods. [18]

II. INTELLIGENT JAVA ANALYZER

This new software tool prototype employs traditional and new software metrics to model the source code content in context. Metrics results change from continuous values to discrete ones depending on deviations from the programming language specification. Thresholds are set to obtain distances from the preferred code style. A dataset built by Expectation Maximization (EM) data mining algorithm is the reference data source to train a neural network. A Multi Layer Perceptron (MLP) artificial neural network (NN) classifies the source code instances on clusters formerly established by the training set. In a programmed self-tuning process, the prototype can adjust each cluster profile, determining a dynamic distinctive identity for every classification output.

The classification phase groups the source code instances that share common attributes of their syntaxes. In those cases where the attribute reveals a sign of erroneous language handling, a recommendation phase is activated. An expert system pre-loaded with rules analyzes the classification results and identifies every inaccurate source code usage. The rule engine also builds a set of recommendations based on key features detected in the code. The analysis process is completed with a report-style output advising the author on convenient procedures to improve the source code.

III. TECHNICAL BACKGROUND

The next section presents the main foundations of the prototype:

A. Software Metrics

A software metric is defined as a measure of some property of a piece of software or its specifications. It represents a numeric value for a particular source code feature. Generally is more useful when multiple metrics are surveying software characteristics, they aim the programmer for a more precise opinion. The present project analyzes source code attributes

1The impact estimates do not reflect “costs” associated with mission critical software where failure can lead to extremely high costs such as loss of life or catastrophic failure.

using traditional software metrics [19] (ie: Lines of Code or Comment Percentage) and new ones specifically designed for this prototype. The metrics displayed in Table I are part of this project’s contribution:

TABLEI IJASOFTWAREMETRICS

B. Samples

In order to be able to define the coding styles universe (CSU) a set of representative and accurate samples with 1000 source code files (SCF) were collected with the following features (see Table II)..

TABLEII SAMPLESORIGIN

Table II Sample Features: 25% of the SCF were coded by amateur programmers with little or moderate experience, 25% developed by Java academic material authors, 25% are from Open Source applications with commercial usage (Spring Framework, mySQL JDBC Driver, iReport, Jdepend and JavaNCSS) and the last SCF forth which is part of J2SE 1.5 API.

C. Data Mining

In order to find out the best cluster number for the MLP, the Expectation Maximization (EM) algorithm [21] was applied.

Parameters were established so as to automatically detect clusters by using cross validation. The EM algorithm stops when there is not a significant quality increase. The quality is measured with:

r1.P(a)+r2.P(b )+r3.P( c )+r4.P( d )+r5.P( e ) (1) Being a, b, c, d and e clusters and r1, r2, r3, r4 and r5 the parameters. Log likelihood stands for the willingness or credibility measure of these probabilities. It is obtained as the product of the conditional probabilities upon every instance i in the sample:

(3)

i i i i i

1 2 3 4 5

x x x x x

r .P r .P r .P r .P r .P

a b c d e

 +  +  +  +  

         

          (2)

Figure 1 shows the clusters determined by EM algorithm. Each cluster is a group of instances that shares common attributes (represented by the software metrics).

TABLEIII CLUSTERS

D. Neural Network

A multilayer perceptron neural network (NN) was selected to classify the SCF (Source Code Files) based on the attributes created by metrics. It was trained by back-propagation algorithm [20].

The next figure shows the NN classification results in a histogram. Each column represents the sum of instances, where the predominant area color corresponds to the correctly classified instances by the neural network.

Fig. 1 NN classification results histogram E. Expert System

An expert system or a system based on knowledge is a computer system that makes decisions or solves problems in a particular field by means of knowledge and analytical rules defined by experts. It is made up of a knowledge base —the rules of the EXSYS, that is to say, the codified expert knowledge—, a working memory —stocks the data received at the beginning in order to solve a problem, then the intermediate conclusions and the final results— and an inference engine, which models the human reasoning process.

The diagram in Fig basically represents the structure of an expert system. Three examples of very well-known expert systems are CLIPS, JESS and DROOLS [22].

Fig. 2 Basic architecture of an Expert System

IV. ARCHITECTURE

The IJA design is divided into 9 modules: 6 for core functions and 3 for support services. Fig shows components interactions, storage units and external interfaces. The analysis is executed in two main steps: classification and recommendation, both of which are coordinated by a Services Manager module.

Fig. 3 IJA architecture diagram

It is a flexible and adaptable tool due to its design and XML configuration files. It could be used not only for scientific requirements but also to suit the needs of industry. The whole tool is built on Enterprise Java technology platform; it runs on any JEE 1.5 servlet container. Its design maximizes CPU load and reduces memory requirements, enabling the quick analysis of large datasets. The following sections describe each module design and their main features.

A. Content Sequencer

Is an IJA extension of Java Collection API interface (Application Programming Interface). It standardizes, encapsulates, and serializes source code files (SCF) content synchronically in order to process it in a transparent way. It can be configured to define specific word separator tokens

B. Parser

The first component of the classification phase is the Syntax Analyzer. The main function is to analyze and extract source code features. It also implements a parser algorithm that translates Java syntax into the model proposed by W. A.

Woods: Augmented. Transition Networks (ATN) [23]. This applies State and Memento design patterns [24]. The algorithm detects some reserved words, symbols and structures and changes to a specific state. The IJA modular design makes it possible to extend all its functionalities to other languages just by selecting a specific Syntax Analyzer module implementation for each programming language.

C. Metrics Analyzer

It is the calculus component and the analyst’s chosen algebra operations translator. According to the configuration, it generates software metrics by executing mathematical operations and via the results obtained from the previous module. For example, a metric named v3 is the division

(4)

between the quantity of methods and the quantity of methods with names starting with lowercase². Besides some new metrics, the software also implements traditional metrics, e.g.

v11 assesses the rate between the number of comments collected in a SCF and the number of Javadocs in it. At this point, the analyst can set a weighting value to every system operator. The set of metrics defined is important to model SCF quality [27].

D. Normalizer

This module normalizes the specific numeric values for each metric between -1 and 1. There are predefined thresholds —or quality class bounds— to help evaluate the result quality. Each SCF is classified using the metric numeric results and the thresholds. Floating labels provide a code identification reference (ID) of the SCF accumulated in each sequence.

Finally, the data is exported into an arff (attribute-relation file format) format so that the results are compatible with the Intelligent Classifier module.

E. Intelligent Classifier

This module automatically classifies the source code files previously modeled by preceding phases. The Table IV shows the results obtained classifying SCF with the Intelligent Classifier module.

TABLEIV

NEURALNETWORKTRAININGRESULTS

Results using training set (A) and cross-validation (B) training modes reaches outstanding accurate marks according to Correctly Classified Instances percentage and results described in 3,4,5,6 and 7.

Kappa Statistic

P( A ) P( E ) 1 P( E )

−

− (3)

Where P(A) is the observed probability of concordance and P(E) is the expected probability of concordance.

The Kappa statistic is used to measure the agreement between predicted and observed categorizations of a dataset [21].

Mean Absolute Error

2 A violation to the Java Language Specification [26].

1 1 2 2 n n

a c a c ... a c

n

− + − + + −

(4) Is the mean of the difference between the prediction and the actual value for all the instances.

Root Mean Squared Error

(

^a¹ ^c¹

) (

² ^a² ^c²

)

² ^...

(

^aⁿ ^cⁿ

)

²

n

− + − + + −

(5) Displays the error at the same dimension of the prediction.

Relative Absolute Error

1 1 2 2 n n

1 2 n

a c a c ... a c

a a a a ... a a

− + − + + −

− + − + + − (6)

Is the summation of the difference between predictive value and given value.

Root Relative Squared Error

( ) ( ) ( )

2 2 2

1 1 2 2 n n

2 2 2

1 2 n

a c a c ... a c

a a a a ... a a

− + − + + −

− + − + + − (7)

Is the absolute total error relative to that it would occur if the prediction was the mean of the absolute value.

TABLEV CONFUSIONMATRIX

In Table V the diagonal has the correctly classified instances. The horizontal sum corresponds to the current total instances for the cluster and the defined value columns for the NN. A general average superior to 90% (Correctly Classified Instances at Table IV) confirms the strong correlation between SCF grouped in clusters by EM and the NN capability to distinguish differences between attributes and to classify them correctly.

F. Dynamic Training Set [25]

A custom algorithm to evaluate the meaning of each cluster was implemented. It depends on the instance SCF metric values. In this process each metric value of the cluster is compared to the total average (as a reference), and depending on a distance factor, a level of proximity is determined.The next table (Table VI) is an example of “cluster 0” profile:

TABLEVI CLUSTER0PROFILEDETAIL

(5)

Each cluster is a programmer profile as an aggregation based on profile detail (see Table VI). Table VII explains the main characteristics of each profile determined by clustering.

TABLEVII CLUSTERSPROFILES

G. Rules Engine

The last component is an Expert System (EXSYS). It selects recommendations about SCF classification and metrics results.

From predefined rules, the system selects the best improvement suggestion for each coding misuse. The knowledge base establishes a bijective function between metrics and rules, and each analyzed metric has a specific rule associated with it. The suggestion is chosen considering a distance factor from the expected value. Significant deviations mean more relevant suggestions. For example in Table 8 the rule based on v22 metric detects lack of customs constructors which could mean that the programmer does not know about the possibility of writing constructors for each class.

TABLE VIII

RECOMMENDATIONS EXAMPLES

V. P^ROTOTYPE

This new tool was implemented following the architecture described in section 4 and integrated as a web application for running in a servlet container like Apache Tomcat [28]. The user interface (UI) was designed to be used without training, in a very intuitive and clear way.

In few steps, the user can upload the source code (as .java o .jar) and after pressing the start button, in a few seconds the report in PDF is ready for download.

Fig. 1 Source code (samples) load phase

During the analysis phases the UI displays a visual log in real time, where the user can see the partial results of each process and a graphical representation of the neural network that is classifying the source code.

Fig. 2 Visualization in classification phase

Finally the recommendations are presented in a PDF report which explains the source code author profile (Fig), each misuse and how it could be solved.

Fig. 3 Recommendation report ready for download.

(6)

The report displays a general author profile and a list of recommendations for improving its source code quality.

Fig. 7 Recommendations report

Basically this tool builds a model from source code, classifies attributes and then generates recommendations for source code improvement.

VI. I^NITIALTESTS

Three metrics were selected (V1, V2 and V22) and a source code java file (SCF) with the following characteristics in order to test the prototype:

* 25 methods

* 205 lines of code (LOC)

* 1 class with default constructor

Table VIII shows the corresponding interval to every metric.

Analyzing the table it is possible to conclude that the interval for every metric of this SCF would be:

*V1: 201 LOC or more.

*V2: 21-40 methods per class.

*V22: 1 class with default constructor.

Once processed the corresponding SCF, AJI outputs the following result (only V1, V2 and V22 metrics):

*It is necessary to decrement the number of LOC per class.

*It is suggested to decrement the number of methods per class.

*It is suggested to write constructors.

The Expert System works with normalized values to build recommendations:

*V1: Interval 0-100 Normalized value: 1 Interval 101-200 Normalized value: 0 Interval 201 or more Normalized value: -1.

*V2: Interval 0-20 Normalized value: 1

Interval 21-40 Normalized value: 0 Interval 40 or more Normalized value: -1.

*V22: Interval 0 Normalized value: 1 Interval 1 Normalized value: 0

Interval more than 1 Normalized value: -1.

Consequently as explained previously it follows that :

AJI discretize continuous values based

on predefined thresholds values in the following way:

*V1: 201 LOC or more: -1

*V2: Interval 21-40: 0

*V22: Interval 1: 0

Thus, AJI made the following recommendations with the discrete values expressed above being demonstrated their correctness:

*It is necessary to decrement the number of LOC per class.

*It is suggested to decrement the number of methods per class.

*It is suggested to write constructors.

VII. MORE AJI RESULTS

Considering the following metrics (see table IX):

TABLEIX METRICDEFINITIONS

There were selected three ACF with predefined expected values (e.v) to test the effectiveness of the prototype and the following values (ov) were obtained after processing (see table 10):

TABLE X FILE RESULTS

VIII. FUTURE WORK

The next step will be to strengthen the testing of the prototype. To do so, AJI will be available on the web for intensive usage to gather community feedback, enabling future more self-tuning capabilities.

IX. C^ONCLUSION

This tool presents a new approach to automatically evaluate and provide recommendations for programmers in order to improve the source code quality, and consequently, the software product itself. In the process, a dataset created by

(7)

data mining algorithm is the reference classified data source to setup a neural network. The trained multilayer perceptron have demonstrated excellent precision for classifying this sort of data. Hence, in addition of an unbiased cluster profiler, the system is capable to predict software quality issues by analyzing the source code. This research establish the foundations in order to define new quality criteria for the evaluation of software and extending the source code metrics value through computational intelligence [29].

REFERENCES

[1] Kan, S., 1995. “Metrics and Models in Software Quality Engineering”, Addison-Wesley.

[2] Basili, V., 1999. “CMSC 735 Class Notes”, University of Maryland.

[3] Moore, pp. 652, 1958.

[4] James D. Arthur. 2002. “Managing Software Quality: A Measurement Framework for Assessments and Prediction”, Springer.

[5] ISO/IEC 9126: http://www.cse.dcu.ie/essiscope/sm2/9126ref.html [6] Roger S. Pressman. 1998. “Ingeniería del Software: Un Enfoque

Práctico”, Mc Graw Hill.

[7] Stephen H. Kan. 2002. “Metrics and Models in Software Quality Engineering”, Addison-Wesley Professional

[8] A. J. Albrecht. 1979. "Measuring application development productivity", Proc. IBM Applications Develop. Symp., pp. 83.

[9] Capers Jones. 1996. “Applied software measurement: assuring productivity and quality”, Mc Graw Hill.

[10] Bugzilla: http://www.bugzilla.org/about/

[11] http://www.oracle.com/technology/pub/articles/server_side_unit_tests.ht ml

[12] http://www-128.ibm.com/developerworks/java/library/j-junit4.html [13] Eva van Emden, Leon Moonen, "Java Quality Assurance by Detecting

Code Smells".

[14] Rigi: http://www.rigi.csc.uvic.ca/Pages/description/whatitis.html [15] Neil Walkinshaw, "Partitioning Object-Oriented Source Code for

Inspections", University of Strathclyde, Glasgow, 2006.

[16] AgitarOne: http://www.agitar.com/solutions/products/agitarone.html [17] Nathaniel Ayewah, William Pugh,J. David Morgenthaler, John Penix,

YuQian Zhou, "Evaluating Static Analysis Defect Warnings On Production Software"

[18] National Institute of Standards and Technology. 2002. “The Economic Impacts of Inadequate Infrastructure for Software Testing”, RTI.

[19] Linda H. Rosemberg. “Applying and Iterpreting Object Oriented Metrics”, Software Assurance Technology Center, NASA

[20] Hertz, Krogh and Palmer. 1991. “Introduction to the Theory of Neural Computation”, pp. 141-147, Addison-Wesley.

[21] Ian H. Witten, Eibe Frank. 2005. "Data Mining: Practical Machine Learning Tools and Techniques", pp. 265, Morgan Kaufmann.

[22] Agüero M, Madou F, Esperón G , López De Luise D. 2010. “Enhancing Source Code Metrics Scope Through Artificial Intelligence”, IEEE CITSA.

[23] W.A. Woods. 1970. “Transition Network Grammars for Natural Language Analysis”, pp. 591-606, Communications of the ACM.

[24] Bruce Eckel. 2003. “Thinking in Patterns”. Mindview, Inc.

[25] M.Agüero, F. Madou, G. Esperón, D. López De Luise. 2010. “Artificial Intelligence for Software Quality Improvement” ICCESSE 2010.

[26] James Gosling, Bill Joy, Guy Steele, Gilad Bracha . 2005. “The Java Language Specification 3rd Edition”, Pretience Hall.

[27] Daniela López De Luise, Martín Agüero. 2007. “Aplicación de Métricas Categóricas en Sistemas con Lógica Difusa”, R América Latina.

Revista IEEE.

[28] Jason Brittain, Ian F. Darwin, 2003. “Tomcat: The Definitive Guide”, pp. 322. O'Reilly Books.

[29] Agüero M., Esperón G., Madou F, López De Luise D. 2008. "Intelligent Java Analyzer”. IEEE CERMA

[30] Juran, J. M., and F. M. Gryna, Jr.”Quality Planning and Analysis: From Product Development Through Use”, McGraw-Hill, 1970.