• No results found

An Efficient Model For Measuring Maintainability From Code Cloning And Refactoring Using Regression

N/A
N/A
Protected

Academic year: 2020

Share "An Efficient Model For Measuring Maintainability From Code Cloning And Refactoring Using Regression"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

4019

An Efficient Model For Measuring Maintainability

From Code Cloning And Refactoring Using

Regression

U.Devi, N.Kesswani, A.Sharma

Abstract: SPL has been developed to control variability and commonality among the system. Object Oriented paradigm suffers with the pro blem of Code Clones which adversely affects the software qualities especially Maintainability. Literature survey reveals that this problem persists in FOP too. As the Software Product Line expands, unknowingly lot of Code Clones is introduced in the source code. Code Clones are the codes which are repeated or duplicated in other parts of the source code as the functionality of program increases. Thus Code Clones deteriorates the performance and functionality and introduces errors which adversely affects the Software Product Lines. In the existent literature, we have found that Feature Oriented Programming was able to remove Object Oriented Code Clones but in effect it introduced Feature Oriented Programming related Code Clones. Till now, no su ch evidence has been found in the literature which has justified the occurrence of Code Clones in Delta Oriented Programming. He nce, the focus here is on the Code Clones in Feature Oriented Programming. Maintainability is an important factor which we are considering here since the more Code Clones are found in the source code for any Software Product Lines, the more difficult it becomes to maintain Software Product Lines which in turn leads to higher cost and increased lines of source code. Thus, it becomes crucial to address the code clones as the SPL evolves, since as the SPL grows, its maintainability also becomes important and cost factor can considerably be reduced if the clones are nipped at the early stages.

Index Terms: Software Engineering, Code Cloning, Refactoring, Software Product Line, Maintainability, Syntactical Classification, Statistics, Metrics

——————————  ——————————

1.

INTRODUCTION

Traditional Software Engineering (SE) method has a group of models that deal with unusual types of projects like waterfall, spiral, iterative but each and every one of them have one or additional lack in flexibility. There are a various common stages for the developing, instead of methodology, preliminary with requirements confinements and conclusion with maintenance. Any modification to the software later than installation, whether to resolve a remaining fault or expand the functionality, constitutes maintenance. Thus, the way that software was created typically can be described as the development-and then-maintenance model [1]. Standard software products lack diversification and accordingly not succeed to convince the requirements of customer. Taking into account customer requirements by delivering individualized products requires a huge investment from the product developer and increases the development charge. SPLs were invented as an explanation that combines platform-based development and mass customization, allowing a decline of the development charge and a huge range of products [2].

Fig.1. Software product line activities

A feature is a unit of usefulness of a software framework that fulfills a prerequisite, speaks to a structure choice, and gives a potential design alternative. Feature-oriented software development (FOSD) is a worldview for the development, customization, and amalgamation of vast scale programming frameworks. The idea of an element is at the core of FOSD. FOSD points basically at 3 properties: structure, reuse, and variation [3]. Reusing a code part for comparative prerequisites by cloning is a proficient technique in programming advancement that helps in lessening cost and time. The utilization of layouts has been empowered by a portion of the ideal models of programming [4]. In particular, the examination performed that whether clones exist and how to differentiate them in SPLs, depending on the respective implementation method. Moreover, first approach discovered for CC removal in SPLs by means of refactoring. Replicated CFs, commonly referred to as CCs, have been subject to exhaustive research for over two decades. Since they play a pivotal role in the procedure of software maintenance, considerable attempt has been expended to examine when and how CCs negatively influence software superiority and preservation. Most commonly, researchers report about incoherent changes and propagating and introducing errors as the chief drawbacks of CCs for software superiority [5].

Refactoring is the way toward changing the structure of a program while keeping up the majority of its usefulness. There are numerous kinds of examples of refactoring like withdrawing a few code, redefining a class, altering a technique signature,. With each refactoring designs, various advances are done to keep the code steady with the first code. Each refactoring designs incorporates both an explanation of a refactoring chance such as a lot of code pieces that ought to be refactored and the comparing system to perform refactoring [6].

2

.

LITERATURE

SURVEY

(2)

4020 some feature oriented software product lines as for code

cloning and touched base at an end that despite the fact that Feature-Oriented Programming worldview goes for lightening the issue of code clone in item arranged system, because of certain impediments. Though the FOP related clones could be removed for an exemplary product line by applying refactoring technique, the causes which cause code clones could not be quantified. I. Heitlager et al. [8] dealings the Maintainability indicator of every part utilize an SPL. Along with that, improving feature Maintainability Index and Optimizing Maintainability Index has also been discussed. This work proposed the utilization of Maintainability Index to take structure choices that would improve worldwide practicality. H.H Mui et al. [9] talks about in Studying Late Propagations in Code Clone Evolution Using Software Repository Mining, the way to deal with concentrate Late Propagations (one of the clone development design) from programming archives. The cases have been studied and have been helpful in deriving propagation time, clone dispersion and effects of LPs on the software. In the surveys it has also come under notice that feature modules cannot remove code clone from an implementation. Along these lines, the plan of an SPL dependably begins from a base component module which contains common parts of all things considered. Moreover, an element module is expected to speak to precisely one item feature. Yue et al. [10] introduces a learning-based approach to CREC, which recommends the clones by removing the features of the software's projects and removing the features from earlier record. Thinking about a lot of programming storehouses, CREC expels refactored (NR-clones) out of the blue cloned gatherings to reconfigure (R-Clone) and to fabricate those training sets. CREC removes 34 features to identify the clonal content and developmental behaviors as well as spatial, syntax and co-change relationships of clon associates. Higo et al. [11] finding the chances of refactoring to the s\w clones a metric-based approach is proposed. Clones set is given as the input in the proposed metrics with specific properties, for eg., union degree and the spread of clones in the class hierarchy in the case of the number of variables referenced and assigned within the clone. Refactoring chances (extracting method) be recommended via examining a number of condition, those compared specified thresholds with metric values. Bian et al. [12] proposed a Semantic-Protecting Amorphous Process Evacuation (SPAPE) method in favour of Type-3 habitual Refactoring (Pass-Miss) clone SPAPE implements amorphous changes on clones of clones so that replication and division loop structures can be repeated. Matchless statement is included via put in controlled variables & conditional statement inside AST of mined process. SPAPE is supportable only written code in procedural programming language C. Speicher and Bremm [13] Propose a phased joining procedure of sort 3 clone dependent on the mapping of PDG. They also suggest that the difference in expressive operator be able to standardize by λ Expression (Java 8 feature). In the statement integration process agree to local variables, parameters, differentiation between the identities of the field and method call, the differentiation in literal, the difference in the object declaration types, & lastly, as per of the parameter in the method call. Kodhai et al. [14] adds to a progressively integrated methodology for the periods of clone support with an emphasis on clone adjustment. This methodology utilizes the refactoring procedure for clone alteration. To distinguish the clones 'CloneManager' device

has been utilized. This methodology is executed as an improvement to the current device CloneManager. The improved device is tried with the open source ventures and the outcomes are contrasted and the execution of other three existing devices. Choi et al. [15] united clone metrics is one of the way to remove the CC for reason of the cloning. This receive the matrix input as clones set and calculate the average measurement lengthwise of the token sequences within the clone, the size of the clone set, and the proportion of the not repeated coupon sequence inside the clone. Metric values with rank are known clones in any structure. As per a Case Study the most helpful refactoring chances are included in the top-ranked clone.

3 PROPOSED

METHODOLOGY

A. Problem Description

Software Product Line (SPL) has been developed to control variability and commonality among the system. Object Oriented paradigm suffers with the problem of Code Clones which adversely affects the software qualities especially Maintainability. As the Software Product Line expands, unknowingly lots of Code Clones are introduced in the source code. Code Clones are the codes which are repeated or duplicated in other parts of the source code as the functionality of program increases. Thus Code Clones deteriorates the performance and functionality and introduces an error which adversely affects the Software Product Lines. Hence, the focus here is on the Code Clones in Feature Oriented Programming. Maintainability is an important factor which we are considering here since the more Code Clones are found in the source code for any Software Product Lines; the more difficult it becomes to maintain Software Product Lines which in turn leads to higher cost and increased lines of source code. Thus, it becomes crucial to address the code clones as the SPL evolves, since as the SPL grows, its maintainability also becomes important and cost factor can considerably be reduced if the clones are nipped at the early stages.

B. Procedure

In our work, first we studied different types of clone detection tool, out of which we decided on CCFinder which helps in identifying Type-I and Type-II clones along with code clone category such as File Metrics, Clone Set Metrics, Line Based Metrics and FOP Based Metrics. For every Code clone, there are different metrics used and we get different values. We also could see the scatter plot for the same and also the portions of the source code where clones are populated could be easily seen as it is shaded. Thus it becomes easier for the removal of clones in further steps. After the detection of clones, they are further categorized into types and categories of clones such as If Statement, For Statement, While Statement, Do Statement, Method Declaration, Type Declaration. Then the values for these syntactical classifications will be taken for the programs in hand.

Prgm Name IS FS WS DS MD TD Bank Account 0 14 0 0 480 185

(3)

4021 Analyzability of the Bank account program is 5, we are finding

out the correlation between Analyzability and each of the syntactical category clones. Thus we are maintaining a table for such correlations for all the other sub characteristics of Maintainability. We create a regression equation and a scatter plot depicting the same. In the next phase of our work, we take the same programs and refactor them (removing code clones). Our next correlation will be formed between Number of Refactorings done(as independent variables) and sub characteristics of Maintainability for which the values have been derived in the previous phase.

We are finding out the correlation between Analyzability and Number of Refactorings done.

After this correlation, a regression equation is derived.

C. Comparison, Result and Usefulness of the Model After we have derived the correlation values for the source codes before and after refactoring, we do a comparison and reach at a conclusion wherein we find out that few of the codes are maintainable when refactored but few others have little or no impact even after code has been improved. Thus with our work we have tried to design a Prediction Model using the nature of correlation which helps in shedding light on the fact that Maintainability could be quantified. But our study is limited in nature. This could be expanded to more industrial applications and put to test whether just by the removal of clones would ensure low Maintainability and its effect on the overall execution of application throughout its time period.

D. Proposed Algorithm

step 1: Collect sample size

step 2: Using clone detection tool, find out the clones in them and categorize them (if, while etc.)

step 3: How to derive Maintainability on Code 1) Get correlation (along with the sub characteristics of

maintainability)

2) Code clones- independent variables

3) Maintainability and it's sub characteristics- dependent variables.

Maintainability = Analyzability + Changeability + Stability + Testability

a. Derive values for all the sub characteristics of maintainability on refactored code.

b. Refactoring of code clones using different refactoring methods.

c. how much code clones are remaining after the above step

d. what all refactoring have been done

e. how many refactoring done on which code clone

f. types of refactoring done, number of refactoring done

g. how many lines of code had how many refactoring applied

h. tables for all of the above mentioned points have to be constructed

4) Apply regression equation

5) Derive scatter plot on the basis of regression equation.

6) Take maximum independent variable value ( for small data set, medium data set and large data set)

step 4: Refactor the clones.

step 5: Again apply step no. 3 on the refactored clones.

Take the values of both (refactored and non refactored clones) in tabular form.

E. Proposed Flow Model

Fig. 2. Flow Model for Proposed Methodology

4.

SIMULATION

AND

RESULTS

These experiments have been performed on various SPL java projects that are collected from the cloning perspectives. To remove the cloning of codes and duplicity used CCFinder as a code clone detection tool. This is also done with the use of JAVA Eclipse. Using CCFinder achieved various software quality matrices. These matrices are file matrix, clone set matrix and line based matrix. Along with, they calculate six types of clones categories which are for, if while, do-while, method declaration and type declaration. Afterward calculate the maintainability with its sub-characteristics and done syntactical classification. These overall results have displayed in tabular form in the given tables below (section 5.2). Refactoring results have displayed as screenshots in section 5.3 as well. Correlation and linear regression outcomes and

Start

Stop Collect dataset

Finding Clones using CCFinder Tool & Categorized (if, while etc.)

Obtained different Software Matrices

Refactoring of Codes using refactoring methods

Java projects

Code Clone Detection Tool

Calculate the Maintainability

Derive values for all the sub-characteristics of Maintainability like Changeability, Analyzability, testability and Stability

Apply Regression method & derive scatter plot

Take maximum independent variable value

Take the values of both (refactored and non refactored clones

Refactored the Clones Go

(4)

4022 compare graph over maintainability have been depicted in

section 5.3 & section 5.4.

4.1 Results of CCFinder in Tabular Form- Graph Product Line

A. Before Refactoring

Table 1. File Metric

Name Min. Max. Average

LEN 1 3099 333.75

CLN 0 2 0.25

NBR 0 1 0.125

RSA 0 0.15 0.041199

RSI 0 0.257 0.082959

CVR 0 0.257 0.093258

RNR 0.873 1 0.947566

Table 2. Clone Set Metrics

Name Min. Max. Average

LEN 17 83 43.0588

POP 2 8 2.94118

NIF 1 3 1.47059

RAD 0 1 0.411765

RNR 0.9 1 0.948087

TKS 12 21 15.2353

LOOP 0 1 0.529412

COND 0 2 0.352941

McCabe 0 2 0.882353

Table 3. Line Based Metrics

Name Total Min. Max. Average

LOC 1336 6 648 83.5

SLOC 767 0 401 47.9375

CLOC 158 0 79 9.875

CVRL - 0 0.47 0.205997

Table 4 : Syntactical classification

for 9

if 6

while 0

do-while 0

method declaration 11

type declaration 7

B. After Refactoring (Extract & Use Supertype where possible)

Table 5: File Metrics Name Min. Max. Average

LEN 1 3099 296.778

CLN 0 12 1.38889

NBR 0 2 0.333333

RSA 0 0.303 0.135343 RSI 0 0.344 0.194496 CVR 0 0.451 0.231374 RNR 0.863 1 0.947398

Table 6: Clone Set Metrics

Name Min. Max. Average

LEN 17 83 43.0588

POP 2 8 2.94118

NIF 1 3 1.47059

RAD 0 1 0.411765

RNR 0.9 1 0.948087

TKS 12 21 15.2353 LOOP 0 1 0.529412 COND 0 2 0.352941

McCabe 0 2 0.882353

Table 6: Line Based Metrics

Name Total Min. Max. Average

LOC 1536 6 702 85.3333

SLOC 767 0 401 42.6111

CLOC 158 0 79 8.77778

CVRL - 0 0.47 0.205997

Table 8 : Syntactical classification after refactoring

for 9

if 6

while 0

do-while 0

method declaration

11

type declaration

8

5.

CONCLUSION

AND

FUTURE

WORK

(5)

4023 Prediction Model using the nature of correlation which helps in

shedding light on the fact that Maintainability could be quantified and correlated. But our study is limited in nature. So it will be used in the following ways:

1) This could be expanded to more industrial applications and put to test whether just by the removal of clones would ensure low Maintainability and its effect on the overall execution of application throughout its time period.

2) Other than Correlation method, we can develop

another model to quantify maintainability over a period of time.

3) It can be used easily in any field, and will considerably

reduce time if source code is available, independent of programs and platforms.

REFERENCES

[1] S. R. Schach, and A. Tomer.

―Development/maintenance/reuse: Software evolution in product lines‖. in: P. Donohoe (Ed.). In Proceedings of the First Software Product Line Conference : experience and research directiorns. Kluwer Academic Publishers. pp. 437-450. Norwell, MA, USA, 2000.

[2] K. Pohl, G. Böckle, and F. Van Der Linden. Software Product Line Engineering. Foundations, Principles, and Techniques. Uwplatt.Edu, 49(12):467, 2005.

[3] S. Apel and C. Kastner. ―An overview of feature-oriented software development‖. In Journal of Object Technology. Volume 8, Issue 5. pp. 49-84, 2009.

[4] Komondoor, Raghavan, and Susan Horwitz(2001), "Using slicing to identify duplication in source code." International Static Analysis Symposium. Springer, Berlin, Heidelberg: 40-56.

[5] Francesca Arcelli Fontana, Marco Zanoni, Andrea Ranchetti, and Davide Ranchetti, Software Clone Detection and Refactoring, Hindawi Publishing Corporation ISRN Software Engineering Volume 2013,

Article ID 129437, 8 pages

http://dx.doi.org/10.1155/2013/129437.

[6] E. Kodhai1 , S. Kanmani2, Method-Level Code Clone Modification using Refactoring Techniques for Clone Maintenance, Advanced Computing: An International Journal ( ACIJ ), Vol.4, No.2, March 2013.

[7] I. Schaefer, L. Bettini, F. Damiani, and N. Tanzarella. ―Delta-oriented programming of software product lines‖. In SPLC. pp. 77-91. Springer, 2010.

[8] I. Heitlager, T. Kuipers, and J. Visser. ―A practical model for measuring maintainability‖ In 6th Int. Conference on the Quality of Information and Communications Technology (QUATIC 2007). pp. 30-39. IEEE Computer Society, 2007.

[9] H. H. Mui, A. Zaidman and M. Pinzger. ―Studying Late Propagations in Code Clone Evolution Using Software Repository Mining‖. In Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) in conjunction with CSMRWCRE’14. February 3rd, Antwerp, Belgium, 2014.

[10]Ruru Yue, Zhe Gao, Na Meng, Yingfei Xiong, Xiaoyin Wang, J. David Morgenthaler, Automatic Clone Recommendation for Refactoring Based on the Present and the Past, arXiv:1807.11184v1 [cs.SE] 30 Jul 2018. [11]Y. Higo, S. Kusumoto, and K. Inoue, ―A metric-based

approach to identifying refactoring opportunities for

merging code clones in a java software system,‖ Journal of Software Maintenance and Evolution: Research and Practice, vol. 20, no. 6, pp. 435–461, 2008.

[12]Y. Bian, G. Koru, X. Su, and P. Ma, ―SPAPE: A semantic-preserving amorphous procedure extraction method for near-miss clones,‖ Journal of Systems and Software, vol. 86, no. 8, pp. 2077–2093, 2013.

[13]D. Speicher and A. Bremm, ―Clone removal in java programs as a process of stepwise unification,‖ in Proceedings of the 26th Workshop on Logic Programming, 2012.

[14]E. Kodhai1 , S. Kanmani2, Method-Level Code Clone Modification using Refactoring Techniques for Clone Maintenance, Advanced Computing: An International Journal ( ACIJ ), Vol.4, No.2, March 2013.

Figure

Fig. 2. Flow Model for Proposed Methodology
Table 1. File Metric

References

Related documents

In summary, apart from trade liberalization, there are at least two reasons behind the entry and exit decisions of each producer: i) Product replacement decisions. Some product

The purpose was to ‘ procure, develop and implement modern, integrated IT infrastructure and systems for all NHS organisations in England by 2010’ (Connecting for Health, 2004). As

The first essay focuses on the adoption and benefits of management accounting practices, whereas the second essay studies the relations between customer-focused strategy,

Despite such far-reaching changes, there is a lack of systematic knowledge about the current state of management accounting practices (MAPs) in Vietnam. This situation is

With that in mind, the aim of the study reported in this article was to show that through the use of active teaching methods focusing on the development of competences we can

This study adds to this body of work a detailed description of high school students’ views of mathematics as a tool for social inquiry in light of their mathematical investigations

Oligodendrocyte Myelin Components Inhibit Axonal Regeneration: Nogo-66, OMgp and MAG are myelin associated proteins that interact with the NgR receptor to inhibit

Analysis of the impact of scale on a single satellite pixel (ID 1167): (a) the topography and gauge network as well as superimposed grids at 0.1 8 and 0.25 8 resolution and