Tracking Software Changes: A Framework and Examples of Applications

(1)

Tracking Software Changes:

A Framework and Examples of

Applications

Massimiliano Di Penta

Dept. Of Engineering

University of Sannio, Benevento (Italy)

(2)

Outline

Fine-grained historical analysis

Mining historical data from software repositories

Tracking source code line evolution

Applications

Tracking clones (this talk)

Tracking design pattern evolution

[ESEC/FSE’07 Paper]

Tracking vulnerabilities

[SCAM’08 Paper]

(3)

Historical analysis

Static and dynamic analysis do not capture

information such as:

How

does an artifact change during the time?

When

was it changed?

Why

was it changed?

–

Evolution, bug-fixing, refactoring,

re-documentation…

Who

changed it?

(4)

Why is it useful?

Identify change-prone parts of your systems

Must be better designed

Identify faulty-prone or vulnerable parts of your system

Must be better tested

Must be maintained

Identify artifacts that tend to change together

Impact analysis

Maintaining cloned code

Change propagation on crosscutting concerns

Triaging

Determine the most experience developer able to perform a given maintenance task based on her/his previous experience

(5)

Historical analysis

Static analysis Dynamic analysis Historical analysis Product a nalysis

Analyze trails left by developers during their maintenance activities

Proc

ess

anal

ysis

(learning from histo ry)

(6)

Level of detail in historical analysis

• Release level

• _{All changes committed between two releases…}

– _{Is it worth considering all changes separated?}

• Change set - commits that share the same author, branch and notes and

their distance is < 200 s. [Zimmermann

et al.

, 2004]

R

₁ c1 c2 c3 c4 c5 c6

R

₂

R

₁ c₁ c₂ c₃ c₄ c₅ c₆

R

₂ ≤200 s Change Change ≤200 s >200 s ≤200 s ≤200 s

R

₁

R

₂

R

₃

(7)

How to perform historical analysis?

As said, by integrating data from different

sources

“Software repositories”

Clustering together related changes

[Zimmermann

et al.

, 2004]

Analyzing changes and tracking artifacts across

file revisions

However this requires

(8)

Differencing tools

Tools that work on

structured representation

(e.g. AST)

Why? We want to identify:

– Methods/functions added/removed

– Variable replacement

Change Distiller

[Fluri et al., 2007]:

identifies changes

between ASTs

Language-dependent tools:

UMLDiff, JDiff, XMLDiff

Tools that work on

flat representation

of the

source code (e.g. sequence of tokens or of lines)

Advantage: no need to parse the source code

The most famous is the Unix diff

(9)

Line tracking

• Versioning systems

keep track of

differences between two file revisions

• _{We know about lines}

added and removed

but…

• Changes are also

treated in terms of

del+add

– _{How to distinguish}

changes from add

and del?

• _{How to track source}

code lines across revisions

– _{Does line}_x_of_A1.1

correspond to line y of A1.2? A 1.1 A 1.2 add del add del

(10)

Limitations of the Unix diff

Difficulty to distinguish additions and removals

from changes

Change identification heuristic: when

diff

finds a

sequence of additions and deletions, starting from

the same position in the file, then it assumes that the

block has been

changed

.

In case a code fragment is moved upward or

downward in a file, it is not possible to keep

track of it.

(11)

Example

It is likely that the programmer swapped

lines 3,4, modified them, moved line 9

to line 5, and then added line 10.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

int power(int n, int exp) { int x=0; int power; for(x=0;x<exp;x++) { power=power*n; }

printf("Computing n to the power of exp\n"); return power; } 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: power.c (v. 1.1)

int power(int n, int exp) {

int power=1; int x;

printf("This program computes n to the power of exp\n"); for(x=0;x<exp;x++)

{

power=power*n; }

printf("Computation done: %d^%d=%d\n",n,exp,power); return power;

}

(12)

diff output

3,4c3,5

< int x=0;

< int power;

---> int power=1;

> int x;

> printf("This program computes n to the power of exp\n");

9c10

< printf("Computing n to the power of exp\n");

---> printf("Computation done: %d^%d=%d\n",n,exp,power);

(13)

Overcoming diff limitations

ldiff:

language independent line differencing tool

Enhanced line differencing tool

Overcomes the limitations of diff

Able to identify likely changed lines

Able to track line moving

References

[Canfora et al., IEEE Software 2009, MSR 2007]

Downloadable at

(14)

Approach

Overview

Three steps approach:

1.Identification of unchanged

lines through the longest

common subsequence (diff)

2.Identification of hunk

similarity (e.g. vector space

cosine similarity)

3.Identification of line similarity

(e.g. Levenshtein distance)

Steps 2 and 3 are iterated to

increase the approach recall

int bar(char c) { int[] b; if (size(b)>0) printf("D"); if (!b) { return 1; printf("C"); } else { } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } foo(c,b); int b=foo(c); printf("B"); printf("A"); printf("B"); printf("A"); Step 3 Line Similarity 1.0 1.0 0.5 (U)nchange int[] b; foo(c,b); printf("A"); } else { } int bar(char c) { int[] b; if (!b) { return 1; printf("C"); } else { printf("A"); } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } printf("A"); int b=foo(c); printf("B"); printf("B"); if (size(b)>0) printf("D"); int b=foo(c); printf("B"); printf("A"); printf("B"); int bar(char c) { if (!b) { } else { } return 1; } printf("C"); int bar(char c) { if (!b) { printf("C"); return 1; } 0.68 1.0 1.0 foo(c,b); if (size(b)>0) printf("D"); Start (C)hange (A)dd (D)el LDA(L,R) L o n g e s t C o m m o n S u b s e q u e n c e Step 2 Hunk Similarity L R Step 1

(15)

Identification of unchanged lines

int[] b; foo(c,b); printf("A"); } else { } if (size(b)>0) printf("D"); int b=foo(c); printf("B"); printf("A"); printf("B"); int bar(char c) { if (!b) { } else { } return 1; } printf("C"); int bar(char c) { if (!b) { printf("C"); return 1; } L o n g e s t C o m m o n S u b s e q u e n c e L R

(16)

Hunk similarity computation

int bar(char c) { int[] b; if (!b) { return 1; printf("C"); } else { printf("A"); } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } printf("A"); int b=foo(c); printf("B"); printf("B"); 0.68 1.0 1.0 foo(c,b); if (size(b)>0) printf("D");

(17)

Line similarity computation

int bar(char c) { int[] b; if (size(b)>0) printf("D"); if (!b) { return 1; printf("C"); } else { } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } foo(c,b); int b=foo(c); printf("B"); printf("A"); printf("B"); printf("A"); 1.0 1.0 0.5

(18)

1st step

(coarse grained tracking)

A 1.1 A 1.2 add del add del 0.9 0.8 0.2 0.1

Textual similarity between code fragments

(19)

2nd step

(fine grained tracking)

A 1.1 A 1.2 add del add del 0.9

Once larger fragments are tracked…

Similarity function between source code lines

(20)

Tracking lines

A 1.1 A 1.2 Added Deleted minimum Levenshtein 1 minimum Levenshtein 2 Changed Example

(21)

chg

Tracking lines…

S₁ S₂ S₃ line 6 S₁(6): S₂(3), … line 9 del add S₁(9): S₂(6), …

(22)

Coming back to our example…

3,3c4,4 < int x=0; ---> int x; 4,4c3,3 < int power; ---> int power=1; 8a10,10

> printf("Computation done: %d^%d=%d\n",n,exp,power);

9,9c5,5

< printf("Computing n to the power of exp\n");

(23)

Visualizing differences using TKDiff

http://tkdiff.sourceforge.net

Open source front-end for diff

(24)

Example of ldiff application

Comparing 2 versions of use case:

diff

(25)

(26)

(27)

Performances - I

Manually classified changes from the

ArgoUML code repository

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

CHG ADD DEL UNC

Kind of change detected

P re c is io n

(28)

Performances - II

Precision and recall in identifying randomly

generated fragment moving in openSSH and

postgreSQL code

0 0.2 0.4 0.6 0.8 1 1 2 3 4

Num ber of iterations (i)

P re c is io n a n d R e c a ll Recall Precision

(29)

Tracking software artifacts evolution

Track how software artifacts changes during their

lifetime

How

and when they are introduced in the system

When

do they change

Who

changes them

In what context

they change

Examples of relevant artifacts to be tracked:

– Code clones

– Design patterns

– Vulnerable instructions

The approach can be applied to any other entity and to

non-code artifacts

(30)

Common analysis process

Step 1: CVS/SVN Snapshots extraction and change set identification

Step 2: Identifying artifacts in snapshots

Design patterns

Clones

Vulnerabilities

Step 3: Tracing artifacts evolution history

Using approaches exploiting ldiff

Step 4: Analyzing artifacts changes

Using ldiff or AST-based/specific differencing

E.g. what changes pattern classes underwent

Step 5: Analyzing artifacts co-change

I.e., set of lines/files that changed together clones, patterns, vulnerabilities

(31)

(32)

Why tracking clones?

Common wisdom:

clones are harmful

for software systems

One maintains a code fragment, and if the

change is not properly propagated on clones…

Studies in the past investigated on the

possibility of clone

automatic refactoring

Too risky (could introduce bugs)

(33)

Clones are not harmful?!?

Recent (and past) studies suggested

clones are not necessarily harmful

[Kapser and Godfrey, 2008, and Krinke, 2007]

Developers use cloning as a development

practices

E.g. code templating

However, clone evolution should be monitored

Tool for manually keeping track of clones during

maintenance

[Duala-Ekoko and Robillard, 2007]

Analyzing clone genealogies

(34)

Our approach

Automatic approach to track clones

Based on existing clone detectors and on the ldiff tool

Do not require manual tagging

Allow for

precisely tracking movements of clone

fragments

within a source code artifact

– Also for gapped clones

Tracks clone evolution patterns that have not been

coped in existing approach

It could be used to extract clone genealogies

– That’s not we did here

– We basically saw how a specific set of clones evolved across file revisions

(35)

Three-steps approach

Step 1:

Extract change sets from the CVS or SVN

repository

Step 2:

Use a clone detection tool to identify

clones in the snapshot of interest

CCFinder for token-based clone detection

Simscan and Bauhaus ccdiml for AST-based clone

detection

Any other tool can be used without any problem

Step 3:

Analyze changes occurring on these

clones and classify clone classes according to

different evolution patterns

(36)

Clone Classes and Fragments

Clone detector tools returns clone classes (

CC

)

Each clone class is composed of two or more (near)

duplicated fragments (

CF

)

The clone class z

_th

in the k

_th

snapshot S

_k

is therefore

defined as:

CF is the set of source code lines of file revision f

_i,j

∈

S

_k

in the interval [l

_start

, l

_end

]

_k

Clearly, the interval [l

_start

, l

_end

]

_k

can vary across revisions

We need to track them…

{

k

}

h

k

z

CF

CC

_,

≡

₁

,...,

(37)

Clone Section Pairs

CF may not match perfectly

They may be gapped clones

They may change across revisions

To this aim, we introduce the notion of a

Clone Section

(CS) pair

it represents the mapping between similar elements of two clone fragments in a clone class.

We denote the set of all clone section pairs between two clone fragments CFx, CFy as

CS pairs are identified and their evolution tracked by

means of

ldiff

{

l

}

y

x

y

x

y

x

CS

_,

_≡

1 _,

,...,

_,

(38)

Example of CS Pair

package org.argouml.uml.cognitive.critics; ...

public class CrNoOutgoingTransitions extends CrUML { ...

public boolean predicate2(Object dm, Designer dsgr) { if (!(dm instanceof MStateVertex)) return NO_PROBLEM; MStateVertex sv = (MStateVertex) dm;

if (sv instanceof MState) {

MStateMachine sm = ((MState)sv).getStateMachine(); if (sm != null && sm.getTop() == sv) return NO_PROBLEM; }

Collection outgoing = sv.getOutgoings();

boolean needsOutgoing = outgoing == null || outgoing.size() == 0; if (sv instanceof MFinalState) {

needsOutgoing = false; }

if (needsOutgoing) return PROBLEM_FOUND; return NO_PROBLEM;

}

} /* end class CrNoOutgoingTransitions */ 1: ... 12: 13: 14: ... 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: package org.argouml.uml.cognitive.critics; ...

public class CrNoIncomingTransitions extends CrUML { ...

public boolean predicate2(Object dm, Designer dsgr) { if (!(dm instanceof MStateVertex)) return NO_PROBLEM; MStateVertex sv = (MStateVertex) dm;

if (sv instanceof MState) {

MStateMachine sm = ((MState)sv).getStateMachine(); if (sm != null && sm.getTop() == sv) return NO_PROBLEM; }

//Vector outgoing = sv.getOutgoing(); Collection incoming = sv.getIncomings();

//boolean needsOutgoing = outgoing == null || outgoing.size() == 0; boolean needsIncoming = incoming == null || incoming.size() == 0; if (sv instanceof MPseudostate) {

MPseudostateKind k = ((MPseudostate)sv).getKind();

if (k.equals(MPseudostateKind.INITIAL)) needsIncoming = false; //if (k.equals(MPseudostateKind.FINAL)) needsOutgoing = false; }

// if (needsIncoming && !needsOutgoing) return PROBLEM_FOUND; if (needsIncoming) return PROBLEM_FOUND;

return NO_PROBLEM; }

} /* end class CrNoIncomingTransitions */ 1: ... 12: 13: 14: ... 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51:

CrNoIncomingTransitions.java (ver. 1.1) CrNoOutgoingTransitions.java (ver. 1.1)

CS1

CS2 CS3

(39)

What we are interested to monitor?

Changes

consistently and immediately propagated

to all cloned

fragments belonging to the same clone class

Changes

consistently propagated, however, there is some delay

between changes performed on different clone fragments

Cases where clones are not consistently changed; instead, they

evolve independently

, e.g., to implement different features

Thus we define the following evolution patterns

:

CO: clone fragments change consistently in the same change set

IE: clone fragments evolve independently (in the time interval we are observing)

(40)

Evolution patterns

CF_x CF_y S₀ S₁ S₂ Consistent change CF_x CF_y S₀ S₁ S₂ Late propagation CF_x CF_y S₀ S₁ S₂ Late propagation CF_x CF_y S₀ S₁ S₂ Independent evolution

(41)

CF CF CF CF₃ CF CF _{2. Identification of clone}

fragment pairs evolution 3. Identification of clone class evolution

Clone class

CS2

1. Identification of clone section pairs evolution

LP LP CO LP LP LP CO CO CF₁ CF₂ CF₃ CS1 CS 1 CS2 1 2 1 2 3 CF₁ CF₂ CF₃ 1,2 1,2 2,3 2,3

(42)

Classifying CS changes

We compute the distance between CS

using a normalized Levenshtein distance

(NLD) and we compare it with thresholds:

Consistent change (C):

If NLD

≥

NDLT

_high

Inconsistent change (I):

If NLD

≤

NLDT

_low

Unknown (U):

If NLDT

_low

< NLD < NLDT

_high

(43)

CS evolution patterns

We analyze the sequence of changes in CS

pairs across snapshots

UN:

the sequence contains at least one UN state.

CO:

the sequence only contains C (e.g. C C C C)

IE:

the sequence ends with a sequence of I (e.g. C C I I)

LP:

the sequence contains at least one transition from C

to I, and then from I to C again terminating with C

(e.g., C C I C)

L2:

if the time interval between the I and the new C is

less than or equal to 24 hours

(44)

CF evolution patterns

We analyze the evolution of its CS pairs and we

classify the CF as

UN:

if at least the evolution of one clone section pair is

classified as UN

LP:

if at least one clone section pair evolves as LP

L2:

if at least one clone section pair evolves as L2 (and

none as LP)

CO:

if a number of CS pairs—representing at least CT% of

the fragment CS pairs source code lines as CO

IE:

otherwise

The

CT (Consistency Threshold)

has been empirically

obtained

(45)

Clone Class evolution

We analyze the evolution of its CF and

assign an evolution pattern to the CC

considering the evolution pattern that

dominates:

(46)

Calibrating the threshold

Obtained by performing a manual analysis

For the following study we chose the

thresholds that allowed to achieve the

best compromise between precision and

recall in the classification of consistent (C)

and inconsistent (I)

changes

(47)

Calibrating the threshold (cont.)

(48)

Empirical study definition

The object of this empirical study is to analyze the evolution of software clones

The purpose of investigating how changes that occurred in cloned code fragments are propagated through all clones

The quality focus is the consistency of change propagation on source code clones

The perspective is of

researchers, who want to investigate the effects of clone maintenance,

project managers that need to monitor how changes on cloned code are propagated.

The context consists of four open source projects of different sizes and developed with different programming languages (Java and C).

(49)

Empirical study

Research questions:

1. Classification of clones across evolution patterns 2. Relationship between evolution patterns and clone

granularity

3. Relationship between evolution patterns and clone radius 4. Relationship between evolution patterns and bug proneness

Analysis done on the evolution of different OSS:

– _{ArgoUML (Java)} – JBoss (Java)

– OpenSSH (C) – _{PostgreSQL (C)}

Different clone detectors:

(50)

Context

630–2,530 121.2–497.9 27 9,323 119 C PostgreSQL 75–170 15.5–47.3 49 1,314 33 C OpenSSH 3410–25,143 363.1–2601.8 267 28,474 27 Java JBoss 446–2,381 99.5–159.5 32 5,524 58 Java ArgoUML FILES KNLOC USERS SNAPS REL LANG SYSTEM

(51)

RQ1: Clone evolution patterns

55% 40% 71% 38% 34% 52% 24% 39% 4% 2% _0% 6% 3% 3% 4% 16% 4% 5% 1% 1% 0% 10% 20% 30% 40% 50% 60% 70% 80%

ArgoUML JBoss OpenSSH PostgreSQL

CO IE L2 LP UN

Token-based

(52)

RQ1: Clone evolution patterns

AST-based

56% 35% 41% 35% 33% 59% 55% 50% 3% 0% 5% 6% 4% _3% 0% 9% 4% _3% 0% 0% 0% 10% 20% 30% 40% 50% 60% 70%

ArgoUML JBoss OpenSSH PostgreSQL

CO IE L2 LP UN

(53)

Examples of clone evolution patterns

Consistent Change (ArgoUML)

The class GenAncestorClasses has been cloned from GenDescendantClasses. Both are utility classes used to navigate class hierarchies in UML diagrams.

Such classes underwent different refactoring changes, always consistently propagated

Independent Evolution (ArgoUML)

The classes GeneratorJava and GeneratorDisplay contain some cloned methods.

The second class becomes more complex in newer ArgoUML versions to account for enhanced visualization features.

Starting from the revision 1.8 of GeneratorJava and 1.4 of GeneratorDisplay, such classes evolved independently.

Late Propagation (PostgreSQL)

The modules parse_oper.c and parse_func.c contain two block size clones.

The first underwent to a bug fixing (August, 26 1999)

The same bug was discovered six months later on the other clone (February, 20 2000)

CVS commit note:

(54)

RQ2: Granularity

We used a Chi-Square test to see if the

proportion of clone evolution patterns was

different for different granularities

H

₀

: proportions of clone classes do not change across

granularities

Class/file, function/method, code fragment

Overall,

no significant difference was found

but in

a few cases

E.g. in JBoss class level clones had a larger proportion

of IE

(55)

RQ3: Radius

We used a Kruskal-Wallis test to see to

see whether there was a relationship

between the clone radius and the

evolution pattern

H

₀

: the median evolution pattern does not

change across different radius

With a few exceptions, no significant

difference was found

(56)

RQ4: Relationship with fault-proneness

We used a prop test to see whether the

proportion of bug fixings changes across

evolution patterns

H

₀

: proportions of # of bug fixings do not

(57)

(58)

RQ4: Results

AST-based clones

Proportions significantly different for ArgoUML (p-value=2.6 10−6_),

JBoss (p-value=0.0003), and PostgreSQL (p-value=0.0009), and marginally different for OpenSSH (p-value=0.06)

Except for ArgoUML, the highest proportions are found for the L2 and LP patterns

Token-based clones

Proportions significantly different

for ArgoUML (p-value=0.002, higher for LP) and PostgreSQL (p-value=0.02, higher for L2)

In other cases

higher proportions were found for L2 and LP

(59)

Pieces of evidence

PoE 1:

Clones are often consistently

changed

PoE 2:

Using clones for templating is a

common phenomena in software systems

PoE 3:

clone characteristics do not

influence the evolution patterns

PoE 4:

high proportions of bug fixing

changes occur for clones exhibiting late

propagations

(60)

Threats to validity

Construct validity

Results might depend on clone detector performances

Line tracking works well but is not perfect

Sensitive to approximations in the classification

Internal validity

Not really, this is an exploratory study

Also on relationships between clone evolution patterns and bugs we only claim correlation not causation

Evolution patterns could depend on the clone age

– however we limited the study to clones occurring at an early stage

External validity

Other systems can lead to different results

(61)

Summary

Historical analysis represents a useful “third dimension”

for software analysis…

…however it requires to deal with a number of issues

Among others, tracking line changes across change sets

We developed an improved differencing tool (ldiff) that overcomes the limitation of the Unix diff

We built upon it a

common framework from software

historical analysis

, used for

Clone tracking

Design pattern tracking

Vulnerability tracking

Work-in-progress:

Improve the framework features and availability

Use it for further fine-grained evolution studies

(62)

Ack

Research on this topic has been carried out together with:

Lerina Aversano

Gerardo Canfora

Luigi Cerulo

Tina Del Grosso

Suresh Thummalapenta

This work is partially supported by the project METAMORPHOS

(MEthods and Tools for migrAting software systeMs towards web and service Oriented aRchitectures: exPerimental evaluation, usability, and tecHnOlogy tranSfer), funded by MiUR (Ministero dell'Università e della Ricerca) under grant PRIN2006-2006098097.

(63)

(Our) References - III

Differencing

Gerardo Canfora, Luigi Cerulo, and Massimiliano Di Penta. Ldiff: an Enhanced Line Differencing Tool. To appear in proceedings of ICSE 2009 (formal demo), May 2009, Vancouver, BC.

Gerardo Canfora, Luigi Cerulo, and Massimiliano Di Penta. Tracking your changes: a language-independent approach. IEEE Software, 27(1), pp. 50-57, 2009.

Gerardo Canfora, Luigi Cerulo, Massimiliano Di Penta: Identifying Changed Source Code Lines from Version Repositories. Fourth International Workshop on Mining Software Repositories (MSR 2007), Minneapolis, MN, USA, May 19-20, 2007

Fine grained analysis of software evolution

Lerina Aversano, Gerardo Canfora, Luigi Cerulo, Concettina Del Grosso, Massimiliano Di Penta: An empirical study on the evolution of design patterns. ESEC/SIGSOFT FSE 2007: 385-394

Lerina Aversano, Luigi Cerulo, Massimiliano Di Penta: Relating the Evolution of Design Patterns and Crosscutting Concerns, in proceedings of the 8th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM 2007), October 2007, Paris, France

Massimiliano Di Penta, Luigi Cerulo, Yann-Gael Guehéneuc, Giuliano Antoniol: An Empirical Study of the Relationships between Design Pattern Roles and Class Change Proneness.

Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008), October 2008, Beijing, China

Lerina Aversano, Luigi Cerulo, Massimiliano Di Penta: How Clones are Maintained: An Empirical Study. 11th European Conference on Software Maintenance and Reengineering (CSMR 2007): 81-90

(64)

Related Work - I

M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM

’03: Proceedings of 19th IEEE International Conference on Software

Maintenance, Amsterdam, Netherlands, Sept. 2003. IEEE Computer

Society Press. 17.

H. Gall, M. Jazayeri, and J. Krajewski. CVS release history data for detecting logical couplings. In IWPSE ’03: Proceedings of the 6th

International Workshop on Principles of Software Evolution, page

13. IEEE Computer Society, 2003.

M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In Proceedings of the International Conference on Software

Maintenance, pages 23-32, Amsterdam, Netherlands, September 2003. IEEE Computer Society Press.

T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In ICSE '04:

Proceedings of the 26th International Conference on Software

Engineering, pages 563-572. IEEE Computer Society, 2004.

E. Duala-Ekoko and M. P. Robillard. Tracking code clones in evolving software. In ICSE ’07: Proceedings of the 29th International

(65)

Related Work - II

Beat Fluri, Michael Würsch, Martin Pinzger, Harald Gall: Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE Trans. Software Eng. 33(11): 725-743 (2007)

M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. In Proceedings of the European Software

Engineering Conference and the ACM Symposium on the

Foundations of Software Engineering, pages 187–196, Lisbon,

Portogal, September 2005. ACM Press.

J. Krinke. A study of consistent and inconsistent changes to code clones. In 14th Working Conference on Reverse Engineering (WCRE

2007), 28-31 October 2007, Vancouver, BC, Canada, pages 170–

178, Los Alamitos, CA, USA, 2007. IEEE Computer Society.

Cory Kapser, Michael W. Godfrey: "Cloning considered harmful" considered harmful: patterns of cloning in software. Empirical Software Engineering 13(6): 645-692 (2008)

(66)