Tracking Software Changes:
A Framework and Examples of
Applications
Massimiliano Di Penta
Dept. Of Engineering
University of Sannio, Benevento (Italy)
Outline
Fine-grained historical analysis
Mining historical data from software repositories
Tracking source code line evolution
Applications
Tracking clones (this talk)
Tracking design pattern evolution
[ESEC/FSE’07 Paper]
Tracking vulnerabilities
[SCAM’08 Paper]
Historical analysis
Static and dynamic analysis do not capture
information such as:
How
does an artifact change during the time?
When
was it changed?
Why
was it changed?
–
Evolution, bug-fixing, refactoring,
re-documentation…
Who
changed it?
Why is it useful?
Identify change-prone parts of your systems
Must be better designed
Identify faulty-prone or vulnerable parts of your system
Must be better tested
Must be maintained
Identify artifacts that tend to change together
Impact analysis
Maintaining cloned code
Change propagation on crosscutting concerns
Triaging
Determine the most experience developer able to perform a given maintenance task based on her/his previous experience
Historical analysis
Static analysis Dynamic analysis Historical analysis Product a nalysisAnalyze trails left by developers during their maintenance activities
Analyze trails left by developers during their maintenance activities
Proc
ess
anal
ysis
(learning from histo ry)Level of detail in historical analysis
• Release level• All changes committed between two releases…
– Is it worth considering all changes separated?
• Change set - commits that share the same author, branch and notes and
their distance is < 200 s. [Zimmermann
et al.
, 2004]R
1 c1 c2 c3 c4 c5 c6R
2R
1 c1 c2 c3 c4 c5 c6R
2 ≤200 s Change Change ≤200 s >200 s ≤200 s ≤200 sR
1R
2R
3How to perform historical analysis?
As said, by integrating data from different
sources
“Software repositories”
Clustering together related changes
[Zimmermann
et al.
, 2004]
Analyzing changes and tracking artifacts across
file revisions
However this requires
Differencing tools
Tools that work on
structured representation
(e.g. AST)
Why? We want to identify:
– Methods/functions added/removed
– Variable replacement
Change Distiller
[Fluri et al., 2007]:
identifies changes
between ASTs
Language-dependent tools:
UMLDiff, JDiff, XMLDiff
Tools that work on
flat representation
of the
source code (e.g. sequence of tokens or of lines)
Advantage: no need to parse the source code
The most famous is the Unix diff
Line tracking
• Versioning systemskeep track of
differences between two file revisions
• We know about lines
added and removed
but…
• Changes are also
treated in terms of
del+add
– How to distinguish
changes from add
and del?
• How to track source
code lines across revisions
– Does line x of A1.1
correspond to line y of A1.2? A 1.1 A 1.2 add del add del
Limitations of the Unix diff
Difficulty to distinguish additions and removals
from changes
Change identification heuristic: when
diff
finds a
sequence of additions and deletions, starting from
the same position in the file, then it assumes that the
block has been
changed
.
In case a code fragment is moved upward or
downward in a file, it is not possible to keep
track of it.
Example
It is likely that the programmer swapped
lines 3,4, modified them, moved line 9
to line 5, and then added line 10.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
int power(int n, int exp) { int x=0; int power; for(x=0;x<exp;x++) { power=power*n; }
printf("Computing n to the power of exp\n"); return power; } 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: power.c (v. 1.1)
int power(int n, int exp) {
int power=1; int x;
printf("This program computes n to the power of exp\n"); for(x=0;x<exp;x++)
{
power=power*n; }
printf("Computation done: %d^%d=%d\n",n,exp,power); return power;
}
diff output
3,4c3,5
< int x=0;
< int power;
---> int power=1;
> int x;
> printf("This program computes n to the power of exp\n");
9c10
< printf("Computing n to the power of exp\n");
---> printf("Computation done: %d^%d=%d\n",n,exp,power);
Overcoming diff limitations
ldiff:
language independent line differencing tool
Enhanced line differencing tool
Overcomes the limitations of diff
Able to identify likely changed lines
Able to track line moving
References
[Canfora et al., IEEE Software 2009, MSR 2007]
Downloadable at
Approach
Overview
Three steps approach:
1.Identification of unchanged
lines through the longest
common subsequence (diff)
2.Identification of hunk
similarity (e.g. vector space
cosine similarity)
3.Identification of line similarity
(e.g. Levenshtein distance)
Steps 2 and 3 are iterated to
increase the approach recall
int bar(char c) { int[] b; if (size(b)>0) printf("D"); if (!b) { return 1; printf("C"); } else { } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } foo(c,b); int b=foo(c); printf("B"); printf("A"); printf("B"); printf("A"); Step 3 Line Similarity 1.0 1.0 0.5 (U)nchange int[] b; foo(c,b); printf("A"); } else { } int bar(char c) { int[] b; if (!b) { return 1; printf("C"); } else { printf("A"); } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } printf("A"); int b=foo(c); printf("B"); printf("B"); if (size(b)>0) printf("D"); int b=foo(c); printf("B"); printf("A"); printf("B"); int bar(char c) { if (!b) { } else { } return 1; } printf("C"); int bar(char c) { if (!b) { printf("C"); return 1; } 0.68 1.0 1.0 foo(c,b); if (size(b)>0) printf("D"); Start (C)hange (A)dd (D)el LDA(L,R) L o n g e s t C o m m o n S u b s e q u e n c e Step 2 Hunk Similarity L R Step 1
Identification of unchanged lines
int[] b; foo(c,b); printf("A"); } else { } if (size(b)>0) printf("D"); int b=foo(c); printf("B"); printf("A"); printf("B"); int bar(char c) { if (!b) { } else { } return 1; } printf("C"); int bar(char c) { if (!b) { printf("C"); return 1; } L o n g e s t C o m m o n S u b s e q u e n c e L RHunk similarity computation
int bar(char c) { int[] b; if (!b) { return 1; printf("C"); } else { printf("A"); } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } printf("A"); int b=foo(c); printf("B"); printf("B"); 0.68 1.0 1.0 foo(c,b); if (size(b)>0) printf("D");Line similarity computation
int bar(char c) { int[] b; if (size(b)>0) printf("D"); if (!b) { return 1; printf("C"); } else { } } int bar(char c) { if (!b) { return 1; printf("C"); } else { } } foo(c,b); int b=foo(c); printf("B"); printf("A"); printf("B"); printf("A"); 1.0 1.0 0.51st step
(coarse grained tracking)
A 1.1 A 1.2 add del add del 0.9 0.8 0.2 0.1
Textual similarity between code fragments
2nd step
(fine grained tracking)
A 1.1 A 1.2 add del add del 0.9
Once larger fragments are tracked…
Similarity function between source code lines
Tracking lines
A 1.1 A 1.2 Added Deleted minimum Levenshtein 1 minimum Levenshtein 2 Changed Examplechg
Tracking lines…
S1 S2 S3 line 6 S1(6): S2(3), … line 9 del add S1(9): S2(6), …Coming back to our example…
3,3c4,4 < int x=0; ---> int x; 4,4c3,3 < int power; ---> int power=1; 8a10,10> printf("Computation done: %d^%d=%d\n",n,exp,power);
9,9c5,5
< printf("Computing n to the power of exp\n");
Visualizing differences using TKDiff
http://tkdiff.sourceforge.net
Open source front-end for diff
Example of ldiff application
Comparing 2 versions of use case:
diff
Performances - I
Manually classified changes from the
ArgoUML code repository
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
CHG ADD DEL UNC
Kind of change detected
P re c is io n
Performances - II
Precision and recall in identifying randomly
generated fragment moving in openSSH and
postgreSQL code
0 0.2 0.4 0.6 0.8 1 1 2 3 4Num ber of iterations (i)
P re c is io n a n d R e c a ll Recall Precision
Tracking software artifacts evolution
Track how software artifacts changes during their
lifetime
How
and when they are introduced in the system
When
do they change
Who
changes them
In what context
they change
Examples of relevant artifacts to be tracked:
– Code clones
– Design patterns
– Vulnerable instructions
The approach can be applied to any other entity and to
non-code artifacts
Common analysis process
Step 1: CVS/SVN Snapshots extraction and change set identification
Step 2: Identifying artifacts in snapshots
Design patterns
Clones
Vulnerabilities
Step 3: Tracing artifacts evolution history
Using approaches exploiting ldiff
Step 4: Analyzing artifacts changes
Using ldiff or AST-based/specific differencing
E.g. what changes pattern classes underwent
Step 5: Analyzing artifacts co-change
I.e., set of lines/files that changed together clones, patterns, vulnerabilities
Why tracking clones?
Common wisdom:
clones are harmful
for software systems
One maintains a code fragment, and if the
change is not properly propagated on clones…
Studies in the past investigated on the
possibility of clone
automatic refactoring
Too risky (could introduce bugs)
Clones are not harmful?!?
Recent (and past) studies suggested
clones are not necessarily harmful
[Kapser and Godfrey, 2008, and Krinke, 2007]
Developers use cloning as a development
practices
E.g. code templating
However, clone evolution should be monitored
Tool for manually keeping track of clones during
maintenance
[Duala-Ekoko and Robillard, 2007]
Analyzing clone genealogies
Our approach
Automatic approach to track clones
Based on existing clone detectors and on the ldiff tool
Do not require manual tagging
Allow for
precisely tracking movements of clone
fragments
within a source code artifact
– Also for gapped clones
Tracks clone evolution patterns that have not been
coped in existing approach
It could be used to extract clone genealogies
– That’s not we did here
– We basically saw how a specific set of clones evolved across file revisions
Three-steps approach
Step 1:
Extract change sets from the CVS or SVN
repository
Step 2:
Use a clone detection tool to identify
clones in the snapshot of interest
CCFinder for token-based clone detection
Simscan and Bauhaus ccdiml for AST-based clone
detection
Any other tool can be used without any problem
Step 3:
Analyze changes occurring on these
clones and classify clone classes according to
different evolution patterns
Clone Classes and Fragments
Clone detector tools returns clone classes (
CC
)
Each clone class is composed of two or more (near)
duplicated fragments (
CF
)
The clone class z
thin the k
thsnapshot S
kis therefore
defined as:
CF is the set of source code lines of file revision f
i,j∈
S
kin the interval [l
start, l
end]
kClearly, the interval [l
start, l
end]
kcan vary across revisions
We need to track them…
{
k
}
h
k
k
z
CF
CF
CC
,
≡
1
,...,
Clone Section Pairs
CF may not match perfectly
They may be gapped clones
They may change across revisions
To this aim, we introduce the notion of a
Clone Section
(CS) pair
it represents the mapping between similar elements of two clone fragments in a clone class.
We denote the set of all clone section pairs between two clone fragments CFx, CFy as
CS pairs are identified and their evolution tracked by
means of
ldiff
{
l
}
y
x
y
x
y
x
CS
CS
CS
,
≡
1
,
,...,
,
Example of CS Pair
package org.argouml.uml.cognitive.critics; ...
public class CrNoOutgoingTransitions extends CrUML { ...
public boolean predicate2(Object dm, Designer dsgr) { if (!(dm instanceof MStateVertex)) return NO_PROBLEM; MStateVertex sv = (MStateVertex) dm;
if (sv instanceof MState) {
MStateMachine sm = ((MState)sv).getStateMachine(); if (sm != null && sm.getTop() == sv) return NO_PROBLEM; }
Collection outgoing = sv.getOutgoings();
boolean needsOutgoing = outgoing == null || outgoing.size() == 0; if (sv instanceof MFinalState) {
needsOutgoing = false; }
if (needsOutgoing) return PROBLEM_FOUND; return NO_PROBLEM;
}
} /* end class CrNoOutgoingTransitions */ 1: ... 12: 13: 14: ... 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: package org.argouml.uml.cognitive.critics; ...
public class CrNoIncomingTransitions extends CrUML { ...
public boolean predicate2(Object dm, Designer dsgr) { if (!(dm instanceof MStateVertex)) return NO_PROBLEM; MStateVertex sv = (MStateVertex) dm;
if (sv instanceof MState) {
MStateMachine sm = ((MState)sv).getStateMachine(); if (sm != null && sm.getTop() == sv) return NO_PROBLEM; }
//Vector outgoing = sv.getOutgoing(); Collection incoming = sv.getIncomings();
//boolean needsOutgoing = outgoing == null || outgoing.size() == 0; boolean needsIncoming = incoming == null || incoming.size() == 0; if (sv instanceof MPseudostate) {
MPseudostateKind k = ((MPseudostate)sv).getKind();
if (k.equals(MPseudostateKind.INITIAL)) needsIncoming = false; //if (k.equals(MPseudostateKind.FINAL)) needsOutgoing = false; }
// if (needsIncoming && !needsOutgoing) return PROBLEM_FOUND; if (needsIncoming) return PROBLEM_FOUND;
return NO_PROBLEM; }
} /* end class CrNoIncomingTransitions */ 1: ... 12: 13: 14: ... 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51:
CrNoIncomingTransitions.java (ver. 1.1) CrNoOutgoingTransitions.java (ver. 1.1)
CS1
CS2 CS3
What we are interested to monitor?
Changes
consistently and immediately propagated
to all cloned
fragments belonging to the same clone class
Changes
consistently propagated, however, there is some delay
between changes performed on different clone fragments
Cases where clones are not consistently changed; instead, they
evolve independently
, e.g., to implement different features
Thus we define the following evolution patterns
:CO: clone fragments change consistently in the same change set
IE: clone fragments evolve independently (in the time interval we are observing)
Evolution patterns
CFx CFy S0 S1 S2 Consistent change CFx CFy S0 S1 S2 Late propagation CFx CFy S0 S1 S2 Late propagation CFx CFy S0 S1 S2 Independent evolutionCF CF CF CF3 CF CF 2. Identification of clone
fragment pairs evolution 3. Identification of clone class evolution
Clone class
CS2
1. Identification of clone section pairs evolution
LP LP CO LP LP LP CO CO CF1 CF2 CF3 CS1 CS 1 CS2 1 2 1 2 3 CF1 CF2 CF3 1,2 1,2 2,3 2,3
Classifying CS changes
We compute the distance between CS
using a normalized Levenshtein distance
(NLD) and we compare it with thresholds:
Consistent change (C):
If NLD
≥
NDLT
highInconsistent change (I):
If NLD
≤
NLDT
lowUnknown (U):
If NLDT
low< NLD < NLDT
highCS evolution patterns
We analyze the sequence of changes in CS
pairs across snapshots
UN:
the sequence contains at least one UN state.
CO:
the sequence only contains C (e.g. C C C C)
IE:
the sequence ends with a sequence of I (e.g. C C I I)
LP:
the sequence contains at least one transition from C
to I, and then from I to C again terminating with C
(e.g., C C I C)
L2:
if the time interval between the I and the new C is
less than or equal to 24 hours
CF evolution patterns
We analyze the evolution of its CS pairs and we
classify the CF as
UN:
if at least the evolution of one clone section pair is
classified as UN
LP:
if at least one clone section pair evolves as LP
L2:
if at least one clone section pair evolves as L2 (and
none as LP)
CO:
if a number of CS pairs—representing at least CT% of
the fragment CS pairs source code lines as CO
IE:
otherwise
The
CT (Consistency Threshold)
has been empirically
obtained
Clone Class evolution
We analyze the evolution of its CF and
assign an evolution pattern to the CC
considering the evolution pattern that
dominates:
Calibrating the threshold
Obtained by performing a manual analysis
For the following study we chose the
thresholds that allowed to achieve the
best compromise between precision and
recall in the classification of consistent (C)
and inconsistent (I)
changes
Calibrating the threshold (cont.)
Empirical study definition
The object of this empirical study is to analyze the evolution of software clones
The purpose of investigating how changes that occurred in cloned code fragments are propagated through all clones
The quality focus is the consistency of change propagation on source code clones
The perspective is of
researchers, who want to investigate the effects of clone maintenance,
project managers that need to monitor how changes on cloned code are propagated.
The context consists of four open source projects of different sizes and developed with different programming languages (Java and C).
Empirical study
Research questions:
1. Classification of clones across evolution patterns 2. Relationship between evolution patterns and clone
granularity
3. Relationship between evolution patterns and clone radius 4. Relationship between evolution patterns and bug proneness
Analysis done on the evolution of different OSS:
– ArgoUML (Java) – JBoss (Java)
– OpenSSH (C) – PostgreSQL (C)
Different clone detectors:
Context
630–2,530 121.2–497.9 27 9,323 119 C PostgreSQL 75–170 15.5–47.3 49 1,314 33 C OpenSSH 3410–25,143 363.1–2601.8 267 28,474 27 Java JBoss 446–2,381 99.5–159.5 32 5,524 58 Java ArgoUML FILES KNLOC USERS SNAPS REL LANG SYSTEMRQ1: Clone evolution patterns
55% 40% 71% 38% 34% 52% 24% 39% 4% 2% 0% 6% 3% 3% 4% 16% 4% 5% 1% 1% 0% 10% 20% 30% 40% 50% 60% 70% 80%ArgoUML JBoss OpenSSH PostgreSQL
CO IE L2 LP UN
Token-based
RQ1: Clone evolution patterns
AST-based
56% 35% 41% 35% 33% 59% 55% 50% 3% 0% 5% 6% 4% 3% 0% 9% 4% 3% 0% 0% 0% 10% 20% 30% 40% 50% 60% 70%ArgoUML JBoss OpenSSH PostgreSQL
CO IE L2 LP UN
Examples of clone evolution patterns
Consistent Change (ArgoUML)
The class GenAncestorClasses has been cloned from GenDescendantClasses. Both are utility classes used to navigate class hierarchies in UML diagrams.
Such classes underwent different refactoring changes, always consistently propagated
Independent Evolution (ArgoUML)
The classes GeneratorJava and GeneratorDisplay contain some cloned methods.
The second class becomes more complex in newer ArgoUML versions to account for enhanced visualization features.
Starting from the revision 1.8 of GeneratorJava and 1.4 of GeneratorDisplay, such classes evolved independently.
Late Propagation (PostgreSQL)
The modules parse_oper.c and parse_func.c contain two block size clones.
The first underwent to a bug fixing (August, 26 1999)
The same bug was discovered six months later on the other clone (February, 20 2000)
CVS commit note:
RQ2: Granularity
We used a Chi-Square test to see if the
proportion of clone evolution patterns was
different for different granularities
H
0: proportions of clone classes do not change across
granularities
Class/file, function/method, code fragment
Overall,
no significant difference was found
but in
a few cases
E.g. in JBoss class level clones had a larger proportion
of IE
RQ3: Radius
We used a Kruskal-Wallis test to see to
see whether there was a relationship
between the clone radius and the
evolution pattern
H
0: the median evolution pattern does not
change across different radius
With a few exceptions, no significant
difference was found
RQ4: Relationship with fault-proneness
We used a prop test to see whether the
proportion of bug fixings changes across
evolution patterns
H
0: proportions of # of bug fixings do not
RQ4: Results
AST-based clones
Proportions significantly different for ArgoUML (p-value=2.6 10−6),
JBoss (p-value=0.0003), and PostgreSQL (p-value=0.0009), and marginally different for OpenSSH (p-value=0.06)
Except for ArgoUML, the highest proportions are found for the L2 and LP patterns
Token-based clones
Proportions significantly different
for ArgoUML (p-value=0.002, higher for LP) and PostgreSQL (p-value=0.02, higher for L2)
In other cases
higher proportions were found for L2 and LP
Pieces of evidence
PoE 1:
Clones are often consistently
changed
PoE 2:
Using clones for templating is a
common phenomena in software systems
PoE 3:
clone characteristics do not
influence the evolution patterns
PoE 4:
high proportions of bug fixing
changes occur for clones exhibiting late
propagations
Threats to validity
Construct validity
Results might depend on clone detector performances
Line tracking works well but is not perfect
Sensitive to approximations in the classification
Internal validity
Not really, this is an exploratory study
Also on relationships between clone evolution patterns and bugs we only claim correlation not causation
Evolution patterns could depend on the clone age
– however we limited the study to clones occurring at an early stage
External validity
Other systems can lead to different results
Summary
Historical analysis represents a useful “third dimension”
for software analysis…
…however it requires to deal with a number of issues
Among others, tracking line changes across change sets
We developed an improved differencing tool (ldiff) that overcomes the limitation of the Unix diff
We built upon it a
common framework from software
historical analysis
, used for
Clone tracking
Design pattern tracking
Vulnerability tracking
Work-in-progress:
Improve the framework features and availability
Use it for further fine-grained evolution studies
Ack
Research on this topic has been carried out together with:
Lerina Aversano
Gerardo Canfora
Luigi Cerulo
Tina Del Grosso
Suresh Thummalapenta
This work is partially supported by the project METAMORPHOS
(MEthods and Tools for migrAting software systeMs towards web and service Oriented aRchitectures: exPerimental evaluation, usability, and tecHnOlogy tranSfer), funded by MiUR (Ministero dell'Università e della Ricerca) under grant PRIN2006-2006098097.
(Our) References - III
Differencing
Gerardo Canfora, Luigi Cerulo, and Massimiliano Di Penta. Ldiff: an Enhanced Line Differencing Tool. To appear in proceedings of ICSE 2009 (formal demo), May 2009, Vancouver, BC.
Gerardo Canfora, Luigi Cerulo, and Massimiliano Di Penta. Tracking your changes: a language-independent approach. IEEE Software, 27(1), pp. 50-57, 2009.
Gerardo Canfora, Luigi Cerulo, Massimiliano Di Penta: Identifying Changed Source Code Lines from Version Repositories. Fourth International Workshop on Mining Software Repositories (MSR 2007), Minneapolis, MN, USA, May 19-20, 2007
Fine grained analysis of software evolution
Lerina Aversano, Gerardo Canfora, Luigi Cerulo, Concettina Del Grosso, Massimiliano Di Penta: An empirical study on the evolution of design patterns. ESEC/SIGSOFT FSE 2007: 385-394
Lerina Aversano, Luigi Cerulo, Massimiliano Di Penta: Relating the Evolution of Design Patterns and Crosscutting Concerns, in proceedings of the 8th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM 2007), October 2007, Paris, France
Massimiliano Di Penta, Luigi Cerulo, Yann-Gael Guehéneuc, Giuliano Antoniol: An Empirical Study of the Relationships between Design Pattern Roles and Class Change Proneness.
Proceedings of the 24th IEEE International Conference on Software Maintenance (ICSM 2008), October 2008, Beijing, China
Lerina Aversano, Luigi Cerulo, Massimiliano Di Penta: How Clones are Maintained: An Empirical Study. 11th European Conference on Software Maintenance and Reengineering (CSMR 2007): 81-90
Related Work - I
M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In ICSM
’03: Proceedings of 19th IEEE International Conference on Software
Maintenance, Amsterdam, Netherlands, Sept. 2003. IEEE Computer
Society Press. 17.H. Gall, M. Jazayeri, and J. Krajewski. CVS release history data for detecting logical couplings. In IWPSE ’03: Proceedings of the 6th
International Workshop on Principles of Software Evolution, page
13. IEEE Computer Society, 2003.M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In Proceedings of the International Conference on Software
Maintenance, pages 23-32, Amsterdam, Netherlands, September 2003. IEEE Computer Society Press.
T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In ICSE '04:
Proceedings of the 26th International Conference on Software
Engineering, pages 563-572. IEEE Computer Society, 2004.
E. Duala-Ekoko and M. P. Robillard. Tracking code clones in evolving software. In ICSE ’07: Proceedings of the 29th International
Related Work - II
Beat Fluri, Michael Würsch, Martin Pinzger, Harald Gall: Change Distilling: Tree Differencing for Fine-Grained Source Code Change Extraction. IEEE Trans. Software Eng. 33(11): 725-743 (2007)
M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. In Proceedings of the European Software
Engineering Conference and the ACM Symposium on the
Foundations of Software Engineering, pages 187–196, Lisbon,
Portogal, September 2005. ACM Press.J. Krinke. A study of consistent and inconsistent changes to code clones. In 14th Working Conference on Reverse Engineering (WCRE
2007), 28-31 October 2007, Vancouver, BC, Canada, pages 170–
178, Los Alamitos, CA, USA, 2007. IEEE Computer Society.
Cory Kapser, Michael W. Godfrey: "Cloning considered harmful" considered harmful: patterns of cloning in software. Empirical Software Engineering 13(6): 645-692 (2008)