Result Presentation - Why and How to Control Cloning in Software Artifacts. Elmar Juergens

Different use cases require different ways of interacting with clone detection results. This section outlines how results are presented in a quality dashboard for clone control and in an IDE for interactive clone inspection and change propagation.

Similar to postprocessing, this section focuses on presentation of code clones; all presentations can be applied to requirements clones as well, since both share the same intermediate representation. Furthermore, in many cases, ConQAT either contains similar presentation functionality for model clones, or it could be implemented in a similar fashion.

7.5.1 Project Dashboard

Project dashboards support continuous software quality control. Their goal is to provision stakeholders—including project management and developers—with relevant and accurate information on the quality characteristics of the software they are developing [48]. For this, quality dashboards perform automated quality analyses and collect, filter, aggregate and visualize result data. Through its visual data flow language, ConQAT supports the construction of such dashboards. Clone detection is one of the key supported quality analyses.

Different stakeholder roles requires different presentations of clone detection results. To support them, ConQAT presents clone detection result information on different levels of aggregation.

Clone Lists provide cloning information on the file level, as depicted in the screenshot in Fig- ure 7.17. They reveal the longest clones and the clone groups with the most instances. While no replacement for clone inspection on the code level, clone lists allow developers to get a first idea about the detected clones without requiring them to open their IDEs.

Figure 7.17: Clone list in the dashboard

Treemaps [223] visualize the distribution of cloning across artifacts. They thus reveal to stakeholders which areas of their project are affected how much.

Treemaps visualize source code size, structure and cloning in a single image. We introduce their interpretation by constructing a treemap step by step. A treemap starts with an empty rectangle.

7.5 Result Presentation

Its area represents all project artifacts. In the first step, this rectangle is divided into sub-rectangles. Each sub-rectangle represents a component of the project. The size of the sub-rectangle corresponds to the aggregate size of the artifacts belonging to the component. The resulting visualization is depicted in Figure 7.18 on the left. The visualized project contains 24 components. For the largest ones, name and size (in LOC) are depicted. Since componentGUI Forms(91 kLOC) is larger than

componentBusiness Logic, its rectangle occupies a proportionally larger area.

In the second step, each component rectangle is further divided into sub-rectangles for the individual artifacts contained in the component. Again, rectangle area and artifact size correspond. The result is depicted in Figure 7.18 on the right.

Figure 7.18: Treemap construction: artifact arrangement

Although position and size of the top-level rectangles did not change, they are hard to recognize due to the many individual rectangles now populating the treemap. The hierarchy between rectangles is, thus, obscured. To better convey their hierarchy, the rectangles are shaded in the third step, as depicted on the left of Figure 7.19.

7 Algorithms and Tool Support

In the last step, color is employed to reveal the amount of cloning an artifact contains and indicate generated code. More specifically, individual artifacts are colored on a gradient between white and red for a clone coverage between 0 and 1. Furthermore, code that is generated and not maintained by hand is colored dark gray. Figure 7.19 shows the result on the right. The artifacts in component

GUI Formscontain substantial amounts of cloning, whereas the artifacts in the component on the

bottom-left hardly contain any. The artifacts of the componentData Accessare generated and thus depicted in gray, except for the two files in its left upper corner.

ConQAT displays tooltips with details, including size and cloning metrics, for each file. The treemaps thus reveal more information in the tool than in the screenshots.

Trend Charts visualize the evolution of cloning metrics over time. They allow stakeholders to determine whether cloning increased or decreased during a development period. Figure 7.20 depicts a trend chart depicting the development of clone coverage over time.

Figure 7.20: Clone coverage chart

Between April and May, clone coverage decreased since clones were removed. In May, new clones were introduced. After developers noticed this, the introduced clones were consolidated.

Clone Churn reveals clone evolution on the level of individual clones, which is required to diagnose the root cause of trend changes. Clone churn thus complements trend charts with more details. The screen shots in Figure 7.21 depict how clone churn information is displayed in the quality dashboard. On the left, the different churn lists are shown. For inspection of clones that have become inconsistent during evolution, the dashboard contains a view that displays their syntax- highlighted content and highlights differences. One such clone is shown in the screenshot on the right of Figure 7.21.

7.5.2 Interactive Clone Inspection

This section outlines ConQAT’s interactive clone inspection features that allow developers to in- vestigate clones inside their IDEs and to use cloning information for change propagation when

7.5 Result Presentation

Figure 7.21: Clone churn in the quality dashboard

modifying software that contains clones.

ConQAT implements aClone Detection Perspective that provides a collection of views for clone inspection. The indented use case is one-shot investigation of cloning in a software system.

A screenshot of the Clone Detection Perspective is depicted in Figure 7.22. Detailed documentation of the Clone Detection Perspective, including a user manual, is contained in the ConQAT Book [49] and outside the scope of this document. However, due to their importance for the case studies performed during this thesis, two views are explained in detail below.

The Clone Inspection View is the most important tool for inspecting individual clones on the code level. It implements syntax highlighting for all languages on which clone detection is supported. Furthermore, it highlights statement-level differences between type-3 clones. According to our experience, this view substantially increases productivity of clone inspection. We consider this crucial for case studies that involve developer inspection of cloned code.

The Clone Visualizer uses a SeeSoft visualization to display cloning information on a higher level of aggregation than the clone inspection view [63,214]. It thus allows inspection of the cloning relationships of one or two orders of magnitude more code on a single screen.

Each bar in the view represents a file. The length of the bar corresponds to the length of its file. Each colored stripe represents a clone; all clones of a clone group have the same color. The length of the stripe corresponds to the length of the clone. This visualization reveals files with substantial mutual cloning through similar stripe patterns.

ConQAT provides two SeeSoft views. Theclone family visualizerdisplays the currently selected

file, all of its clones, and all other files that are in a cloning relationship with it. However, for the other files, only their clones with the selected file are displayed. The clone family visualizer thus supports a quick investigation of the amount of cloning a file shares with other files, as depicted in Figure 7.23.

7 Algorithms and Tool Support

Figure 7.22: Clone detection perspective

Figure 7.23: Clone family visualizer

Theclone visualizerdisplaysallsource files and their clones. If the files are displayed in the order

they occur on disk (or in the namespace), high-level similarities are typically too far separated to be recognized by the user. To cluster similar files, ConQAT orders them based on their amount of mutual cloning. Files that share many clones are, hence, displayed close to each other, allowing users to spot file-level cloning due to their similarly colored stripe patterns, as depicted in Figure 7.24. Ordering files based on their amount of mutually cloned code can be reduced to the traveling sales- person problem: files correspond to cities, lines of mutually cloned code correspond to travel cost, and finding an ordering that maximizes the sum of mutually cloned lines between neighboring files corresponds to finding a maximally expensive travel route. Consequently, it is NP-complete [75]. ConQAT thus employs a heuristic algorithm to perform the sorting.

Clone Filtering Apart from postprocessing, clones can be filtered during inspection, so that developers do not have to wait until detection has been re-executed. Clones can be filtered based

7.5 Result Presentation

Figure 7.24: Clone visualizer with files ordered by mutual cloning

on a set of files or clone groups (both inclusively and exclusively), based on their length, number of instances, gap positions or blacklists. Clone filters are managed on a stack that can be displayed and edited in a view.

Clone Indication The goal of clone indication is to provision developers with cloning information while they are maintaining software that contains cloning to reduce the rate of unintentionally inconsistent modifications. It is integrated into the IDE in which developers work to reduce the effort required to access cloning information. We have implemented clone indication for both Eclipse16

and Microsoft Visual Studio.NET17_[72].

After clone detection has been performed, ConQAT displays so calledclone region markersin the editors associated with the corresponding artifacts, as depicted in Figure 7.25.

Figure 7.25: Clone region marker indicates code cloning in editors.

Clone region markers indicate clones in the source code. A single bar indicates that exactly one 16_{www.eclipse.org}

7 Algorithms and Tool Support

clone instance can be found on this line; two bars indicate that two or more clone instances can be found. The bars are also color coded orange or red: orange bars indicate that all clones of the clone group are in this file; red bars indicate that at least one clone instance is in a different file. A right click on the clone region markers opens a context menu as shown in Figure 7.25. It allows developers to navigate to the siblings of the clone or open them in a clone inspection view.

Figure 7.26: Clone indication in VS.NET.

Figure 7.26 depicts a screenshot of clone indication in Visual Studio.NET.

Tailoring Support For each iteration of the tailoring procedure, clone detection tailoring (cf.,

Section 8.2) requires computation of precision, and comparison of clone reports before and after tailoring. ConQAT provides tool support to make this feasible.

The order of the list of clone groups can be randomized. The firstnclone groups then correspond to a random sample of sizen. Each clone group can be rated asAcceptedandRejected. Both the list

order and the rating are persisted when the clone report is stored. ConQAT can compute precision on the (sample) of rated clone groups.

To compare clone reports before and after tailoring, they can be subtracted from each other, reveal- ing which clones have been removed or added through a tailoring step. Two different subtraction modes can be applied:

Fingerprint-based subtraction compares clone reports using their location-independent clone fin-

gerprints. It can be applied when tailoring is expected to leave the positions and normalized content of detected clones intact, e. g., when the filters employed during post-processing are modified.

Clone-region-based subtractioncompares clone reports based on the code regions covered by clones. It can be applied when tailoring does not leave positions or normalized content intact, e. g., when the normalization is changed or shapers are introduced that clip clones. The clone report produced

In document Why and How to Control Cloning in Software Artifacts. Elmar Juergens (Page 121-128)