In this section, we describe a DAD prototype tool which implements the concep- tual DAD framework (see Figure 5.1). We first present the main features of this tool in Section 5.3.1. We then describe its data input in Section 5.3.2, its data processing steps in Section 5.3.3, and its outputs in Section 5.3.4. Note that this prototype tool was built on an extended Relation Algebra (in order to implement the main features of the DAD approach). We describe this Relation Algebra in
Appendix A. Furthermore, Appendix B demonstrates the application of this DAD prototype tool on a commercial legacy system and the Eclipse Platform.
5.3.1
Main Features
This prototype tool implements the three goals of the DAD approach (see Sec- tion 5.2): (i) identification of degeneration-critical components and fix relation- ships in a given system, (ii) evaluation of persistence of components and fix rela- tionships in relation to architectural degeneration, and (iii) evaluation of architec- tural degeneration of the system across development phases and releases. There are three basic features upon which these three core features are built: (1) iden- tification of MCDs in a defect-fix dataset, (2) identification of fix relationships among components, and (3) measurement of components and fix relationships with MCD quantity and complexity metrics (see Section 5.2.2).
Except the above features, there are another three complementary visualiza- tion features implemented in this DAD prototype tool:
(a) Defect architecture visualization. A defect architecture is visualized as abox- and-arrow graph (Gorton, 2006, p. 117). It can highlight the degeneration- critical components and fix relationships in the system.
(b) Visualization for persistence of components and fix relationships. A persis- tence view is visualized with a bar or line chart. It aids understanding the obstinate problems in the system leading to architectural degeneration. (c) Architectural degeneration trend visualization. An architectural degenera-
tion trend is visualized with a bar or line chart. It aids evaluating the system’s architectural degeneration with time.
We note that an ordinary Relation Algebra (such as Tarski’s algebra of binary relations (Tarski, 1941) or Codd’s algebra of n-ary relations (Codd, 1972)) has been used by some researchers to facilitate the visualization (Berghammer and
Fronk, 2003), transformation (Krikhaar et al., 1999), abstraction (Holt, 1999), aggregation (Holt, 1999), and analysis (Feijs et al., 1998) of architectures. We thus implemented the above features of the DAD prototype tool based on an extended Relation Algebra (specifically the work by Feijs and Krikhaar (1998) and Holt (1999)). The full algebra is described in Appendix A.
5.3.2
Data Input for the Tool
The main dataset under the investigation of the prototype tool is the defect-fix history (defect records and change logs) of a given software system. The required data attributes are described in Table 5.1. The system structure information is also processed in the prototype tool, which indicates which file belongs to which component and what is the file’s size (in KSLOC). This table is self-explained, so we do not describe it here any more.
Table 5.1: Key attributes of the data input.
Attribute Description
ID The unique identity of the defect report Release The release where the defect was discovered Defect Phase The phase where the defect was discovered
Component The component where the defect was discovered Submit date The date to submit the defect to the system State The last state of the defect
ID The unique identity of the change log Change Release The release where the change was made
Phase The phase where the change was made (Fix) File The code file where the change was made
Defect ID The identity of the defect fixed by the change System Component The component’s name
Structure File The file’s name (part) Size The file’s size
For a usual system, the system structure information is mostly available. Such a system usually contains defects. Fixing a defect requires changes (fixes) to the code base. The discovered defects are usually recorded in a defect-tracking
database (collecting historical defect records, e.g., Bugzilla3) and the changes are logged in a version control system (collecting historical changes made in a code base, e.g., Concurrent Versions System or CVS4). Therefore, the key attributes of the defect records and change logs (shown in Table 5.1) can be mostly gathered from defect-tracking databases and version control systems. We thus claim that the data required by this prototype tool is widely available.
5.3.3
Data Processing by the Tool
Figure 5.2 illustrates the data processing by the DAD prototype tool. In particu- lar, it specifies the steps used to implement the main features (see Section 5.3.1). We briefly describe these steps below.
• Map change logs to defect records. Each defect is associated with a set of changes (in a code base) by matching the “Defect ID” field of change logs to the “ID” field of defect records (see Table 5.1).
• Locate defects in components. Each defect is located in component(s) in which at least one code file is changed in order to fix this defect.
• Identify MCDs; see Section 5.2.1.
• Identify fix relationships; see Section 5.2.1.
• Measure components and fix relationships with the MCD quantity and com- plexity metrics; see Section 5.2.2.
• Set up criteria and identify degeneration-critical components and fix rela- tionships; see Section 5.2.3.
• Create and visualize persistence view for components and fix relationships by gathering the measures for components or fix relationships over time, which is then visualized as a bar or line chart.
3See Bugzilla’s web site: http://www.bugzilla.org/(last access in November 2010). 4See a CVS web site: http://www.nongnu.org/cvs/(last access in November 2010).
Map change logs to defect records Identify MCDs Locate defects in components Defect records; Change logs; System structure Identify fix-relationships Create and visualize defect-archite cture
Measure components and fix
relationships with M1, M2, M3 an d M4
Set up criterion and identify degeneration-critical components
and fix relationships
Create and visualize architectural degeneration trend
Legend:
data input activity data flow
Create and visualize persistence vie w of components and fix relationships
Figure 5.2: Data processing by the DAD prototype tool.
• Create and visualize architectural degeneration trend by gathering the av- erage measures of components or fix relationships over time, which is then visualized as a bar or line chart.
• Create and visualize defect architecture; see Section 5.2.6.
Overall, the processing steps above posed implement the conceptual DAD framework (see Figure 5.1) in the prototype tool.
5.3.4
Output of the Tool
Following the above description of the data processing steps of in the DAD pro- totype tool, we describe the outputs of using this tool on a given system:
• Characteristics of defects (including MCDs). The tool creates charts to demonstrates quantity and complexity characteristics of defects (including MCDs) in a given system. See examples in Section B.2.
• Measures of components and fix relationships. The tool creates bar and line charts to show MCD quantity and complexity measures for components and fix relationships of the system. See examples in Section B.2.
• Defect architectures. The tool creates box-and-arrow graphs to visualize defect architectures for the system. See examples in Section B.3.
• Persistence view for components or fix relationships. The tool creates bar and line charts to visualize the measures of components or fix relationships over phases and releases. See examples in Section B.2.1.
• Architectural degeneration trend. The tool creates bar and line charts to visualize architectural degeneration trend over phases and releases for a given system. See examples in Section B.2.2.
• Profile of the system, for example, the size (in SLOC) of a component, and the number of defects and changes occurred in a component. See examples in Section B.1.
These outputs are created by following the data processing steps shown in Figure 5.2. Later in Appendix B, we demonstrate several typical outputs of this prototype tool used on a commercial legacy system.