3.5 Factors in Software Practice Adoption
4.2.7 Software Metrics and Analysis
The goal of this research is to examine the internal quality of software. This re- search takes the approach of Reeves [72] to recognize the software as the design. In order to measure internal quality, software from all of the academic and industry experiments was collected and product metrics were generated and analyzed statis- tically. Fenton’s [29] taxonomy of software metrics identifies three classes: Process, Product, and Resource. Table 4.2 expands on these three classes [55].
This section will describe the metrics collected, the tools used to generate them, and the analysis process. Desirable attributes of high quality software will be ex- amined first, followed by a discussion of static code-based metrics and dynamic test-coverage metrics.
Desirable Attributes of Quality Software
Desirable attributes of high quality software were identified as:
• Understandability
– low complexity, high cohesion, simplicity
• Maintainability
– low complexity, high cohesion, low coupling
• Reusability
– low complexity, high cohesion, low coupling, inheritance
• Testability
Process
• Maturity Metrics • Management Metrics • Life Cycle Metrics
Product • Size Metrics • Architecture Metrics • Structure Metrics • Quality Metrics • Complexity Metrics Resource • Personnel Metrics • Software Metrics • Hardware Metrics
Table 4.2: Fenton’s Taxonomy of Software Metrics
One will note that complexity, coupling, and cohesion are cross-cutting mea- sures affecting many desirable attributes. Although it is difficult to relegate these measures to a single attribute, a broad look at many metrics can indicate high or low internal quality. Table 4.3 and Table 4.4 identify many of the objective metrics considered for these categories.
Static Code Analysis
An extensive search produced twelve static code analysis metrics tools that were then acquired and evaluated for the purposes of this research. The search focused on tools that generate metrics from C++ and Java code. These static analysis tools come from a variety of sources. Some are free and open-source. Others are commer- cial products. In the case of the latter, fully functional trial versions were acquired. The tools produce a total of 116 unique traditional and object-oriented metrics. Appendix E identifies all 116 metrics and gives a very brief description of each metric. The appendix also identifies what languages are supported by each tool and which tools generate each metric. In some cases, multiple tools generate the same metric, sometimes with different names which is noted in the table. Appendix D provides brief definitions and some examples for many of the metrics collected.
An automated metrics collection framework and some custom metrics collec- tion scripts were developed by the author. The framework was an Ant-based script
Attribute Metrics Complexity
• McCabe’s Cyclomatic Complexity • Halstead Complexity
• LOC/method
• Weighted Methods per Class (WMC) • Number of Parameters
• Depth of Inheritance Tree • #Children (bigger can be bad) • Specialization Index
• #Overridden Methods • Nested Block Depth • Cyclic Dependencies • Limited Size Principle • Response for Class
Coupling
• Coupling between Objects
• Fan-in, Fan-out (Afferent/Efferent Coupling) • Information Flow
• Instability • #Interfaces
• Cyclic Dependencies
• Direct Cyclic Dependencies • Dependency Inversion Principle • Encapsulation Principle
Cohesion
• Lack of Cohesion of Methods • Weighted Methods per Class • LOC/Method
Attribute Metrics Size
• LOC (source and test) • #Modules
• #Classes • #Methods • #Interfaces
• Weighted Methods per Class • LOC/Module • LOC/Method • LOC/Class • #Attributes • #Static Attributes • #Packages Reusability
• Depth of Inheritance Tree • #Children (bigger is good) • Fan-in
• Specialization Index • Distance from Main • Abstractness • Instability • #Overridden Methods • #Interfaces Testability • #Asserts • #Tests • Line Coverage • Branch Coverage • Method Coverage • Total Coverage • Response for Class
• Depth of Inheritance Tree • #Children
• #Overridden Methods
coupled with several Java programs that would invoke metrics tools, parse xml out- put files, and consolidate desired metrics in comma separated spreadsheet files by experiment. Additionally some Java programs were written to count assert test- ing statements in the CS1 and CS2 C++ programs. Unfortunately not all metrics collection was able to be automated. Some metrics tools such as JStyle and Refac- torIT required invocation through a graphical user interface. Collecting metrics with these tools was manually intensive involving many steps for each of the Java projects evaluated. Also, not all xml parsing was automated. Although this again involved many manual steps of extracting xml data through an editor or parsing it in a spreadsheet with many custom spreadsheet formulas, it was deemed to be faster than writing the code for all metrics desired.
Project metrics were produced from CCCC [57] (C++ and Java), custom scripts to count asserts (C++), Eclipse Metrics [75] (Java), JStyle [78] (Java), and Krakatau Professional Metrics (C++) [77]. Class, file, and method metrics were primarily pro- duced using JStyle (Java), Eclipse Metrics (Java), and Krakatau Professional Metrics (C++).
Dynamic Test Coverage Analysis
All software produced was expected to have associated automated unit tests. Code from the CS1 and CS2 experiments contained assert() statements embedded in the source code, but separated in a global run_tests() function. Code from all other experiments utilized the JUnit framework so the test code was separate from the source/production code.
Code coverage tools were employed to determine line, branch, and overall test coverage. Cobertura [23] and Clover [21] were used to generate test coverage met- rics for all Java projects. Generally all tests should pass. In the rare instances where a project contained tests that did not pass, the failing test was omitted in order to generate test coverage metrics. Although defect density turned out to be a difficult measure to capture, a detailed analysis did examine this measure in the undergrad- uate software engineering projects.
Code coverage tools also exist for C++ [82], but test coverage metrics were not produced for the CS1 and CS2 projects. A couple of factors weighed in on the decision not to collect CS1/CS2 test coverage metrics. First, a not-insignificant per- centage of these projects failed to compile and execute correctly. Second, even when automated tests were written, they were generally commented out before final sub- mission for grading purposes. As a result it was unreasonable to manually examine every CS1/CS2 project to determine what tests were working. Instead a script was written to count the number of asserts written in each CS1/CS2 project. Although this is a very suspicious metric, it gives an indication of testing effort and when combined with the graded score on each project, provides a reasonable measure of testing.