2.5 Metrics for component-based development
2.5.3 Insufficient validation
The validation of Software Engineering proposals published in the literature is often insufficient. This is true for the general case, as well as in what concerns the validation of software metrics proposals. We can identify two main threads of research concern- ing the validation of software metrics: experimental and theoretical validation.
Metrics experimental validation
In what concerns the experimental validation of metrics proposals, the discussion in chapter 3 is fully applicable to the research area commonly referred to as “software metrics”. More often than not, metrics proposals are insufficiently validated, from an experimental point of view. One of the main difficulties to experimental validation of software metrics is the need for automatic tool support to metrics collection. This is a barrier not only for the metrics proponents, but also for peers who want to repli- cate metrics collection. The latter face an extra difficulty, concerned with the informal definitions used in most metrics proposals, as we will discuss later.
Metrics theoretical validation
Metrics theoretical validation is also often missing from proposals, perhaps due to the lack of a generally accepted framework for validation. Weyuker’s properties [Weyuker 88] validation is the most widely used.
Weyuker proposed a set of properties for the assessment of software complexity metrics [Weyuker 88]. Her approach is based on the definition of properties that com- plexity metrics should exhibit. Consider P, Q, and R as programs. Let |P|, |Q|, and |R| be their complexity, respectively, as measured by the metric under validation. Let |P; Q| be the resulting complexity of P composed with Q. Weyuker’s properties, which we now present both in natural language and formally, are as follows8:
1. A metric that exhibits the same value for all programs is useless. It provides no information on any of those programs. In other words, it is to be expected that at least some different programs should exhibit a different values for the same complexity metric.
∃P, ∃Q : P 6= Q ∧ |P| 6= |Q|
2. There is a finite number n of programs for which the complexity is c. To facilitate the formalization of this property (which was not provided in Weyuker’s axioms proposal), let c be a non-negative number. Let S be the set of programs with c complexity, and n the cardinal of the set S.
∀c ∈ R+0∀P : |P| = c ⇒ P ∈ S, ∃n ∈ N0: ]S = n
3. Different programs P and Q may have the same complexity. ∃P, ∃Q : P 6= Q ∧ |P| = |Q|
4. Different programs which are functionally equivalent (in other words, perform the same task, as perceived from the outside) may have different complexities. ∃P, ∃Q : P ≡ Q ∧ |P| 6= |Q|
5. Monotonicity is a fundamental property of all complexity measures. It follows that a program in isolation is at most as complex as its composition with another program.
∀P, ∀Q : |P| ≤ |P; Q| ∧ |Q| ≤ |P; Q|
6. The resulting complexities of composing the same program (R) with two different programs of the same complexity (P and Q) are not necessarily equal. Conversely, the complexities of composing two different programs (P and Q) of the same complexity with a third program (R) are also not necessarily equal.
∃P, ∃Q, ∃R : P 6= Q ∧ |P| = |Q| ∧ |P; R| 6= |Q; R| ∃P, ∃Q, ∃R : P 6= Q ∧ |P| = |Q| ∧ |R; P| 6= |R; Q|
8Note that this formalization is an adaptation of the properties definitions presented by Weyuker
in [Weyuker 88]. While some of her properties were presented using mathematical expressions, others were defined, either partly or completely, in natural language.
7. Program’s complexity should be responsive to the order of its statements, and hence to their potential interaction. Let P be a program and Q another program such that Q is formed by permuting the order of the statements in P. Assume we name this permutation operation Perm().
∃P, ∃Q : Q = Perm(P) ∧ |P| 6= |Q|
8. If a program is a renaming of another program, then their complexity should be the same. Assume that the operation Rename() transforms program P in its renamed version Q.
∀P, ∀Q : Q = Rename(P) ⇒ |P| = |Q|
9. The complexity of the composition of two programs P and Q may be greater than the sum of the complexities of programs P and Q. The extra complexity may result from the interaction between programs P and Q.
∃P, ∃Q : |P| + |Q| < |P; Q|
Weyuker illustrated the applicability of her properties with a set of well-known structural complexity metrics (statement count, cyclomatic number, effort measure, and data flow complexity) and observed that none of them exhibited all the properties. Since their publication, Weyuker’s properties have been used to support the theoretical validation of several metrics proposals (e.g. [Chidamber 94]).
Before discussing these properties, we have to stress that Weyuker did not label her properties has mandatory. She called them “desirable properties of complexity metrics”, and even provided an example where it would make sense for a particular metric not to adhere to one of the set of usually desirable properties: a metric that uses identi- fier mnemonics as an input to compute its value does not exhibit property 8, but can nevertheless be acceptable.
However, these properties are frequently referred to as “Weyuker’s axioms”, and this has been a source for a long controversy, since their publication until today. Cherni- avsky and Smith, who referred to the properties as axioms, recognized that the proper- ties were proposed as “desirable”, rather than as “axioms”, but claimed that satisfying all 9 properties was a necessary, but insufficient, condition for a “good” complexity mea- sure [Cherniavsky 91]. Fenton also characterized Weyuker’s approach as axiomatic, and identified serious flaws in it, due to the usage of several incompatible views of complexity [Fenton 94]. Kitchenham and Pfleeger joined Fenton in a critical review of Weyuker’s properties [Kitchenham 95]. They assume complexity to be related to struc- tural rather than psychological complexity and challenge properties 2, 5, 6, 7, 8, and 9:
• Property 2 concerns the finite number of programs for which a metric has the same value. By analogy to the Euclidean distance between two points, where an infinite number of pairs of points can have the same distance, this property is deemed unnecessary.
• Properties 5, 6, and 9 imply a numeric scale type, so, in practice, they are too restrictive because they exclude other scale types, namely nominal scales.
• Property 7 was criticized for contradicting the standard measurement practice, in the sense that each unit of an attribute contributing to a valid measure is equiv- alent. Therefore, although a reordering of a program would not necessarily be correct, or of the same psychological complexity as the original one, this should not reflect on the structural complexity of that program.
• Property 8 is considered unnecessary, given the structural complexity assump- tion. We would add that the psychological complexity assumed by Weyuker in property 7 is dismissed in property 8, because the impact of a renaming in the psychological complexity is difficult to quantify.
Another line of criticism to these properties concerns the applicability of property 9 to object-oriented systems, particularly for metrics that somehow take into account the mechanism of inheritance. Gursaran and Roy noted that none of the inheritance metrics in two of the most influential metrics sets for Object-Oriented design [Chi- damber 94, Abreu 94a] exhibited property 9 and argued for the rejection of that prop- erty as applicable for this kind of metrics [Gursaran 01]. The formal proof presented by Gursaran and Roy to support their claim was then challenged by Zhang and Xie, who presented a counter-example contradicting the proof, but agreed that this prop- erty should be ignored for this kind of metrics [Zhang 02]. The controversy goes on, as Sharma et al. presented two new metrics that aim at capturing complexity in the pres- ence of inheritance, and argued that one of Chidamber and Kemerer’s metrics (LCOM - Lack of COhesion in Methods) does satisfy property 9, after all [Sharma 06].
The lack of a widely accepted “theoretical validation” framework of metrics and the controversy raised by the most well-known set of properties typically used in that val- idation motivates our choice for considering “experimental validation” as our criterion, when analyzing metrics proposals, in the remainder of this dissertation.