• No results found

4.4 The FukaBeans case study

4.4.6 Interpretation

Evaluation of results and implications

It may seem surprising that most of the components in a library developed by the metrics proponents fail to meet the thresholds included in the metrics set proposal. The thresholds used in this case study were derived from a sample of a population frame which consisted of COTS components. The sample used here includes com- ponents developed for educational purposes, with a typically low complexity (their implementation has less than 10 methods). The apparent lack of reusability of most of the components, according to the heuristics, clashes with the expectation that com- ponents developed with educational purposes would be well designed, to foster the utilization of component development best practices by students.

Threats to validity

Perhaps the major threat to the validity of this case study concerns the characteristics of the used sample, and their implications on the kind of statistical analysis that would

make sense to perform. With only 12 components to be analyzed, it is dangerous to extrapolate definitive conclusions from this case study.

Another issue concerns the single group, single observation design of this replica- tion. All subjects are assumed to be of a high reusability, which makes it impossible to test whether or not components with a low reusability would also meet the quality thresholds of Washizaki et al.’s quality model.

Inferences

The identified threats to the validity of the case study do not imply that the effort of conducting exploratory case studies such as this one is fruitless. It is beyond the scope of this dissertation to conduct a comprehensive effort for the validation of this quality model and metrics set. With all the care that must be taken when inferencing based on an exploratory case study, the one described in this chapter does point to a potential problem in this metrics set and its underlying quality model, concerning the thresholds proposed by Washizaki et al..

When experimentally validating statistical models, one should cover, not necessar- ily on a single experiment, both the internal and the external consistency of those models.

Internal consistency relates to the mathematical correctness of the statements in the model being validated. A set of inputs are collected from the system represented by the model, along with relevant information on the assumptions made about the system elements. The model allows computing a set of outputs representing the predicted behavior of the system being modeled. In an internally consistent model, the outputs are valid if the inputs are also valid.

A model exhibits external consistency if information collected from it is not contra- dicted by other information observed in practice. This relates to the applicability of the model, as it focuses on the extent to which the assumptions made in the model apply beyond the scope of from which the model was delivered.

In our exploratory case study, we noted that the number of methods defined in each of the components is fairly low. It is common to develop toy examples to illustrate techniques, in an academic context. The analyzed metrics are defined as ratios. The small number of elements in their computation may be regarded as a fragility of this heuristics-based quality model. Indeed, the standard deviation of the values for some of the metrics (most noticeably RCO) was very high. In contrast, the heuristics were computed with a larger sample of commercial components. Those COTS components were likely to have a higher complexity.

This observation suggests that if the FukaBeans are reusable, in spite of violating some of the reusability heuristics, this may indicate a lack of external consistency of the reusability model proposed by Washizaki et al. Of course, further validation of this observation is sought.

Lessons learned

We can organize the discussion of lessons learned in this case study around two main subjects: the formalization of the metrics set, and the limitations of scope of this metrics set which were identified while conducting this case study.

The original metrics definition is ambiguous in what concerns inheritance. It is un- clear how inherited features (methods and attributes) should be accounted for. Our formalization only uses the directly defined features. While for this particular sam- ple of components this is not a problematic issue, it is possible to define hierarchies of object-oriented components where this option would have an influence on the met- rics values. The shortcoming of the original definition is that we were left with the decision of whether or not we should include inherited methods. Different external validation efforts lead by different research teams could have chosen to use also the indirectly available features. This would severely damage the comparability of results, particularly if their option was not made clear in the experiment report.

The original definitions of the metrics raise another, more subtle issue. Whenever defining the ratio metrics, Washizaki et al. specified, in an arbitrary fashion, the metrics results for those components with no accessible properties (in RCO and RCC) or no busi- ness operations (in SCCp and SCCr). For instance, in the absence of accessible properties, RCCequals 1. In other words, the rate of component customizability is maximum. All the 0 (zero) accessible properties are writable. Conversely, one could also say that none of the 0 accessible properties were writable. A similar argument can be made for each of the ratios, although the chosen value in each case varies. This issue could have been better dealt with by clearly specifying that the metrics RCO and RCC would only be applicable when the pre-condition A > 0 holds. Similarly, the metrics SCCp and SCCr would only be applicable when B > 0 holds. The improved definitions in OCL would be as presented in listing 4.13.

Listing 4.13: Defining pre-conditions for metrics definitions, in OCL.

c o n t e x t C o m p o n e n t

RCO (): Real = self . Pr ()/ self . A () pre : self . A () > 0

RCC (): Real = self . Pw ()/ self . A () pre : self . A () > 0

SCCp (): Real = self . Bv ()/ self . B () pre : self . B () > 0

SCCr (): Real = self . Bp ()/ self . B () pre : self . B () > 0

Concerning the limitations of scope of Washizaki et al.’s metrics set, the metrics were designed to assess reusability of fine grained components (JavaBeans) through

the analysis of their interface complexity. This limits somewhat the scope of model ele- ments being analyzed. UML architectural components have a much richer expressive- ness than the one used in these metrics, which leaves out important model elements such as the provided and required interfaces, as well as the events the component may produce or consume.

Another possible concern relates to the complexity associated with parameter types in the evaluation of the complexity of method interfaces. The metrics just count the number of parameters, thus being blind to parameter type repetition and parameter type complexity. For instance, a method with N parameters of distinct types is intu- itively more complex than another method with N parameters of the same type. Also, arguments of atomic types (e.g. Integer, Real or Boolean) are intuitively less complex than those of a composed type.

4.4.7

Case study’s conclusions and further work

The cross validation of software metrics and quality models is an essential step toward their promotion and subsequent adoption by practitioners. As we have seen in chapter 2, the current state of the art concerning metrics for CBD clearly shows a lack of vali- dation of metrics proposals, not only by their authors, but also, and more importantly, by their peers.

Although it can be argued that most of the proposals are fairly recent, our experi- ence in the area of experimental software engineering lead us to think that there are a few shortcomings hampering the independent collection and cross-validation of most software metrics: either the ambiguity in their definition, when an informal metrics definition technique is used, or the usage of formal definitions using a formalism that is not easy to grasp by practitioners. Furthermore, the lack of availability of adequate tool support for metrics collection is also a common problem.

ODM has helped us overcoming these problems in an elegant and sound way. Us- ing OCL upon the UML 2.0 metamodel, we have metrics definitions which are formally defined, and can be directly used to support the metrics collection, as long as a UML tool with OCL support is used.

Our concern in using standard notations and technologies with a wide adoption by practitioners aims at bridging the traditional gap between research and industry. ODM can be fully integrated in the normal software development process. We regard this as an enabling condition for its widespread adoption.

By facilitating the independent replication of metrics validation efforts, we are pro- viding an essential support for the adoption of the experimental approach advocated in chapter 3. This replication is essential so that independent teams can conduct their own validation efforts, each mitigating its own set of threats to validity. Hopefully, the independent validation efforts will cover, as a whole, the most important threats. In contrast, a validation performed by a single team, even if with replications, is more

prone to repeat validity threats that may result from that team’s own biases.

With respect to the results of this independent validation effort, we identified a po- tential model calibration problem. Our results suggest that the model is not accurate for very fine-grained components. Further differentiated replications should be per- formed to confirm, or deny, this observation. In particular, it would be interesting to contrast highly reusable components with components that are of a lower quality, so that the heuristics-based quality model proposed by Washizaki et al. can be fine tuned, if necessary.

4.5

Related work