Other Research Paths Explored - Modelling software quality : a multidimensional approach

CHAPTER 8: CONCLUSION

8.3 Other Research Paths Explored

During the five years we spent completing this dissertation, we explored other aspects of quality modelling that we do not detail for lack of space and sake of consistency. This section serves to briefly summarize our findings and warn others who wish to pursue these paths of the types of problems they can expect.

Software design as an enabler From the beginning of our research, we have always believed that good software design does not necessarily imply good quality. Rather, it is an enabler: a well structured/designed system should support the activities that are performed by a development team. Following this idea, we need two things: 1) an idea to know how developers plan on modifying the system, and 2) higher-order patterns (like design patterns) to indicate the adequacy of a structure to support certain changes. We did some preliminary work on this subject [VS07, VS08], but decided to move on to other research dimensions because it was almost impossible to build an adequate data set describing the activities performed by developers.

The influence of changes on fault-proneness The first two years of our work were spent building change models [VSV08]. We built a corpus comprising over 10 systems over multiple versions. This required a lot of grunt work as we had to download these

versions either from source-code repositories, or from software archive sites. In many cases, the code could not compile, or wouldn’t run after compilation. This work required that we recover the set of dependencies required of every version of the system; some dependencies were not available on-line anymore. We also had a problem determining the origin of different software entities [GZ05]. In successive versions of the system, we had to identify whether “new” software entities were simply old ones that were renamed. We built tools to assist with this classification, but ultimately, we needed to validate the results manually.

The effort required to build such a corpus was not worth the effort from a research perspective as an impact of changes on quality indicators was either minor, or not statis- tically significant. Consequently, the majority of the corpus created was never exploited. We must however admit that by reading the code of these systems and by building the tools to performs code analyses, we developed a clear understanding of the problems we were trying to model and of the complexity of maintaining software. In our opin- ion, the key problem with our approach is that we studied good open-source projects. Open-source projects tend to be abandoned/forked when maintenance costs become too high. Consequently, the data we collected corresponded mostly to programs with good structures and few maintenance problems, the phenomenon we wanted to study.

Our recommendation to future researchers wishing to perform these sorts of analyses is to take time to find projects with quality problems. These will likely 1) be very large as there needs to be substantial motivation to keep a project alive when it is hard to maintain, or 2) closed-source where clients would pay for this difficult maintenance. In either case, researchers should expect spending a lot of time building, understanding and analysing their corpus.

Recovering clean bug data from version control systems We were interested in find- ing objective quality indicators, and the standard indicator of “bad quality” is the pres- ence of bugs. In two open-source systems we analysed, we tried to manually locate bugs from version control system logs and annotate classes with this data. We were confronted with two problems. First, the notion of a bug depends on who looks at the code. A bug

143 for a user often did not correspond to a bug for the developer. In the systems analysed, often these bugs pertained to parts of a specification that were not implemented yet. In open-source projects, developers will generally deliver an “unfinished” product as soon as the important parts are ready to get feedback from the community. In this context, what a user might consider bug fix is in fact a new feature for the developer. In our efforts to build a clean corpus, we found that it was almost impossible, as an outsider, to differentiate an improvement to the code from a bug-fix without an explicit indication of the intention of the developers (e.g., explicit mention of bug fixes in the versioning system).

Second, the number of obvious bugs identified can be very low. In one of the systems we inspected (JFreeChart), we looked at all the CVS logs (and relevant code) for several years of development. We found that only a few (obvious) bugs had been released over the course of a few years of active development. Therefore, the noise to signal ratio was very high. The errors we found were also trivially corrected. For example, we found badly encoded RGB colour schemes, an error introduced by a developer who incorrectly entered a constant value that only caused minor issues for users.

Moral Obtaining large corpora of clean, representative software upon which to train and test models is of primary importance to any prospective researcher in this area. And he must necessarily spend an inordinate amount of time gathering, understanding and analyzing data; the quality of the data will ultimately determine his research perspec- tives. If he ignores the type of data he has and tries to go forward anyways, his results will be built on shaky foundations.

Our recommendation is to reuse existing data sources, participate with other researchers to build/maintain clean corpora, or work with a development team willing to invest its time to providing insight as to what they did to a system and why they did it. The latter option is the best way to gather high-level metrics. This also requires significant investments for researchers who should build tools and integrate them within developers’ tool sets. If metric gathering is difficult or expensive for a development team, it will not be done properly.

In document Modelling software quality : a multidimensional approach (Page 163-166)