Study 3 Measures of Diffusion and Evolution of Testware in OS

Table 10.3 Study 3: Answers to the Research Questions

RQ3.1: Adoption and size - What is the level of adoption of a set of automated testing tools among open- source Android projects?

The considered GUI testing tools reached a diffusion that is always lower than 4.11% individually, and a combined adoption of about 8% by the considered set of 15 thousand Android repositories hosted on GitHub. The projects that are tested with the considered tools are typically rather short-lived, with an average of 15 releases, and feature on average few very few test classes for around 10% of total production code devoted to testing.

RQ3.2: Evolution - How much are GUI test classes associated with the analyzed sets of tools modified through consecutive releases of an open-source An- droid project?

An average 5% of the testing code is modified between consecutive tagged releases of Android repositories hosted on GitHub featuring tests associated with the six selected testing tools. 4.54% of the whole maintenance effort on production code is limited to changes in classes that are identified as tests developed with the studied testing tools. On average, one every five release required efforts of maintenance on test classes, and one every five classes had to be modified at least once during the lifespan of a project. On each new release, an average 15.43% of test classes (3.83% of test methods) feature modifications.

10.3 Study 3 - Measures of Diffusion and Evolution

of Testware in OS projects

In Study 3 of this research, a data mining experiment was conducted on open-source Android apps hosted on GitHub, to quantify the adoption of popular automated GUI testing tools for Android, and to measure the average size and the needed amount of maintenance of developed automated test suites. The findings of the experiment are reported in table 10.3.

The metrics about adoption and size reflect what has been found by other studies in the literature, pertaining to mobile applications: Linares-Vasquez et al. [68] conducted a survey about automated mobile app testing, identifying the testing tools of which we studied the diffusion as the most used by developers, with the addition of UI Automation (for testing iOS apps), Ranorex, Calabash, Quantum, and Qmetry. The latter tools were not considered in this study because they were either closed-source or based on languages different from Java (and hence not comparable with production code of the mined apps).

Kochhar et al. [57], in addition to their interviews with open-source developers, performed a quantitative analysis on 600 open-source Android apps mined from F-Droid. The authors found that 14% of mined apps contained test cases (with 9% of the apps having executable test cases) with a coverage of 40%. The most widespread testing tools cited by the authors were JUnit, Monkeyrunner, Robotium, and Robolectric. Those results are in line with the findings reported in this thesis

(8% of applications having test cases with the considered testing frameworks) for several reasons: JUnit and Monkeyrunner were not considered among the studied frameworks (since they were not GUI-level testing frameworks), and the Espresso framework had just been published at the time of publication of the considered study.

Cruz et al. [35] performed another mining of applications from F-Droid, to measure the amount of test code they feature and some correlation between the presence of test code and quality indicators (e.g., ratings and downloads from the PlayStore, Repository Activity and popularity of the related GitHub repositories). It is found that the presence of test code correlates with the number of contributors and the number of commits of a given GitHub repository, but not with the ratings on the PlayStore (and hence, the perceived quality of the app by its users). Regarding the adoption of testing tools, they found that Appium, Espresso, and Robotium were the most used GUI testing, hence compatible findings with those of the present study.

With respect to all the studies cited in this subsection, the study documented in this thesis has the element of novelty of mining projects from GitHub directly, instead of mining Android apps from F-Droid and then pairing them with the relative GitHub projects. This way Android projects that are on GitHub only, and not on F-Droid, are also taken into consideration. With the adoption of such a mining procedure, the described metrics have been computed on the largest set of open-source Android application packages documented in the literature.

Literature about software testing typically identifies the amount of Verification and Validation for a software project as spanning between 20% and 50% of the total effort for the project [37]. The metrics measured in this work found that on average 10% of the total production code of analyzed open-source apps is produced with testing tools and that 5% of the total maintenance effort on production code is localized in test code. Those values are slightly lower than the average effort values identified by the literature, and can be motivated by supposing a co-existence of manual testing activities with a rather limited amount of automated scripted test code for Android open-source projects.

The evolution metrics gathered in this study can be compared with evidence from papers about the maintenance of various testing techniques. The considered tools featured 15% modified test classes, on average, at each new release; 20% of test classes had to be modified at least once during the lifespan of the project. The results can be compared to those measured for automated test cases of web-applications:

In document Evolution and Fragility of Mobile Automated Test Suites (Page 174-176)