Evolution measures - Evolution and Fragility of Mobile Automated Test Suites

4.2 Results

7.2.2 Evolution measures

The measures gathered for the evolution of test suite, by comparisons of test classes in subsequent releases throughout the entire lifespans of the considered projects, are shown in table 7.5. The columns show, respectively, averages for Test LOCs Ratio measured on all the releases instead on the master release only (TLR), Modified Test LOCs Ratio (MTLR), the Modified Relative Test LOCs (MRTL), Modified Releases Ratio (MRR), Test Suite Volatility (TSV), Modified Classes Ratio (MCR), Modified Methods Ratio (MMR), Modified Classes With Modified Methods Ratio (MCMMR). The last row in the table reports the average of the individual averages for each testing tool, weighted by the size of the respective sets.

To ease the reading of this section, the acronyms used hereafter for the evolution metrics are explained in table 7.4.

The reported value for T LR show that – when present – the test code associated with the selected testing tools amounts on average to slightly less than 10% of the whole production code of the projects. Comparing the values in table 7.5 to those in table 7.3, it is evident that the averaged T LR value is smaller than the TLR measured on master release. This result may be evidence of the graduality of construction of test suites, or their absence in the initial release of Android projects. The measured average values ranged from 5.11% (for the set of projects featuring Robotium) to 11.23% (for the set of projects featuring Robolectric).

The average values for the MT LR metric show that on a typical tested project, about 5% of the testing code associated with the six considered GUI testing tools is modified between consecutive releases of the project. The values for the MRT L show that on average, when the selected testing tools are used, the 4.54% of the total modified production LOCs belong to test classes. This metric is unable to discriminate the reasons behind the maintenance performed in test code, but however gives an indication of the amount of intervention needed between subsequent releases by a typical test class written with one of the selected testing tools. Robolectric has shown the highest MRT L values: this result, paired with the higher TLR for the same tool, may suggest a higher complexity of test suites written with Robolectric, that hence are of bigger size and need more maintenance.

The MRR metric was used as an indication about how often the developers of the inspected open source projects had to modify the individual classes associated

7.2 Results 95

Table 7.6 Percentage of projects without modifications in test suites, classes and methods Unmodified Unmodified Unmodified

Tool suites classes methods Espresso 24.6% 57.0% 65.8% UIAutomator 16.0% 40.0% 55.0% Robotium 16.6% 44.1% 60.0% Robolectric 15.8% 45.3% 53.3%

with the studied GUI testing tools. On average upon all projects, about 19% of the releases of the projects contained modifications in classes associated with the selected testing tools, with a maximum of 20.39% for projects featuring Robolectric and a minimum of 10.68% for projects featuring UI Automator. The TSV metric measures the occurrence of modifications from the point of view of the set of test classes associated with a given testing tool. Also in this case the resulting average over all the sets was of about 20% (with a maximum of 25.13% for Robotium, and a minimum of 18.12% for Robolectric), implying that about one every five test classes is modified during the lifespan of a project, and the other four are never modified after they have been inserted in the repository.

The average MCR metric shows that, on average, 15.43% of test classes are modified between consecutive tagged releases, in the set of Android repositories considered. Average values were rather similar for all the six sets of repositories, with the maximum value of 17.40% measured for projects associated with Robotium. The average value for MMR metric tells that 3.83$ of the test methods are modified between consecutive releases of the considered Android repositories. This percentage is, as expected, smaller than MCR, because individual test classes may contain multiple test methods, and just the modification of one method would make them count for the computation of MCR. Also in the case of the MMR metric all the individual values for the six sets were very close to the overall average value.

Not all modified test classes contained significant modifications, i.e. they could contain changed lines of code only due to syntax corrections, changed comments or changed imported files. The MCMMR metric was used to give a statistic about how many of the modified classes contained modified methods, instead of having changes limited to irrelevant sections of code. The measures for this metric showed that in almost 60% of the cases of modified test classes, the modifications were also lying inside test methods.

It is worth highlighting that the average values for the evolution metrics over the sets of projects featured quite a low variability: more specifically, the average values for Modified Classes Ratio and Modified Methods Ratio were very similar for all the considered testing frameworks. Since all the tools are layout-based and produce test code in Java, these results may suggest that they share similarities in terms of syntax, and hence are influenced to a similar extent by typical changes applied to an Android project.

Lastly, it must be taken into account that the values measured for the MCR, MMR and MCMMR metrics are heavily lowered if the test classes and methods are added at some point of the lifespan of a project, and then remain unmodified during its evolution. Table 7.6 shows statistics about the projects that have unmodified test suites (meaning that test suites are entirely unaltered for the whole lifespan of the project); those that only have unmodified test classes (meaning that no modifications in any test class are made during the whole lifespan of the project, but additions or removals of test classes are possible); those that have modified classes but no modifications in test methods (meaning that the changes inside the test classes are only limited to irrelevant portions of the code of test classes).

Answer to RQ3.2: An average 5% of testing code is modified between consecutive tagged releases of Android repositories hosted on GitHub featuring tests associated with the six selected testing tools. 4.54% of the whole maintenance effort on production code is limited to changes in classes that are identified as tests developed with the studied testing tools. On average, one every five release required efforts of maintenance on test classes, and one every five classes had to be modified at least once during the lifespan of a project. On each new release, an average 15.43% of test classes (3.83% of test methods) feature modifications.

Chapter 8 Study 4: Taxonomy of fragility

causes

To understand what are the typical causes of fragility for Android projects, and to compute their frequencies of occurrence, the Grounded Theory technique has been applied over the full set of modified test methods gathered during the previous part of the study. This study allowed to answer the fourth research question of the study: RQ4 - Why and with which frequency fragilities occur in tested Android projects?

A preliminary presentation of the application of Grounded Theory for the creation of a taxonomy of modification reasons of Android GUI tests has been given at the NEXTA 2018 workshop [31].

8.1 Study Design

This section contains a brief description of the Grounded Theory approach for the creation of taxonomy, and the way it was applied to git diff files for the construction from the bottom up of several distinct types of modifications triggering fragilities.

Previous definitions of categories of reasons for modifying test classes are avail- able in the literature. For instance, Yusifoglu et al. [101] identified four types of maintenance activities that can be performed on test code:

• Perfective maintenance: modifications performed only to improve the quality of test code, e.g. refactoring;

• Corrective maintenance: modifications performed to fix bugs in test code;

• Adaptive maintenance: modifications performed to follow the evolution of the AUT;

• Preventive maintenance: modifications performed to remove smells or redun- dancies, and not after the actual detection of defects.

The element of novelty in the derivation of the taxonomy performed in this section of the study is the application of the Grounded Theory technique, and the derivation of a fine-grained set of categories for modification causes that are specific to Android development.

In document Evolution and Fragility of Mobile Automated Test Suites (Page 111-115)