Diff Files Analysis - Evolution and Fragility of Mobile Automated Test Suites

4.2 Results

8.1.2 Diff Files Analysis

For this study, the Straussian definition of Grounded Theory has been adopted, with the definition of a Research Question upfront as a follow-up of the previous parts of this study. In particular, the aim of this part of the study was answering the following two research subquestions to characterize the fragility issue for Android open-source projects:

RQ4.1 Modification Causes: what are the main causes behind the need for main- taining GUI test code in Android open-source projects?

RQ4.2 Fragility: how fragile are test methods and classes to modifications in the AUT or in its appearance?

To construct a taxonomy of modification causes from the bottom up, the Straus- sian definition of the Grounded Theory method has been adopted, with the Research Questions defined upfront following the previous part of the study, and not emerging from the research.

The site for the Grounded Theory studies, an organization or group in the original Straussian definition of the technique, can be interpreted in the case of Grounded Theory in Software Engineering as an artifact or a repository of artifacts. In this study, the chosen site is the repository of Android open-source projects mined in the previous step of the study.

Among the possible Data Collection strategies identified by Ralph in the guide- lines for Grounded Theories in Software Engineering, the selected one for this study was the strategy of Technical observation, defined as “accessing, creating or copying digital artifacts such as source code, unit tests, server logs or database entries”. In particular, the copied digital artifacts were the diff files computed for each test class of the mined repositories, for each transition between consecutive releases in which they were featured.

Starting from modified lines in test methods, the corresponding production classes and layout files have been individuated and examined, to understand what was the

underlying reason for each modification emerging from diff files. The inspection, hence, moved from the usage of widgets inside the test classes to the layout files where such widgets were defined, that were inspected to find changes in the definition, properties, and arrangement of the widgets; then, the activities in charge of inflating the identified layouts were also inspected, to understand whether the modifications in the layouts or test code were paired with changes in the production code. When, on the other hand, the modifications in test methods were not evidently linked to widgets of the user interface, the search for modified lines of code was not propagated to layout files and production code, and the modification was flagged as pertaining to test code only.

Following the described inspections, the categories of the taxonomy were generated incrementally through Open Coding, with each modified test method being classified under one or more classes of the taxonomy, that were thus not deemed as mutually exclusive (i.e., two or more different causes can concur to a single modification operated on a test method). The open coding procedure involved two iterations over the collected set of diff files. Axial Coding was used then to find macro-categories of modifications in the taxonomy.

The taxonomy building procedure was applied over four different sets of diff files, related to Android repositories that featured the Espresso, UI Automator, Robolectric and Robotium testing tools. The application of the taxonomy over a considerably high amount of diff files generated with four different tools proved also as a conceptual evaluation of the transferability of the taxonomy. Percentages of occurrence were gathered for each of the defined categories of the taxonomy, in order to find the most common causes for maintenance in Android test code (and hence answering RQ4.1).

Finally, the modification causes have been split between modification causes related to test code only, and modification causes related to changes in the AUT or, more specifically, to its GUI appearance or definition; the latter ones have been deemed as fragile according to our definition of fragile test cases of section 2.6.1. This way, an estimation of the fragility of the test suites obtained with the selected GUI automation frameworks was obtained (and hence, RQ4.2 was answered).

8.2 Results 101

8.1.3 Threats to Validity

Threats to Internal validity

The analysis of diff files in existing Android projects has been conducted at a release granularity, considering all the tagged releases of projects hosted on GitHub. The commit granularity would take in consideration also smaller and/or temporary modifications; hence, the results in terms of frequencies of maintenance causes may vary sensibly.

The scripts and tools used for the inspection of diff files, and the individuation of modifications inside test methods, assume that there are no syntax errors inside test classes; the correctness of the extraction of modified methods – and thus of the diffs considered for the inspection of maintenance cause – is thus not ensured for any project.

Threats to External Validity

The findings are based solely on projects hosted on the GitHub open-source project repository. Even though the set of projects is very large and varied in terms of types of applications, it is not assured that the findings can be generalized to closed-source Android applications, or to other sets of open-source applications. This particularly applies to the frequencies of maintenance causes, that can vary significantly if test classes are produced using different testing tools.

8.2 Results

This section illustrates the taxonomy of modification causes that has been derived applying the procedure described in section 8.1.

The open coding procedure was applied on all the diff files containing modifications in test methods for Android open-source projects featuring code written with Espresso (819 diff files), Robotium (424 diff files) and UI Automator (59 diff files). The set of diff files of projects associated with Robolectric was instead sub-sampled, due to the size that was excessive for manual examination. This selection led to 422

randomly extracted diff files out of the full set of 4221 (10%). To sum up, the open coding procedure involved the manual examination of a total of 1724 diff files.

In document Evolution and Fragility of Mobile Automated Test Suites (Page 116-119)