The Analysis and Interpretation

RiPLE-TE: a Process for Product Line Testing

5.2 First Experiment

5.2.4 The Analysis and Interpretation

The third phase in the experiment is the analysis and interpretation of the gathered data. We want to be able to draw valid conclusions, therefore we will analyze every artifact produced by the subjects, including the error reporting form, the feedback questionnaire and the source code of test cases they developed using JUnit. The analysis was performed using descriptive statistics and is described along this section.

We analyzed the results of the both groups 1 and 2, considering the whole group components, but we also analyzed the results considering the expertise of subjects. We decided to created two subgroups for each group, named expert and non-expert subgroups.

5.2. FIRST EXPERIMENT

As a means of deciding how to categorize each subject in one of them, every profile’s element received a weighting factor, and all together were arranged in a formula, S_{X P}, as follows:

S_{X P}= 2θ +

The elements that compose this formula came from the subjects profile and are next de-scribed:

σ = Ex perience in industrial develo pment pro jects ω = Ex perience in industrial testing pro jects

a =







2, if subject has experience in industrial dev. projects 3, if subject has no experience in industrial dev. projects

b =







2, if subject has experience in industrial testing projects 3, if subject has no experience in industrial testing projects

θ = English reading ex pertise (scale: 1 - basic, 2 - intermediate, 3 - advanced)

Table 5.6 Subjects Expertise, calculated through SX Pformula.

Group Subjects ID / Score

1 2 3 5 6 8 10 11

77.3 45.7 41 156.7 44.3 168.8 194 159

14 18 21 23 24 25 26 27

143 102 101.7 95 155.7 130.7 84 107

4 7 9 12 13 15 16 17

49 177 145.3 122 107.3 137 175.3 49.5

19 20 28 30 31 32

110 72 140.3 124 127 92

5.2. FIRST EXPERIMENT

Table 5.7 Distribution of Subjects considering the expertise coefficient

GROUP 1 GROUP 2

Experts 5 8 10 11 14 7 9 12 15 16

24 25 28 30 31

Non-Experts 1 2 3 6 18 4 13 17 19 20

21 23 26 27 32

Therefore, all subjects received a score according to their expertise. Table5.6shows the scores for each subject.

The resultant scores, composing the data set, were arranged in a interval. The median value, which represents the middle value of the data set, was chosen to be the threshold, in which denotes 50% of the samples below this value, representing the non-experts, and the 50% over representing the experts. The final distribution of subjects is presented in Table5.7.

The formula uses arbitrary values, since there was not found evidence in the literature that could aid in defining a mathematical formula that could combine to such grouping strategy.

However, we established a scoring, in which we considered the importance of each element composing the profile.

Test case effectiveness

Table 5.8 Amount of Designed Test Cases

Group 1 Group 2

Experts 93 (mean: 13.3) 78 (mean: 9.75) Non-Experts 73 (mean: 8.1) 36 (mean: 6) The whole group 166 (mean: 10.4) 114 (mean: 8.1)

Table 5.9 Amount of Defects Found

Group 1 Group 2

Valid Invalid Valid Invalid

Experts 55 1 35 8

Non-Experts 44 5 19 1

The whole group 99 6 54 9

Regarding the use and applicability of the TCE metric, in order to determine whether a test suite was effective in a test suite, we compared the TCE from the both groups. We calculated for the TCE as a ratio of the total amount of designed test cases by the total of valid defects found.

5.2. FIRST EXPERIMENT

Table 5.10 Test Case Effectiveness Group 1 Group 2

Experts 59.1% 44.8%

Non-Experts 60.1% 52.7%

The whole group 59.6% 47.3%

As a result, the Null Hypothesis H₀₁ was not rejected, since H₀₁: µ_{T CE}_{ADH OC} > µT CERI P, in the different situations, as when considering the subgroups individually as the whole groups.

Quality of defects found

In terms of valid defects found, Figure5.3shows that subjects using the process found less defects than subjects using an ad-hoc approach. Table5.11give detailed information regarding the defects found by subjects.

Figure 5.3 BoxPlot of defects found by groups, including outliers.

All defects found were tabulated so that similarities could be extracted. Then, we identified 12 groups of defects, and classified them according to associated Difficulty and Severity, as can be seen in table5.12. The classification was done based on a discussion performed with the Ph.D. students, considering their industry expertise.

Besides, when false positives were identified by subjects, these were considered invalid errors and were not included in the score analysis.

5.2. FIRST EXPERIMENT

The score was then extracted from a function of defects found, defined on the basis of (IEEE,1988). Every defect is classified according to its Difficulty - f(DD) - and Severity - f(SV).

Table5.12presents the classification. The weighting factors (k for Difficulty and r for Severity) are displayed in the bottom of the Table. The values range from Low values, representing trivial defects, to High values, representing critical defects.

The formula for calculating the score (S_DS) is presented below. Sub corresponds to the set of subjects.

These formulas were created to the purpose of this experiment. Since we used the same formula for the both treatments and associated subgroups, as well as in the second experiment the same calculation will be performed, hence, for this specific context, the values do not negatively influence the results. We have no evidence about the effectiveness of such formula is other contexts. It might not be applicable to any other experimental context, unless a series of applications prove that. They probably have to be calibrated before applying in other experiment.

When we consider the quality of valid defects, in terms of Difficulty and Severity levels, we have indicatives that subjects from the both groups have almost the same results. It can be identified by analyzing the Figure5.4, with 6 boxplots (line 1 shows the group of experts from

5.2. FIRST EXPERIMENT

Table 5.12 Difficulty and Severity of defects found

Difficulty Severity

Table 5.13 Amount of defects found in terms of Difficulty and Severity

Difficulty Severity

the two treatments; line 2 shows the group of non-experts; and line 3 shows the general results, including both groups) generated according to the score of valid defects found, in function of Difficulty and Severity coefficient, represented in column (A) values including outliers and (B) without outliers.

By analyzing the boxplots, in which experts are explored, we can see that there is a clear pattern that subjects who did not use the process have better results. Thus, it may be possible to prove the results in a hypothesis testing. The t-test (unpaired, two-tailed) with 95% of confidence is shown in Table5.14(group of experts), Table5.15(non-experts) and Table5.16(both groups).

Table 5.14 Results from the t-test applied to Test Score - Experts Degrees of freedom (df) p-value t-value

13 0.1390 1.5762

In the analysis, considering all scenarios, the t-test did not reject the Null Hypothesis H02. Thus, we can conclude that there was no gain using the process instead of an adhoc fashion.

5.2. FIRST EXPERIMENT

Table 5.15 Results from the t-test applied to Test Score - Non-Experts Degrees of freedom (df) p-value t-value

13 0.278 1.1321

Table 5.16 Results from the t-test applied to Test Score - Both groups Degrees of freedom (df) p-value t-value

28 0.1079 1.6609

Test coverage

Table5.17presents the average test coverage of each group, considering (1) the subgroups and (2) the value for the whole groups. The values exhibited in this table demonstrates that subjects that did not apply the process (group 1) covered a larger amount of code, as in the two subgroups as in the overall value. Thus, the coverage confirms the Null Hypothesis, H₀₆: µ_{T C}_{ADH OC} ≤ µT CRI P.

Table 5.17 Test Coverage

Group 1 Group 2

Experts 73% 71%

Non-Experts 61% 53%

The whole group 66% 63%

Approach effectiveness and difficulties in using the process

Data from subjects who applied the process, who composed Group 2 form the input for this analysis. Figure5.5presents the distribution of the process effectiveness according to subjects’

opinion. Factors next listed were considered so that subjects must attribute them with a YES/NO value:

1. Subject needed additional information other than the available artifacts.

2. RiPLE-TE guidelines were properly followed.

3. RiPLE-TE was efficient in finding defects.

4. RiPLE-TE was effective in finding defects.

5. RiPLE-TE was helpful to find more defects.

In document Ivan do Carmo Machado (Page 131-139)