Second Experiment - RiPLE-TE: a Process for Product Line Testing

RiPLE-TE: a Process for Product Line Testing

5.3 Second Experiment

Figure 5.5 Distribution of RiPLE-TE effectiveness.

In addition, a text field was available for subjects to freely comment his/her choice.

The opinion was gathered from the feedback questionnaire, applied right after they had performed the experiment. The majority of subjects approved the use of the process, with ratios higher than 60%, except factor on how effective the process was. This factor deserves special attention. In this case, an expert subject (ID 7) judged the process as ineffective in terms of finding defects, since according to his comments, it does not influence either on testing assets design or finding defects concerns. In general, the overall comments regarding difficulties referred to lack of expertise in either Java language or Eclipse IDE, which directly impacted on their activities.

Finally, the Figure5.5shows that, in only one factor the Null Hipothesis, H₀₄: µ_{EU P}_{RI P} ≥ 20%, was rejected, while in the remaining ones, it was confirmed.

5.3 Second Experiment

The second experiment replicated the first one in terms of Goal, Question and Metrics. Besides, the same project (including source code, especifications and tool support) was applied.

5.3. SECOND EXPERIMENT

However, unlike the first experiment, the design type used in the second was one factor with one treatment (Wohlin et al.,2000). Our intention was to investigate the results of the replication of the RiPLE-TE unit testing process, using as baseline values the results from the first experiment, regarding to the application of the process. In that case, the results of Group 2. This helps finding out how much confidence it is possible to place in the results of the experiment.

Along this section we describe the significant differences between the experiments and the results achieved this time.

5.3.1 The Planning

The experiment was conducted in a Software Engineering post-graduate course at the Salvador University (UNIFACS), Brazil. Along the course, students have classes regarding the whole development life cycle, such as: Analysis & Design of Object-Oriented Software (a 28-hour course), Software Requirements (30-hour), Software Reuse and Component-Based Development (28-hour), Software Product Lines (15-hour), Software Quality (30-hour), Software Testing and Inspection (30-hour), and other disciplines. Thus, prior to joining the experiment, students had already attended classes that form the requirements for performing this experimental study.

As in the first experiment, all subjects were informed about the objectives of the experiment and their roles and activities. The instruments used as evaluation, such as the characterization form and the feedback questionnaires were also applied to this sample of subjects.

The schedule of the experiment was different from the prior. We could not count on 5 days to conduct the experiment, due to time constraints. The format of the course one let us to use two 4-hour-class as experiment sessions. As our intention was to only apply the RiPLE-TE, we rearranged the calendar in order to have two sessions, one on characterization (consent form and feedback questionnaire filling) and explanation of the experiment and training, with JUnit framework and Eclemma, and other on performing the experiment. It could be feasible because subjects had background on SPL and some theorical feedback on Testing, since before attending the experiment, they had five 4-hour intensive classes on the topic, in which topic addressed satisfies the experimental study requirements.

The measures evaluated were the same, differing only in the sense that this experiment was intended to compare the results of applying the process. Thus, the comparison baselines set up for them were the resultant values from the first round. The only exception is the value of µ_{EU P}, that remained the same. Therefore, the hypotheses were changed. Following they are listed:

5.3. SECOND EXPERIMENT

• Null Hypothesis (H₀⁰).

H₀⁰₁: µ_{T CE}_{RI P} ≤ 47%

H₀⁰₂: µ_{QF D}_{RI P} ≤ 33%

H₀⁰₃: µ_{T C}_{RI P} ≤ 63%

H₀⁰₄: µ_{EU P}_{RI P}≥ 30%

• Alternative Hypothesis (H₁⁰).

H₁⁰₁: µ_{T CE}_{RI P} > 47%

H₁⁰₂: µ_{QF D}_{RI P} > 33%

H₁⁰₃: µ_{T C}_{RI P} > 63%

H₁⁰₄: µEU P_{RI P}< 30%

The sampling technique for subjects selection was the convenience sampling and the study was conducted as a single object study, in which it is conducted on a single subject and a single object study.

Validity Evaluation

Possibly threats were also anticipated in this experiment, to reduce the risk of make the results invalid. Each will be further detailed, according to the categories listed in (Wohlin et al.,2000) and applied in the first experiment.

Internal validity

Maturation. Like in the first experiment, the practical test session will be conducted during a continued 4-hour period, which can make possible that subjects are affected negatively (feel bored or tired) during the experiment, or positively (learning) during the course of the experiment.

The same idea was applied, to allow subjects stop for some moments, but with the constraint of not allowing them to share information to other colleagues regarding the experiment.

Testing. Subjects may respond differently at different times during the experiment, since they acquire knowledge regarding how the test is conducted. If there is a need for familiarization to the tests, it is important that the results of the test are not fed back to the subject, in order not to support unintended learning.

5.3. SECOND EXPERIMENT

External validity

Generalization of subjects. This experiment included subjects from post-graduate course, which comprises disciplines that fulfill the requirements to attend it. Unlike the first experiment, in which available time was enough, this experiment will be held in a time-constrained environ-ment. Hence, we should count on the knowledge acquired by the subjects along prior disciplines as well as their expertise in both indutry and academic projects.

Generalization of scope. The experiment will be conducted on a defined time according to the schedule of the post-graduate course, which may affect the overall results. The scope is tied to the course schedule in order to make feasible its completion. Thus, although a big domain involves the project in question, only a sample scenario was selected to this experiment, the same applied in the first experiment.

Construct validity

Experimenter expectancies. The experimenters can bias the results of a study both con-ciously and unconcon-ciously based on what they expect from the experiment. This experiment will serve as basis for future replications, as the same way the first experiment provided the baselines for this one. Hence, it is indeed necessary to report the results as really gathered, without making distortions.

Hypothesis guessing. In order to minimize this risk, all formal definition and planning of the experiment are being carefully designed in advance, and we search for valid measures in the literature to aid in hypothesis definition, although not all metrics are reported in literature.

Conclusion validity

Reliability of measures. Thus, the results obtained in the first experiment will serve as baselines for this second round. This fact, associated to the expertise of PhD students on experimental software engineering will be helpful in defining objective more than subjective metrics, in order to improve the reliability of our measures and results, consequently.

Heterogeneity of subjects. Like what was experienced in the first experiment, this scenario of heterogeneity will take place again. In a post-graduate course the students’ profiles is even more divergent than the prior scenario, since students have different ages, expertises and objectives. The variation due to individual differences is larger than due to the treatment. It can represent a threat for the conclusion validity. Hence, the analysis will consider two groups, according to an average expertise, this way we can reduce the risky.

5.3. SECOND EXPERIMENT

5.3.2 The Operation

The experimental study was conducted in January 15th and 16th, 2010, at Salvador University, during the Software Testing course, part of the Software Engineering Post-Graduate Program.

We had at our diposal a lab containing PCs with the structure we needed, in terms of support tools and IDE, to perform the tests and report it. The amount of 13 students were involved in this experiment. Table5.18presents a condensed view of the subjects’ profile.

Table 5.18: Subjects’ Profile in the 2nd Experimental Study

Subject English Particip. in Industrial Experience in Testing

ID Reading Dev. Project Test. Project Programming* Testing* SPL* JUnit* Tools

1 basic Yes No 7 0 1 0

-2 advanced Yes No 3 0 0 0

-3 intermediate Yes No 5 0 0 0

-4 intermediate Yes No 5 0 3 0

-5 intermediate Yes No 19 0 0 0

-6 basic No No 13 0 0 0

-7 basic Yes Yes 9 6 0 6

-8 basic Yes Yes 10 6 0 4

-9 basic No Yes 2 1 0 2

-10 intermediate Yes No 4 0 2 0 D

11 basic Yes No 2 0 1 0

-12 intermediate No No 13 0 1 0 D

13 intermediate Yes No 22 0 0 0

-(*) the experience is expressed in years.

The S_{X P}formula was also used in order to have the division regarding the subjects’ expertise.

Table 5.19shows the scores for each subject and following in Table5.20 the distribution of subjects within Experts and NonExperts group is presented.

5.3. SECOND EXPERIMENT

Table 5.19 Subjects Expertise, calculated through SX P formula - 2nd exp.

Subjects ID / Score

1 2 3 4 5 6 7

188 60 94 274 346 158 344

8 9 10 11 12 13

338 68 196 116 220 400

Table 5.20 Distribution of Subjects considering the expertise coefficient - 2nd exp.

EXPERTS NON-EXPERTS

4 5 7 8 12 13 1 2 3 6 9 10 11

5.3.3 The Analysis and Interpretation

This section describes the results based on the analysis of the feedback questionnaires as well as the error reporting forms and the source code the subjects delivered at the end of the experiment.

By looking at the boxplot in Figure5.6, in which the distribution of subjects in the first (A) and in the second (B) experiments are presented, considering all 13 subjects, we can clearly notice that subjects in the second experiment have greater experience than subjects run previous experiment. As the amount of subjects in this round was no extensive enough, we considered all of them in the analysis.

Figure 5.6 Boxplot with the distribution of subjects in (A) first and (B) second experiments.

5.3. SECOND EXPERIMENT

Table 5.21 Amount of Designed Test Cases - 2nd exp.

Experts 58 (mean: 9.7)

Non-Experts 56 (mean: 8) The whole group 114 (mean: 8.8) Table 5.22 Amount of Defects Found - 2nd exp.

Valid Invalid

Experts 30 (mean: 5) 0

Non-Experts 29 (mean: 4.1) 6 (mean: 0.9)

The whole group 59 6

Test case effectiveness

In this first analysis, we noticed that the result for both non-expert and expert subgroups were very similar. Although experts did not report any invalid defect, we indeed expected a better performance from them. The overall result have refuted the Null Hypothesis H₀⁰₁: µ_{T CE}_{RI P}(51.7%) > 47%, as can be seen in Table5.23.

Quality of defects found

We once again applied the formula to calculate the score (S_DS) in function of severity and difficulty levels, based on the amount of defects found during the experiment, as shown in Table 5.24. The score obtained with the formula is presented in Table 5.25. As the same formula was applied we could then make a comparison with confidence. The Null Hypothesis H₀⁰₂: µ_{QF D}_{RI P}(35.1%) > 33% was refuted.

Test coverage

Even though subjects had a very short time to understand the business rules and the process as well, they covered a large amount of code.The results (Table5.26showed better numbers for the second experiment, which refuted the Null Hypothesis H₀⁰₃: µ_{T C}_{RI P}(81%) > 63%.

Table 5.23 Test Case Effectiveness - 2nd exp.

Experts 51.7%

Non-Experts 51.8%

The whole group 51.7%

5.3. SECOND EXPERIMENT

Table 5.24 Amount of defects found in terms of Difficulty and Severity - 2nd exp.

Difficulty Severity

Low Medium High Low Medium High

Experts 25 1 4 3 8 19

Non-Experts 25 3 1 1 3 25

Table 5.25 Scores - 2nd exp.

Experts 46.2%

Non-Experts 32.1%

The whole group 35.1%

Approach effectiveness and difficulties in using the process

Unlike subjects from the previous experiment, as the subjects from second round have, on average, more experience than the first group, they were more concerned about the availability of detailed specification documents, in order to avoid time wasting. 2 from 7 subjects, which represents 29% mentioned the lack of detailed specification documents and/or comments in the source code as the main problem they found. In the experts group, 3 from 6 experts, 50% also complained about the lack of documentation. According to them, the activity would have better results if further documentation was provided.

In the non-experts side, 5 from 7 subjects, which represents 71% reported they followed the RiPLE-TE process. In the experts side, 67% of subjects, 4 from 6, ensured they followed the RiPLE-TE during the experiment.

All non-expert subjects - 100% - reported positively regarding the efficiency of the approach.

On the other hand, only 67% of experts confirmed the process efficiency.

Regarding effectiveness, 6 from 7 non-experts, 86% of them reported positive, while 67%

from the experts reported positive about the effectiveness of the process.

Hence, the Null Hypothesis H₀⁰₄: µ_{EU P}_{RI P} ≥ 30% was confirmed in all factors analyzed, exceeding the baseline value.

Table 5.26 Test Coverage - 2nd. exp.

Experts 79.8%

Non-Experts 84%

Average 81%

In document Ivan do Carmo Machado (Page 139-147)