The Planning - First Experiment - RiPLE-TE: a Process for Product Line Testing

RiPLE-TE: a Process for Product Line Testing

5.2 First Experiment

5.2.1 The Planning

After the definition of the experiment, the planning takes place. Whereas definition phase determines the foundation for the experiment, i.e. the why it is conducted, the planning phase prepares for how the experiment is conducted (Wohlin et al.,2000).

As in other software engineering disciplines, activities should be planned in advance, in the form of a plan, which must be followed by the remaining activities, in order to have a control over the experiment. Hence, the planning phase of this experimental study includes the steps described in next subsection, and follows the model proposed inWohlin et al.(2000), including some important aspects described inJedlitschka et al.(2008).

Context Selection

As mentioned earlier, Ph.D. candidates and undergraduate students will be involved in this experiment. MATB15 classes will host the ’experimental lab’. This course was designed to train students in concepts, principles, techniques and tools of software testing and and validation, so that they can understand the theoretical background of testing as a means of gaining knowledge, and the limits of this approach as a means of controlling quality. MATC66 classes will host the experiment preparation, thus including a pilot project. This course is intended to teach Ph.D. students concepts and practice of empirical software engineering, such as performing and reporting controlled experiments.

Then, the experiment will be conducted in an academic environment, in which selection os subjects, training, and the execution of the experiment will be held. Yet, regarding the characterization of the experiment, it will run off-line (not industrial software development), based on a simulated problem, a SPL project in the conference management domain. The project

3https://disciplinas.dcc.ufba.br/MATC66

4https://disciplinas.dcc.ufba.br/MATB15

5http://dmcc.dcc.ufba.br/

5.2. FIRST EXPERIMENT

is further detailed in Section5.2.2. In addition, the experiment will be conducted as a single object study, in which it will be conducted on a single subject and a single object study.

Pilot Project

Before performing the study, a pilot project will be conducted with the same structure defined in this planning. The pilot project will be performed by MATC66 Ph.D. students, who will be trained on how to use the proposed process. For the project, the subjects will use the same material described in this planning, and will be observed by the responsible researcher. This way, we can identify possible inconsistencies avoiding future misunderstanding.

Hypothesis formulation

In an experiment it is necessary to formally and clearly state what we intend to evaluate. In this experimental study, we chose to focus on four hypotheses. We hence state them formally and also define what measures we need to evaluate the hypotheses.

• Null Hypothesis (H₀). The Null Hypothesis determines that there is no benefit of using the RiPLE-TE Unit Testing (described below as RIP) approach to support testing in product lines activities, if compared to ad-hoc testing (described below as ADHOC), in terms of effectiveness. The Null Hypotheses and the values defined were:

H₀₁: µ_{T CE}_{ADH OC} ≤ µT CERI P

H₀₂: µ_{QF D}_{ADH OC} ≤ µQF DRI P

H₀₃: µT C_{ADH OC} ≤ µT C_{RI P}

H₀₄: µ_{EU P}_{RI P}≥ 30%

• Alternative Hypothesis (H₁). The Alternative Hypothesis determines that the use of the process produces benefits that justify its use. The following Alternative Hypotheses were defined:

H₁₁: µ_{T CE}_{ADH OC} > µT CERI P

H₁₂: µ_{QF D}_{ADH OC} > µ_{QF D}_{RI P} H₁₃: µ_{T C}_{ADH OC} > µT CRI P

H₁₄: µ_{EU P}_{RI P}< 30%

5.2. FIRST EXPERIMENT

Variables

The independent variables are the experience of the subjects, which will be collected through the questionnaire and the proposed process. The dependent variables are the quality and amount of defects found by subjects and the applicability of the proposed process.

Selection of Subjects

The non-probability sampling technique for subjects selection chosen in this experiment was the convenience sampling (Wohlin et al.,2000), representing a non-random subset from the universe of students from Software Engineering.

MATC66 graduate students will act as test managers and test architects, while MATB15 undergraduate students will act as test analysts and testers, following the roles defined in the RiPLE-TE process. The tasks of designing assets and executing tests will be the responsibility of MATB15 undergraduate students. These will basically perform the most laborious and time consuming activity in this experiment, hence in order to ease the understanding, from now on we will call these group simply as subjects.

These will be divided into two groups, in which one will be responsible for performing testing activities in an ad-hoc fashion, whereas another will perform the tests applying the RiPLE-TE approach. The division of groups will be based on the subjects’ expertise, as data gathered from the background form, as explained in details in next section.

The motivation for the participants to attend the experimental study is based upon the assumption that they will have an opportunity to engage a testing project so that they can put into practice the knowledge gained throughout the course.

Design Types

The problem has been stated, and we have chosen our variables. Moreover, we have defined the measurement scales for the variables. Hence, we are now able to design the experiment.

An experiment consists of a series of tests of the treatments, or even a set of tests. The design type to be used in this experiment is the one factor with two treatments, in which we want to compare the two treatment agains each other (Wohlin et al.,2000). In this sense, the factor is the RiPLE-TE unit testing process and the treatments are following described:

1. Testing with the RiPLE-TE unit testing process. In this treatment, subjects will be trained in the process, and they will have access to the process documentation (the master test plan, the unit test plan, feature model, requirements specification, test scenarios,

5.2. FIRST EXPERIMENT

process guidelines and usage examples), before and during the test performing session.

They are supposed to follow the guidelines so that we can draw conclusions regarding the metrics prior mentioned.

2. Ad-hoc testing. In this treatment, subjects will not receive any training regarding a specific process, but rather they will apply their expertise towards finding defects in the SPL project code.

Instrumentation

The background and experience of the individuals is found through a survey handled out at the first lecture, here named background questionnaire, in which they have to provide information about their experience with software development, participation in projects, experience with software product lines, software testing, and the tools the experiment requires, see AppendixB.1.

This data provided the input to the characterization of the subjects, serving as a means of balancing the groups disposition.

Subjects will be given access to the artifacts necessary to perform the experiment, such as:

master test plan, unit test plan template, project code - components to be tested and specification - requirements and use cases, and the error reporting form, the document in which errors found must be carefully reported. In addition, the subjects will have access to the RiPLE-TE documentation, and the tutorials used in the training as well as a guideline on how to report the errors.

After being informed about the goals and general information on the experiment, all subjects will sign a consent form (see Appendix B.2), as a means of agreeing to join the study. Their signature means that they understand the information given about the study and in the consent form as well. They will be informed that they can withdraw from the experiment at any time, without penalty.

An important means of collecting information, maybe the most important, is the error collection form, where subjects are supposed to report every defect found. AppendixB.3depicts a copy of the error reporting form. It includes details about every error found by a subject. We chose to use such a form instead of using a bug tracking system, such as Trac⁶, Bugzilla⁷or Mantis⁸, due to timing reason. Although the fields were extracted from such systems, we believe that a spreadsheet would be suitable for this experimental context. The use of spreadsheets to

6Trac - http://trac.edgewall.org/

7Bugzilla - http://www.bugzilla.org/

8Mantis Bug Tracker - http://www.mantisbt.org/

5.2. FIRST EXPERIMENT

extract data would be faster, due to the easy of implementation of parsers to extract data and generate useful information. Surely, in a real industrial project, a bug tracking system is indeed required.

In addition, there were designed three types of feedback questionnaires: one intended to the group which conduct the experiment in an ad-hoc fashion (see AppendixB.4), and another to the group that apply the RiPLE-TE unit testing process (see AppendixB.5). The feedback questionnaires were designed to provide useful information on the use of the approach, through gathering information about the participants’ satisfaction with the RiPLE-TE Unit Testing process and the other elements that comprise the experimental study.

The third one refers to questionnaire exhibited inB.6, designed with the intention to gather feedback on the subjects which performed the experiment without using the RiPLE-TE, regarding their opinion about the possibility of finding more defects, if they had used the process. Unlike the application of the previous ones, that is mandatory to all subjects, from both groups, this one is intended to only a sample of subjects, who must answer it after the last training session, when the RiPLE-TE is explained.

Subjects must be monitored in filling out the questionnaires, since these will provide useful data for future analysis and improvements, regarding both the process and the experiment design.

Training

Table5.2.1indicates the experiment training and execution schedule, based on the elements the experiment requires. It aids in identifying which trainings will be performed and the moment they will occur throughout the experimental study. The table shows columns for groups 1 and 2, referring to the division aforementioned. The both groups will work 5 days each, attending to training sessions and the test execution session.

Regarding training sessions, firstly the subjects will become aware of the experiment purpose and associated tasks, as well as an introduction to the SPL topic. It will last 4 hours. The tools to be used in the experiment are JUnit and Eclemma code coverage tool. In order to balance the knowledge, a training on such tools will be performed, which includes practical exercises.

Training sessions will last two 4-hour sessions.

As previously mentioned, one group will perform the tests without applying the RiPLE-TE, but rather in an ad-hoc fashion. Looking at the Table it is possible to notice that Group 1, responsible for such task, will execute the tests in advance to training in the RiPLE-TE. After that, the training on the RiPLE-TE will take place. The both groups will attend this training, that will last 3 hours. Then, Group 2 will be able to perform the tests correctly applying the

5.2. FIRST EXPERIMENT

RiPLE-TE. Each group will have 4 hours to perform testing tasks.

Feedback 1 represents the moment in which subjects will report their feedbacks on the experiment through filling in a questionnaire - Group 1, which performed tests without apply the process will use the feedback questionnaire presented in AppendixB.4and Group 2, which performed tests using the RiPLE-TE will use the questionnaire presented in AppendixB.5.

Feedback 2 represents the moment in which an additional feedback questionnaire will be applied to a sample of subjects from Group 1, right after they joined the training on the RiPLE-TE Process. As this group performed the experiment in an ad-hoc fashion, at this moment they will give feedback on what they think that would be improved upon, by comparing this experience with possible opportunities to improve the results they reported by using a process.

This was the main intention of enabling the first group to attend the training on the RiPLE-TE.

In summary, all activities mentioned in the Table will be sequentially executed, following the schedule accordingly. Yet, all of them, except test execution, will be performed jointly.

Table 5.1 Experiment Training and Execution Agenda Groups

Day 3 - Dec 3^rd, 2009 Training on JUnit/Eclemma 1:00

4 h

5.2. FIRST EXPERIMENT

Validity Evaluation

A fundamental question concerning results from an experiment is how valid the results are (Wohlin et al.,2000). Thus, it is necessary to anticipate the threats possibly involved in the context of an experiment.

(Wohlin et al.,2000) adopt the four-type-categorization of the threats to the validity of the experimental results. They are: Internal validity, External validity, Construct validity and Conclusion validity. Each will be further detailed, including threats that fit in the context of this experiment:

Internal validity

Maturation. This is the effect of that the subjects react differently as time passes. As the experiment (practical tests session) will be conducted during a continued 4-hour period, it is possible that subjects are affected negatively (feel bored or tired) during the experiment, or positively (learning) during the course of the experiment. Subjects were free to stop for some moments, but they could not share information to other subjects.

Testing. If the test is repeated, the subjects may respond differently at different times during the course of the experiment, since they acquire knowledge regarding how the test is conducted. If there is a need for familiarization to the tests, it is important that the results of the test are not fed back to the subject, in order not to support unintended learning. In addition, subjects performed the experiment without external interference, only minor doubts, e.g. sintax commands, could be solved by the experimental study staff (Ph.D. students, as earlier mentioned).

External validity

Generalization of subjects. This is an affect of having a subject population not repre-sentative of the population we want to generalize to, i.e. the wrong people participate in the experiment. Although this is an experiment involving a SPL project, some training sessions on the topic will be held, involving subjects in practical sessions, in which they can become familiar with the tools that will be used as well as the purpose of product lines, and so on. In addition, the experiment will put into practice the knowledge on software testing the subjects have acquired along a semester. However, if subjects succeed in using the proposed approach, it is not convincing that we could generalize its use to SPL testing practitioners at all.

Generalization of scope. This is the effect of not having the experimental setting or material representative of, for example, industrial practice. The experiment will be conducted on a defined

5.2. FIRST EXPERIMENT

time according to the schedule of the undergraduate course, which may affect the overall results.

The scope is tied to the course schedule in order to make feasible its completion. Thus, although a big domain involves the project in question, only a sample scenario was selected to this experiment. Otherwise, it could not be possible to have everything finished within the timetable.

It is a clear indication that the process would fail if it was applied to a larger project, rather than a toy context.

Construct validity

Mono-operation bias. Since the experiment includes a single independent variable, the experiment may under-represent the construct and thus not give the full picture of the theory.

Experimenter expectancies. The experimenters can bias the results of a study both con-ciously and unconcon-ciously based on what they expect from the experiment. We hence tried to reduce such a threat by involving Ph.D. students from other institution, that are interested in practices on experiments, as mentioned earlier in this Chapter, in work together in the whole study, which result in a set of meetings, during classes, to discuss, plan, and review the study.

Hypothesis guessing. When people take part in an experiment they might try to figure out what the purpose and intended result of the experiment is. They are likely to base their behavior on their guesses about the hypotheses. To minimize this risk, all formal definition and planning of the experiment are being carefully designed in advance, and we search for valid measures in the literature to aid in hypothesis definition, although not all metrics are reported in literature.

Conclusion validity

Reliability of measures. The validity of an experiment is highly dependent on the reliability of the measures. Objective measures are more reliable than subjective ones, since humans do not judge and hence do not influence the results. Thus, in objective metrics, the replication of a phenomenon will always have the same outcome. Thus, the expertise of PhD students on experimental software engineering will be helpful in defining objective more than subjective metrics, in order to improve the reliability of our measures and results, consequently.

Heterogeneity of subjects. When the group is very heterogeneous, there is a risk that the variation due to individual differences is larger than due to the treatment. It can represent a threat for the conclusion validity. Since the experiment will be conducted with undergraduate students in the last periods, and training on the required tools and techniques will be performed, we can reduce the risky heterogeneity.

5.2. FIRST EXPERIMENT

In document Ivan do Carmo Machado (Page 117-125)