with selection of test cases generated from our chosen type of specification model. More- over, we decided to further explore the benefits of using a similarity function for selecting test cases based on modified specification models, before beginning to experiment with dif- ferent similarity functions. Details regarding our similarity function will be presented when explaining SART in Chapter 3.
2.5
Experimental Studies in Software Engineering
In order to achieve reliable conclusions during research, we need to choose an appropri- ate method such as surveys, case studies and experiments and then evaluate our hypothesis. Surveys are used for exploring and understanding a population based on a sample. The anal- ysis is often performed through forms, interviews and questionnaires, allowing researchers to explain and describe the population based on the sample drawn. In turn, case studies are con- ducted to investigate phenomenon within a specific time interval or industrial setting, hence observations and conclusions are often limited and hard to scale up or generalize [Wohlin et al. 2012].
The main difference between those empirical methods and an experiment is that the lat- ter is based on a formal, rigorous and controlled investigation of variables, thus increasing confidence in obtained results. The starting point is to observe a cause and effect relationship (Figure 2.5) expressed through a hypothesis, thus we want to study the outcome (dependent variables) after changing the input variables (independent variables) to a process. Examples of experiments could be to investigate the effect of changing a software development process or testing technique (examples of independent variables) in the productivity rates of devel- opers, time to release a product, or defect detection rate (examples of dependent variables).
Factors in an experiment, are one or more independent variables with varying values
named treatments (or levels) that when changed will affect the dependent variables. Thus during an experiment factors assume different treatments while other independent variables (objects such as software artefacts and subjects/participants involved with the experiment) are controlled and then the effect of these changes are measured through the dependent vari- able for subsequent analysis. Experiments require a process in order to be properly con- ducted. Here, we use Wohlin’s et al. process that comprise the following steps [Wohlin et al.
2.5 Experimental Studies in Software Engineering 23 Treatment Outcome Cause construct Effect construct Experiment objective Experiment operation Theory Observation Cause-effect construct Treatment-outcome construct
Figure 2.5: Experiment principles (adapted from Wohlin et al.[Wohlin et al. 2012]) 2012]:
• Definition: The main concern in this step is properly defining the elements of the experiments, such as the hypothesis being investigated, context and purpose of the study.
• Planning: This step is the foundation of the experiment where variables and their values are defined, main activities are planned and the experimental design is deter- mined. The latter establishes how the execution is conducted and plays a major role in enabling analysis of dependent variables with the appropriate statistical functions and tools. Also, the null and alternative hypotheses are created, and they usually represent the causation effect between treatments of a factor. Traditionally the null hypothesis
indicates that the outcome is not affect by different treatments (H0 : µa = µb) and the
alternative indicate otherwise (H1 : µa 6= µb3), hence the goal is often to reject the null
hypothesis.
• Operation: Preparation and actual execution of the experiment are performed during this step, i.e. setting up tools, defining scripts and questionnaires for subjects, etc.
2.5 Experimental Studies in Software Engineering 24
• Analysis: The data collected during operation is organized (e.g. reduced or processed) and interpreted to draw conclusions regarding the hypothesis. Interpretation of data is mainly done by analysing descriptive statistics and perform hypothesis testing to ensure a significance level in conclusions drawn.
• Packaging: This step focuses on presentations and packaging of results regarding the experiment. This last activity is important since it allows other researches to repro- duce the experiment based on information made available, thus it is recommended to generate research compendia with access to data, reports and the platform where the experiment was executed [González-Barahona and Robles 2012].
During this process several validity threats may appear and compromise validity of re- sults, meaning that the results must be valid for the population being considered in order to allow generalization of results. These threats are classified as conclusion, internal, external and construct validity [Cook and Campbell 1979]. Conclusion and internal validity threats are related to the observed effect between treatment and outcome during analysis and execu-
tionrespectively. For example, the former is a consequence of wrong statistical relationship
and the latter is a consequence of not controlling or measuring the variables properly. In turn construct validity threats refer to properly transition from theory to observation where treatments and outcome indeed reflect cause and effect constructs, respectively. Last, exter- nal validity threats are concerned with generalization where the relationship observed during execution really implies in a general cause-effect construct relationship.
Therefore, validity threats must be identified and reported for two main reasons. First, they allow researchers to properly define what aspects of the experiment can be applied in practice in order to avoid risks or compromise the integrity of the object being studied by, for example, transferring technology to production based on wrong results. The second reason is to encourage reproducibility where, not only the threats but all information possible must be accessible, allowing other researchers to reproduce or adapt the experiment and then expand the results. Next we will discuss some basic aspects of statistical analysis to familiarize the reader with interpretation of data in an experiment.