A very important concept is termed effect strength. In computer science it appears for example as the term ”silver bullet” which is used to denote strong factors. Regarding software engineering experimentation, effect strength is directly related to the number of participants needed within an experiment. The result of such an analysis is a probability value describing the likelihood of an experiment to show successfully a given effect strength for a given number of participants. This analysis which should be part of an experiment as first described by Miller et al. [62] and refined later by Miller [61], is called power analysis.
One source for estimating the required number is given by the works of Cohen [20] for statistical tests like t-tests, χ2-tests, or correlation analysis (amongst
others). Cohen presents power tables which for a given number of participants
n and an effect strength d present the likelihood of an experiment to positively show a given effect strength. Effect strength is defined based on the type of
18.4 Effect Strength
test, and for the t-test the effect strength d is given as
d= ma−mb
σ
with maand mb being the population means (of the original measurement) and
σ as the standard deviation. For example a medium effect strength of 0.5 would have a probability of 0.8 to be shown by an experiment when 80 participants took part. For a weak effect strength of 0.2 to have the same probability, 500 participants would be needed, while a strong effect of 1.0 would need only 21 participants. Another origin of power estimation is given by [12] in the form of a table for multiple tests which can be used as a rule of thumb, too.
Only for the refactoring experiment a power analysis was executed. As for this experiment bootstrap methods were used, the power analysis was executed using a resampling based approach. In order to calculate the probability experiments are simulated based on the original data. The resulting data is compared to the observed difference in means and the fraction of experiment with a same or higher difference is used to estimate the power of an experiment. By using more participants within the simulation, power estimates for higher values of
n can be calculated. An example of this post analysis process is presented in Section 5.3.3.
The relationship of effect strength, number of participants and likelihood to show a difference within an experiment is important especially during the design phase of an experiment. Numbers of 30 participants or more, which are proposed for human sciences, appear as a stable value at least for medium to strong effects. Regarding the ability to estimate the power during execution, an experiment may be tested for power during its actual execution. Thus one rather uncommon way would be to start an experiment and execute it over time. In contrast most experiments are executed only for a certain, short period of time. Regarding this execution, apart from higher power values, funding may be a bit easier as it may be spread over a longer period of time.
19 Results
The last chapter presents the main contribution of this thesis to embedded software engineering and software engineering experimentation. The chapter is closed by intentions for future work consisting of a perfectly predictable software engineering experiment.
19.1 Agility and Embedded Systems
Although a direct positive effect of a single agile technique could not be found, some hints on the benefits of agility were found. Regarding refactoring a reduced memory consumption for the program text of the resulting program was found. Although not part of a major hypothesis, differences were found to be unlikely caused by randomness. The reason for this reduced memory assumption lies in the aggregation of code into single functions, or in a more general sense in the design principle of ”once and only once”. As memory consumption is important for embedded systems, application of refactoring might help to stay resourceful in this regard.
The analysis of types of work as presented in Chapter 4 revealed that most of the function related code is developed constantly throughout a project using the technique of short releases. In contrast initial planning appears to lead to architecture related development in the beginning and functional work in the end. A rather interesting result is that a higher fraction of work in short releases is spent on defect related development and general change of software. Two interpretations of this could be given: either short iterations lead to a higher amount of failures and change, or due to early functional development, problems are found earlier. Long planning projects appears to have long phases were no defect or change related work is done. Only at the very end the fraction of work for these issues increases.
The final indirect result concerning agility is that the basic work style of programming is rather based on randomness (cf. Chapter 12). The randomness can be observed in form of lines of code that are created but are later deleted and thus do not appear in the final version of the program. In addition the module that is worked on does not follow a straight sequence. Changes may
19 Results
occur during later stages of a software development task. This observation leads to an indirect justification of using Agile Methods, as the ability to cope with change is said to be one major property of Agile Methods [7].
Negative aspects of Agile Methods can be measured for refactoring where an overhead for this technique can be shown. The reason for this clearly originates from the added changes enforced by this technique. Due to the absence of structural improvement the observation is considered problematic. Regarding the analysis of work types presented in Chapter 4, one unexpected observation was a peak in absolute development time for the short iteration based development right before the end of the project. As agility was assumed to circumvent such effects due to constant addition of functionality and defect reduction, results appear to support longer planning phases which provide a fixed development sequence. The last problem of Agile Methods is that their effect strength may not be sufficient to justify their application. Even though it might be possible that their effect strength is high and just by chance did not occur in the experiments carried out, their effect strength appears to be too low to be in the area of a ”silver bullet”. Especially when compared to the tool chain that was used, the subjective impression is that better tools (external test environments, extensive peripheral libraries, faster compile and debug cycle) would have improved the development more than an agile technique. The benefit of using superior tool chains might be caused by the software structure of embedded systems. The special aspect of the embedded software structure is made up of realtime requirements and hardware dependence which are technical problems. Solving these technical problems might only be possible for development methods directly designed for the need of embedded software engineering.