Empiricism in a Software Engineering Context

Why should we perform experiments and other empirical studies in software engineering? The major reasons for carrying out quantitative empirical studies is the opportunity of getting objective and statistically significant results regarding the understanding, controlling, prediction, and improvement of software development.

Empirical studies are an important input to the decision-making in an improvement seeking organization.

Before introducing new techniques, methods, or other ways of working, an empirical assessment of the virtues of such changes is preferred. In this section, a framework for evaluation of software process changes is presented, where different empirical strategies are suggested in three different contexts: desktop, laboratory, and development projects.

To be successful in software development there are some basic requirements [7, 8,42]:

1. Understanding of the software process and product.

2. Definition of process and product qualities.

3. Evaluation of successes and failures.

4. Information feedback for project control.

5. Learning from experience.

6. Packaging and reuse of relevant experience.

Empirical studies are important to support the achievement of these require-ments, and fit into the context of industrial and academic software engineering research, as well as in a learning organization, seeking continuous improvement. An example of a learning organization, called Experience Factory, is proposed by Basili in conjunction with the Quality Improvement Paradigm [7], as further described in the sequel of this section. This approach also includes a mechanism for defining and evaluating a set of operational goals using measurement. This mechanism is called Goal/Question/Metric (GQM) method [17], which is further described below. The GQM method is described in more detail by van Solingen and Berghout [172].

2.9.1 Empirical Evaluation of Process Changes

An improvement seeking organization wants to assess the impact of process changes (e.g., a new method or tool) before introducing them to improve the way of working. Empirical studies are important in order to get objective and quantifiable information on the impact of changes. In Sects.2.2–2.4, three empirical strategies are described: surveys, case studies and experiments, and they are compared in Sect.2.5. This section describes how the strategies may be used when software process changes are evaluated [177]. The objective is to discuss the strategies in terms of a suitable way of handling technology transfer from research to industrial

2.9 Empiricism in a Software Engineering Context 25

Large scale

Low risk High risk

Laboratory Experiment

Real life projects Case study

Desktop Survey

Small scale

Fig. 2.1 Surveys, experiments and case studies

use. Technology transfer and some different steps in that process in relation to using empirical strategies are discussed in Sect.2.10.

In Fig.2.1, the strategies are placed in appropriate research environments. The order of the strategies is based on the ‘normal’ size of the study. The objective is to order the studies based on how they typically may be conducted to enable a controlled way of transferring research results into practice. As a survey does not intervene with the software development to any large extent, there is a small risk.

An experiment is mostly rather limited in comparison to a real project and the case study is typically aimed at one specific project. Furthermore, an experiment may be carried out in a university environment prior to doing a study in industry, hence lowering the cost and risk, see also Linkman and Rombach [113].

The research environments are:

Desktop The change proposal is evaluated off-line without executing the changed process. Hence, this type of evaluation does not involve people that apply the method, tool, etc. In the desktop environment, it is suitable to conduct surveys, for example, through interview-based evaluations and literature studies.

Laboratory The change proposal is evaluated in an off-line laboratory setting (in vitro¹), where an experiment is conducted and a limited part of the process is executed in a controlled manner.

Real life The change proposal is evaluated in a real life development situation, i.e. it is observed on-line (in vivo²). This involves, for example, pilot projects. In this environment it is often too expensive to conduct con-trolled experiments. Instead, case studies are often more appropriate.

In Fig.2.1, the placement of the different research environments indicates an increase in scale and risk. In order to try out, for example a new design method

1Latin for “in the glass” and refers to chemical experiments in the test tube.

2Latin for “in life” and refers to experiments in a real environment.

in a large-scale design project and in a realistic environment, we may apply it in a development project as a pilot study. This is, of course, more risky compared to a laboratory or desktop study, as failure of the process change may, endanger the quality of the delivered product. Furthermore, it is often more expensive to carry out experiments and case studies, compared to desktop evaluation, as a desktop study does not involve the execution of a development process. It should be noted that the costs refer to the cost for investigating the same thing. For example, it is probably less costly to first interview people about the expected impact of a new review method than performing a controlled experiment, which in turn is less costly than actually using the new method in a project with the risks involved in adopting new technology.

Before a case study is carried out in a development project, limited studies in either or both desktop and laboratory environments should be carried out to reduce risks. However, there is no general conclusion on order and cost; for every change proposal, a careful assessment should be made of which empirical strategies are most effective for the specific situation. The key issue is to choose the best strategy based on cost and risk, and in many cases it is recommended to start in a small scale and then as the knowledge increases and the risk decreases the study is scaled up.

Independently of which research strategy we use, there is a need for methodology support in terms of how to work with improvement, how to collect data and to store the information. These issues are further discussed subsequently.

2.9.2 Quality Improvement Paradigm

The Quality Improvement Paradigm (QIP) [7] is a general improvement scheme tailored for the software business. QIP is similar to the Plan/Do/Study/Act cycle [23, 42], and includes six steps as illustrated in Fig.2.2.

These steps are explained below [16].

1. Characterize. Understand the environment based upon available models, data, intuition, etc. Establish baselines with the existing business processes in the organization and characterize their criticality.

2. Set goals. On the basis of the initial characterization and of the capabilities that have a strategic relevance to the organization, set quantifiable goals for successful project and organization performance and improvement. The reasonable expec-tations are defined based upon the baseline provided by the characterization step.

3. Choose process. On the basis of the characterization of the environment and the goals that have been set, choose the appropriate processes for improvement, and supporting methods and tools, making sure that they are consistent with the goals that have been set.

4. Execute. Perform the product development and provide project feedback based upon the data on goal achievements that are being collected.

2.9 Empiricism in a Software Engineering Context 27

Analyze

Execute

Package Set goals

Characterize

Choose process

Fig. 2.2 The six steps of the Quality Improvement Paradigm [7]

5. Analyze. At the end of each specific project, analyze the data and the information gathered to evaluate the current practices, determine problems, record findings, and make recommendations for future project improvements.

6. Package. Consolidate the experience gained in the form of new, or updated and refined, models and other forms of structured knowledge gained from this and prior projects.

The QIP implements two feedback cycles [16], see also Fig.2.2:

• The project feedback cycle (control cycle) is the feedback provided to the project during the execution phase. Whatever the goals of the organization, the project used as a pilot should use its resources in the best possible way;

therefore quantitative indicators at project and task level are useful in order to prevent and solve problems.

• The corporate feedback cycle (capitalization cycle) is the feedback loop that is provided to the organization. It has the double purpose of providing analytical information about project performance at project completion time by comparing the project data with the nominal range in the organization and analyzing concordance and discrepancy. Reusable experience is accumulated in a form that is useful and applicable to other projects.

2.9.3 Experience Factory

The QIP is based on that the improvement of software development requires continuous learning. Experience should be packaged into experience models that can be effectively understood and modified. Such experience models are stored in a repository, called experience base. The models are accessible and can be modified for reuse in current projects.

Experience factory Project organization

1. Characterize 2. Set goals 3. Choose process

5. Analyze 4. Execute

Experience base 6. Package Project

support

Fig. 2.3 Experience Factory

QIP focuses on a logical separation of project development (performed by the Project Organization) from the systematic learning and packaging of reusable experience (performed by the Experience Factory) [8]. The Experience Factory is thus a separate organization that supports product development by analyzing and synthesizing all kinds of experience, acting as a repository for such experience, and supplying that experience to various projects on demand, see Fig.2.3.

The Experience Factory packages experience by “building informal, formal, or schematised models and measures of various processes, products, and other forms of knowledge via people, documents, and automated support” [16].

The goal of the Project Organization is to produce and maintain software. The project organization provides the Experience Factory with project and environment characteristics, development data, resource usage information, quality records, and process information. It also provides feedback on the actual performance of the models processed by the experience factory and utilized by the project.

The Experience Factory processes the information received from the develop-ment organization, and returns direct feedback to each project, together with goals and models tailored from similar projects. It also provides baselines, tools, lessons learned, and data, tailored to the specific project.

To be able to improve, a software developing organization needs to introduce new technology. It needs to experiment and record its experiences from develop-ment projects and eventually change the current developdevelop-ment process. When the technology is substantially different from the current practice, the evaluation may be off-line in order to reduce risks. The change evaluation, as discussed above, may take the form of a controlled experiment (for detailed evaluation in the small) or of a case study (to study the scale effects). In both cases, the Goal/Question/Metric method, as described subsequently, provides a useful framework.

2.9 Empiricism in a Software Engineering Context 29

Goal

Question

Metric

Goal

Question Question Question Question

Metric Metric Metric Metric Metric

Conceptual level

Operational level

Quantitative level

Fig. 2.4 GQM model hierarchical structure

2.9.4 Goal/Question/Metric Method

The Goal/Question/Metric (GQM) [17,26,172] method is based upon the assump-tion that for an organizaassump-tion to measure in a purposeful way it must:

1. Specify the goals for itself and its projects,

2. Trace those goals to the data that is intended to define those goals operationally, and

3. Provide a framework for interpreting the data with respect to the stated goals.

The result of the application of the GQM method is a specification of a measurement model targeting a particular set of issues and a set of rules for the interpretation of the measurement data.

The resulting measurement model has three levels, as illustrated by the hierar-chical structure in Fig.2.4:

1. Conceptual level (Goal). A goal is defined for an object, for a variety of reasons, with respect to various models of quality, from various points of view, relative to a particular environment. Objects of measurement are products, processes, and resources (see also Chap.3).

2. Operational level (Question). A set of questions is used to characterize the way the assessment/achievement of a specific goal is going to be performed based on some characterization model. Questions try to characterize the objects of measurement (product, process and resource) with respect to a selected quality aspect and to determine its quality from the selected viewpoint.

3. Quantitative level (Metric). A set of data is associated with every question in order to answer it in a quantitative way (either objectively or subjectively).

The process of setting goals is critical to the successful application of the GQM method. Goals are formulated based on (1) policies and strategies of the organization, (2) descriptions of processes and products, and (3) organization models. When goals have been formulated, questions are developed based on these goals. Once the questions have been developed, we proceed to associating the questions with appropriate metrics.

Practical guidelines of how to use GQM for measurement-based process im-provement are given by Briand et al. [26], and van Solingen and Berghout [172].

In Chap.3, general aspects of measurement are further described.

In document Experimental SoftwareEngineering (Page 47-53)