Conclusions - Optimization Techniques for Automated Software Test Data Generation

In this chapter we discuss the fundamentals of software testing from its beginning. Testing is not a new concept at all since it was already used by the Romans to assess the quality of metals. Nevertheless it is still valid since we want to assess the quality of software. In this chapter we formally defined actual testing problems that are addressed in this PhD dissertation. First, we defined the problems related to structural testing: Test Data Generation Problem and Multi- objective Test Data Generation Problem. We are aware that there is a negligible cost not considered in general, the oracle cost. For this reason we have taken into account the quality of the test suite and the oracle cost in the formulation of problems. Second, we define the problems related to functional testing. We can group the addressed problems in those related to Classification Tree Method, that are the Prioritized Pairwise Test Data Generation Problem and the Test Sequence Generation Problem with Extended Classification Tree Method. The other group are the problems related to Software Product Lines, that are the Pairwise Test Data Generation Problem with and without priorities and the Multi-Objective Test Data Generation Problem in SPL. We want to claim the usefulness of this chapter to provide the reader with the exact version of the problems we are going to solve throughout this PhD thesis.

Fundamentals of Metaheuristics

A heuristic is a simple and intuitive technique that employs a practical methodology to produce close to optimal solutions for a given complex problem using specific information of the problem, not guaranteed to be optimal or perfect, but sufficient for the immediate goals [155]. When finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be seen as mental shortcuts that ease the cognitive load of making a decision. Heuristics are adapted to the problem at hand and they try to take full advantage of the particularities of the problem. However, they can get trapped in a local optimum and thus fail, in general, to obtain the global optimum solution. In addition, they have the drawback of being problem-specific techniques, thus a good heuristic for some given problem will help little when solving a different problem. Consequently, a more general-purpose technique was proposed: Metaheuristics [116].

Metaheuristics, in contrast, are more problem-independent techniques. They are generally conceived as high level heuristics able to employ heuristics methods by guiding them over the search space in order to exploit its best capabilities to achieve better solutions, especially with incomplete or imperfect information or limited computation capacity. The main advantage with respect to a simple heuristic is that they have mechanisms to avoid getting trapped in local optima. This chapter serves as a presentation of metaheuristics, the main techniques used to solve the different variants of the automatic test data generation problem tackled in this PhD thesis. We formally define what is a metaheuristic, we classify the main metaheuristics and we present them for their latter detailed description in the next chapter. Then, we explain the multi-objective paradigm, used when we are interested in optimizing more than one objective at the same time. Finally, we describe the quality indicators and the statistical methods used to measure the quality of the solutions and the practical importance of the results of this research.

3.1 Formal Definition

In a first approach, the techniques can be classified into Exact and Approximate. Exact techniques, which are based on the mathematical finding of the optimal solution, or an exhaustive search until the optimum is found, guarantee the optimality of the obtained solution. However, these techniques present some drawbacks. The time they require, though bounded, may be very large, especially for NP-hard problems. Furthermore, it is not always possible to find such an exact technique for every problem. This makes exact techniques not to be a good choice in many occasions, since both their

time and memory requirements can become unreasonably high for large scale problems. For this reason, approximate techniques have been widely used by the international research community in the last few decades. These methods sacrifice the guarantee of finding the optimum in favor of providing some satisfactory solution within reasonable times [116].

Among approximate algorithms, we can find two types: ad hoc heuristics, and metaheuristics. We focus this chapter on the latter, although we mention before ad hoc heuristics, which can in turn be divided into constructive heuristics and local search methods [172]. Constructive heuristics are usually the swiftest methods. They construct a solution from scratch by iteratively incorporating components until a complete solution is obtained, which is returned as the algorithm output. Finding some constructive heuristic can be easy in many cases, but the obtained solutions are of low quality in general since they use simple rules for such construction. In fact, designing one such method that actually produces high quality solutions is a nontrivial task, since it mainly depends on the problem, and requires thorough understanding of it. For example, in problems with many constraints it could happen that many partial solutions do not lead to any feasible solution.

Local search or gradient descent methods start from a complete solution. They rely on the concept of neighbourhood to explore a part of the search space defined for the current solution until they find a local optimum. The neighbourhood of a given solution s, denoted as N (s), is the set of solutions (neighbours) that can be reached from s through the use of a specific modification operator (generally referred to as a movement ). A local optimum is a solution having equal or better objective function value than any other solution in its own neighbourhood. The process of exploring the neighbourhood, finding and keeping the best neighbour, is repeated in a process until the local optimum is found. Complete exploration of a neighbourhood is often unapproachable, therefore some modification of this generic scheme has to be adopted. Depending on the movement operator, the neighbourhood varies and so does the manner of exploring the search space, simplifying or complicating the search process as a result. Out of the many descriptions of metaheuristics that can be found in the literature [35, 86], the following fundamental features can be highlighted:

• They are general strategies or templates that guide the search process.

• Their goal is to provide an efficient exploration of the search space to find (near) optimal solutions.

• They are not exact algorithms and their behavior is generally non deterministic (stochastic). • They may incorporate mechanisms to avoid visiting non promising (or already visited) regions

of the search space.

• Their basic scheme has a predefined structure.

• They may use specific problem knowledge for the problem at hand, by using some specific heuristic controlled by the high level strategy.

In other words, a metaheuristic is a general template for a stochastic process that has to be filled with specific data from the problem to be solved (solution representation, specific operators to manipulate them, etc.), and that can tackle problems with high dimensional search spaces. In these techniques, the success depends on the correct balance between diversification and intensification. The term diversification refers to the evaluation of solutions in distant regions of the search space (with some distance function previously defined for the solution space); it is also known as exploration of the search space. The term intensification refers to the evaluation of solutions in small bounded regions, or within a neighbourhood (exploitation of the search space). The balance between these two opposed aspects is of the utmost importance, since the algorithm has to

quickly find the most promising regions (exploration), but also those promising regions have to be thoroughly searched (exploitation).

We can distinguish two kinds of search strategy in metaheuristics. First, there are “intelligent” extensions of local search methods (trajectory-based metaheuristics in Figure 3.1). These techniques add some mechanism to escape from local optima to the basic local search method (which would otherwise stick to it). Tabu Search (TS) [84], Iterated Local Search (ILS) [86], Variable Neighbourhood Search (VNS) [157] or Simulated Annealing (SA) [119] are some techniques of this kind. These metaheuristics operate with a single solution at a time, and one (or more) neighbourhood structures. A different strategy is followed in Ant Colony Optimization (ACO) [64], Particle Swarm Optimization (PSO) [48] or Evolutionary Algorithms (EAs) [86]. These techniques operate with a set of solutions at any time (called colony, swarm or population, depending on the case), and use a learning factor as they, implicitly or explicitly, try to grasp the correlation between design variables in order to identify the regions of the search space with high-quality solutions (population-based techniques in Figure 3.1). In this sense, these methods perform a biased sampling of the search space.

A formal definition of metaheuristics can be found in [138], with an extension in [45]. A basic formulation of a metaheuristic is presented in the following definition:

Definition 3.1.1 (Metaheuristic). A metaheuristic M is a tuple consisting of eight components as follows:

M = hT , Ξ, µ, λ, Φ, σ, U, τ i , (3.1)

where:

• T is the set of elements operated by the metaheuristic. This set contains the search space, and in many cases they both coincide.

• Ξ = {(ξ1, D1), (ξ2, D2), . . . , (ξv, Dv)} is a collection of v pairs. Each pair is formed by a state

variable of the metaheuristic and the domain of said variable. • µ is the number of solutions operated by M in a single step. • λ is the number of new solutions generated in every iteration of M. • Φ : Tµ_×Qv

i=1

Di× Tλ → [0, 1] represents the operator that produces new solutions from the

existing ones. The function must verify for all x ∈ Tµ _{and for all t ∈}Qv i=1Di, X y∈Tλ Φ(x, t, y) = 1 . (3.2) • σ : Tµ _{× T}λ _× Qv i=1

Di× Tµ → [0, 1] is a function that selects the solutions that will be

manipulated in the next iteration of M. This function must verify for all x ∈ Tµ_{, z ∈ T}λ

and t ∈Qv_i=1Di, X y∈Tµ σ(x, z, t, y) = 1 , (3.3) ∀y ∈ Tµ_{, σ(x, z, t, y) = 0 ∨ σ(x, z, t, y) > 0 ∧} _(3.4) (∀i ∈ {1, . . . , µ}, (∃j ∈ {1, . . . , µ}, yi= xj) ∨ (∃j ∈ {1, . . . , λ}, yi= zj)) .

• U : Tµ_{× T}λ_×Qv i=1 Di× v Q i=1

Di→ [0, 1] represents the updating process for the state variables

of the metaheuristic. This function must verify for all x ∈ Tµ_{, z ∈ T}λ _{and t ∈}Qv i=1Di, X u∈Qv i=1Di U(x, z, t, u) = 1 . (3.5) • τ : Tµ_×Qv i=1

Di→ {f alse, true} is a function that decides the termination of the algorithm.

The previous definition represents the typical stochastic behavior of most metaheuristics. In fact, the functions Φ, σ and U should be considered as conditional probabilities. For instance, the value of Φ(x, t, y) is the probability to generate the offspring vector y ∈ Tλ_{, since the current set}

of individuals in the metaheuristic is x ∈ Tµ_{, and its internal state is given by the state variables}

t ∈ Qv_i=1Di. One can notice that the constraints imposed over the functions Φ, σ and U enable

them to be considered as functions that return the conditional probabilities.

In document Optimization Techniques for Automated Software Test Data Generation (Page 39-44)