• No results found

Automatically Configuring (µ + λ) Evolutionary Algorithms

4.2 Incorporating Problem Difficulty Measure to Build Automatic Algo-

4.2.5 Automatically Configuring (µ + λ) Evolutionary Algorithms

Problem

The utility of the proposed automatic algorithm configuration method is illus- trated by configuring the (µ + λ) evolutionary algorithms (EAs) for solving the unique input output sequence problem. First, the unique input output sequence problem is introduced. Second, the target algorithm (µ + λ) EA is described with the domains of possible values for its parameters. Third, the performance metrics to evaluate the performance of the target algorithm and the proposed automatic algo- rithm configuration method are defined, followed by the descriptions of the training data set.

4.2.5.1 Unique Input Output Sequence Problem

Finite state machines (FSMs) have been widely used to model software, commu- nication protocols and circuits [74]. In FSM–based testing, a standard test strategy consists of two parts, namely, transition test and tail state verification. The former part aims to determine whether a transition of an implementation under test (IUT) produces the expected output while the latter checks that the IUT arrives at the specified state when a transition test is finished. Nearly all FSMs have Unique In- put/Output Sequences (UIOs) for each state [2], and UIOs are the most widely used technique to generate robust and compact test sequences in finite state machine (FSM) based testing.

Computing UIOs is an NP-hard problem [74]. Lee and Yannakakis [74] note that adaptive distinguishing sequences and UIOs may be produced by constructing a state splitting tree. However, no rule is explicitly defined to guide the construction

of an input sequence. Metaheuristics have proven efficient and effective in providing good solutions to some NP-hard problems in software engineering. Similarly to other problems in search-based software engineering [53], the UIO problem has been reformulated as an optimisation problem, and several evolutionary algorithms (EAs) such as genetic algorithm and simulated annealing [32,50] were develop to tackle it. The experimental results show that EAs can efficiently find UIOs for some FSMs.

Theoretical investigations have confirmed that EAs can outperform random search on the UIO problem [75]. The expected running time of (1+1) EA on a counting FSM instance class is polynomial, while random search needs exponential time [75]. The UIO problem is NP-hard, so one can expect that there exist EA-hard instance classes. Theoretical results show that the EA configurations (operators and parameters) have an essential impact on finding the UIOs for an FSM efficiently [76]. The remainder of this section introduces definitions and notations of the UIO problem.

Definition 10 (Finite State Machine).

A finite state machine(FSM) is a quintuple: M = (S,X,Y, δ, λ), where X,Y and S are finite and nonempty sets of input symbols, output symbols, and states, respectively; δ : S × X −→ S is the state transition function; and λ : S × X −→ Y is the output function.

Definition 11 (Unique Input Output Sequence).

A unique input output sequence for a state si in an FSM is an input/output

sequence x/y, where x ∈ X∗, y ∈ Y∗, ∀sj 6= si, λ(si, x) 6= λ(sj, x) and λ(si, x) = y.

Section 2 in [50] gives an example of the unique input output sequence problem. To generate UIO using an EA, candidate solutions are represented by input strings restricted to Xn = {0, 1}n, where n is the number of states of the FSM.

search for a UIO of input string length n for state s1 in all FSM instances. The

fitness function is defined as a function of the state partition tree induced by the input sequence [50, 75, 76].

Definition 12 (UIO fitness function).

For a FSM M with m states, the fitness function f : Xn −→ N is defined as f (x) := m − γM(s, x), where s is the initial state, and γM(s, x) := |{t ∈ S|λ(s, x) =

λ(t, x)}|.

The instance size of the UIO problem is defined as the length of the input se- quence n. The value of γM(s, x) is the number of states in the leaf node of the state

partition tree containing node s, and is in the interval from 1 to m. If the shortest UIO for state s in FSM M has length no more than n, then f (x) has an optimum of m − 1.

4.2.5.2 Target Algorithm

The (µ+λ) EAs described in Algorithm3is employed to tackle the UIO problem. The (µ + λ) EAs has three parameters: population size, variation operator, and selection operator, the domains of possible values for the three parameters are listed below:

• Population size: 3 different (µ+λ) options are considered: {(4+4), (7+3), (3+ 7)}.

• Variation operator Nj, (j = 1, 2, . . . , 12): Three variation operators with dif-

ferent probabilities are considered:

– N1(x) ∼ N5(x): Bit-wise mutation, flip each bit with probability p = c/n,

where c ∈ {0.5, 1, 2, n/2, n − 1};

– N10(x) ∼ N12(x): Non-uniform mutation[20], for each bit i, 1 ≤ i ≤ n,

flip it with probability χ(i) = c/(i + 1), where c = {0.5, 1, 2}.

In total 12 variation operators are considered. These variation operators are applied to generate 12 fitness-probability clouds and obtain a set of 12 aep measures for each UIO instance.

• Selection operator Si, (i = 1, 2): Two selection operators are considered:

– Truncation Selection: Sort all µ + λ individuals in P(k) and P(k)

m by their

fitness values, then select µ best individuals as the next generation P(k+1).

– Roulette Wheel Selection: Retain all the best individuals in P(k)and P(k) m

directly, and the rest of the individuals of the population are selected by roulette wheel.

Algorithm 3: (µ + λ)- EA

Choose µ initial solutions P(0)= {x(0) 1 , x (0) 2 , . . . , x (0) µ } uniformly at random from {0, 1}n k ←− 0

while Termination criterion not satisfied do P(k)m ←−Nj(P(k)) %%mutation P(k+1) ←−Si(P(k), P (k) m ) %%selection k ←− k + 1 end 4.2.5.3 Performance Metric

The performance metric H and the performance evaluation function F to eval- uate the performance need to be clearly defined. The former is to evaluate the performance of the algorithm configuration method over an entire set of testing instances, the later is to evaluate the performance of the target algorithm when executed under configuration θ on a problem instance inst.

Definition 13 (Performance metric H and Performance evaluation function F ) Let inst be a UIO instance, given a cut-off time t, F is defined as the number of function evaluations taken by the target algorithm executed under configuration θ to find a unique input output sequence for inst. H is defined as the mean value of F over an entire set of testing UIO instances. The lower the value of H , the better the algorithm configuration method.

4.2.5.4 Training Data Generation

The SVM takes as input a set of training samples {D1, · · · , Dm}, and their labels

{L1, · · · , Lm} where Li ∈ {1, −1}.

A training sample Di = (PF, θ), where PF in D denotes a problem instance,

which is represented by a set of aep measures computed based on the fitness- probability clouds generated using different neighbourhood operators. In this case, a UIO instance is represented by a set of 12 aep measures {aep1, · · · , aep12} computed

based on the fitness-probability clouds generated using 12 different neighbourhood operators. With respect to θ in D, θ represents a configuration of the target algo- rithm (µ + λ) EA, where the candidate configurations of the (µ + λ) EA are listed in the above section.

The label Li for a training sample Di is determined by the performance of the

target algorithm (µ + λ) EA executed under configuration θ on the UIO instance. 100 independent runs are performed and the performance is measured by the per- formance metric F , which is the mean number of function evaluations taken by the target algorithm to find a unique input output sequence for the UIO instance. The performance metric F is then compared to a pre-defined threshold to determine the label Li in the following way: if F < threshold, Li = 1, otherwise Li = −1.