Genetic Algorithm-based Optimized Test Case Design Using UML
Santi Swarup Basa
1*, Santosh Kumar Swain
2and Durga Prasad Mohapatra
3*1
Assistant Professor,
Department of Computer Science, North Orissa University, Baripada, INDIA.
2
Professor,
Department of Computer Engineering,
KIIT Deemed to be University, Bhubaneswar, INDIA.
3
Professor,
Department of Computer Engineering, NIT, Rourkela, INDIA.
email:
1[email protected],
2[email protected],
3[email protected]
(Received on: September 7, 2018) ABSTRACT
Thorough and effective testing of large software is time-consuming. Design of test case is an important task among the testing activities of software development.
Further, automated design of test cases is very important for testing a software product. We use UML state chart diagram with an intention for early test case generation in this paper. Feasible and optimized paths are generated by using Genetic Algorithm. The proposed approach generates optimized and efficient test cases. The generated test cases help to uncover faults like state based interaction, sequence and scenario faults and transaction based condition faults.
Keywords: Test case, Genetic Algorithm, Unified Modeling Language, State Chart.
1. INTRODUCTION
The software systems are playing an increasingly prime role in society both economically and socially. There is always demand for good software systems in the society.
Software testing plays an important role in developing a good quality and reliable software.
Software testing can be carried out manually or automatically by generating the test cases.
Model based automated software testing saves time and costs if it is started from the design
phase of software development
1. It presents a determined approach to create strong and good
software. A test case is a sequence of input and output satisfying specific coverage criteria in
a particular state. With the help of test cases, testers find the faults of not satisfying the requirement specifications and check the output against the input
2,3. UML models are used as a good source of providing information for Model-Based Test case generation. So, UML based test case generation improves quality and reliability of software
2,4,5.
The UML State Chart diagrams are used to elaborate the changes of different states of an object in a use case
6,7. UML state chart model represents the behavioral information of each object when it participates in execution of certain activity
4. We can obtain test cases from UML state chart diagrams for unit level testing. The state chart model of a system generates test requirements to meet state and transition based coverage criteria
8,9,10. A good test case should cover as much as test objectives. Efficiency of a testing process relies on the quality not in the quantity of test cases. In this paper, we present a technique to generate optimal test cases by eliminating redundant test cases. So the time consumed in testing phase can be reduced.
In this paper, a test case generation and optimization technique from UML state chart diagram is proposed which is based on the total cost of each path traversed. The proposed technique uses Genetic Algorithm (GA) for test sequence generation and optimization. Then the test cases are generated from test sequences. The generated test cases can be used to identify state-interaction and state-sequence faults using pre- and post- condition of events of transitions.
The rest part of the paper is structured as follows: we describe basic concepts on UML state chart diagram, graph and testing graph in Section 2. The brief outlines of GA are given in section 3. Section 4 discusses related work on test case generation and optimization from UML diagrams. In Section 5, we describe the test case generation algorithm applying GA. In Section 6, the proposed algorithm is implemented on Student Registration for Seminar.
Finally, the conclusion and future scopes are given in Section 7.
2. BASIC COCEPTS
2.1 UML State Chart Diagram
The state Chart diagram is one of the behavioral diagram in the UML that shows transition between states of object. It models the dynamic aspects of a system. It illustrates the states of an object attained during the execution of a system activity as well as the transitions among those states. Transition of one state to another state is happened when an event is occurred at specific state of object. This diagram is specially designed for displaying the behavior of an object. Each state chart diagram is having a start state represented by a dark circle and one or more final state represented by a bordered dark circle
12.
It specializes in describing particular types of behavior in specific shifts coming out
regarding single state to other. The major essential features that state chart diagram describes
are states and transitions
11,13. States are represented by boxes with rounded corners. Each arrow
showing flow from one state change to another is called as transition which is marked with
event
12.
State
A state illustrates different information that an object can hold and shows the current nature of an object. Three different compartments of a state are given follows
14:
Name: name of the state is written in name compartment.
Internal activities: This compartment consists of internal activities.
Internal transitions: This compartment consists of a record of enclosed variations.
Fig.1: Different notations of state chart diagram
Event
It is an instance that triggers a transition. An event instance has few outcomes to the system. An event can be any of the subsequent categories, such as change event, signal event, call event, time event, and trigger
15,16.
Transition
A transition is connected among dual states displaying a desirable variation from individual state to other
17.
2.2 Graph Concepts
Graph: A graph G is an ordered pair of vertices (V) and edges(E). The finite set V contains the number of nodes and E contains the number of arcs connecting the nodes and represented by 2-element subset of V of graph G.
Graph Traversal: The process of visiting each vertex in a graph is called as graph traversal.
The traversal techniques are characterized by the ordering of visiting the vertices of graph.
Depth-First Search (DFS) and Breadth-First Search (BFS) are the two techniques of graph traversal.
State Transition Graph: State Chart Graph (STG) is a graph where each state represents the node and each edge is represents the transition of UML state chart diagram. It shows the control flow and state changes after occurring of an event on a specific state of an object during a particular use case scenario.
Fig.1: Different notations of state chart diagram
paying
transition
event
name
state
name
Unpaid
Paid Invoice
created
Invoice destroy
ed paying
transition
Initial state
event
final state
state
2.3 Testing Concepts
In this subsection, we have discussed some concepts related to our proposed approach.
Test Path: It is a path from initial node to end node of state transition graph to test a program or model.
Transition Paths: A test path containing at least one new transition of state chart diagram is known as Transition Path (TP).
Testing Coverage Criterion: It is a testing measure to show the parts of code/software/model is exercised when we execute a test suite. Test coverage determines amount of testing covered by running a test suits.
Event Coverage Criterion: Each event of the transition of state chart diagram is part of at least one test case.
Transition Coverage Criterion: Each transition of the state chart diagram is part of at least one test case.
3. GENETIC ALGORITHM
Genetic algorithm (GA) is an evolutionary and search heuristic algorithm, which can be used to solve larger class of problems by generating high quality solutions. GA is used as an optimization technique and meta-heuristic search method which is motivated by Charles Darwin’s theory of natural evolution of species. In GA, the fittest individuals are selected from the population to generate the offspring for further generation. The offspring inherits the characteristics of parent population. The offspring are better if the parents are fitted better and chance of survival is also better
18. It is the best way to solve a set of problems with less information
18,19. It can automatically produce high quality test data for complex and real-life problems
2.
The algorithm of GA starts with a set of individual population of chromosomes. Each chromosome is characterized by a binary string which is formed using number of digit called gene. The individual population can be selected randomly
4. The GA relies on bio-inspired operators like selection, crossover and mutation. Generally, the algorithm terminates when or a required population fitness level has been reached or a maximum number of generations has completed. In GA, the process can be described as follows
2:
Randomly Initialize population
Calculate the fitness value of individual population using fitness function.
Do {
Perform Selection operation to select the parents from the population Perform Crossover operation on parents
Perform Mutation operation on population
Calculate Fitness value of population
} While (stopping criterion is not satisfied)
Selection Operation: A selection operator is applied on population chromosomes to determine the fittest parent chromosomes to mate. Fitness function depends upon the various criteria. This function measures the quality of solution. Further, it determines the optimum point and pass the genes to produce the offspring. Then, the offspring are created and survived to the next iteration/generation
18. There are generally six different types of Selection methods such as roulette wheel, exponential rank, binary tournament, stochastic universal sampling, linear rank and truncation.
Crossover Operation: Crossover is referred as the reproduction in bio-evolution system. This operator is applied on selected chromosomes. In this operation, sequence of bits in the string is swapped between two chromosomes. There are one point, multi point, uniform and arithmetic crossover operations. Crossover is likely to create better individual. The algorithm can be converged on a good but sub-optimal solution applying selection and crossover operators
19.
Mutation Operation: Random modification within the population is required to balance genetic diversity. For each pair of parents, a crossover point is chosen randomly within the genes There are commonly six operators for mutation e.g. bit string, uniform, non- uniform flip bit, boundary and Gaussian
18.
4. RELATED WORK
In literature, a lot of work has already been done on test case generation from UML diagrams, using GA. We presented a few relevant research in this section.
Minj et al.
4introduced a path oriented approach, which produced test cases from UML state chart diagram. They eliminated the infeasible paths to reduce the time and cost of testing.
Path based optimal and feasible test cases are generated using genetic algorithm which achieves predicate coverage criteria.
Ali et al.
16used state chart diagram to generate test cases. The proposed technique will help to initiate the software testing process quickly. They used a graph based methodology called DFSM, which covers all the possible path of generated test cases.
Samuel et al.
5proposed an approach to automatically generate test cases based on UML state model. The produced test cases achieve the transition path coverage criteria. Their approach is capable of class and cluster level testing using dependency behavior of states.
Lefticaru et al.
12developed an approach to produce achievable test sequences by implementing GA in transitions considering guard cases as algebraic variables.
Yasir et al.
20established various coverage criteria for test case generation by using UML state chart diagram. They also formulate a method to calculate the coverage percentage of the testing coverage criteria. The proposed coverage criteria include all states, transitions and all transition pairs. They also used all loop free path coverage.
Shubhangi et al.
21proposed a test case generation technique using use case and state
chart diagrams. Their method is capable to detect faults like loop and synchronization. They
develop an semi-automatic testing method which help to improve quality of the software by
reducing the cost of software development.
Khurana et al.
22used UML sequence and state chart combined diagram for test case generation and optimization. Initially they integrate both the diagrams to identified maximum number of test cases and finaly optimized the identified test cases by applying G.A which will help for system testing and fault detection.
Wasiur et al.
23proposed an method to optimize and prioritize test paths generated using activity diagram. They used firefly algorithm to find the critical paths and obtained paths are optimized by applying information Flow Metric and their cyclomatic complexity. The optimized paths produced better results by reducing redundancy.
Sabharwal et al.
24presented an approach to prioritize test paths by applying Genetic Algorithm. Initially they generate the test paths by using activity and state chart diagram respectively and finally they applied IF model and Genetic Algorithm to prioritize the test paths.
5. PROPOSED WORK
The details of generating optimized test cases are discussed in this section. The approach generates a set of state based paths that covers every guard condition in the transition of state chart diagram. It traverses every state in state chart diagram and generates the set of test scenarios. Then, GA is used to optimize the test paths obtained from state chart diagram.
The steps of our proposed approach are shown in the form of a flow chart in Fig.2.Also the steps are mentioned below.
Construct State chart model of an object of specific problem domain.
Transform the constructed state chart diagram into state transition graph (STG).
Traverse the STG and identify all the paths.
Assign the Stack-Based complexity weights to the nodes of STG
Calculate cost and complexity of each path.
Apply genetic algorithm on all test paths until all path sequences are covered.
Generate the optimized test cases scenarios, from test path sequences
Fig.2: Flow chart diagram of our proposed approach
Generate state transition graph from state chart diagram
Construct UML State chart diagram according to therequirement
specification dhd
Traverse the state transition graph andidentify all thepaths
Assign the Stack-Based complexity weights tothe
nodes of STG Calculate cost and complexity of
each path
Apply genetic algorithm on all test paths until all path sequences are covered Generate the optimized testcases from testpath
sequences dhd
5.2 Construct the model of the SUT using state chart diagram
UML State chart diagram is state-transition models where states are represented by nodes and the transitions are represented by directed arrows connect the states of an object within the activity of system. It illustrates dynamic behaviour of an object by capturing the change of states in response to various events that may occur at each state of objects. State chart diagram depicts the different state views of the system by modelling the flow of control of states of object for the execution of one activity. It defines state changes of an object during its lifetime.
5.3 Convert the state chart diagram into state transition graph
We transform the state diagram into state transition graph (STG). A state transition graph STG = (V
t, E
d). V
trepresents a set of vertices (nodes) of STG and E
das a set of directed arcs (edges). In STG, nodes correspond to states and edges correspond to transitions of state chart diagram. STG contains a start node representing start state of state chart diagram and that one or more end nodes characterizing the stop/end states. The start node represents the root node of STG. Sub graph of STG is illustrated by the states of each nesting level.
5.4 Traverse the state transition graph to identify all Transition Paths (TP)
The STG is traversed to find all the information applying Depth First Search algorithm for test sequence generation which are required to generate the test paths. All the paths like P
1,P2, P3,... where each path contains sequence of states or events or transitions from start to end node in STG are generated.
5.5 Assign the Stack-Based Complexity Weights to each node of STG
Information flow metrics can be applied to the components of system design according to Information Flow (IF) model. Here, node of STG is taken as the component. The complexity weight W (N) is calculated for each node of a STG using IF model. The W of node N i.e. W (N) is computed using Equation 1.
W (N) = [INDEGREE (N) * OUTDEGREE (N)] (1)
Where INDEGREE (N) is the number of nodes calling or passing control to node N and OUTDEGREE (N) is a number of nodes called by node N. The W (N) is calculated for each node of a STG.
The complexity weight of i
thnode ( Wi) in a transition path is calculated by adding sum of
complexity weights W(i) and stack-based complexity which is given in Equation 2.
Wi W (i) Stack-Based-Weight(i) (2)
Where Stack-Based-Weight(i) is number of pop operation needed to get the i
thnode from the
stack at the time of DFS traversal of STG. It is calculated from Stack-Based Complexity using
the following equation.
Stack-Based-Weight (i)= S
max̶ K (3) Where, S
maxindicates the maximum stack size and K represents lowest number of before nodes in the stack at the time of traversal.
5.6 Calculate Costs complexity of each TP.
The complexity of each path is calculated using Equation (4).
CW
p=∑
𝑛𝑖=1𝑊i (4)
Where,CW
pis the complexity weight of transition path P.
Wi represents weight of i
thnode in a transition path of STG n is the total nodes counts in a transition path.
5.7 Apply Genetic Algorithm
Genetic Algorithm is applied on all transition paths until all TP sequences are covered.This section illustrates the application of genetic algorithm on test paths generated from STG for test case generation and optimization. The proposed algorithm is outlined as follows:
Input: State Transition Graph STG
Output: An optimized test suite that achieves 100% path coverage
Step 1. Create Initial Population i.e. Chromosomes (C
1,C
2,C
3...C
n) from decision nodes /*A chromosome or test data is represented by a single bit or multiple bits in a binary string depending upon the number of decision node (s) of the STG. For example, for four decision nodes in STG, eight bit binary string (e.g. two bits for one decision node) or four bit binary string (e.g. one bit for one decision node) can be used to form a chromosome (e.g. an individual population.) */
Step 2. Initialize the Population as Test Case Sequence
Step 3. Calculate the fitness value
The fitness value of each chromosome is calculated from the complexity of each path is which is given in Equation 5.
F(X) = CW
p(5) Where F(X) is the fitness value of chromosome X .The fitness value of each chromosome is calculated using Equation 5. CW
pis calculated by using Equation 4.
Step 4. Now calculate the
Individual probability P(X) as
P(X)=F(X)/ƩF(X)
and Cumulative Probability CP(X
k) as
CP(X
k) =∑
𝐾𝑋=1𝑃(𝑋) where X
kis K
thchromosome
Step 5. Select Best Populations from the population so that they can be mate to produce optimal solutions. The high fitness value chromosomes are then considered as the parents a. Select fittest individuals
-Bin is range of probability. Bin size is based on the relative fitness
b. check the bin with random values between 0 to 1. If the bin values fall into then select the individual population .
Step 6. Apply Cross Over Operator
Perform single point crossover pair-wise if random number R < 0.8 .
For crossover, apply fourth bit from right. Then 3rd bit from right used for crossover for the next two pairs.
Step 7. Mutate the new Chromosomes
Perform mutation using mutation probability as 0.2. Next, random number is generated. If r < 0.2, then mutate the population randomly.
Step 8. Duplicates are to be eliminated Step 9. Test for the higher fitness
If (the fitness value minimizes or maximizes or all transition paths are covered) Then
Best test path is generated Else
Go to Step 2 End
The proposed algorithm first creates initial population randomly and evaluates the fitness for each chromosome. Based on the higher fitness, crossover and mutation operators are applied to find a better chromosome. The algorithm stops when it finds the test case that covers all nodes of a specified path.
6. EXPERIMENTAL CASE STUDY
In this paper, We have taken Student Registration for Seminar as a case study which
is represented by the state chart diagram. The states considered for the seminar are
Proposed, Scheduled, Open For Enrolment, Full, and Closed to Enrolment states. An object
starts with start state represented by the closed circle, and can end up in a stop state which is
represented by the bordered circle. The state chart diagram of the total operation is shown in
Figure 3.
Fig. 3 state chart diagram of Student Registration for Seminar
The intermediate graph (STG) for Student Registration for Seminar The state chart diagram is shown in Figure 4.
Fig. 4 State Transition Graph for UML state chart Diagram of Student Registration for Seminar
The STG of Student Registration for Seminar contains 4 decision nodes e.g. 2, 3, 4 and 5. There are 15 events such as e1, e2, e3, e4……… e14, e15 which are represented by the edges and 7 states which are represented by nodes. Considering the events from start node to final node, we can find 72 no. of valid and invalid transition paths using the decision nodes. Out of 72 transition paths, we have chosen four transition paths as test paths randomly which are shown below. These test paths show test case scenarios of seminar registration operation using the changes of states of seminar object. The test paths containing the edges and the decision nodes are given in Table1.
Table 1 Generated Test Paths for the STG of Student Registration for Seminar
The details of each node weights and complexity weights are shown in Table 2 and Table
Table 2 Weight of each node in STG of Student Registration for SeminarNodes K size(max) Weight= size(max) - K
1 0 1 7-0=7
2 1 2 7-1=6
3,7 2 3 7-2=5
4,7 3 4 7-3=4
5,6,7 4 5 7-4=3
6,7 5 6 7-5=2
7 6 7 7-6=1
e12 e11
e15 e14
e1 e2
e3
e5
e7 e9 e4
e13 e8
e10 3
1
7
5 6
4 2
e6
Sl.No edges path followed Fitness value
1 e1—e14—e4—e8 1—2—3—7 37
2 e15—e14—e4—e8 1—2—7 30
3 e1—e2—e5—e13 1—2—3—4—5—7 57
4 e1—e2—e5—e8 1—2—3—4—5—6—7 64
Table 3 Node Complexity Weight of STG of Student Registration for Seminar
Node Stack based Complexity W(i ) Complexity Weight(Wi )
1 7 0 7
2 6 2 8
3 5 2 7
4 4 9 13
5 3 4 7
6 3+2=5 2 7
7 1+2+3+4+5=15 0 15
The initial population is created by using binary encoding of test path. First, we generate the initial population randomly which is shown in Table 4. Initial Population contains the chromosomes: 00010110, 01010110, 00001011, and 00001010 shown as X in Table 4.
The fitness value (F(X)) is calculated by using fitness function given in Equ. 5 and Random number ( R ) is found within 0 to 1. ‘C’ represents Crossover and ‘M’ represents Mutation operation. The population 00001010 represents the path 1-2-3-4-5-6-7 which includes the edges e1- e2-e5-e8 and with the fitness value of 64 (e.g. 7 + 8 + 7 + 13 + 7 + 7+15). For the population 00001011 represents the path 1-2-3-4-5-7 which includes the edges e1- e2-e5-e13 and the fitness value computed is 57(e.g. 7 + 8 + 7 + 13 + 7 +15). Likewise for other population the fitness value is calculated and given in Table 4.
For next generation, the chromosome with higher fitness value will be selected as the new population to mate. Then, generate new offspring are generated after applying crossover and mutation operations on the new selected chromosomes. We have used Dev -C++ IDE- 5.11 for the experimental purpose. Next, eliminate duplicate chromosomes corresponding to the test paths are done to get optimized test cases using the guard conditions used over the transitions of paths.
Table 4 Initial Population with Fitness Function Serial
no
Chromosome string (X)
Fitness value of X
Individual Population
Cumulative Population
Bin size
1 00010110 37 0.19680 0.19680 0-0.2
2 01010110 30 0.15957 0.35637 0.2-0.4
3 00001011 57 0.30319 0.65956 0.4-0.7
4 00001010 64 0.34042 1 0.7-1
Sum of F(x) 188
Table 5 New Population Selections R Bin falling under Selection operation Crossover
operation
Mutation operation
0.8310 4 00001010 00001010 00001010
0.9413 4 00001010 00001010 00001010
0.1245 1 00010110 00010001 00000001
0.5403 3 00001011 00001100 00001100
Table 6 New generation of Population Serial
no
Chromosome string (X) Fitness value of X Individual Population
Cumulative Population
Bin size
1 00001010 64 0.27234 0.27234 0-0.28
2 00001010 64 0.27234 0.54468 0.28-0.55
3 00000001 50 0.21276 0.75744 0.55-0.76
4 00001100 57 0.24255 1 0.76-1
Sum of F(x) 235
Table 7 New Population Selections
R1 Bin falling under Selection operation Crossover operation Mutation operation
0.2075 1 00001010 00001010 00001010
0.1754 1 00001010 00000101 00010101
0.3460 2 00001010 00001101 00001101
0.6843 3 00000001 00000110 00000110
Table 8 New generation of Population Serial
no
Chromosome string (X) Fitness value of X
Individual Population
Cumulative Population
Bin size
1 00001010 64 0.31840 0.31840 0-0.32
2 00010101 37 0.18407 0.50247 0.32-0.50
3 00001101 57 0.28358 0.78605 0.50-0.78
4 00000110 43 0.21393 1 0.78-1
Sum of F(x) 201
Table 9 New generation Selection
R1 Bin falling under Selection operation Crossover operation Mutation operation
0.3659 2 00010101 00011010 00011010
0.2376 1 00001010 00001010 00001010
0.6083 3 00001101 00001010 00001010
0.7946 4 00000110 00000110 00000110
Table 10 Fitness of new generation Serial
no
Chromosome string (X)
Fitness value of X
Individual Population
Cumulative Population
Bin size
1 00011010 37 0.17788 0.17788 0-0.2
2 00001010 64 0.30769 0.48557 0.2-0.5
3 00001010 64 0.30769 0.79326 0.5-0.8
4 00000110 43 0.20673 1 0.8-1
Sum of F(x) 208
Table 11 New generation Selection
R1 Bin falling under Selection operation Crossover operation Mutation operation
0.1589 1 00011010 00011010 00001010
0.3475 2 00001010 00001010 00001010
0.6890 3 00001010 00001010 00001010
0.8567 4 00000110 00000110 00000110
Table 12 Fitness of new generation Serial
no
Chromosome string (X)
Fitness value of X
Individual Population
Cumulative Population
Bin size
1 00001010 64 0.27234 0.27234 0-0.3
2 00001010 64 0.27234 0.54468 0.3-0.6
3 00001010 64 0.27234 0.81702 0.6-0.8
4 00000110 43 0.18297 1 0.8-1
Sum of F(x) 235