Integrating Artificial Intelligence
iin
Software Testing
Roni Stern and Meir Kalech, ISE department, BGU
Abstract
Abstract
Di
i
Artificial Intelligence
Diagnosis
Planning
Software Engineering
Testing
Testing
Testing
g
Planning
g
Testing is Important
• Quite a few software development paradigms
• All of them recognize the importance of testing
• All of them recognize the importance of testing
• There are several types of testing
i i bl k b (f i l) i
– E.g., unit testing, black-box (functional) testing ….
Tester
• Runs tests
Programmer
Handling a Bug
Handling a Bug
1. Bugs are reported by the tester
- “Screen A crashed on step 13 of test suite 4…”p
2. Prioritized
bugs are assigned to programmers
3. Bugs are diagnosed
- What caused the bug?g
4. Bugs are fixed
H f ll
Why is Debugging Hard?
Why is Debugging Hard?
•
The developer needs to
reproduce the bug
•
Reproducing (correctly) a bug is non-trivial
Reproducing (correctly) a bug is non trivial
– Bug reports may be inaccurate or missing
S f h h i l
– Software may have stochastic elements
• Bug occurs only once in a while
– Bugs may appear only in special cases
Introducing AI!
Introducing AI!
este
r Run a test suite
Fi d b mer
Diagnose Fi h b
Te Find a bug
rogr
am Fix the bug
P
r
Tes
te
r Run a test suite
Find a bug
AI Compute
Diagnoses
m
mer Fix the bug T Find a bug
Plan next tests
ogr
Test Diagnose and Plan (TDP)
Test, Diagnose and Plan (TDP)
r r
Te
st
er Run a test suite
Find a bug
AI Compute
Diagnoses
Pl t t t amme
r Fix the bug
Plan next tests
Pr
ogr
a
1. The tester runs a set of planned tests (test suite) 2. The tester finds a bug
3. AI generates possible diagnoses
4. If there is only one candidate – pass to programmer 5 Else AI plans new tests to prune candidates
Diagnosis
Diagnosis
Find the reason of a problem
given observed symptoms
Requirement: knowledge of the diagnosed system
Gi b
• Given by experts
Model Based Diagnosis
Model-Based Diagnosis
•
Given: a
model of how the system works
Î
I f
h t
t
Î
Infer what went wrong
•
Example:
Example:
IF ok(battery) AND ok(ignition) THEN start(engine)
Where is MBD applied?
Where is MBD applied?
i
•
Printers
(Kuhn and de Kleer 2008)
•
Vehicles
(Pernestål et. al. 2006)
•
Robotics
(Steinbauer 2009) (Steinbauer 2009)
•
Spacecraft - Livingstone
(Williams and Nayak, 1996; Bernard et al., 1998)
Software is Difficult to Model
Software is Difficult to Model
•
Need to code how the software should behave
– Specs are complicated to modelp p
•
Static code analysis
ignores runtime
P l hi
– Polymorphism
– Instrumentation
Zoltar
[Ab t l 2011’]Zoltar
[Abrue et. al. 2011’]•
Construct a model from the observations
•
Observations should include
Observations should include
execution traces
execution traces
– Functions visited during execution
Ob d b / b
– Observed outcome: bug / no bug
•
Weaker model
Ok(function1) Æ function1 outputs valid value
Î A bug entails that at least one comp was not Ok
Execution Matrix
Execution Matrix
• A key component in Zoltar is the execution matrix
• It is built from the observed execution traces
– Observation 1 (BUG) : F1ÆF5ÆF6 – Observation 2 (BUG) : F2( ) ÆF5 – Observation 3 (OK) : F2ÆF6ÆF7ÆF8 F1 F2 F3 F4 F5 F6 F7 F8 Bug F1 F2 F3 F4 F5 F6 F7 F8 Bug Obs1 1 0 0 0 1 1 0 0 1 Obs2 0 1 0 0 1 0 0 0 1 Obs3 0 1 0 0 0 1 1 1 0 Obs3 0 1 0 0 0 1 1 1 0
Diagnosis = Hitting Sets of Conflicts
Diagnosis = Hitting Sets of Conflicts
In every BUG trace at least one function is faulty
•
Observations:
Observations:
– Observation 1 (BUG) : F1ÆF5ÆF6
Ob i 2 (BUG) F2ÆF5
– Observation 2 (BUG) : F2ÆF5
•
Conflicts
:
– Ok(F1) AND Ok(F5) AND Ok(F6)
Ok(F2) AND Ok(F5) Hitting sets of Conflicts
– Ok(F2) AND Ok(F5)
•
Possible diagnoses:
{F5} ,{F1,F2},{F6,F2}Software Diagnosis with Zoltar
Software Diagnosis with Zoltar
P i iti
Bug Report
Prioritize
Diagnoses
g
Zoltar
Set of
Candidate
Diagnoses
Plan More Tests
Plan More Tests
Bug Report
AIEngine
Set of
Possible
Diagnoses
Plan More Tests
Plan More Tests
Bug Report
AIEngine
A single diagnosis A single diagnosis
Suggest New Test to
P
C
did
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Pacman Example
Execution Trace
Execution Trace
F1
F2
F3
F1
Move
F2
Stop
F3
Eat Pill
Which Diagnosis is Correct?
Which Diagnosis is Correct?
Knowledge
Knowledge
Knowledge
Knowledge
•
Most probable cause: Eat Pill
•
Most easy to check: Stop
Most easy to check: Stop
Objective
Objective
Plan
a sequence of
tests
f d h
b
d
to find the
best diagnosis
with
minimal tester actions
1 Highest Probability (HP)
1. Highest Probability (HP)
h
b
f
f
•
Compute the prob. of every function
C
i= Sum of prob. of candidates containing
p
g
C
ii•
Plan a test to
check the most probable function
0.2
F1
0.4 0.2
0.4
1 Highest Probability (HP)
1. Highest Probability (HP)
h
b
f
f
•
Compute the prob. of every function
C
i= Sum of prob. of candidates containing
p
g
C
ii•
Plan a test to
check the most probable function
0.2 F1 0.4 0.6 0 4 F5 F2 F6 0.4
2 Lowest Cost (LC)
2. Lowest Cost (LC)
d
l f
h
( )
•
Consider only functions C
iwith
0<P(C
i)<1
•
Test the closest function
from this set
F1
3 Entropy-based
3. Entropy-based
• A test α = a set of components
Ω set of candidates that assume α will pass
– Ω+= set of candidates that assume α will pass
– Ω- = set of candidates that assume α will fail – Ω?? = set of candidates that are indifferent to α
• Prob. α will pass = prob. of candidates in Ω+
• Choose test with lowest Entropy(py(Ω++ ,,Ω- ,,Ω??)) = highest information gain
0.2 F1 0.4 0.2 0 4 0.4 F5 F2 F6 0.4
4 Planning Under Uncertainty
4. Planning Under Uncertainty
• Outcome of test is unknown
Î we would like to minimize expected cost
• Formulate as Markov Decision Process (MDP)
– StatesStates: set of performed tests and their outcomes: set of performed tests and their outcomes
– Actions: Possible tests
– TransitionTransition: Pr(: Pr(ΩΩ+) Pr(), Pr(ΩΩ-) and Pr() and Pr(ΩΩ??))
– Reward (cost): Cost of performed tests
• Current solver uses myopic sampling
Preliminary
Experimental Results
Preliminary
Experimental Results
•
Synthetic code
•
Setup
Setup
– Generated random call graphs with 300 nodes
G d f h d ll h
– Generate code from the random call graphs
– Injected 10/20 random bugs
– Generate 15 random initial tests
•
Run until a diagnosis with prob >0 9 is found
•
Run until a diagnosis with prob.>0.9 is found
Preliminary Experimental Results
Preliminary Experimental Results
• HP and Entropy perform worse than (LC) and MDP
Preliminary Experimental Results
NUnits
Preliminary Experimental Results -
NUnits
•
Real code
•
Well-known testing framework for C#
Well known testing framework for C#
•
Setup
– Generated call graphs with 302 nodes from NUnits
– Injected 10,15,20,25,30 random bugs j g
– Generate 15 random initial tests
•
Run until a diagnosis with prob >0 9 is found
•
Run until a diagnosis with prob.>0.9 is found
Preliminary Experimental Results NUnits
Preliminary Experimental Results - NUnits
Conclusion
Conclusion
•
The test, diagnose and plan (TDP) paradigm:
– AI diagnoses observed bug g g with MBD algorithmg
– AI plans further tests to find correct diagnosis
E
t t
i
AI
•
Empowers tester using AI
SCM Server Logs Source Code
SCM Server Logs Source Code
i AI Engine