Integrating Artificial Intelligence. Software Testing

(1)

Integrating Artificial Intelligence

iin

Software Testing

Roni Stern and Meir Kalech, ISE department, BGU

(2)

Abstract

Di

i

Artificial Intelligence

Diagnosis

Planning

Software Engineering

Testing

(3)

Testing

g

Planning

g

(4)

Testing is Important

• Quite a few software development paradigms

• All of them recognize the importance of testing

• There are several types of testing

i i bl k b (f i l) i

– E.g., unit testing, black-box (functional) testing ….

Tester

• Runs tests

Programmer

(5)

Handling a Bug

1. Bugs are reported by the tester

- “Screen A crashed on step 13 of test suite 4…”p

2. Prioritized

bugs are assigned to programmers

3. Bugs are diagnosed

- What caused the bug?g

4. Bugs are fixed

H f ll

(6)

Why is Debugging Hard?

• _{The developer needs to}

_{reproduce the bug}

• _{Reproducing (correctly) a bug is non-trivial}

_{Reproducing (correctly) a bug is non trivial}

– Bug reports may be inaccurate or missing

S f h h i l

– Software may have stochastic elements

• Bug occurs only once in a while

– Bugs may appear only in special cases

(7)

(8)

Introducing AI!

este

r Run a test suite

Fi d b mer

Diagnose Fi h b

Te Find a bug

rogr

am Fix the bug

P

r

Tes

te

r Run a test suite

Find a bug

AI Compute

Diagnoses

m

mer Fix the bug T Find a bug

Plan next tests

ogr

(9)

Test Diagnose and Plan (TDP)

Test, Diagnose and Plan (TDP)

r r

Te

st

er Run a test suite

Find a bug

AI Compute

Diagnoses

Pl t t t amme

r Fix the bug

Plan next tests

Pr

ogr

a

1. The tester runs a set of planned tests (test suite) 2. The tester finds a bug

3. AI generates possible diagnoses

4. If there is only one candidate – pass to programmer 5 Else AI plans new tests to prune candidates

(10)

Diagnosis

Find the reason of a problem

given observed symptoms

Requirement: knowledge of the diagnosed system

Gi b

• Given by experts

(11)

Model Based Diagnosis

Model-Based Diagnosis

• Given: a

model of how the system works

Î

I f

h t

t

Î

Infer what went wrong

• _Example:

_Example:

IF ok(battery) AND ok(ignition) THEN start(engine)

(12)

Where is MBD applied?

i

• _Printers

(Kuhn and de Kleer 2008)

• _Vehicles

(Pernestål et. al. 2006)

• _Robotics

(Steinbauer 2009) (Steinbauer 2009)

• Spacecraft - Livingstone

(Williams and Nayak, 1996; Bernard et al., 1998)

(13)

Software is Difficult to Model

• _{Need to code how the software should behave}

– Specs are complicated to modelp p

• Static code analysis

ignores runtime

P l hi

– Polymorphism

– Instrumentation

(14)

Zoltar

[Ab t l 2011’]

Zoltar

[Abrue et. al. 2011’]

• _{Construct a model from the observations}

• _{Observations should include}

_{Observations should include}

_{execution traces}

– Functions visited during execution

Ob d b / b

– Observed outcome: bug / no bug

• _{Weaker model}

Ok(function1) Æ function1 outputs valid value

Î A bug entails that at least one comp was not Ok

(15)

Execution Matrix

• A key component in Zoltar is the execution matrix

• It is built from the observed execution traces

– Observation 1 (BUG) : F1ÆF5ÆF6 – Observation 2 (BUG) : F2( ) ÆF5 – Observation 3 (OK) : F2ÆF6ÆF7ÆF8 F1 F2 F3 F4 F5 F6 F7 F8 Bug F1 F2 F3 F4 F5 F6 F7 F8 Bug Obs1 1 0 0 0 1 1 0 0 1 Obs2 0 1 0 0 1 0 0 0 1 Obs3 ₀ ₁ ₀ ₀ ₀ ₁ ₁ ₁ ₀ Obs3 ₀ ₁ ₀ ₀ ₀ ₁ ₁ ₁ ₀

(16)

Diagnosis = Hitting Sets of Conflicts

In every BUG trace at least one function is faulty

• _{Observations:}

_{Observations:}

– Observation 1 (BUG) : F1ÆF5ÆF6

Ob i 2 (BUG) F2ÆF5

– Observation 2 (BUG) : F2ÆF5

• _Conflicts

_:

– Ok(F1) AND Ok(F5) AND Ok(F6)

Ok(F2) AND Ok(F5) Hitting sets of Conflicts

– Ok(F2) AND Ok(F5)

• _{Possible diagnoses:}

{F5} ,{F1,F2},{F6,F2}

(17)

Software Diagnosis with Zoltar

P i iti

Bug Report

Prioritize

Diagnoses

g

Zoltar

Set of

Candidate

Diagnoses

(18)

Plan More Tests

Bug Report

AIEngine

Set of

Possible

Diagnoses

(19)

Plan More Tests

Bug Report

AIEngine

A single diagnosis A single diagnosis

Suggest New Test to

P

C

did

(20)

Pacman Example

(21)

Pacman Example

(22)

Pacman Example

(23)

Pacman Example

(24)

Pacman Example

(25)

Pacman Example

(26)

Pacman Example

(27)

Execution Trace

F1

F2

F3

F1

Move

F2

Stop

F3

Eat Pill

(28)

Which Diagnosis is Correct?

(29)

Knowledge

(30)

Knowledge

• _{Most probable cause: Eat Pill}

• _{Most easy to check: Stop}

_{Most easy to check: Stop}

(31)

Objective

Plan

a sequence of

tests

f d h

b

d

to find the

best diagnosis

with

minimal tester actions

(32)

1 Highest Probability (HP)

1. Highest Probability (HP)

h

b

f

• Compute the prob. of every function

C

_i

= Sum of prob. of candidates containing

p

g

C

_i_i

• Plan a test to

check the most probable function

0.2

F1

0.4 0.2

0.4

(33)

1 Highest Probability (HP)

1. Highest Probability (HP)

h

b

f

• Compute the prob. of every function

C

_i

= Sum of prob. of candidates containing

p

g

C

_i_i

• Plan a test to

check the most probable function

0.2 F1 0.4 0.6 0 4 F5 F2 F6 0.4

(34)

2 Lowest Cost (LC)

2. Lowest Cost (LC)

d

l f

h

( )

• Consider only functions C

_i

with

0<P(C

_i

)<1

• Test the closest function

from this set

F1

(35)

3 Entropy-based

3. Entropy-based

• A test α = a set of components

Ω set of candidates that assume α will pass

– Ω₊= set of candidates that assume α will pass

– Ω_- = set of candidates that assume α will fail – Ω_?_? = set of candidates that are indifferent to α

• Prob. α will pass = prob. of candidates in Ω₊

• Choose test with lowest Entropy(py(Ω₊₊ ,,Ω_- ,,Ω_?_?)) = highest information gain

0.2 F1 0.4 0.2 0 4 0.4 F5 F2 F6 0.4

(36)

4 Planning Under Uncertainty

4. Planning Under Uncertainty

• Outcome of test is unknown

Î we would like to minimize expected cost

• Formulate as Markov Decision Process (MDP)

– StatesStates: set of performed tests and their outcomes: set of performed tests and their outcomes

– Actions: Possible tests

– TransitionTransition: Pr(: Pr(ΩΩ₊) Pr(), Pr(ΩΩ_-) and Pr() and Pr(ΩΩ_?_?))

– Reward (cost): Cost of performed tests

• Current solver uses myopic sampling

(37)

Preliminary

Experimental Results

Preliminary

Experimental Results

• _{Synthetic code}

• _Setup

_Setup

– Generated random call graphs with 300 nodes

G d f h d ll h

– Generate code from the random call graphs

– Injected 10/20 random bugs

– Generate 15 random initial tests

• _{Run until a diagnosis with prob >0 9 is found}

• _{Run until a diagnosis with prob.>0.9 is found}

(38)

Preliminary Experimental Results

• HP and Entropy perform worse than (LC) and MDP

(39)

Preliminary Experimental Results

NUnits

Preliminary Experimental Results -

NUnits

• _{Real code}

• _{Well-known testing framework for C#}

_{Well known testing framework for C#}

• _Setup

– Generated call graphs with 302 nodes from NUnits

– Injected 10,15,20,25,30 random bugs j g

– Generate 15 random initial tests

• Run until a diagnosis with prob >0 9 is found

• Run until a diagnosis with prob.>0.9 is found

(40)

Preliminary Experimental Results NUnits

Preliminary Experimental Results - NUnits

(41)

Conclusion

• _{The test, diagnose and plan (TDP) paradigm:}

– AI diagnoses observed bug g g with MBD algorithmg

– AI plans further tests to find correct diagnosis

E

t t

i

AI

• _{Empowers tester using AI}

(42)

SCM Server Logs Source Code

i AI Engine