Imperfect Debugging in Software Reliability

(1)

Imperfect Debugging in Software Reliability

Tevfik Aktekin and Toros Caglar∗

University of New Hampshire

Peter T. Paul College of Business and Economics Department of Decision Sciences

and United Health Care∗

(2)

Outline

Outline:

What is Meant by Software Reliability and Debugging? Motivation and Uncertainties of Interest

Very Brief Overview of Software Reliability Models The Proposed Model and Its Bayesian Inference Numerical Applications of the Proposed Model Investigating The Existence of Perfect vs. Imperfect Debugging in Software Data

(3)

What is Meant by Software Reliability?

Definition

Software Reliability: Can be defined as the probability of successful performance of software for a specified time interval under certain conditions. The fact that the software runs without any problems implies that the code is generating the intended output.

A failure is said to occur on a piece of software, when a fault (or a so called bug) causes the software to fail.

Note: If a failure does not occur for a period of time, this does not necessarily imply that the software is bug free.

(4)

Debugging Stage in Software Reliability

The goal of software testing (i.e. debugging) is to detect/fix software faults (bugs) inherent in the software code and to decide when to release the software.

Software testing is expensive. Thus, there is a trade-off

between releasing a reasonably good piece of software and keeping to debug.

There are many examples of buggy software being released

and negatively affecting sales (first impression in the market in

gaming software for instance).

The testing stage consists of several consecutive program executions. Whenever a failure occurs, the software engineer attempts to fix the problem.

As the faults reveal themselves and are eliminated by the software engineer, the reliability of the software tends to

(5)

Perfect vs. Imperfect Debugging

Definition

Perfect Debugging: If during the debugging stage the fault which caused the failure has been eliminated permanently and no new

faults are introduced, then a perfect debugging is said to have

occurred (software reliability gets better).

Definition

Imperfect Debugging: Loosely speaking, if during the debugging stage a fault is detected and is not eliminated permanently (ex: by

introducing new faults), then an imperfect debugging is said to

have occurred (software reliability stays the same or worsens). Note that there are many possible definitions of imperfect debugging (dealing with multiple faults vs. a single fault or the type of software being dealt with as in gaming vs. word

(6)

What is Meant by Software Reliability?

P(T|θ) is usually referred to as a software reliability model ,

whereT represents the time to failure of a piece of software.

It represents the probability that a particular piece of software will fail in a given interval.

P(T ≥t|θ) for somet ≥0 represents the reliability function

of T.

Note that here the notion of time is based on the mission

time t, in other words the time during which the software is

(7)

Statistical Motivation and Uncertainties of Interest

Carry out inference on the unknown parameters such as the number of faults inherent in the code and fault detection rate after each stage of testing.

Obtain the (predictive) reliability function after each

debugging stage as a function of the parameters of interest.

P(Ti+1 ≥t|θ,D). (1)

Making a decision on whether to stop the testing and release the software based on the reliability assessment of the

(8)

Software Reliability Models in Literature

The following two can be considered to be the building blocks for most of the current software reliability models:

Jelinski and Moranda (JM) Model (1972)

Each fault is permanently removed upon failure (perfect debugging).

Each fault contributes equally to the failure rate at any stage of the testing.

Littlewood and Verall (LV) Model (1973)

Both assumptions from the JM Model have been relaxed by the LV Model; perfect debugging and equally likely

(9)

Proposed Model

Consider modeling of a multiplicative failure rate model whose components are evolving stochastically over testing stages (an NHPP type model).

Proposed model is based on the Jelinski Moranda (JM) model as is the case for most subsequent work in the software reliability literature.

Two of the main assumptions of the JM model is that every fault contributes equally to the failure rate at any stage of testing and that each fault is removed permanently upon failure.

Consider the case where each fault is removed permanently upon failure, along with the possibility of introducing new faults during debugging.

(10)

Proposed Model and Definitions

ti, for i = 1, . . . ,N: time until failure during (or prior to) the

ith stage of testing.

φi, for i = 1, . . . ,N: fault detection rate per fault during (or

prior to) the ith stage of testing.

λi, fori = 1, . . . ,N: the number of faults present on the

software code during (or prior to)the ith stage of testing.

λ1: the number of faults present on the software code during

(or prior to) the first stage of testing (i.e. prior o testing). Note thatλis will be functions ofλ1 andφis.

φiλi: the failure rate of the software during (or prior to) the

(11)

Proposed Model

Assume that the inter-failure times, ti, are exponentially

distributed. The likelihood would be

L(Φ,Λ;D) = N ! i=1 φiλiexp{−tiφiλi}, (2) where Φ={φ₁, . . . ,φN},Λ={λ1, . . . ,λN}and D={t1, . . . ,tN}.

Therefore the joint posterior of Φand Λwould be given by

p(Φ,Λ|D)∝

N

!

i=1

(12)

Modeling the fault detection rate per fault,

φ

i

φis be given by the following power law relationship:

φi =φβ_i₋₁×νi, fori = 1, . . . ,N, (4)

where νi ∼LN(0,σ2). We can obtain the following the linear

model in logarithms:

log(φi) =βlog(φi−1) +'i, fori = 1, . . . ,N, (5)

where'i =log(νi). (5) is a first order autoregressive process of the

latent fault detection rates per fault in the log scale. The conditional distributions of log(φi)s can be written as

log(φi)|log(φi−1),β ∼N(βlog(φi−1),σ2), fori = 1, . . . ,N, (6)

Note the scale of the inter-failure times (all above one in our

(13)

Modeling the fault detection rate per fault,

φ

i

The relationship implied by (4) also dictates the type of debugging

that occurs during the ith debugging stage.

If φi <φi−1, then perfect debugging is said to have occurred.

If φi ≥φi−1, then imperfect debugging is said to have

occurred.

In other words, when a failure is detected at the (i−1)th failure

epoch, a fault has been detected and repaired, however a new fault

was introduced during the same debugging stage. β determines on

average how the fault detection rate per fault is changing from stage to stage. For instance, when 0<φi−1 <1 and β >1 then

perfect debugging tends to occur, conversely when 0<β <1 then

(14)

Modeling the total number of faults,

λ

i

Conditional on whether perfect or imperfect debugging has

occurred during the previous debugging stage, the total number of

faults left in the software code,λi, is assumed to have the following

structure

λi =λi−1−γi,fori = 1, . . . ,N (7)

where

γi = 1, with probability p(φi <φi−1)

= 0, with probability p(φi ≥φi−1) fori = 1, . . . ,N.

In (7), γi is a Bernoulli process whose probability of success is the

probability of perfect debugging, p(φi <φi−1). When perfect

debugging occurs, λi goes down by one unit, since the fault that

has caused the failure has been found and fixed. When imperfect

(15)

Other Model Priors

For the model on φis,

σ2∼Gamma(a,b) (8)

and

β∼U(c,d) (9)

and for the initial fault detection rate, φ1, we assume the following

φ1 ∼LN(e,f) (10)

For the initial number of inherent faults, λ1, we assume the

following

λ1 ∼Poisson(θ) (11)

with

(16)

Markov Chain Monte Carlo Estimation

Goal: To generate samples from

p(Φ,λ1,θ,β,σ2|D)∝

N

!

i=1

φiXiexp{−tiφiλi}p(Φ|β,σ2)p(λ1|θ)p(β)p(σ2)p(θ). (13)

In (13), the conditional joint prior distribution for φis using

the chain rule and dropping terms that are independent can be obtained as

(17)

Markov Chain Monte Carlo Estimation

To generate the full conditionals

p(φi|. . . ,D) fori = 1, . . . ,N: Use Metropolis-Hastings.

p(λ₁|. . . ,D): Discrete. In addition, one can computeλj as

λj−1−1(φj <φj−1) forj = 2, . . . ,N once we have the

required samples.

p(θ|. . . ,D): Gamma

p(β|. . . ,D): Normal

(18)

Markov Chain Monte Carlo Estimation

To generate samples fromp(Φ,λ1,θ,β,σ2|DN)

1 Assume the starting points (λ(0)₁ ,φ(0)₂ ,θ(0)_,₍_σ2₎(0)_,_β(0)_{) and set l=1.}

2 Generateφ(₁l)usingλ₁(l−1),φ(₂l),β(l−1)_{and (}_σ2₎(l−1) _{from (}_φ_1|_{. . . ,}_D_).

3 Sequentially generateφ(_il)fori= 2, . . . ,N usingλ₁(l−1),β(l−1)_,(_σ2₎(l−1)_and

φ(_i₋l)₁

from (φi|. . . ,D).

4 Generateβ(l)_{using (}_σ2₎(l−1)_and_φ(l)

i fori= 1, . . . ,Nfrom (β|. . . ,D). 5 Generate (σ2₎(l)_using_β(l)_and_φ(l)

i fori= 1, . . . ,Nfrom (σ2|. . . ,D). 6 Generateλ(₁l)usingφ(_il)fori= 1, . . . ,Nandθ(l−1)from (X1|. . . ,D.

7 Sequentially computeλ(_il)fori= 2, . . . ,Nusingλ_i(₋l)₁andφ(_il)fori= 1, . . . ,N viaλi=λi−1−1(φi<φi−1).

8 Generateθ(l)_using_λ(l)

1 from (θ|. . . ,DN). 9 Set l=l+1 and go back to step 1.

(19)

Numerical Example

Dataset 1: The numerical application of our model is carried out on the well known dataset first reported in JM 1972. The dataset consists of 31 software inter-failure times, 26 of which were obtained during the production stage of debugging and the remaining 5 during the rest of the testing stage. In our example, all 31 inter-failure times were used (most inference is based on this one).

Dataset 2: The military systems application data (data 17) of

John Musa of Bell Telephone Laboratories1 with 38

inter-failure times. (only used for comparison purposes). Model Comparison: Use the harmonic mean estimator of the marginal likelihood and the DIC.

Models to Compare Against: MS88-M1, MS88-M2,

KY96-GOS-W, KY96-RVS-P and a simple perfect debugging (PD) model.

(20)

Numerical Example

ID PD MS88-M1 MS88-M2 KY96-GOS-W KY96-RVS-P log{p(D)} -108.55 -191.06 -116.51 -113.81 -114.01 -111.77 DIC 215.49 218.09 227.76 226.24 223.26 222.90

Table: log{p(D)} and DIC for the JM dataset

ID PD MS88-M1 MS88-M2 KY96-GOS-W KY96-RVS-P log{p(D)} -50.41 -50.49 -57.87 -53.98.81 -53.36 -55.03 DIC 101.95 100.58 114.52 107.89 105.45 109.59

(21)

Summary of Findings

1 3 5 7 9 11 1315 17 1921 23 25 2729 31 0.00 0.01 0.02 0.03 0.04 0.05 Debugging Stages φi 0 5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Debugging Stages Probability of P erf ect Deb ugging

Figure: Boxplots ofφi fori= 1, . . . ,31 (left) and the probability of perfect debugging vs. debugging stages (right)

(22)

Summary of Findings

0.80 0.85 0.90 0.95 1.00 1.05 0 5 10 15 β Density 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0 20 40 60 80 100 120 φ1 Density 10 20 30 40 0.00 0.02 0.04 0.06 0.08 θ Density X1 Density 10 12 14 16 18 20 22 0.00 0.05 0.10 0.15 0.20

(23)

Summary of Findings

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 5 10 15 20 Debugging Stages Xi

(24)

Summary of Findings

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Debugging Stages Failure Rate

(25)

Predictive reliability function estimation

The predictive reliability function during the ith testing stage

(given that the software has gone through (i −1)th stages of

testing) can be computed via

R(ti|D(i−1)) = 1− 1 S S " j=1 F(ti|φ(ij),λ (j) i ,D( i−1)₎_. ₍₁₅₎

Once (15) is estimated it can easily be used as part of a software reliability optimal release scheme.

0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Predictive Reliability Function After 28 Stages of Testing

t R(t|D) 0 10 20 30 40 50 0.4 0.5 0.6 0.7 0.8 0.9 1.0

t R(t|D) 0 10 20 30 40 50 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

t

(26)

Concluding Remarks and Future Work

Introduce a Markov chain type of structure on the number of bugs repaired or introduced during the debugging stage instead of the current Bernoulli setup.

Investigate the possibility of state space evolution of the β

coefficient in the power law relationship between the

inter-failure times.

Note: Both extensions would be challenging from an MCMC estimation point of view.