Questions & Answers Chapter 10 Software Reliability Prediction, Allocation and Demonstration Testing

(1)

Software Reliability Engineering

1 Questions & Answers – Chapter 10

Software Reliability Prediction, Allocation and Demonstration Testing

1. Homework: How to derive the formula of failure rate estimate.

𝜆

′

₌

𝜒

𝛼,2𝑟+22

2𝑡

When the failure times follow an exponential distribution, the number of failures in a fixed interval of time follows a Poisson distribution with associated parameter λT. Poisson process is applicable when the probability that en even occurs is proportional to the length of time interval.

The probability of exactly n failures in the time interval (0,t] is given by Poisson probability mass function (in case of discrete random variables and constant failure intensity λ):

𝑝𝑚𝑓{𝑛, 𝑡} =𝜆𝑡𝑛𝑒−𝜆𝑡 𝑛!

The above formula can be used to obtain the upper bound of failure rate λ by solving the following equation: 1 − 𝐶𝐿 = ∑𝜆𝑡 𝑛_𝑒−𝜆𝑡 𝑛! 𝑟 𝑛=0

Where r is the total number of failures, CL is the confidence level and λ is the failure rate to be estimated at confidence level of CL.

We can further manipulate the above equation in some steps to show the relationships with Chi-Squared distribution. We define:

𝑥 = 𝜆𝑡 Then 1 − 𝐶𝐿 = ∑𝑥 𝑛_𝑒−𝑥 𝑛! 𝑟 𝑛=0

Using this equation, for a given confidence level CL, the upper bound of random variable X can be solved for. Pr(𝑋 < 𝑥) = 𝐶𝐿 = 1 − ∑𝑥 𝑛_𝑒−𝑥 𝑛! 𝑟 𝑛=0

(2)

2 This equation can be related to the Gamma Distribution. The general formulation of gamma function is defined by:

Γ(𝛼) = ∫ 𝑥∞ 𝛼−1_𝑒−𝑥

0 𝑑𝑥

For a Gamma Distribution 𝑌 ≅ Γ(𝑘, 𝜆), the cumulative distribution function (cdf) is:

Pr(𝑌 < 𝑦) = 𝐹(𝑦, 𝑘, 𝜆) = 1 − ∑𝜆𝑦

𝑛_𝑒−𝜆𝑦

𝑛!

𝑘−1 𝑛=0

Comparing the latest two equation, one can see that x follows the Gamma Distribution X ~ Gamma(r+1,2). In addition, 𝜒2(𝑟+1)2 is a special case of the Gamma Distribution if the random

variable follows Gamma(r+1,2). Therefore we can also say: 2X ≅ 𝜒_2(𝑟+1)2 _{and X ≅}𝜒2(𝑟+1)2

2

Since x = λT, we know the upper bound of the failure rate (λ) is:

𝜆 =𝜒(𝛼,2𝑟+2)

2

2𝑇

2. Explain the meaning of each column in the 1st_{case study.}

I have explained it in the discussion slide, therefore here I will explain again in briefs. The description of each column in the case study of forecasting failure rate after period of growth is as follows.

Column Name Definition

CSCI Stands for Computer Software Configuration Items, software elements that comprise the subsystem of a software.

UTIL Utilization of each CSCI which is expressed as percentage of total CPU time for each CSCI. It is the ratio of CSCI execution time to system operating time. The total utilization of the host computer can be found by adding up the utilizations of the CSCIs. [% of Total CPU Seconds]

𝜆_𝑠 Initial failure rate, with respect to system operating seconds. [failure/CPU seconds]

𝑆𝑌𝑆_𝜏 The number of CPU time used in system tests [40 hours = 144,000 seconds] 𝐶𝑆𝐶𝐼_𝜏 The number of CPU time that each CSCI utilized [CPU seconds]

β Parameter of basic execution time model. It is the decrement of failure intensity per occurrence of the failure [1/CPU Seconds].

𝜆(𝜏) Each CSCI’s failure intensity after one hour system test.

𝜆𝑠(𝜏) CSCI’s failure rate with respect to system operating seconds [failure/CPU

(3)

3 3. What is the use of software reliability allocation?

Software reliability allocation is a process to translate overall (or global) software reliability requirement into reliability goals for each existing lower-level software elements. For example, if 10 programs are exists in a subsystem of software and the requirement of overall subsystem failure rate is one failure per 10,000 system operating hours. We can allocate this overall requirement as a proportions to each component’s forecasted failure rate. By this we know the failure rate requirement of each component of the subsystem i.e. we know how reliable a component should be to achieved the specified overall reliability requirement.

4. What data is used to do the allocation?

The main information to be used for software reliability allocation are the forecasted CSCI’s system operating time failure rate (𝜆𝑙), and the required failure rate (Λ𝑅𝐸𝑄𝐷).

5. What is the advantage of demonstration testing?

It is important to quantify and determine whether a software is ready to use by demonstrating its ability to pass the maximum allowable occurrence of failure as specified in the contractual arrangement or in the pre-determined software reliability specification. Thus, it can be considered as a tool of quality assessment for a software.

6. Which one is the best among the three methods of demonstration testing?

Three types of software demonstration tests are fixed-duration test, failure-free execution interval test and sequential test. Each type serves particular conditions in which they are recommended for.

According to MIL-HDBK-781:

 A fixed-duration test plan must be selected when it is necessary to obtain an estimate of the true MTBF demonstrated by the test, as well as an accept-reject decision or when total test time and the amount of cost must be known in advance.

 A sequential test plan may be selected when it is desired to accept or reject predetermined MTBF values (θ0,θ1) with predetermined risks of error (α,β) and when

uncertainty in total test time relatively unimportant. This test will save time, as compared to fixed duration test plan having similar risks and discrimination ratios, and when the true MTBF is much greater than θ0 or much less than θ1.

 A failure-free execution interval test will accept software that has a failure rate much lower than λ0 more quickly than a fixed duration test.

7. If a software has been rejected by demonstration testing, what will happen next? Get through the entire cycle (again)?

System reliability programs are often mandated through contractual requirements in which the developer must comply with. Even if it is found that the software contains more failures that it is allowed by the specification, then up to the end of this demonstration testing, through the process of debugging and removing fault, the developer may have been able to focus its efforts towards certain part of the software which is the most susceptible to fault. Therefore, it is not necessary for the developer to repeat the entire cycle of software reliability engineering. Instead the

(4)

4 developer can concentrate on one or several unit testing and then perform the integration and system testing to re-check for any improvement.

8. What is the difference between producer’s risk and consumer’s risk?

Producer’s risk is the risk belongs to the software developer. It is the probability of rejecting software with a (unknown) true failure rate when it is equals to the reliability goal of the software (λ0). Simply speaking, the producer has disadvantage in this situation since he reject a software

which is actually has a lower failure rate compared with what the customer wants already. It means he will be required to allocate extra resources for re-engineering of the software which is actually unneeded.

Consumer’s risk is the risk belongs to the customer. It is the probability of accepting software with a true failure rate equals to the reliability value specified by the customer (λ1).

Prior to the demonstration test, λ0 and λ1 are specified by an agreement of the developer and

customer where the reliability goal of the software should be larger than customer minimum specification (λ0< λ1).

9. How do you calculate two decision risks, which are specified as test parameter?

The two test parameter (producer’s and consumer’s risks) are not calculated by means of mathematical formulation. It is determined by agreement between software developer and its customer or it is adopted from industry’s widely applied practices which usually range within 10% to 30%.

10. In the failure-free execution interval test, what if we have a failure rate lower than λ0?

In any of three types of demonstration test, a good test plan should be able to accept with high probability a software with true failure rate that approaches λ0 moreover if it has lower failure

rate than λ0.

11. In failure-free execution interval test, how to decide the interval?

It is calculated by using standards. Here I give an example of a failure free-execution interval test which refer to MIL-HDBK-781. In a failure free execution interval test, the software is given T time units to achieve a failure-free interval of t-time units.

First, the customer specifies λ1 as 0.0001 failures/hour. The producer’s and consumer’s risks are

set at 30%. The reliability goal for the software was specified as λ0 = 0.00005 failures/hour. The

discrimination ratio (d) is calculated as 0.0001/0.00005 = 2. Now, we would like to calculate the lower test MTBF, θ1 = 1/λ1. This is the lowest acceptable level of MTBF for the software as required

by the customer.

By using the table 1 which is adopted from MIL-HDBK-781, we see that at α=0.3 (column 1) and β=0.3 (column 2). Since d=2.0, it is approximated by 1.995 (column 3). It provides λ1T = 7.008

(column 4) or T = 70.08 hours. This is the given test time given to the developer for achieving failure-free interval.

(5)

5 Since t/T = 0.40 (column 6) then t = 28.032. This is the failure free interval which has to be obtained by the software.

Table 1 – Failure Free Execution Interval Test Plan – MIL-HKBD-781

α β d λ1T λ0T t/T ETT/T for λ0 ETT/T for λ1 .10 .10 2.442

63.308

25.925 .10

.88

.43

.10 .10 2.814

38.581

13.710 .15

.84

.45

2.454

4.086

1.665 .55

.62

.63

.30 .30