Software Reliability Engineering
1 Questions & Answers – Chapter 10
Software Reliability Prediction, Allocation and Demonstration Testing
1. Homework: How to derive the formula of failure rate estimate.
𝜆
′=
𝜒
𝛼,2𝑟+222𝑡
When the failure times follow an exponential distribution, the number of failures in a fixed interval of time follows a Poisson distribution with associated parameter λT. Poisson process is applicable when the probability that en even occurs is proportional to the length of time interval.
The probability of exactly n failures in the time interval (0,t] is given by Poisson probability mass function (in case of discrete random variables and constant failure intensity λ):
𝑝𝑚𝑓{𝑛, 𝑡} =𝜆𝑡𝑛𝑒−𝜆𝑡 𝑛!
The above formula can be used to obtain the upper bound of failure rate λ by solving the following equation: 1 − 𝐶𝐿 = ∑𝜆𝑡 𝑛𝑒−𝜆𝑡 𝑛! 𝑟 𝑛=0
Where r is the total number of failures, CL is the confidence level and λ is the failure rate to be estimated at confidence level of CL.
We can further manipulate the above equation in some steps to show the relationships with Chi-Squared distribution. We define:
𝑥 = 𝜆𝑡 Then 1 − 𝐶𝐿 = ∑𝑥 𝑛𝑒−𝑥 𝑛! 𝑟 𝑛=0
Using this equation, for a given confidence level CL, the upper bound of random variable X can be solved for. Pr(𝑋 < 𝑥) = 𝐶𝐿 = 1 − ∑𝑥 𝑛𝑒−𝑥 𝑛! 𝑟 𝑛=0
Software Reliability Engineering
2 This equation can be related to the Gamma Distribution. The general formulation of gamma function is defined by:
Γ(𝛼) = ∫ 𝑥∞ 𝛼−1𝑒−𝑥
0 𝑑𝑥
For a Gamma Distribution 𝑌 ≅ Γ(𝑘, 𝜆), the cumulative distribution function (cdf) is:
Pr(𝑌 < 𝑦) = 𝐹(𝑦, 𝑘, 𝜆) = 1 − ∑𝜆𝑦
𝑛𝑒−𝜆𝑦
𝑛!
𝑘−1 𝑛=0
Comparing the latest two equation, one can see that x follows the Gamma Distribution X ~ Gamma(r+1,2). In addition, 𝜒2(𝑟+1)2 is a special case of the Gamma Distribution if the random
variable follows Gamma(r+1,2). Therefore we can also say: 2X ≅ 𝜒2(𝑟+1)2 and X ≅ 𝜒2(𝑟+1)2
2
Since x = λT, we know the upper bound of the failure rate (λ) is:
𝜆 =𝜒(𝛼,2𝑟+2)
2
2𝑇
2. Explain the meaning of each column in the 1st case study.
I have explained it in the discussion slide, therefore here I will explain again in briefs. The description of each column in the case study of forecasting failure rate after period of growth is as follows.
Column Name Definition
CSCI Stands for Computer Software Configuration Items, software elements that comprise the subsystem of a software.
UTIL Utilization of each CSCI which is expressed as percentage of total CPU time for each CSCI. It is the ratio of CSCI execution time to system operating time. The total utilization of the host computer can be found by adding up the utilizations of the CSCIs. [% of Total CPU Seconds]
𝜆𝑠 Initial failure rate, with respect to system operating seconds. [failure/CPU seconds]
𝑆𝑌𝑆𝜏 The number of CPU time used in system tests [40 hours = 144,000 seconds] 𝐶𝑆𝐶𝐼𝜏 The number of CPU time that each CSCI utilized [CPU seconds]
β Parameter of basic execution time model. It is the decrement of failure intensity per occurrence of the failure [1/CPU Seconds].
𝜆(𝜏) Each CSCI’s failure intensity after one hour system test.
𝜆𝑠(𝜏) CSCI’s failure rate with respect to system operating seconds [failure/CPU
Software Reliability Engineering
3 3. What is the use of software reliability allocation?
Software reliability allocation is a process to translate overall (or global) software reliability requirement into reliability goals for each existing lower-level software elements. For example, if 10 programs are exists in a subsystem of software and the requirement of overall subsystem failure rate is one failure per 10,000 system operating hours. We can allocate this overall requirement as a proportions to each component’s forecasted failure rate. By this we know the failure rate requirement of each component of the subsystem i.e. we know how reliable a component should be to achieved the specified overall reliability requirement.
4. What data is used to do the allocation?
The main information to be used for software reliability allocation are the forecasted CSCI’s system operating time failure rate (𝜆𝑙), and the required failure rate (Λ𝑅𝐸𝑄𝐷).
5. What is the advantage of demonstration testing?
It is important to quantify and determine whether a software is ready to use by demonstrating its ability to pass the maximum allowable occurrence of failure as specified in the contractual arrangement or in the pre-determined software reliability specification. Thus, it can be considered as a tool of quality assessment for a software.
6. Which one is the best among the three methods of demonstration testing?
Three types of software demonstration tests are fixed-duration test, failure-free execution interval test and sequential test. Each type serves particular conditions in which they are recommended for.
According to MIL-HDBK-781:
A fixed-duration test plan must be selected when it is necessary to obtain an estimate of the true MTBF demonstrated by the test, as well as an accept-reject decision or when total test time and the amount of cost must be known in advance.
A sequential test plan may be selected when it is desired to accept or reject predetermined MTBF values (θ0,θ1) with predetermined risks of error (α,β) and when
uncertainty in total test time relatively unimportant. This test will save time, as compared to fixed duration test plan having similar risks and discrimination ratios, and when the true MTBF is much greater than θ0 or much less than θ1.
A failure-free execution interval test will accept software that has a failure rate much lower than λ0 more quickly than a fixed duration test.
7. If a software has been rejected by demonstration testing, what will happen next? Get through the entire cycle (again)?
System reliability programs are often mandated through contractual requirements in which the developer must comply with. Even if it is found that the software contains more failures that it is allowed by the specification, then up to the end of this demonstration testing, through the process of debugging and removing fault, the developer may have been able to focus its efforts towards certain part of the software which is the most susceptible to fault. Therefore, it is not necessary for the developer to repeat the entire cycle of software reliability engineering. Instead the
Software Reliability Engineering
4 developer can concentrate on one or several unit testing and then perform the integration and system testing to re-check for any improvement.
8. What is the difference between producer’s risk and consumer’s risk?
Producer’s risk is the risk belongs to the software developer. It is the probability of rejecting software with a (unknown) true failure rate when it is equals to the reliability goal of the software (λ0). Simply speaking, the producer has disadvantage in this situation since he reject a software
which is actually has a lower failure rate compared with what the customer wants already. It means he will be required to allocate extra resources for re-engineering of the software which is actually unneeded.
Consumer’s risk is the risk belongs to the customer. It is the probability of accepting software with a true failure rate equals to the reliability value specified by the customer (λ1).
Prior to the demonstration test, λ0 and λ1 are specified by an agreement of the developer and
customer where the reliability goal of the software should be larger than customer minimum specification (λ0< λ1).
9. How do you calculate two decision risks, which are specified as test parameter?
The two test parameter (producer’s and consumer’s risks) are not calculated by means of mathematical formulation. It is determined by agreement between software developer and its customer or it is adopted from industry’s widely applied practices which usually range within 10% to 30%.
10. In the failure-free execution interval test, what if we have a failure rate lower than λ0?
In any of three types of demonstration test, a good test plan should be able to accept with high probability a software with true failure rate that approaches λ0 moreover if it has lower failure
rate than λ0.
11. In failure-free execution interval test, how to decide the interval?
It is calculated by using standards. Here I give an example of a failure free-execution interval test which refer to MIL-HDBK-781. In a failure free execution interval test, the software is given T time units to achieve a failure-free interval of t-time units.
First, the customer specifies λ1 as 0.0001 failures/hour. The producer’s and consumer’s risks are
set at 30%. The reliability goal for the software was specified as λ0 = 0.00005 failures/hour. The
discrimination ratio (d) is calculated as 0.0001/0.00005 = 2. Now, we would like to calculate the lower test MTBF, θ1 = 1/λ1. This is the lowest acceptable level of MTBF for the software as required
by the customer.
By using the table 1 which is adopted from MIL-HDBK-781, we see that at α=0.3 (column 1) and β=0.3 (column 2). Since d=2.0, it is approximated by 1.995 (column 3). It provides λ1T = 7.008
(column 4) or T = 70.08 hours. This is the given test time given to the developer for achieving failure-free interval.
Software Reliability Engineering
5 Since t/T = 0.40 (column 6) then t = 28.032. This is the failure free interval which has to be obtained by the software.
Table 1 – Failure Free Execution Interval Test Plan – MIL-HKBD-781
α β d λ1T λ0T t/T ETT/T for λ0 ETT/T for λ1 .10 .10 2.442