University of Southern California Department of Computer Science
[email protected]
ABSTRACT
The concept of software dependability is intuitively understood but difficult to quantify into a constructive model. In this paper, we present a dependability risk model to convey that dependability is a relative economic measure of the dependability attribute risk factors. A system that is “undependable” is “risky” relative to the value (or benefit) of items at risk that is expected from that system. We also propose the questions that our dependability risk model can answer and provide a simple example.
Keywords
dependability risk insurance
1. Introduction
The concept of software dependability is intuitively understood but difficult to quantify into a constructive model. The concept is analogous to hardware dependability in which the goal is to provide a measure of assurance that a system will not fail to perform in an expected manner. Dependability extends beyond system reliability (the focus of which is on how likely a system will cease to function altogether) to an aggregate of dependability attributes [1] that include the following:
Robustness: reliability, availability, survivability,
recoverability
Protection: security, safety
Quality of Service: accuracy, fidelity, performance assurance,
maintainability
Integrity: correctness, verifiability
Availability means the readiness for correct service.
Reliability means the continuity of correct service. Survivability means that a software system can repair itself
or degrade gracefully to preserve as much critical capabilities as possible in the face of attacks and failures.
Recoverability means a software system can recover itself
from faults. Security means absence of unauthorized access to, or handling of, system state. It is the concurrent existence of a) availability for authorized users only, b) confidentiality, and c) integrity with ‘improper’ taken as meaning ‘unauthorized’. Safety means absence of unauthorized disclosure of information. Maintainability means the ability to undergo repairs and modifications.
Integrity means absence of improper system state alternations.
Integrity is a prerequisite for availability, reliability and safety.
These attributes are mostly compatible and synergetic, but it is not uncommon for there to be some conflicts and tradeoffs. Some examples of this might be within a system that makes use of distributed information to increase the survivability of its data in the event that a particular data location is destroyed. In such a system security is traded off (or complicates the dependability with respect to data security) for survivability as the data must be accessible in multiple locations thereby increasing the number of entry points for possible security breaches. Other examples are building a fail-safe system whereby safety is balanced with quality of service, or employing a graceful degradation policy where survivability now contrasts with quality of service.
The dependability attributes described above only tell part of the story. The degree to which each attribute applies is relative to the expected outcome when the system is subject to negative events (e.g. a component fails, security is violated, data is incorrect).
Our view is that dependability is a relative economic measure of the dependability attribute risk factors. A system that is “undependable” is “risky” relative to the value (or benefit) of items at risk that is expected from that system. To illustrate this, consider system faults that result from undependable software (that is, faults with respect to the dependability attributes listed previously). Software faults/defects incur economic losses over time (e.g. IUM’s, reputation, reduced sales, opportunities, etc.) with relative to the dependability attributes. For example if a sales system that people depend on to process customer sales is unavailable, there will assuredly be a measurable economic loss. Such was the case when the AT&T business sales
system became bottlenecked and the loss was in thousands of dollars per minute [2].
Ultimately for any system, a return on investment (ROI) is expected due to the continued (dependable) operation of that system. In this light, we invest in dependability as an insurance policy in which we make an initial investment (such as fault tolerance) along with continual premiums (e.g. security policies, contingency measures, backup systems) to insure against non-achievement of an acceptable ROI. This involves a complex and dynamic interplay of cost, risk, and value with respect to the dependability attributes and operational constraints and priorities. We propose a possible model to help analyze this perspective along with some potentially interesting questions and some initial examples.
2. Dependability Risk Model
Clearly value is created over the time a system operates, and a respectable return on investment (ROI) is expected due to the continued (dependable enough) operation of that system. Risks within the dependability attributes reduce this expected ROI and thus the key to a “dependable” system is to ensure the total investment for the dependable operation of the software system and the expected losses due to dependability risks do not outweigh the expected gains due to flawless operation of the system.
We are looking at risk models rather than the traditional reliability models for the following reasons:
• Risk models may be more empirically accessible than other models
• Overall dependability risk is simply the sum of dependability attribute risks, regardless of dependencies
• There is a well-established theory and practice of risk assessment and management to draw upon
• It matches well with intuitive concepts of dependability
There are many possible value-risk models that may apply to dependability risk. A particularly attractive and straightforward one we have been considering is the “insurance” model [3]. It is summarized as:
∑
∑
∑
= = =−
−
=
( ) 1 1 ) ( 1)
(
)
(
)
(
N t i i m j j t N i i R Vt
R
I
t
V
t
X
where
V
i(t)
is a random variable distributed according to the value of item i “at risk” at time t,N
V(t)
is the number of value events given totally dependable operation of the system up to timet
,I
j is the amount invested to achievethe desired level for each of m dependability attributes during the development of the system,
N
R(t)
is the number of dependability faults up to timet
, andR
i(t)
is a random variable distributed according to the potential loss (risk) of each type of system failure i up to time t. The insurance process has parameters that can be estimated with empirical distributions gathered from analogous systems, with the exception ofV
i(t)
, which can be estimated via an earned-value [4] model.3. Dependability Problems To Be Answered
by Dependability Risk Model
The dependability risk model can be used to analyze cost- benefit issues such as how dependable is dependable enough. For instance, it helps answer basic dependability questions such as:
• (How much is enough?)
o Given an investment amount and value earned over time, how low can the dependability risk be before an expected ROI is unachievable? o Can we find a minimum I (amount of
investment) that insures
X(t)
will never be negative?• (Risk of Catastrophe) Is there a high risk of an effectively infinite loss (Including low probability, high loss events)?
• (Risk of Recession) Is there a time t in which
)
(t
X
will ever be negative?• (Risk of Ruin) Is there a time t after which
X(t)
will always be negative?• (Risk of Decay) Will cumulative small losses force
)
(t
X
to zero over time?One of its applications is to map various dependability approaches to dependability attributes and benchmark them with respect to risk/value. Therefore we can compare the relative dependability attribute risks and effectiveness of dependability approaches on risk reduction. Defect and fault seeding is often considered for gathering empirical estimates of defect and fault populations.
Our goals are to develop dependability risk models based on development-time and runtime characteristics, evaluate use of dependability risk models as means of determining extent to invest in dependability (e.g. defect removal, dependability mechanisms such as fault handling or a particular architecture style) with respect to system value and risks. We hope that by implementing and continuously monitoring dependability risk models, developers will gain
insight into when and how much to invest in dependability (i.e. fault removal, tolerance mechanisms, prevention, etc.).
The heading of subsections should be in Times New Roman 12- point bold with only the initial letters capitalized. (Note: For subsections and subsubsections, a word like the or a is not capitalized unless it is the first word of the header.)
4. A Simple Example
Let us consider a painfully oversimplified example to illustrate the analytical considerations described above. For this, say that the value achieved is constant, or
V
i(t)=c
for some constant c, and that the value events occur continuously over time (this is of course grossly oversimplified for any practical software system). We will assume that the number of faults up to time t is Poisson with intensityαand thatR
i(t)
is identically exponentially distributed and independent of t with meanµ
and varianceσ
2. Further, let us ignore initial investment costs for dependability. The dependability risk model will be:∑
=−
=
( ) 1)
(
t N i i RR
ct
t
X
Let us consider a simplified “how much is enough” question. The expected “profit” for the above model will be
t
c
t
X
E[
(
)]=(
−αµ)
so clearly to achieve a desirable ROIc>αµ
. However, this does not answer the critical question of when a particular ROI will be achieved. Let us deal with the question of how tolerance of dependability risks. That is, how is there a point at which we lose so much value that we should give up on the system? Thisroughly translates into the probability
ψ(u)=P{u+X(t)<0
for somet>0}
where u is the tolerance value desired. Under these conditions (assuming
c>αµ
) it can be shown that: − − = αµ µαµ ψ c c u e c u ) (
As one would expect, larger tolerance of dependability risks means it is less likely that such tolerance will be exceeded at some point.