• No results found

Techneau, 10. December Risk Evaluation and Decision Support for Drinking Water Systems

N/A
N/A
Protected

Academic year: 2021

Share "Techneau, 10. December Risk Evaluation and Decision Support for Drinking Water Systems"

Copied!
77
0
0

Loading.... (view fulltext now)

Full text

(1)

Risk Evaluation and

Decision Support for

Drinking Water

Systems

Techneau, 10.

(2)

© 2010 TECHNEAU

TECHNEAU is an Integrated Project Funded by the European Commission under the Sixth Framework Programme, Sustainable Development, Global Change and Ecosystems Thematic Priority Area

(contractnumber 018320). All rights reserved. No part of this book may be reproduced, stored in a database or retrieval system, or published, in any form or in any way, electronically, mechanically, by print, photoprint, microfilm or any other means without prior written permission from the publisher

TECHNEAU

Risk Evaluation and Decision Support

for Drinking Water Systems

(3)

This report is:

PU = Public

Colofon

Title

Risk Evaluation and Decision Support for Drinking Water Systems

Authors

Andreas Lindhe1, Lars Rosén1 and Per Hokstad2 1 Chalmers University of Technology

2 SINTEF

Quality Assurance

By Thomas Pettersson, Chalmers University of Technology

Deliverables number

(4)
(5)

Summary

The vital importance of a reliable and safe drinking water supply makes efficient risk management necessary for water utilities. Risks must be

assessed and possible risk-reduction measures evaluated to provide relevant decision support. The World Health Organization (WHO) emphasises the use of an integrated approach where the entire drinking water system, from source to tap, is considered when assessing and managing risks.

This report provides a background to risk evaluation and decision support for managing risks in water utilities. A special focus is put on cost optimisation, and methods for cost-benefit analysis (CBA), cost-effectiveness analysis (CEA) and multi-criteria decision analysis (MCDA) of risk reduction alternatives are presented.

A dynamic fault tree method is presented that enables quantitative, integrated risk assessment of drinking water systems. It is shown how the method can be used to evaluate uncertainties and provide information on risk levels, failure probabilities, failure rates and downtimes of the entire system and its subsystems. The fault tree method identifies where risk-reduction measures are needed most and different risk-reduction alternatives can be modelled, evaluated and compared. The method is combined with economic analysis to identify the most cost-effective risk-reduction alternative.

Integrated risk assessments of drinking water systems are commonly performed using risk ranking, where the probability and consequence of undesired events are assessed using discretised scales. There is, however, no common, structured way of using risk ranking to prioritise risk-reduction measures. Two alternative models for risk-based, multi-criteria decision analysis (MCDA) for evaluating and comparing risk-reduction measures have therefore been developed. The MCDA models are based on risk ranking, they can consider uncertainty in estimates and include criteria related to, for example, different risk types and economic aspects.

This report provides methods for integrated risk assessment that make it possible to evaluate risks and prioritise risk-reduction measures in an efficient way. This study also provides good examples of applications of these

methods in Gothenburg, Sweden, Bergen, Norway and Březnice, Czech Republic.

Based on the practical applications of these methods, it is concluded that the methods provide relevant decision support for efficient risk management in water utilities.

(6)
(7)

Contents

Summary v 

Contents vii 

1  Introduction 1 

1.1  Objective 1 

1.2  The TECHNEAU Risk Management Framework 2 

1.3  Notation 3 

1.4  Abbreviations 4 

2  Risk evaluation 5 

2.1  Introduction 5 

2.2  Risk measures 5 

2.2.1  Qualitative measures of risk – risk matrix and risk ranking 5 

2.2.2  Dimensions of risk 7 

2.3  Risk acceptance 7 

2.3.1  Risk Acceptance criteria (RAC) 7 

2.3.2  The “two limit approach” to risk acceptance: ALARP 8 

2.3.3  Target values 10 

2.4  The decision-making process 10 

2.4.1  Principles and criteria for risk evaluation 10 

2.4.2  How to establish a RAC 11 

2.4.3  Normative issues 12 

3  Decision Support Methods developed in TECHNEAU 15 

3.1  A generic framework for decision support 15 

3.2  The dynamic fault tree method (DFT) 17 

3.2.1  Method development 17 

3.2.2  Failure types and conceptual model 18 

3.2.3  Logic gates and dynamic calculations 19 

3.2.4  Generic fault tree structure 21 

3.2.5  Risk and measure of risk 22 

3.2.6  Input data and uncertainties 22 

3.2.7  Case study Göteborg 23 

3.2.8  Key aspects when applying the fault tree method 29 

3.3  Quantitative risk assessment and economic analysis using DFT 31 

3.3.1  Modelling risk reduction 31 

3.3.2  Economic analysis of risk-reduction measures 31 

3.3.3  Case study Göteborg 33 

3.3.4  Key aspects when modelling and evaluating risk reduction 39 

3.4  Multi-criteria decision models 40 

3.4.1  General approach 40 

3.4.2  The discrete model 42 

(8)

3.4.4  Performance score and matrix 44 

3.4.5  Case study Bergen 45 

3.4.6  Case study Březnice 49 

3.4.7  Key aspects when applying the MCDA models 53 

4  Dicussion and Conclusions 55 

4.1  Introduction 55 

4.2  Advantages and limitations of the methods developed 56 

4.2.1  The dynamic fault tree (DFT) method 56 

4.2.2  The MCDA models 57 

4.3  Communication and organisation 57 

4.4  Conclusions 58 

5  References 61 

Appendix 1: Equations for logic gates in dynamic fault tree analysis 65 

(9)

1

Introduction

1.1 Objective

The main objective of Work Area 4 (WA4—Risk Assessment and Risk

Management) in TECHNEAU is to integrate risk assessments of the separate parts into a comprehensive decision support framework for cost-efficient risk management in safe and sustainable drinking water supply. Specific goals of WA4 are to provide tools and guiding documents to support the water utilities in their risk assessment and risk management work. A schematic illustration of the framework, guides and tools that is to be produced in WA4 is presented in Figure 1.1.

Figure 1.1. A schematic illustration of the framework, guides and tools that will be produced in WA4.

The report “Generic Framework and Methods for Risk Management in Water Safety Plans” (Rosén et al., 2007) forms the basis for WA4. It describes risk management on a general level and also provides an overview of risk analyses methods for water utilities.

The report “Methods for risk analysis of drinking water systems from source to tap” (Hokstad et al., 2009) provides a guide on the use of risk analysis for drinking water systems. Lindhe et al. (2010c) present summarises the application of developed risk assessment methods in case studies.

The report “Decision support for risk management in drinking water supply” (Rosén et al., 2010b) constitutes a literature review providing the background

(10)

to several topics relevant for decision-making and risk management of drinking water supplies.

The present report treats risk evaluation and decision support methods for identifying the best alternative for risk reduction and control. A few case studies are also included.

A crucial part of risk evaluation is for stakeholders to define limits for acceptable/tolerable risk. When a risk analysis has been carried out, the estimated risk should be evaluated and compared to the risk acceptance criteria (RAC) to decide whether the risk is tolerable. Further, risk evaluation includes the process of identification of risk-reduction measures and

controlling risk during operation.

The objective of this report is to provide a guide on these tasks. The main target group is management and personnel of water utilities with some basic knowledge/competence of risk management.

1.2 The TECHNEAU Risk Management Framework

The TECHNEAU framework for integrated risk management is presented in Figure 1.2 (Rosén et al., 2007). The framework includes the following main components:  Risk Analysis  Risk Evaluation  Risk Reduction/Control Risk Analysis Define scope Identify hazards Estimate risks Qualitative Quantitative Risk Evaluation

Define tolerability criteria

Water quality Water quantity Analyse risk-reduction options Ranking Cost-efficiency Cost-benefit Risk Reduction/ Control Make decisions Treat risks Monitor Acquire new information Update Analyse sensitivity Develop supporting programmes Document and assure quality Report and communicate Review, approve and audit Risk Assessment Risk Management

Figure 1.2. The main components of the TECHNEAU generic framework for integrated risk management in WSP (after Rosén et al., 2007).

(11)

Various activities required for carrying out a risk analysis, risk evaluation and risk reduction/control are indicated in the rightmost box of Figure 1.2.

The risk evaluation requires that a risk acceptance/tolerability criterion is defined (by the water utility). The estimated risk is then compared with this acceptance criterion in order to decide whether the risk is acceptable

(tolerable) or not. Furthermore, various risk-reduction measures are considered to evaluate their cost-effectiveness, and to prioritise amongst various alternatives. Principles for this decision process, including normative issues, are the topic of the present report.

1.3 Notation

The following notation and definitions of terms are applied in the TECHNEAU project:

Hazard is a source of potential harm or a situation with a potential of harm.

Hazardous agent is for example a biological, chemical, physical or radiological agent that has the potential to cause harm.

Hazardous event is an event which can cause harm.

Hazard identification is the process of recognizing that a hazard exists and defining its characteristics.

Risk is a combination of the frequency, or probability, of occurrence and the consequence of a specified hazardous event. The total risk is given by aggregating the risk of the various hazardous events.

Risk analysis is the systematic use of available information to identify hazards and to estimate the risk to individuals or populations, property or the environment.

Risk estimation is the process used to produce a measure of the level of risk being analysed. Risk estimation consists of the following steps; frequency analysis, consequence analysis, and their integration.

Risk evaluation is the process in which judgements are made on the tolerability of the risk on the basis of risk analysis and taking into account factors such as socio-economic and environmental aspects.

Risk assessment is the overall process of risk analysis and risk evaluation.

Risk management is the systematic application of management policies, procedures and practices to the tasks of analysing, evaluating and controlling risk.

Risk measure is a quantitative measure of a specified risk, i.e. a quantified measure of the combination of probabilities/frequencies and consequences of a hazardous event.

Risk-reduction measure is a preventing/detecting/controlling/

mitigating measure which has the effect of reducing (or eliminating) the probability and/or the consequences of an hazardous event, i.e. reducing the risk.

(12)

Risk reduction is the process where decisions are made regarding risk reducing measures; what needs to be done, by whom, when and at what cost.

1.4 Abbreviations

ALARP As Low As Reasonable Practicable

CBA Cost-Benefit Analysis

CCP Critical Control Points

CEA Cost-Effectiveness Analysis CER Cost-Effectiveness Ratio CML Customer Minutes Lost

CRA Coarse Risk Analysis DFT Dynamic Fault Tree

HACCP Hazard Analysis and Critical Control Points HAZID Hazard Identification

MCDA Multi-Criteria Decision Analysis RAC Risk Acceptance Criteria

RPN Risk Priority Number

SSM Substandard Supply Minutes

THDB TECHNEAU Hazard Database WHO World Health Organization WSP Water Safety Plan

(13)

2

Risk evaluation

2.1 Introduction

When the risk analysis of a drinking water system has been carried out, a risk evaluation is performed to decide whether the identified risks are

“acceptable” (tolerable) and what to do if there are unacceptable risks. The risks that are identified during a risk analysis can be measured (quantified) in various ways, and the risk acceptance criteria (RAC) will usually be based on the chosen risk measures.

If the risk is found to be acceptable, it may be sufficient to control the risk rather than reducing it. However, if the risk is found to be unacceptable, various risk-reduction measures/options have to be analysed and compared to identify the best alternative. Thus, both the tolerability of the various risks and the costs, as well as additional criteria, of the risk reducing options will be considered during the risk evaluation.

2.2 Risk measures

Due to the different aspects of risk that exist there are many ways to measure risk. However, the measures are usually based on the unwanted

consequences and/or their probability. Examples on application of risk measures in risk evaluation are given in chapters 3.2.5 and 3.4.1.

2.2.1 Qualitative measures of risk – risk matrix and risk ranking

In a Coarse Risk analysis (CRA) (Hokstad et al., 2009) we analyse various hazardous events, and the risk is given by the likelihood

(probability/frequency), P, and the consequence, C, of the event.

It is a very common approach—if we do not have sufficient resources or information to carry out a full quantitative analysis of P and C—to apply a classification of risks. Then probabilities and consequences are divided into categories. For the likelihood we use categories such as ‘rare’ and ‘frequent’, and various consequences could be categorised e.g. as ‘small’, ‘medium’ and ‘catastrophic’. These categories may represent a simple ranking of likelihood and consequences, or the categories can be properly defined. For instance, the probability category, ‘rare’ could be defined as meaning ‘less than once a month’. Similarly, the consequence category ‘small’ with respect to health effects could be defined as ‘at most 10 consumers with minor health effects’. This set (P, C), will be inserted in a risk matrix, see Figure 2.1. In this example there are four probability categories, ranging from P1 (rare) to P4 (very frequent), and four consequence categories, ranging from C1 (low) to C4 (catastrophic). Seven categories of risk are introduced, ranging from 1 (“very low”) to 7 (“very high”). Such a risk matrix is a common way to present risk.

(14)

In particular it is used to present the risk of the hazardous events identified in a CRA (Hokstad et al., 2009).

C1 C2 C3 C4

P4

4 5 6 7

P3

3 4 5 6

P2

2 3 4 5

P1

1 2 3 4

Figure 2.1. Risk matrix with four categories of probability (P) and of consequence (C).

Another example of a risk matrix—adapted from (Hokstad et al., 2009)—is illustrated in Figure 2.2. Again the two axes represent likelihood and severity of consequences, and definitions of the five likelihood and consequence categories are given. There are also defined four risk categories, L, M, H and E; ranging from L (low risk) to E (extreme risk).

Severity of consequences

Likelihood Insignificant Minor Moderate Major Catastrophic

Almost certain M H H E E

Likely M M H E E

Moderate L M H H E

Unlikely L L M H H

Rare L L M H H

Note: The number of categories should reflect the need of the study. E – Extreme risk, immediate action required;

H – High risk, management attention needed;

M – Moderate risk, management responsibility must be specified; L – Low risk, management by routine procedures.

Examples of definitions of likelihood and severity categories that can be used in risk scoring Item Definition

Likelihood categories

Almost certain Once a day

Likely Once per week

Moderate Once per month

Unlikely Once per year

Rare Once every 5 years

Severity categories

Catastrophic Mortality expected from consuming water

Major Morbidity expected from consuming water

Moderate Major aesthetic impact possibly resulting in use of alternative but unsafe water sources Minor Minor aesthetic impact causing dissatisfaction but not likely to lead to use of alternative less safe sources

Insignificant No detectable impact

Figure 2.2. Example of a risk matrix and definitions of likelihood and severity categories to be used in risk scoring in WSP (adapted from Davison et al., 2005). Four classes of risk are shown.

(15)

This qualitative approach will provide a ranking of hazardous events, according to the corresponding risk category.

Ranking can also be carried out based on the results of, for example, a Fault Tree Analysis (FTA). The various contributing causes (“basic events”) can be ranked according to their contribution to the “top event” (unwanted event) of the fault tree (Lindhe et al., 2009).

2.2.2 Dimensions of risk

The risk will have various dimensions (aspects), and for a water supply system there are two main dimensions:

 loss of quality (giving health effects), and

 loss of quantity (resulting in water unavailability/supply interruptions), If risk is measured by the use of a risk matrix, there should be one risk matrix for each dimension. In general, there are several ways to quantify the various dimensions of risk (see Chapter 4, Hokstad et al., 2009).

A couple of examples related to loss of quality are

Probability (mean number) of consumers receiving infected water during a year.

Probability (mean number) of consumers being infected by drinking water during a year.

DALY = Disability Adjusted Life Years, (cf. Appendix B of Hokstad et al., 2010).

A couple of examples related to loss of quantity are

Customer Minutes Lost (CML); the average number of minutes that drinking water is not delivered to an average consumer.

Mean number of consumers affected by shortage of drinking water during a year.

2.3 Risk acceptance

The main purpose of the risk evaluations is to decide whether or not the identified risks are acceptable (tolerable) or not. The risks are therefore

compared to predefined risk acceptance criteria (RAC). A detailed description of risk tolerability in the context of drinking water is presented by Rosén et al. (2010b). In this section some of the most important concepts of RAC are presented.

2.3.1 Risk Acceptance criteria (RAC)

The purpose of using RAC is to support the decision makers, and we note that RAC can be applied at different “levels”.

(16)

Various RAC at a "lower level” could be related to risk tolerability regarding the use of specific equipment or processes. For instance, we could have a RAC giving an upper limit for the frequency of raw water contamination. Similarly there could be acceptance criteria related to the probability of failure of safety functions or treatment systems, or probability hazardous events of the utility. However, the risks of hazardous events can also be merged in order to give a measure of the overall total risk of the drinking water system. Then we may define a “top level” RAC for the overall risk. Such an RAC can be given as an upper limits for a risk measure like Mean number of consumers being infected by drinking water during a year or DALY = Disability Adjusted Life Year. In order to calculate such overall measures of safety (to check RAC) will require detailed analyses and they are often hard to estimate.

The rationale behind using RAC in the decision process of risk evaluation could be to improve:

1. Risk control:

Use of RAC should help to evaluate and control the undesired

consequences of the planned activity to a level that is acceptable to the affected parties.

2. Efficiency of the decision process:

Use of RAC should be an efficient way to structure the tasks of the decision process. Even if the RAC is tailored to the specific situation, it may not be necessary to repeat all arguments every time a risk is evaluated.

The use of RAC should contribute to more focus and involvement regarding safety issues for the affected parties. Unfortunately the use of RAC could also lead to somewhat "automatic" decisions. So a possible problem with the use of RAC is that setting a target does not give drive for improvements beyond this level. That is, the creative process of finding even better solutions and measures is in practice limited to meeting the criteria. If that is the case, the use of RAC does not play an active role in the risk management process. This has caused some authors to advice against the use of RAC (Aven and

Vinnem, 2005).

However, applied in a proper way, the use of RAC in combination with other incentives would often prove useful for the decision process. This should be the case if use of RAC is combined with involvement and a drive for risk reduction.

2.3.2 The “two limit approach” to risk acceptance: ALARP

When the risks related to a drinking water system (or a subsystem of it) are evaluated, we can apply the ALARP (As Low As Reasonably Practicable) principle to decide on risk acceptance, see Figure 2.3. The ALARP principle applies two acceptance limits. An upper acceptance limit is specified,

(17)

is above this limit, (see red area in figure), and in this case the risk must be reduced or eliminated.

But a lower limit is also specified. Risks below this limit are considered acceptable and do not need to be further investigated (see green area).

However, risks in between these two limits, in the so-called “ALARP region”, (see yellow area), should be investigated further and be reduced “as far as reasonably practicable”. This means that risk reducing measures should be investigated and their cost-effectiveness be evaluated. Unless a risk reducing measures is unreasonably expensive relative to its effect on the risk, it should be implemented. Thus, a systematic discussion to reduce risk should be carried out for any risk in the ALARP region.

Acceptable Risk ALARP Region

The risk can be accepted if it is economically and technically unreasonable to reduce it Unacceptable Risk

The risk cannot be accepted under any circumstances

Figure 2.3. The ALARP (As Low As Reasonably Practicable) Principle (Melchers, 2001).

The borders (limits) between the three regions (red, yellow green) should be decided in a process prior to the actual risk analysis; cf. the decision-making process to define RAC, as discussed in the next section.

Note that if a risk matrix is used to measure the risk, (Figure 2.2 and Figure 2.1), we can define three areas in the matrix, green, yellow and red; cf. Figure 2.4, and this can then represent an application of the ALARP principle. A risk falling in the red region is absolutely unacceptable, and a risk in the yellow region, must be investigated further to identify possible risk reducing measures of “reasonable cost”.

Severity of consequences

Likelihood Insignificant Minor Moderate Major Catastrophic Almost certain

Likely Moderately likely Unlikely Rare

(18)

A principle, closely related to ALARP, and with the same meaning, is ALARA (As Low As Reasonably Achievable), see Davidsson et al. (2002).

2.3.3 Target values

The water Safety Plan refers to health-based targets and there are often politically established safety targets that can be regarded as acceptable levels of risk. For instance the City of Göteborg defined a safety target as: the duration of interruption in delivery to the average consumer shall, irrespective of the reason, be less than a total of 10 days in 100 years (Göteborg Vatten, 2006). Whether the risk is unacceptable or not was given by the probability of exceeding the target value. Similarly, Appendix 2 presents safety targets applied by the City of Trondheim (Norway).

2.4 The decision-making process

The discussion on use of RAC cannot be separated from the relevant decision process, involving various stakeholders. Note that there are some principles that can assist in the formulation of suitable RAC; discussed below.

Different stakeholders are in various ways and to a different degree involved in the risk management process. Note that stakeholders exposed to the risks are not always those benefiting from the risk generating activities. For example, industries in a catchment area of a water supply may benefit from their production, but they will also contribute to water safety risks to

consumers, which do not benefit from the industrial activities. This is one of the normative issues in the risk evaluation decision process, cf. section 2.4.3.

2.4.1 Principles and criteria for risk evaluation

Due to the multi-dimensional character of the decision-making for risk issues, it is of primary importance that the evaluation of risks and the decision-making are made with respect to criteria and principles that are agreed upon among the affected stakeholders. There are different principles for evaluation of risks, and these principles form the basis for defining risk tolerability, and they should be openly communicated and accepted by the involved

stakeholders.

Davidsson et al. (2002) present the following four general approaches that can be used when evaluating risk:

- Principle of reasonableness – If it is reasonable with respect to

economical and technical means, the risk shall be reduced regardless the level of risk.

- Principle of proportionality – The overall risk resulting from an activity should not be unreasonably large compared to the benefits.

- Principle of allocation – The allocation of risk in society should be reasonable/fair compared to how the benefits are allocated.

(19)

- Principle of avoidance of disasters – Risks with disastrous consequences should be avoided so that the consequences can be managed with accessible resources.

Actual RAC should be formulated taking these principles into account. Renn ( 2008) mentioned that technical analyses of risk have drawn much criticism from the social science. One reason to this is that the technical analyses not are considered to include people’s perception of risk and social constructions. Klinke and Renn (2002) present nine criteria to be used for evaluating risk. These criteria are meant to include more than just the extent of damage and probability of occurrence when evaluating risks. The nine criteria are: - Extent of damage - Probability of occurrence - Incertitude - Ubiquity - Persistency - Reversibility - Delay effect - Violation of equity - Potential of mobilization

2.4.2 How to establish a RAC

In the outset it may not be obvious which risks are tolerable, and the decision process can benefit from a documented line of arguments, e.g. by

comparisons to existing risk levels. This could promote consistency in various decisions.

Various principles exist to decide on the actual limit between "acceptable” and “non-acceptable” risk. The following are two general principles to assist in attaining a numerical limit for acceptance.

1. "The Comparison criteria” (e.g. NORSOK Z-13 (NORSOK, 2001)) is essentially the same as the French GAMAB (“Globalement Au Moins Aussi Bon“) principle. This is primarily used when non-standard solutions, e.g. new technology, are to be implemented. Then the acceptance will require that the solution shall give at least as low risk as the presently accepted

practice/solution. In general the Comparison Criteria seem the most helpful approach by modifications of systems, e.g. by introduction of new

technology, and when new utilities shall be built.

2. "The Additional risk" criteria, which can be seen as a version of the (German) MEM (Minimum Endogenous Mortality) principle. Roughly speaking, this principle starts from an existing “basic risk”. Then a new activity shall not significantly increase this. By specifying such an underlying

(20)

basic risk, we are assisted in also specifying a RAC. We can require that the increase in risk due to an (increase in a specific) activity shall be less than a certain percentage of the “basic risk”.

In general, the following are useful input, when the actual limit of a RAC shall be specified:

- Historical risk data and acceptability of risk in similar activities; (i.e. we utilise accumulated knowledge)

- Assessment of perceived risk of stakeholders,

- Willingness to accept the risks by involved parties, see below. The risk tolerability levels must be defined taking people’s perception and aversion of risks into consideration. The public perception has for example been found to have an important affect on the priorities and legislative agendas of regulatory bodies (Slovic, 2001). Examples on factors affecting peoples risk aversion are:

- Catastrophic potential - Familiarity - Uncertainty - Individual or societal - Controllability - Voluntariness

Finally, when a RAC shall be decided, one should have the ambition to achieve continuous risk reduction, and one must be aware of the various ethical challenges, see Section 2.4.3.

2.4.3 Normative issues

The risk evaluation has obvious ethical aspects, and the decision process will benefit from including normative discussions, (e.g. regarding the choice of the RAC): Which risks can we actually tolerate?

The following are main normative issues in the decision process of a water utility, cf. Hokstad et al. (2009):

- Which dimensions (aspects) of risk shall be evaluated? Shall decision makers restrict to consider water quality and water quantity? Should special/additional attention be given e.g. to major accidents or environmental issues?

- What are the preferences and trade-offs between the various dimensions of risk (as water quality and water quantity)? That is, when we know the costs of two risk reducing measures, which of them should be given priority? And could different

stakeholders/consumers be treated differently, etc.?

- How shall we arrive at a RAC (the actual acceptance limits) for various risks?

(21)

It is obvious that a discussion is needed to define the dimensions of risk to evaluate. For example, shall we only deal with “average risk values” for the total population, or do we focus on high risk groups (Stallen et al., 1996). Another topic is the question of public’s perceived risks (fears). To what extend shall that be taken into consideration? Sometimes it is also an issue to achieve a “fair” distribution of risk amongst various parties affected (Hokstad and Vatn, 2008).

So a RAC-approach should have an "ethical foundation", securing that safety is not compromised. Use of RAC should be seen as a means to reduce risk, and management commitment is essential in the process.

(22)
(23)

3

Decision Support Methods developed in

TECHNEAU

3.1 A generic framework for decision support

Risk management and decision-making may be described in different ways. The framework in Figure 3.1 was devised (Rosén et al., 2010b) to provide a combined structure and a generic description of risk management and decision-making in the context of drinking water supply. Here, the aim is to provide an overview of the most important steps and aspects included in water supply risk management and decision-making. The framework is based on the descriptions of risk management produced by the International

Electrotechnical Commission (IEC, 1995), c and decision-making by (Aven, 2003). The purpose is not to describe a new framework but rather to stress the close link between the two processes and clearly illustrate the role of risk assessment results as decision support. Additional components and aspects have been added to the original descriptions to stress, for example, the importance of considering uncertainties, to acquire new information when available, to update models and analyses and to communicate results to the consumers and other stakeholders.

The framework (Figure 3.1) outlines risk management and decision-making as a proactive process where an underlying decision problem initiates a risk assessment and the results are reviewed by the decision-maker before a decision is made. Decision problems initiating risk assessments are often based on the need to prioritise possible alternatives such as risk-reduction measures. A drinking water utility may, for example, want to know the risk a new chemical facility within the watershed would pose to the water source and in the end to the consumers. Questions linked to such a problem could be whether the risk is acceptable or not, and if not what measure should be taken to reduce the risk? When managing a drinking water system it is important to consider risk related to both water quantity, i.e. supply interruptions, and water quality, i.e. health problems. There may of course also be other risk types important to consider.

As illustrated in the framework, stakeholder values reflected in goals, criteria and preferences affect the decision problems as well as the risk assessment and the subsequent review. Examples of stakeholders are the water utility, the consumers, industries located within the watershed and government authorities. A typical example of criteria used within the drinking water sector is health-based targets defined by authorities. However, water utilities may also define their own performance targets and similar criteria that affect how prioritisations are made. Furthermore, there may be competing interests in society that affect the use of water sources. For example, new roads and railroads within the watershed of a groundwater source may be needed for improved transport, although this also introduces new risks to the water supply due to possible accidents, including hazardous goods.

(24)

Risk Assessment Risk Analysis Define scope Identify hazards Estimate risks Water quality Water quantity Risk Evaluation

Define tolerability criteria

Water quality Water quantity Decision analysis of alternatives Ranking Cost-effectiveness analysis Cost-benefit analysis Multi-criteria analysis Decision problem Decision alternatives Decisions on risk reduction and risk control

Treat risks Monitor Managerial review and judgement Acquire new information Update Analyse uncertainties and sensitivity Develop supporting programmes Document and assure quality Report and communicate Review, approve and audit Stakeholder values

Goals, criteria and preferences

Figure 3.1. A generic framework for decision support illustrating the main steps in risk management and how it is interconnected with decision-making (Rosén et al., 2010b).

Based on the decision problem, suitable methods and tools should be selected and used in the risk assessment to provide useful results that can support decision-making. A decision problem includes a vast number of different dimensions that can be perceived in different ways. In most cases it is not possible to consider all these aspects in a risk assessment. Hence, the risk assessment results provide decision support although a subsequent managerial review and judgement is necessary to consider aspects not possible to include in the risk assessment.

To support the performance of a risk assessment a team of people should be put together. The team should include people with knowledge of the system being analysed as well as people with knowledge of risk assessment and other aspects that may be relevant.

The arrows in Figure 3.1 illustrate the exchange of information between different steps as well as communication with relevant stakeholders. The task of communicating risk is important and carefully performed risk assessments may provide useful results that facilitate communication with decision-makers, consumers and other stakeholders. It is important to emphasise that risk assessment and decision-making should be a continuous and iterative process that is updated when new information becomes available and

(25)

preconditions change. Furthermore, the framework emphasises that risk assessments and other work need to be reviewed in order to assure the quality.

In addition to the framework, specific tools for providing decision support to utilities have been developed within TECHNEAU. These methods are

described below, together with examples of applications performed in case studies. The aim of the case studies was to evaluate the methods and tools that had been developed and to provide good examples. The descriptions are primarily based on the doctoral thesis by Lindhe (Lindhe, 2010) and papers by Lindhe et al. (2009), Lindhe et al. (2010a), Lindhe et al. (2010b), Rosén et al. (2010a) and Lindhe et al. (2011).

3.2 The dynamic fault tree method (DFT)

A quantitative and probabilistic risk assessment method for considering entire drinking water systems, from source to tap, was developed to

overcome the current lack of such methods. The method is based on dynamic fault tree analysis and is presented in detail by Lindhe et al. (2009; 2010a). Rosén et al. (2010a), and Lindhe et al. (2011) showed how to use the fault tree method to model risk-reduction measures. A summary of these applications are given below.

3.2.1 Method development

The importance of considering the entire drinking water system, from source to tap, when assessing risks is emphasised within the drinking water sector, although quantitative risk assessments of entire drinking water systems are rare. It was therefore decided to develop a method that could be employed when making such risk assessments. A key requirement for such a method is that it must be able to model the complex structure of a drinking water system, including interactions between subsystems and the ability to compensate for failures. Furthermore, it was concluded that the method should be quantitative so that risk levels and other results are expressed numerically can thus be compared easily to acceptable risk levels and other performance targets. Since risk is related to uncertainty a probabilistic approach should be applied to enable uncertainty analysis.

Based on these requirements, fault tree analysis was identified as a suitable basis for the method. Fault tree analysis makes it possible to model failures as chains of events and thus consider interactions between components and events. However, it was concluded that traditional fault tree analysis was not sufficient to model drinking water systems correctly and provide sufficiently informative results. Consequently, a Markovian approach was used to consider the dynamic behaviour of a drinking water system. The main differences between the dynamic fault tree method presented here and traditional fault tree analysis are: (1) the logic gates that make it possible to model fault-tolerant systems with an ability to compensate for failures; (2) the possibility to calculate not only the probability of failure but also the failure

(26)

rate and downtime for each event in the fault tree; and (3) the risk levels that are calculated as a function of the probability of failure and information on the proportions of consumers affected by different failures.

Fault tree analysis has previously been applied by, for example, Li (2007) to analyse cause-effect relationships in water supply systems. Beauchamp et al. (2010) used it to identify hazards in water treatment and Risebro et al. (2007) structured events of quality-related failure using a fault tree.

To facilitate the development of the dynamic fault tree method, it was applied simultaneously to the drinking water system in Gothenburg, Sweden. The Gothenburg system was thus used to identify conditions specific to drinking water systems that needed to be considered in the method. The method is, however, generic and can be applied to any type of drinking water system. A team made up of both researchers and water utility personnel contributed to the task of specifying the scope of the method and then developing and applying it.

3.2.2 Failure types and conceptual model

Failures in a drinking water system may affect the consumers in different ways. The overall failure event included in the fault tree method is termed supply failure and is defined as including: (1) quantity failure, i.e. insufficient water is delivered to the consumer; and (2) quality failure, i.e. water is delivered but does not meet water quality standards (Figure 3.2). Note that the failure types are defined based on how the consumers are affected. Quantity failure may occur due to failure of technical components such as pipes and pumps, making it impossible to transfer water. However, quantity failure can also be caused by events resulting in an unacceptable raw water or drinking water quality which, in turn, cause the water utility to stop delivery. Quality failure occurs if unacceptable water quality is not detected or if no actions are possible or sufficient and delivery is not stopped.

Quantity failure(Q = 0) No water is delivered to the consumer

Quality failure(Q > 0, C’) Water is delivered but does not meet water quality standards

Categories of supply failure Causes

Failure of components in the system (e.g. pumps or pipes)

Events related to unacceptable water quality causing the water utility to stop delivery Unacceptable water quality is detected but no action is possible or sufficient and it is not possible to stop delivery

Unacceptable water quality is not detected and no action is thus possible

Supply failure

Q = Flow (Q = 0, no water is delivered to the consumer; Q > 0, water is delivered) C’ = The drinking water does not comply with water quality standards

(27)

The fault tree method was developed to consider entire systems so that interactions between subsystems could be considered and also to identify how much the different subsystems contribute to the risk. As illustrated in Figure 3.3, the system is divided into its three main subsystems (raw water, treatment and distribution) and it is considered that failure in one part may be compensated for by the subsequent parts. For example, if no raw water can be supplied to the treatment plant, stored water at the treatment plant and service reservoirs within the distribution system can be used and the

consumers are not affected until all stored water is used. Spare components, such as reserve pumps, as well as the ability to compensate for failure within a specific subsystem, should of course also be considered.

The above-described failure types and the conceptual view of how failures may occur are of help when constructing fault tree models. Which failure types are included and how the system is divided should of course be adjusted to suit the specific application.

Figure 3.3. Conceptual model of how quantity and quality failures may occur in a drinking water system and affect the consumers (Lindhe, 2010).

3.2.3 Logic gates and dynamic calculations

To model the dynamic function of drinking water systems a Markovian, dynamic fault tree approach is used in the method presented here. Based on the function of drinking water systems and how failures may originate, four logic gates needed to be included. In addition to the traditional OR- and AND

(28)

-gates, two variants of the AND-gate have been developed (Lindhe et al., 2009; Lindhe et al., 2010a). The variants of the AND-gate have been devised to model a system’s ability to compensate for failures and are similar to what in dynamic fault tree applications are referred to as SPARE-gates (e.g. Durga Rao et al., 2010). In Table 3.1 examples are presented of the type of conditions each of the four logic gates can model. The variants of the AND-gate model how a system, given an initial failure, may prevent the failure from affecting the consumers. Using the first variant, one or more compensating

components/events can be included in the model and they are described using a failure rate (λ) and a probability of failure on demand (q). The failure rate corresponds to the time the component may compensate for failure (1/λ). The probability of failure on demand is included since compensation may not be available at all when needed, due to different reasons such as maintenance. The second variant of the AND-gate is similar to the first variant but can include only one compensating component/event. The important difference is, however, that the second variant can model the ability of the compensating component to recover after failure, i.e. the downtime (1/μ) is considered. Since a Markov approach is used the events are described using a failure rate (λ), or a mean time to failure (1/λ), and a repair rate (μ), or a mean downtime (1/μ). It has been found most suitable to use the failure rate and the mean downtime when discussing the characteristics of events in drinking water systems. These are therefore the variables mainly referred to in this report when characterising events. However, the rates λ and μ are used in the calculations. The Markov models for all logic gates are described in detail in Lindhe et al. (2009), Lindhe et al. (2010a) and Norberg et al. (2009).

Table 3.1. Examples of conditions in a drinking water system that the different logic gates can model (Lindhe, 2010).

LOGIC GATE EXAMPLE

OR-gate A raw water source may be contaminated by microbiological, chemical or other contaminants.

AND-gate To be unable to supply the treatment plant with raw water, all water sources need to be unavailable simultaneously.

First variant of

AND-gate If no drinking water can be transferred from the treatment plant to the distribution system, water stored in reservoirs in the distribution system may compensate for failure for a limited period. Failure on demand may occur if the reservoir is not in use due, for example, to maintenance work.

Second variant

of AND-gate Unacceptable raw water quality may be compensated for by the treatment. If the quality deviation cannot be compensated for at all, the treatment fails on demand. If there is no failure on demand, the quality deviation is compensated for until the treatment efficiency is affected by a failure. When the treatment recovers after the failure compensation is possible again.

(29)

To reduce the computational demand, approximate dynamic fault tree calculations are used that do not require Markov simulations. By replacing each basic event in the four logic gates with a Markov process, equations for calculating the probability of failure, failure rate and mean downtime have been developed, see Appendix 1.

3.2.4 Generic fault tree structure

It is not possible to provide one fault tree model that can be applied to all systems since they all look slightly different and are exposed to different risks. However, a generic fault tree structure is presented in Figure 3.4, which is in line with the conceptual model in Figure 3.3. The system is divided into its three main subsystems: raw water, treatment and distribution. Note that both quantity and quality failure are included in the same fault tree to

provide one model that gives an overview of the entire system. However, the results are presented separately for the two failure types, i.e. two different top events are used. Although the final fault tree model for a system must be more detailed than the example in Figure 3.4 the figure shows a basic structure.

OR-gate

First variant of AND-gate

Q = flow (Q = 0, no water is delivered to the consumer; Q > 0 water is delivered) C' = The drinking water does not comply with water quality standards

Distribution fails to compensate Quality failure Treatment fails to compensate Treatment quantity failure (Q = 0) Treatment fails to compensate Distribution fails to compensate Quantity failure Distribution fails to compensate Quality failure Distribution fails to compensate Raw water quality

failure (Q > 0, C') Treatment quality failure (Q > 0, C') Distribution quantity failure (Q = 0) Distribution quality failure (Q > 0, C') Treatment failure

Raw water failure Distribution failure

Supply failure

Raw water quantity failure (Q = 0)

Quantity failure

Figure 3.4. Generic fault tree structure illustrating quantity and quality failure in the three main subsystems (Lindhe, 2010).

(30)

3.2.5 Risk and measure of risk

In the fault tree method the risk related to both quantity and quality failure is determined by how often failure occurs (failure rate λ), the duration of failure (mean downtime 1/μ) and the mean number of people affected. The risk is expressed as the expected value of Customer Minutes Lost (CML), which corresponds to the number of minutes per year the average consumer is: (1) not supplied with water (quantity-related risk); and (2) supplied with water that does not comply with water quality standards (quality-related risk). Whilst the same unit is used for both risk types they must be presented separately to retain transparency. The risk may be calculated approximately by multiplying the failure rate (λ) by the downtime (1/μ) and the proportion of consumers affected (C). However, to take into account that the system cannot fail when already in failure mode the risk (R) should be calculated as

F

R P C (3.1)

where PF is the probability of failure and C the proportion of consumers

affected. The proportion of consumers affected is used since the risk is expressed for the average consumer. The dynamic fault tree calculations provide information on the probability of failure but not the proportion of consumers affected. Hence, the proportion of consumers affected must be defined for the main failure events in the fault tree. It cannot be defined for the top event since it may include failures that affect a very different number of people. The total risk is thus calculated as

Fi i i

R

P C

(3.2)

To be able to calculate the risk in this way there can only be OR-gates between the top event and the level where Ci is defined.

The use of CML within the drinking water sector is discussed by, for example, Blokker et al. (2005). It should be noted that the quality-related CML does not include any information about the possible health effects and not all drinking water is used as plain drinking water or for cooking.

3.2.6 Input data and uncertainties

The dynamic fault tree calculations are combined with Monte Carlo

simulations to enable uncertainty analysis. The probabilistic approach makes it possible to: (1) analyse the uncertainties in the risk calculations; (2) calculate rank correlation coefficients, providing information on how much the

uncertainty of each input variable in the fault tree contributes to the uncertainty in the results; and (3) calculate the probability of the risk exceeding specified criteria, i.e. acceptable risk levels.

All input variables in the fault tree model are thus replaced by probability distributions. Parameters (rates) λ and μ are modelled as random variables using Gamma distributions, and the proportion of consumers affected (C) as

(31)

well as the probability of failure on demand (q) are modelled using Beta distributions (Lindhe et al., 2009; Lindhe et al., 2010a; Norberg et al., 2009). The distributions can be defined based on measurements and event statistics, i.e. hard data, or by using expert judgements. The Gamma distribution has one shape parameter (r) and one scale parameter (σ). Hard data used to define the Gamma distribution for variables λ can be presented by the total number of registered events (r-1) and the specific time period (1/σ). For μ the data can be presented by the total number of registered events (r-1) and the total duration of failures (1/σ).

Expert knowledge is often an important source of information in risk

assessments since the amount of hard data is often scarce (Paté-Cornell, 1996). The elicitation of expert judgments is facilitated in the fault tree method since the events are seen as Markov processes defined using failure rates and mean downtimes. This means that no direct estimates of the probability of failure are required which was considered an advantage in the method application in Gothenburg. Events and the function of components are described more easily using rates or times and existing data is often available in this format. Possible experts are water utility personnel or other persons with knowledge of the specific event studied. The expert can be asked to estimate a probable highest and lowest value of the failure rate (λ), the downtime (1/μ) and other variables. This information can be used as, for example, as 5- and

95-percentiles to define a probability distribution. What 95-percentiles the values are assumed to correspond to should be based on the expected accuracy in the judgements. It is possible in a fault tree application to use different percentiles for different judgements.

3.2.7 Case study Göteborg Introduction

The fault tree method was used to analyse the drinking water system in Gothenburg, Sweden. The Gothenburg system supplies approximately 500,000 people with drinking water and includes several forms of interaction between events and subsystems. The system is based solely on surface water and the main water source is the river Göta Älv. An overview of the raw water supply in Gothenburg is presented in Figure 3.5.

Two water treatment plants are included in the system and treatment plant number 1 is under normal conditions supplied with water from the river. Water from the river is also pumped via a 12 km rock tunnel to two

interconnected lakes (main reservoir lakes), which in turn supply treatment plant number 2 with raw water. Due to variable water quality in the river, the river water intake is closed regularly for about 100 days per year (e.g. Åström et al., 2007). During these periods the reservoir lakes supply both treatment plants with raw water. When the intake needs to be closed for longer periods an additional water source (additional reservoir lakes) can also be used to supply the main reservoir lakes or treatment plant number 2 directly, with water.

(32)

Water source / reservoir

Water treatment plant no. 1 Water treatment plant

Water treatment plant no. 2 Watercourse

Raw water distribution

Main reservoir lakes

Main raw water source

Upper additional reservoir

Lower additional reservoir

Figure 3.5. Schematic description of the raw water system in Gothenburg (Lindhe, 2010).

Both treatment plants include similar treatment processes and contribute in approximately equal parts to meeting an average water demand of

165,000 m3/d (normally demand varies between 120,000 and 210,000 m3/d).

To handle variations in the water demand and production capacity, service reservoirs in the distribution system and at the treatment plants are used. In addition, the distribution system is divided into different pressure zones and booster stations are used to ensure sufficient pressure in elevated zones. The water quality is monitored online and by means of regular additional measurements throughout the system. The decision to close the river water intake is based on the online monitoring and reports from operating bodies upstream, such as companies and municipalities.

Fault tree model

The risk assessment of the Gothenburg system included both quantity and quality failures and the drinking water quality was considered unacceptable when unfit for human consumption, a criterion defined by the Swedish quality standards for drinking water (SLVFS 2001:30). The task of identifying undesired events, structuring the fault tree and evaluating and updating the fault tree structure was carried out jointly by researchers and water utility personnel. The model building and calculations were performed using Microsoft Excel© and the add-in Software Crystal Ball© was used for running

Monte Carlo simulations.

The fault tree model of the Gothenburg system was based on the generic structure presented in Figure 3.4 and in total it included 116 basic events and 101 logic gates. An OR-gate was used to model that failures may occur in any of the three main subsystems (raw water, treatment and distribution). The first variant of the AND-gate was used to model that failure in one subsystem may be compensated for in the subsequent subsystems. The raw water part of the model included the water sources, the raw water supply system (i.e. pumps, siphons, pipes, tunnels etc.) and all components up to the points

(33)

where the raw water enters the two treatment plants. Everything between the points where the raw water enters the treatment plants, throughout the plants and up to the points just before the treated water is pumped out into the distribution network, was included in the treatment part of the fault tree. The distribution system included all components (pumps, pipes, service

reservoirs etc.) from the point where the treated water is pumped out from the treatment plants to the consumers’ taps.

An OR-gate was used to separate failures in each of the three main subsystems into quantity and quality failures. In doing so it was possible to calculate the results for quantity and quality failures separately and thus retain

transparency.

It is not only possible for one subsystem to compensate for failure in other parts of the system; interactions between parts within the same subsystem also provide opportunities for compensation. Both variants of the AND-gate were used to model different kinds of compensation within the three subsystems. The first variant of the AND-gate was used to model situations where the ability to compensate was limited in time, for example due to limited reservoir volume. The second variant was used to model the ability of the treatment to compensate for unacceptable raw water quality, see

Appendix 1.

To make it possible to calculate risk levels expressed as CML, a suitable level in the fault tree for defining the proportions of people affected needed to be identified. In the Gothenburg fault tree, quantity failure as well as quality failure under each of the three main subsystems were divided into main failure events and the proportion of people affected was defined for these events. Quantity failures in the raw water system, for example, were divided into two events illustrating which of the two treatment plants that may not be supplied with raw water. Quality failures in the distribution system were divided into events such as quality deterioration and contaminant intrusion. These events were also divided into major and minor events in order to avoid mixing events with considerably different consequences.

Hard data as well as expert judgments were used as input data for the fault tree model. The expert judgments were used as estimates of the 5- and 95-percentiles when defining probability distributions describing the

uncertainties of the input data.

Case study results

In addition to the quantitative results of the fault tree analysis the actual fault tree model and the process of constructing it are also important results. The structure of the fault tree model illustrates how the system functions and it shows the interactions between subsystems and events. Furthermore, when constructing a fault tree model aspects of a system that may otherwise be ignored are acknowledged.

(34)

The Monte Carlo simulations were performed using 10,000 iterations and the risk, failure probabilities, failure rates and downtime were estimated at all levels in the fault tree. In Figure 3.6 and Figure 3.7 the mean CML per year (risk), probability of failure, failure rate and mean downtime are shown for quantity and quality failure respectively. The probability of failure is determined by the failure rate and mean downtime and these two variables provide additional information on the dynamic behaviour of the system. For each failure type the results are presented for the entire system as well as for the raw water, treatment and distribution parts separately. Since

uncertainties are considered in the analysis the mean, 5- and 95-percentiles are presented for all variables.

By studying the contributions to risk it can be concluded that for both

quantity failure and quality failure the raw water system contributes most to the total risk level (Figure 3.6 and Figure 3.7). However, when comparing the probabilities of failure it is clear that failures in the distribution system are the most probable for both quantity and quality failures. Hence, by studying the CML values together with information on probabilities it can be concluded that the raw water system contributes most to the total risk level due to more severe consequences and not because of a high probability of failure.

The failure rates and downtimes show that the high probability of

distribution failure (quantity and quality) is due to frequent failures, i.e. a high failure rate, because the downtime is short. It is also shown that the raw water system, in contrast to the distribution system, has a low failure rate but a long downtime. The long downtime in combination with the fact that many consumers are affected when something happens in the first part of the supply chain results in severe consequences. This explains why the raw water system contributes most to the total risk level. Failure in the treatment may also affect many consumers, but since the failure rate is low and the

downtime is short for these events, they only have a minor influence on the total risk. It should be noted that although a quality failure has a low failure rate and short downtime, the consumers affected may be subject to severe health effects.

(35)

Tot. Raw w. Treat. Distr. 0 500 1000 1500 2000 Risk (quantity)

CML (quantity) per year

P05 Mean P95

Tot. Raw w. Treat. Distr.

0 0.1 0.2 0.3

0.4 Probability of failure (quantity)

Probability

Tot. Raw w. Treat. Distr.

0 100 200 300 400 500

600 Failure rate (quantity)

Mena failure rate [year

-1]

Tot. Raw w. Treat. Distr.

0 10 20 30 40 50 Downtime (quantity) Mean downtime [h]

Figure 3.6. Diagrams showing the risk (expected value of CML), probability of failure, failure rate and mean downtime for quantity failure. The mean, 5- and 95-percentiles are presented for the entire system (Tot.) as well as the three main sub-systems (Lindhe, 2010).

Tot. Raw w. Treat. Distr.

0 1000 2000 3000

4000 Risk (quality)

CML (quality) per year

P05 Mean P95

Tot. Raw w. Treat. Distr.

0 0.1 0.2 0.3

0.4 Probability of failure (quality)

Probability

Tot. Raw w. Treat. Distr.

0 5 10

15 Failure rate (quality)

Mean failure rate [year

-1]

Tot. Raw w. Treat. Distr.

0 100 200 300 400 Downtime (quality) Mean downtime [h]

Figure 3.7. Diagrams showing the risk (expected value of CML), probability of failure, failure rate and mean downtime for quality failure. The mean, 5- and 95-percentiles are presented for the entire system (Tot.) as well as the three main sub-systems (Lindhe, 2010).

(36)

Figure 3.6 and Figure 3.7 show that the failure rate is higher for quantity failure compared to quality failure although the downtime is shorter for quantity failure. Quantity failures are therefore most common whilst quality failures have a longer duration. The percentiles show that the uncertainties in some of the variables are high. One example is the total risk level related to quantity failure, the uncertainties of which are analysed further below.

To evaluate the results the calculated total risk level related to quantity failure was compared with a politically established safety target that can be regarded as being an acceptable level of risk. The safety target is defined by the City of Gothenburg as: duration of interruption in delivery to the average consumer shall, irrespective of the reason, be less than a total of 10 days in 100 years

(Göteborg Vatten, 2006). The uncertainty distribution in Figure 3.8 represents the calculated risk level and it is compared with the performance target, translated into 2.4 hours lost per customer per year, i.e. CML = 2.4 x 60=144 minutes per year. The probability of exceeding the target value was

calculated to 0.84. To be able to say whether the risk is unacceptable or not one needs to decide to what level of certainty the target should be fulfilled.

0 500 1000 1500 2000 0 0.01 0.02 0.03 0.04 0.05 Risk (quantity)

CML (quantity) per year

Probability

Figure 3.8. Uncertainty distribution of quantity-related risk, including the entire system, compared with the performance target (144 CML per year) indicated by the solid vertical line. The probability of exceeding the performance target (grey area) is 0.84 (Lindhe, 2010).

During Monte Carlo simulations, rank correlation coefficients between the input and the output parameters can be calculated. To illustrate how rank correlation coefficients may be used, Figure 3.9 shows the six variables in the fault tree model contributing most to the uncertainties in the result of

probability of quantity failure in the distribution system. Note that in Figure 3.9 the repair rate (μ) is presented and not the mean downtime (1/μ). This is because the repair rate is used as an input variable in the fault tree model. However, since both variables correspond to the same information this does not affect the uncertainty analysis. All failure rates (λ) have a positive rank correlation coefficient since an increase in the failure rate means that failure becomes more frequent and the probability of failure thus increases. In the opposite way, all mean repair rates (μ) have a negative rank correlation

(37)

coefficient since an increase in the repair rate means that the mean downtime (1/μ) decreases and consequently the probability of failure decreases.

The results in Figure 3.9 show that the failure rate and repair rate of failure of distribution pipe, failure of service connection and quantity failure in building are the six variables in the fault tree that contribute most to the uncertainties in the probability of distribution failure. To reduce the uncertainties in this specific probability value most effectively, these six variables should be studied further to acquire more accurate estimations. This kind of information may thus act as a guide in further studies.

−1 −0.5 0 0.5 1

Failure of service connection (λ) Quantity failure in building (λ) Quantity failure in building (μ) Failure of distribution pipe (λ) Failure of service connection (μ)

Failure of distribution pipe (μ)

Rank correlation coefficient

Figure 3.9. Uncertainty analysis of the probability of quantity failure in the distribution system. The rank correlation coefficients of the six variables contributing most to the uncertainties in the probability of distribution failure are presented (Lindhe, 2010).

The rank correlation coefficients were used to analyse how the input data affects the results for the top events (quantity and quality failure) in the fault tree model of the Gothenburg system. It was concluded that the uncertainties in both the quantity- and quality-related risk levels are mainly caused by uncertainties in the consequences, i.e. proportions of affected consumers. One reason why the consequences have a large impact on the uncertainties is that they are included in the calculations at a high level in the fault tree model. Besides the consequences, some of the events in the raw water part of the model had relatively high correlation coefficients. For the probability of failure as well as the rates λ and μ, events in the distribution part of the fault tree contributed most to the uncertainties. Compared to the raw water and treatment part of the fault tree model, the distribution part has more basic events at a high level in the structure. This makes the results more sensitive to changes in the variable for these events compared to events at a lower level in the fault tree.

3.2.8 Key aspects when applying the fault tree method

Based on the method development and the case study application, the following key aspects have been identified as being important to consider when using the fault tree method:

References

Related documents