Safety - Tools and Techniques in Dependable Distributed Control

3. Tools and Techniques in Dependable Distributed Control

3.2 Dependability

3.2.4 Safety

3.2.4.1 Safety Standards

[Jesty 1993] compares two sets of draft standards aimed at the safety of computers and programmable electronic devices, and introduces a third.

Def Stan [00-56 1991] concerns hazard analysis and safety classifications, and, where the safety integrity level is found to be significantly high, determines the use of Def Stan 00-55. [00-55 1991] specifies the procedures and tools to develop safety critical software, namely the use of formal methods for

specification, design and verification.

[IEC/SC65A] WG9 & WG101 are concerned with the software and the system respectively. Once the system safety integrity level is defined with WG10, then WG9 is used to select the appropriate design technique.

Two main criticisms in the standards are in the imposition of the use of unagreed methods, and the relationship between safety integrity and probability of software failure. This motivated Jesty’s proposed standard, which differs little from the other two.

Firstly, although 00-55 deals only with safety critical software, it provides strict definitions as to what constitutes a suitable formal method, thereby ruling out some methods where no consensus lies in the safety critical community. In contrast WG9 includes all integrity levels, but determines which method types to use for each particular level.

Secondly, software faults, unlike hardware faults, are purely systematic, see section 3.2.5, so the

probability of failure of software cannot be based on randomness. Safety integrity levels, in 00-56, imply specific minimum failure rates, but because the failure probability of software is not related to the integrity of the software development method, then this means that good practice does not guarantee good software. For this reason Jesty introduces confidence levels. Confidence levels can be associated with probabilities of failure, and can be mapped onto levels of integrity. The strategy is: the more the design and implementation processes are known, the greater confidence can be placed on the software produced by them. It is therefore optional to use formal methods at any integrity level, but it is only at the highest level where formal methods are mandatory.

TEST RETRO-FIT DESIGN VERIFICATION VERIFICATION IMPLEMENTATION DESIGN REVIEW VALIDATION PLANNING SYSTEM MODIFICATION OPERATION AND MAINTENANCE DESIGNATION OF SAFETY RELATED SOFTWARE SAFETY VALIDATION FUNCTIONAL REQUIREMENTS SPECIFICATION SOFTWARE REQUIREMENTS SPECIFICATION SAFETY INTEGRITY REQUIREMENTS SPECIFICATION

Figure 3-3 A life cycle for safety critical software

FREQUENCY OF OCCURRENCE SAFETY INTEGRITY LEVEL RIGOUR OF TECHNIQUE SEVERITY OF HAZARD

THE HAZARD AND ITS FREQUENCY DETERMINE THE RISK LEVEL OF A SYSTEM. WHERE THE SAFETY INTEGRITY REQUIREMENT IS VERY HIGH, THEN FORMAL METHODS MUST BE USED IN SOFTWARE DEVELOPMENT.

RISK LEVEL HIGH LOW NORMAL DEATH 1 VERY HIGH 2 HIGH 3 MEDIUM 4 LOW

5 NORMAL 4 PSEUDO CODE5 ISO 9000 1 FORMAL NOTATION

The proposed standard borrows its safety critical life cycle from IEC, and is outlined in Figure 3-3.The designation of safety related software is established through hazard analyses. The requirements

specification is made up of functional requirements and safety requirements. The latter includes measures to avoid faults and measures to control faults. The design and implementation processes make frequent consistency checks ensuring their validity by verifying with themselves and the requirement

specification. Testing and safety validation are performed before the software is deemed satisfactory and set into operation.

Defence Standard 00-55

Def Stan 00-55 defines that a formal method must meet several criteria before it is suitable for safety critical applications:

• it should be recognised as a formal notation, capable of expressing specifications and designs mathematically, either by itself or with another formal method

• the design steps should be verifiable, so should have a proof of theory and guidance on how the theory can best be exploited

• it should be in the public domain, with accessible courses, textbooks, case studies and industrial tools • it should be standardised and preferably be a standard.

Examples given of formal methods for reasoning about sequential properties are VMD and Z, and about concurrent and communicating properties is LOTOS.

The 00-55 standard discusses validation and checking the formal specification by: • mechanical checks for consistency

• proof checkers and editors • symbolic execution tools

• animation for validity, completeness, consistency, reachability and redundancy Another way is to produce a (verified) executable prototype:

• from an executable subset of the formal method, and gives objex for OBJ as an example • by translation into a logic programming language e.g. prolog

• by translation into a language with strong data typing, e.g. Pascal.

3.2.4.2 Safety Integrity Level

Very early in the development of a system, the software engineer must assess the hazards associated with the system and the likelihood of them happening. Significant hazards are: 1 loss of human life, 2 injuries to, or illness of, persons, 3 environmental pollution, 4 loss of, or damage to, property, and likelihood is the frequency of occurrence. Where the combination of the hazard and its frequency, the risk level, is too high then the system should be treated as ‘safety related’ or even ‘safety critical’, and more stringent levels of dependability must be applied.

The major difference between a system designed for safety and one which is not is that all other (functional and non-functional) characteristics of the system are subordinate to safety, and that every

effort must be made to avoid the hazard happening and to minimise its effects. This means that it can be better to ‘fail safe’, i.e. stop, when a fault occurs than to recover and continue with reliable service.

CATEGORY DEFINITION

Catastrophic Multiple deaths

Critical Single death/multiple severe injuries Marginal Single severe injury/multiple minor injuries Negligible At most a single minor injury

Table 3-1 Accident severity categories

PROBABILITY OCCURRENCE during operational life Frequent Likely to be continually experienced Probable Likely to occur often

Occasional Likely to occur several times

Remote Likely to occur some time

Improbable Unlikely, but may exceptionally occur

Incredible Extremely unlikely that the event will occur at all

Table 3-2 Probability ranges

CATASTROPHIC CRITICAL MARGINAL NEGLIGIBLE

Frequent A A A B Probable A A B C Occasional A B C C Remote B C C D Improbable C C D D Incredible D D D D

Table 3-3 Risk classification of classes

RISK

CLASS INTERPRETATION

A Intolerable

B Undesirable, only accepted when risk reduction is impracticable C Tolerable with the endorsement of the project safety review committee D Tolerable with the endorsement of the normal project reviews

Standards on safety in computer controlled systems recommend methods to determine the level of risk associated with the system. Once established, risk levels indicate the ‘overall system integrity level’, and recommend the minimum level of rigour of technique to adopt when developing the software. Figure 3-4 shows that software written for a system with a very high level of risk should be developed using formal methods, but [ISO 9000] must be followed when developing software for systems at all risk levels.

3.2.4.3 Hazard Analysis

It would seem hypercritical to advocate the adoption of standards without even partial use of them. It is for this reason that a cursory assessment of safety integrity is made. Def Stan [00-56 1991] is used because it specifies hazard analysis and safety classification and it is currently available, albeit in interim form.

It is assumed that an industrial FMC will not be as fully guarded as is the test FMC. It is considered that the cell robot is the most dangerous hazard, because it could ‘launch’ a 5 kg mass with a linear velocity of 3 m/s. The estimated consequences are “a single death and/or multiple severe injuries”, however this would be “unlikely, but may exceptionally occur”. With reference to the accident severity categories (Table 3-1) and the probability ranges (Table 3-2), then a risk class can be determined (Table 3-3). A “critical” accident severity with “improbable” probability of occurring leads to a “C” accident risk class, which means “tolerable with the endorsement of the project safety review committee” (Table 3-4). The safety integrity level for the first safety feature must be S4 (Table 3-5a), because the accident severity is critical, and subsequent safety features must be S2 (Table 3-5b).

Safety features are devices or methods that reduce risk, either by reducing the probability of occurrence or by reducing the accident severity, and thus improve the safety integrity. Formal methods can be used as a safety feature in specification, design and verification, and are deemed as necessary in safety critical software as specified in Def Stan 00-55. Table 3-6 permits the claim of a ‘remote’ failure rate for integrity level S4 and ‘probable’ for S2. The minimum failure relates only to systematic failure. Hazard analysis for the FMC determines it a safety critical system, which requires the use of formal methods.

In document A Petri net occam based methodology for the development of dependable distributed control software (Page 47-51)