• No results found

Triple Modular Redundancy Pattern (TMR)

6. Catalog of Design Patterns

7.5. Triple Modular Redundancy Pattern (TMR)

Other Names:

2-oo-3 Redundancy Pattern, Homogeneous Triplex Pattern. Type:

Hardware Pattern. Abstract:

This pattern is a variation of homogeneous hot redundancy, that consists of three identical modules operate in parallel to detect random faults, in order to enhance reliability and safety in a system with no fail-safe-state. The modules operate in parallel to produce three results that are compared using a voting system to produce a common result as long as two channels or more have the same result. This structure allows the system to operate and to provide func- tionality in the presence of a random fault without losing the input data [37]. Context:

Developing an embedded system with no fail-safe-state in a situation that in- cludes high random failure rate and no limitation on redundancy, with the purpose of improving safety and reliability of the system.

Problem:

How to deal with random faults and single-point of failure in order to increase the safety and reliability of the system without losing the input data in the presence of faults.

Pattern Structure:

As shown in Fig. 7.5, this pattern contains three identical modules or channels operate in parallel. This structure is used to prevent the failure of a single com- ponent, which may lead to a complete system failure [97]. If a single fault occurs in one channel then the other two channels will continue to work correctly and produce the correct actuation control signals.

The voter plays a main role in this pattern by applying the voting policy to take the majority from the results which represents the correct actual result. This pattern does not identify the type or the reason of the fault; it just deter- mines the module that contains a fault without correcting the fault itself. The function of each unit is as follows:

• Module (Channel): (see the general pattern in Section 6.7).

7.5. Triple Modular Redundancy Pattern (TMR)

Actuator(s) Data Acquisition System

(Input Processing)

Data Processing

(Transformations) Output Processing Input

Sensor(s)

Data Acquisition System (Input Processing)

Data Processing

(Transformations) Output Processing Input

Sensor(s)

Data Acquisition System (Input Processing)

Data Processing

(Transformations) Output Processing Input Sensor(s) Voter Module (Channel) 1 Module (Channel) 2 Module (Channel) 3 Actuator Control Signal(s)

Figure 7.5.: Triple Modular Redundancy Pattern

(see the general pattern in Section 6.7). It is possible to use one sensor for the three modules, but in this case the failure of this sensor would affect inputs to all of the modules. Another choice is to use three separate sensors in order to remove the possible single-point of failure in this sensor.

• Actuator(s): (see the general pattern in Section 6.7).

• Data Acquisition (Input Processing): (see the general pattern in Section 6.7).

• Data Processing (Transformation): (see the general pattern in Section 6.7).

• Output Processing: (see the general pattern in Section 6.7).

• Voter: The voter receives three outputs from the output processing units of the modules and implements a voting policy to find the majority output. If at least two modules give the same output, then the voter takes this result and discards the output from the deviated channel. Like the sensor, this component could be a reason for a single-point of failure. So, it may be possible to use three separate voters instead of a single one. Another possible solution is to take care of the design of this unit to produce a simple and reliable voter.

Implication:

This section shows the implication of this pattern relative to the basic system.

• Reliability:

two or more channels have no fault.

Rnew=Rvoter 3R2−2R3

(7.15) Assume that the voter is a simple component that was carefully designed with reliability (Rvoter ≈1).

Rnew≈3R2−2R3 (7.16)

⇒ The percentage relative improvement in reliability is

RRI = Rnew−Rold

1−Rold ×100% RRI = 3R 22R3R 1−R ×100% RRI = 2R2−R ×100% (7.17) • Safety:

The TMR pattern includes two design techniques: test by redundant hard- ware, and fault detection and diagnosis (voter). According to the hard- ware and software requirements in the standard IEC 61508-2, 3 [46], the recommendations for these techniques are shown in Tab. 7.6.

Table 7.6.: Recommendations for safety integrity levels.

Techniques SIL1 SIL2 SIL3 SIL4

Test by redundant hardware R R R R

Fault detection and diagnosis (Voter) – R HR HR

According to the last table, the average recommendations of this pattern for the different safety integrity levels are shown in Tab. 7.7.

Table 7.7.: Recommendations of TMR Pattern for safety integrity levels.

Pattern SIL1 SIL2 SIL3 SIL4

Triple Modular Redundancy Pattern WR R MR MR

To compute RSI:

PU F(old)= 1−R (7.18) The triple modular redundancy will continue to work correctly as long as two or more channels have no fault. Thus, the probability that the system will give an erroneous result that leads to unsafe failure is

PU F(new)= 1−Rnew= 1− 3R2−2R3

7.5. Triple Modular Redundancy Pattern (TMR)

⇒ The percentage relative safety improvement is

RSI = PU F(new)−PU F(old)

0−PU F(old) ×100% RSI = 1− 1− 3R 22R3 1−R ! ×100% RSI = 2R2−R×100% (7.20)

Like in HmDandHtD pattern, it is clear from Equation (7.20) and Equa- tion (7.17) that the relative safety improvement is equal to the relative reliability improvement.

• Cost:

The cost of this pattern can be classified into two parts: – Recurring Cost:

∗ This pattern has a high recurring cost due to the using of three parallel modules. So, the recurring cost is 300% comparing to the basic system.

∗ The cost of voter which is normally a simple hardware circuit that depends on the type of the output control signal and the implementation method.

– Development Cost: The three modules (channels) are identical and using the same algorithm and the same software. Therefore, the development cost for this pattern is similar to the development cost of the basic system.

• Modifiability:

This pattern does not change the level of modifiability of the basic system, since if you want to modify the functionality for a system with TMR, the effort will be equivalent to modifying a simple channel.

It is also possible to modify this pattern to M-oo-N redundancy by in- creasing the number of channel to increase the reliability of the system. This is an easy step that just includes the modifying of the voter to take the new channels into its voting policy.

• Impact on Execution Time:

This pattern has a little influence on the executing time comparing to the basic system, since the three modules are running separately in parallel. The only time influence of this pattern is the small delay of the voter hardware that affects the response time from reading the input signal to generating the control actuating signals.

Implementation:

• To implement this pattern, the designer should replicate the channel which includes the replication of the hardware as well as software.

• With respect to the sensor, there are two options either to use a common single sensor for the three channels which may become a source for a single- point-failure, or to use three separate sensors one for each channel [57], which will increase the recurring cost.

• If the outputs from the three channels are binary and identical without any deviation, then the voting can be implemented as bit-by-bit comparator that checks for a common output between two channels or more.

• If the three modules consist of different hardware components, then there might be a divergence between the correct outputs [97]. So, the designer should determine a tolerance valueδ, such that the two outputs AandB

will be considered as identical as long as|A−B|< δ.

Consequences and Side Effects:

• The main drawback for this pattern is that it is not appropriate for system- atic faults handling. In this case the three channels are identical and have the same possible fault, and the system will continue to work producing invalid data.

• To deal with single-point of failure in the input sensor, it is possible to use three separate sensors. This option might lead to a problem, especially for different sensors with different speed of responses [57].

Related Patterns:

• This pattern does not perform a check on the input data or on the actua- tors. So, it is possible to combine this pattern with the Single Protected Channel Pattern in order to deal with the transient fault.

• In order to deal with the systematic faults, this pattern can combined with the heterogeneous design pattern to design three diverse modules that perform the same functionality. This option will increase the development cost to300%.