RELIABILITY IMPROVEMENT THROUGH ROBUST DESIGN
5.12 ROBUST RELIABILITY DESIGN OF DIAGNOSTIC SYSTEMS In this section we describe the development of the robust reliability design method
for diagnostic systems whose functionality is different from that in common hardware systems in that the signal and response of the systems are binary. In particular, in this section we define and measure the reliability and robustness of the systems. The noise effects are evaluated and the noise factors are prioritized.
ROBUST RELIABILITY DESIGN OF DIAGNOSTIC SYSTEMS 173
13.5 14.0 14.5 15.0 15.5
A D C B
Level
0 1
(a) h∧ h∧ h∧
12 13 14 15 16
D0
B0 B1
(b) D1
12 13 14 15 16
D1 D0
C0 C1
(c)
FIGURE 5.24 (a) Main effect plots for A, B, C, and D; (b) interaction plot for D× B;
(c) interaction plot for D× C
The steps for robust reliability design are described in detail. An automotive example is given to illustrate the method.
5.12.1 Problem Statement
Diagnostic systems are software-based built-in-test systems which detect, isolate, and indicate the failures of the prime systems, where prime systems refers to the hardware systems monitored by the diagnostic systems. Use of diagnostic systems reduces the loss due to the failure of the prime systems and facilitates subsequent repairs. Because of the benefits, diagnostic systems have found exten-sive applications in industry, especially where failure of the prime systems results in critical consequences. For instance, on-board diagnostic (OBD) systems are integrated into automobiles to monitor components and systems whose failure would cause emission concerns. When failure of such components or systems occurs, the OBD system detects the failure, illuminates a light on the instrument panel cluster saying “Service Engine Soon,” to alert the driver to the need for repair, and stores the diagnostic trouble codes related to the failure to aid failure isolation.
In modern software-intensive diagnostic systems, algorithms are coded to per-form operations for diagnosis. Ideally, the diagnosis should indicate the true state (failure or success) of the prime systems. However, if not designed adequately, the algorithms are sensitive to noise sources and thus cause diagnostic systems to commit the following two types of errors:
ž Type I error (α error). This error, denoted by α, is measured by the prob-ability that the diagnostic system detected a failure given that one did not occur.
ž Type II error (β error). This error, denoted by β, is measured by the proba-bility that the diagnostic system failed to detect a failure given that one did occur.
Because of α error, diagnostic systems may generate failure indications on surviving prime systems. Thus,α error results in unnecessary repairs to products.
Manufacturers have an intense interest in eliminating or minimizing this type of error because unnecessary repairs incur remarkable warranty expenses. On the other hand, diagnostic systems may not generate failure indications on failed prime systems because of β error. As a result, β error causes potential losses to customers, so manufacturers are also responsible for reducing β error. In the automotive industry, a largeβ error of an OBD system can trigger vehicle recalls, usually issued by a government agency. Therefore, it is imperative that bothα and β errors be minimized over a wide range of noise factors. A powerful technique to accomplish this objective is robust reliability design.
5.12.2 Definition and Metrics of Reliability and Robustness
A prime system usually has a binary state: success or failure. The intended function of a diagnostic system is to diagnose the states correctly over time. That is, a diagnostic system should indicate a failure of the prime system when it occurs, and not indicate a failure if it does not occur. Thus, the reliability and robustness of a diagnostic system can be defined as follows.
ROBUST RELIABILITY DESIGN OF DIAGNOSTIC SYSTEMS 175
ž Reliability of a diagnostic system is defined as the probability of the system detecting the true states of the prime system under specified conditions for a specified period of time.
ž Robustness of a diagnostic system is the capability of the system to detect the true states of the prime system consistently in the presence of noise sources.
Robustness can be measured by α and β. These two types of errors are cor-related negatively; that is,α increases as β decreases, and vice versa. Therefore, it is frequently difficult to judge the performance of a diagnostic system using α and β only. Reliability is a more reasonable and comprehensive metric to measure performance.
G. Yang and Zaghati (2003) employ the total probability law and give the reliability of a diagnostic system as
R(t)= (1 − α) − (β − α)M(t), (5.47) where R(t) is the reliability of the diagnostic system and M(t) is the failure probability of the prime system. Equation (5.47) indicates that:
ž If the prime system is 100% reliable [i.e.,M(t)= 0], the reliability of the diagnostic system becomes 1− α. This implies that the unreliability is due to false detection only.
ž If the prime system fails [i.e., M(t)= 1], the reliability of the diagnostic system becomes 1− β. This implies that the unreliability is due only to the inability of the system to detect failures.
ž If α= β, the reliability becomes 1 − α or 1 − β. This implies that M(t) has no influence on the reliability.
ž The interval ofR(t) is 1− β ≤ R(t) ≤ 1 − α if β > α (which holds in most applications).
Taking the derivatives of (5.47) gives
∂R(t)
∂α = M(t) − 1, ∂R(t)
∂β = −M(t), ∂R(t)
∂M(t) = −(β − α). (5.48) Because the derivatives are negative,R(t) decreases as α, β, or M(t) increases.
In most applications,M(t) is smaller than 0.5. Hence,|∂R(t)/∂α| > |∂R(t)/∂β|.
This indicates thatR(t) is influenced more by α than by β.
Since reliability is considered as the quality characteristic, the signal-to-noise ratio for a run is computed from (5.36) as
ˆη = −10 log
1l
l j=1
1 Rj − 1
2
,
TABLE 5.17 Grouped and Prioritized Noise Factorsa
Noise Variable
Type α β M(t) Sensitivity Priority
1 × × × −(1 + β − α) 1
2 × × −1 2
3 × × −(1 + β − α − M) 3
4 × × −(M + β − α) 5
5 × −(1 − M) 4
6 × −M 6
7 × −(β − α) 7
a ×, affected.
where l is the number of columns in an outer array and Rj is the reliability at thej th noise level.
5.12.3 Noise Effects Assessment
As discussed in Section 5.6, there are three types of noise factors: external noise, internal noise and unit-to-unit noise. Some of these noise factors disturb the diagnostic systems directly and increase α and β errors. Meanwhile, others may jeopardize the function of the prime systems and deteriorate their reliability. In general, a noise factor may affect one or more of the variables α, β and M(t).
Depending on what variables are disturbed, the noise factors can be categorized into seven types, as shown in Table 5.17. The noise factors in different types have unequal influences on the reliability of the diagnostic systems. The significance of a noise factor can be evaluated by the sensitivity of reliability to the noise factor.
The sensitivity is obtained by using (5.48) and is summarized in Table 5.17. The table also lists the priority of the seven types of noise factors ordered by the sensitivity, assuming thatM(t) > β > α. Because it is impossible to include all noise factors in an experiment, only the noise factors in high-priority groups should be considered.
5.12.4 Experimental Layout
Signals from prime systems to diagnostic systems have a binary state: success or failure. Diagnostic systems should be robust against the states and noise factors.
In robust design the signals and noise factors go to an outer array, with the design parameters placed in an inner array. A generic experimental layout for the robust design is shown in Table 5.18. In this table, M1= 0 indicates that the prime system is functioning andM2= 1 indicates that the prime system has failed; αij
and βij(i = 1, 2, . . . , N; j = 1, 2, . . . , l) are the values of α and β at the cross combination of row i and column j .
ROBUST RELIABILITY DESIGN OF DIAGNOSTIC SYSTEMS 177
TABLE 5.18 Generic Experimental Layout for Diagnostic Systems
Design Parameter M1= 0 M2= 1
Run A B C . . . . z1 z2 . . . zl z1 z2 . . . zl
1 α11 α12 . . . α1l β11 β12 . . . β1l
2 α21 α22 . . . α2l β21 β22 . . . β2l
3 Orthogonal array α31 α32 . . . α3l β31 β32 . . . β3l
... ... ...
N αN 1 αN 2 . . . αN l βN 1 βN 2 . . . βN l
Experiments are conducted according to the layout. The layout requires a diagnostic system with the same setting of design parameters to monitor both functioning and failed prime systems at various noise levels. For example, in the first run, a diagnostic system with the first set of parameters is built to diagnose the functioning prime system working at each of thel noise levels. Then the same diagnostic system is used to monitor the failed prime system at each noise level. During experimentation, record the number of failure occurrences detected by the diagnostic system while running at M1= 0, and the number of failure occurrences that are not detected by the diagnostic system while running atM2 = 1. By definition, αij is estimated by the number of failure occurrences detected, divided by the total number of replicates givenM1= 0; the estimate of βij is the number of undetected failure occurrences divided by the total number of replicates givenM2= 1.
5.12.5 Experimental Data Analysis
At the time of interestτ (e.g., the warranty period or design life), the reliability of the diagnostic system at the cross combination of row i and column j is calculated from (5.47) as
Rij(τ )= (1 − αij)− (βij − αij)M(τ ).
Estimates of reliability are used to compute the signal-to-noise ratio using (5.36).
Table 5.19 summarizes the estimates of reliability and signal-to-noise ratio.
Once the estimates of the signal-to-noise ratio are calculated, ANOVA or graphical response analysis should be performed to identify the significant factors.
Optimal levels of these factors are chosen to maximize the signal-to-noise ratio.
Finally, the optimality of the setting of design parameters selected should be verified through a confirmation test.
5.12.6 Application Example
The example is to show howα, β, reliability, and signal-to-noise ratio are calcu-lated with the automobile test data. The steps for robust design are standard and thus are not given in this example.
TABLE 5.19 Estimates of Reliability and Signal-to-Noise Ratio for Diagnostic Systems
Run z1 z2 · · · zl ˆη
1 Rˆ11 ˆR12 · · · ˆR1l ˆη1
2 Rˆ21 ˆR22 · · · ˆR2l ˆη2
3 Rˆ31 ˆR32 · · · ˆR3l ˆη3
... ... ... ... ... ...
N ˆRN 1 ˆRN 2 · · · ˆRN l ˆηN
Test Method A sport utility vehicle installed with an on-board diagnostic mon-itor with a current setting of design parameters was tested to evaluate the robust-ness of the monitor. Load and engine speed [revolutions per minute (RPM)] are the key noise factors disturbing the monitor. The combinations of load and RPM are grouped into seven noise levels; at each level both the load and RPM vary over an interval because of the difficulty in controlling the noise factors at fixed levels. Table 5.20 shows the noise levels. The vehicle was driven at different combinations of load and RPM. The prime system (component) being monitored is expected to have 10% failure probability at the end of design life (τ = 10 years). Thus, failures at 10% probability were injected into the component under monitor during the test trips. The test recorded the number of failures undetected when failures were injected (M2= 1), and the number of failures detected when no failures were injected (M1 = 0).
Test Data At each noise level, the number of injected failures, the number of injected failures undetected, the number of successful operations, and the number of failures detected from the successful operations, denoted I1, I2, S1, andS2, respectively, are shown in Table 5.20. The test data are coded to protect the proprietary information.
Data Analysis The estimates of α and β equal the values of S2/S1 and I2/I1, respectively. The reliability of the monitor at 10 years at each noise level is
TABLE 5.20 Noise Levels and Coded Test Data
Noise Level Load RPM (×1000) S2/S1 I2/I1
z1 (0.0, 0.3) (0.0, 1.6) 1/3200 0/400
z2 (0.0, 0.3) [1.6, 3.2) 100/10,400 110/1200
z3 [0.3, 0.6) [1.6, 3.2) 30/7500 40/800
z4 [0.6, 0.9) [1.6, 3.2) 30/3700 100/400
z5 (0.0, 0.3) [3.2, 4.8) 20/600 20/80
z6 [0.3, 0.6) [3.2, 4.8) 30/4800 300/600
z7 [0.6, 0.9) [3.2, 4.8) 160/7800 800/900
CASE STUDY 179
TABLE 5.21 Estimates ofα, β, Reliability, and Signal-to-Noise Ratio
z1 z2 z3 z4 z5 z6 z7
ˆα 0.0003 0.01 0.004 0.008 0.033 0.006 0.02
ˆβ 0 0.09 0.05 0.25 0.25 0.5 0.89
ˆR(τ) 0.9997 0.982 0.9914 0.9678 0.9453 0.9446 0.893
ˆη 34.83
calculated from (5.47) with the α and β estimates and M(τ )= 0.1. Then the signal-to-noise ratio of the monitor is computed from (5.36). Table 5.21 summa-rizes the estimates ofα, β, reliability, and signal-to-noise ratio.