SERVICE CONDITION
4. DOE-Based Approaches to Reliability Modeling
4.4. Design the Tests
The next step is to design the experiment itself. The tests must be designed to determine the specific factor level combinations to be tested, and the order in which they will be tested. There are many things that will influence the design of the experiment, including sample availability, the cost of running the tests, the time allotted for the tests, and test equipment availability.
As an example of a simple experimental design, consider Figure 4.4-1.
Figure 4.4-1: DOE Terminology
In this example, there are three factors to be assessed, A, B and C, represented by the three right-hand columns. Each factor has two levels, a “+” indicating the high level and a “–“ indicating the low level. This experiment has four runs, each one representing a treatment. A treatment refers to the combination of levels used in the tests.
Reliability Information Analysis Center
Repetition and replication are techniques used to increase the number of runs. The advantage of increasing the number of runs is that obtaining multiple responses with exactly the same factor levels is valuable in quantifying the amount of variability and error in the measurements obtained. Repetition is the practice of repeating the same run sequentially. Replication is the practice of repeating a set of runs sequentially. Both practices will result in multiple responses for a given set of factor levels, but the advantage of replication over repetition is that it is better able to quantify measurement error in the event when there is a gradually changing parameter in the test or
measurement system.
The full-factorial approach will be used as an example for illustrating the concepts of data analysis, followed by a discussion of other approaches.
A full-factorial design, an example of which is shown in Table 4.4-1, is the most comprehensive experimental design. It includes runs which represent all possible
combinations of factor levels. The primary drawback to the full-factorial approach is that it requires many runs. In some cases, this may be practical, but in many cases, the cost and time required to carry out the experiments are prohibitive.
Table 4.4-1: Full-Factorial Example
Run A B C R (response) 1 + + + R1 2 + + - R2 3 + - + R3 4 + - - R4 5 - + + R5 6 - + - R6 7 - - + R7 8 - - - R8
The number of required runs is calculated as yx, where “y” is number of levels per factor (2, for this example), and “x” is number of factors (3). In Table 4.4-1, then, the number of runs is 23=8.
There are many alternatives to the full factorial approach. “One-Factor-at-a-Time” experiments, illustrated in Figure 4.4-2, refer to experiments in which each run varies the level of one factor. In this manner, the effects of each factor can be assessed by
comparing the response between the two successive runs in which the factor was varied. This is generally a brute force way to perform experiments, and is usually very
inefficient.
Figure 4.4-2: One-Factor-at-a-Time Experiments
Fractional Factorial Orthogonal Array Experiments can be used when it is impractical to perform a full factorial experiment. Characteristics of orthogonal experiments are as follows:
• They use a fraction of the number of full-factorial combinations
• The treatments are chosen to provide enough information to analyze the effects of a factor using analysis of means
• “Orthogonal” means that the combination of factors are balanced such that the weight of all factors are equal
• “Orthogonal” also means that the effects of the factors can be assessed independently of the others
A full-factorial array can be scaled such that the resultant array has the characteristics of orthogonality. These are referred to as fractional factorial arrays, since only a fraction of the full-factorial runs are required, yet are still orthogonal. The naming convention for these arrays is determined from:
Reliability Information Analysis Center
where:
a = the number of experimental runs y = the number of levels
x = the number of factors
In the previous examples, “y” and “x” were the number of factors and the number of runs, respectively. In the standard DOE nomenclature, however, “La” refers to the number of runs. For example, a seven-factor, two-level experiment for which there will be eight runs is shown in Figure 4.4-3.
Figure 4.4-3: Standard DOE Nomenclature
Another critical element that must be considered when defining reliability tests is the potential interactions between factors. Everything discussed thus far in this section has assumed that the effects of each of the factors are independent of each other. In practice, there are often interactions between factors that must be accounted for. Graphical representations of potential interactions are shown in Figure 4.4-4. Referring to the Figure, if the responses for the two levels of the “B-factor” plotted against the two levels of the “A-factor” are parallel, then this is an indication that there is no interaction
between the two factors. This is shown on the top left. In other words, the relative magnitudes of the B-response are independent of the level of “A”. If however, when the plots of the same factors result in the plot on the top right, then this is an indication that there is a strong interaction between factors A and B. In this example, the levels of “A” change the entire relationship between the B-levels and the response. The plot on the bottom indicates that there is a mild interaction between the two factors.
( )
x y LaFigure 4.4-4: Potential Interactions
If the potential interactions are not accounted for in the reliability test plan, the risk is that the effects of the factors cannot be deconvolved (separated) from the interactions between the factors. There are many DOE test plans and tools that assist in identifying the
capability of various plans to identify main effects and interactions.
A detailed treatment of DOE principals is beyond the scope of this book, as this has been done extensively in the literature, but it is important to understand the impact of some of the principals as they pertain to reliability testing.
Resolution is a term that describes the degree to which the main effects of factors are aliased, or confounded, with the interactions amongst factors. In general, the resolution number of a design is one more than the smallest order interaction with which some main effects are aliased. For example, if some main effects are confounded with some 2-level interactions, the resolution number of the DOE is 3. Since full-factorial designs test the response of every possible combination of factors, there is no confounding and, therefore, they have infinite resolution. As stated previously, since the implementation of a full- factorial test is often not practical, weaker tests are often necessary. The key is to select the aliasing structure of the test such that the actual critical interactions can be
Reliability Information Analysis Center
To illustrate this, consider an example of a corrosion failure mechanism that is accelerated by temperature, humidity and the level of ionic contamination. A full factorial, 2-level per factor, plan would be as shown in Table 4.4-2. The “-1” and “1” designation represent the low and high levels of the factors, respectively. For this full- factorial, 2-level plan, eight runs are sufficient to test all possible combinations.
Table 4.4-2: Full and Half Factorial Example for Corrosion
Main effects Interactions Temperature
(T)
Humidity (H)
Ionic
contamination (I) T*H T*I H*I
Full- Factorial 1 -1 1 -1 1 -1 -1 1 -1 -1 1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 -1 -1 -1 1 1 -1 -1 1 1 1 1 1 1 1 -1 -1 -1 1 1 1 1 1 -1 1 -1 -1 Half- Factorial (Resolution = 3) 1 1 1 1 1 1 1 -1 -1 -1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 1 -1
Another possible plan would be a half factorial, also shown in Table 4.4-2. Notice that, for the half-factorial design, the temperature-humidity (T*H) interaction (i.e., the product of the two) is the same as for ionic contamination (I). Also, the T*I interaction is the same as H, and the H*I interaction is the same as T. Therefore, this Resolution 3 plan is incapable of deconvolving the main effects of T, H or I with the interactions of the other two.
From physics, we know that both humidity and ionic contamination are required for corrosion. Therefore, the fact that H*I is the same as T (i.e., they are confounded) is unacceptable, since we would not be able to determine if the lifetime is governed by temperature, or the combination of humidity and ionic contamination. Therefore, we need a better DOE test plan. The full-factorial plan would be the best, if it could be executed, since none of this confounding exists. For the full-factorial plan, notice that none of the interaction terms are the same as the main effects.
If we were to actually model this failure cause based on the tests defined in these plans, the general form of the reliability model may be based on the two parameter Weibull distribution, which is:
β α⎟⎠ ⎞ ⎜ ⎝ ⎛ −
=
te
R
where:R = the reliability, or probability of survival, at time “t” α = the characteristic life (i.e., time to 63% failure) β = the Weibull shape parameter
The characteristic life is then developed as a function of the applicable variables. The model in this case is:
3 4 2 1 0 α α α α α
α
=
e
e
TH
I
HI
where:α 0 through α 4 = parameter coefficients estimated in the life modeling process T = the temperature in degrees K (degrees C+273)
H = the relative humidity I = the ionic contamination
HI = the product of humidity and ionic contamination
All model parameters, α 0 through α 4, could be adequately quantified with the full- factorial design, but not with the half-factorial.
There are many other potential test plans that would be adequate, providing that the required model variables can be quantified and are not confounded with one another (Reference 1).