Experiments
Designed experiments play an important role in quality improvement. While the confidence intervals and hypothesis tests previously discussed are limited to rather simple comparisons between one sample and requirements or between two samples, the designed experiments will use ANOVA (analysis of variance) techniques to partition the variation in a response amongst the potential sources of variation.
The traditional approach, which most of us learned in high school science class, is to hold all factors constant except one. When this approach is used we can be sure that the variation is due to a cause and effect relationship or so we are told. However, this approach suffers from a number of problems:
It usually isn’t possible to hold all other variables constant.
There is no way to account for the effect of joint variation of independent variables, such as interaction.
There is no way to account for experimental error, including measurement variation.
The statistically designed experiment usually involves varying two or more variables simultaneously and obtaining multiple measurements under the same experimental conditions. The advantage of the statistical approach is threefold:
Interactions can be detected and measured. Failure to detect interactions is a major flaw in the OFAT approach.
Each value does the work of several values. A properly designed experiment allows Green Belt/Black Belt to use the same observation to estimate several different effects. This translates directly to cost savings when using the statistical approach.
Experimental error is quantified and used to determine the confidence the experimenter has in his conclusions.
Black belts use Design of Experiments (DOE) to craft well-designed efforts to identify which process changes yield the best possible results for sustained improvement. Whereas most experiments address only one factor at a time, the Design of Experiments (DOE) methodology focuses on multiple factors at one time. DOE provides the data that illustrates the significance to the output of input variables acting alone or interacting with one another.
"A branch of applied statistics dealing with planning, conducting, analyzing and interpreting controlled tests to evaluate the factors that control the value of a parameter or group of parameters."
DOE provides these advantages over other, more traditional methods:
Evaluating multiple factors at the same time can reduce the time needed for experimentation.
Some well-designed experiments do not require the use of sophisticated statistical methods to understand the results at a basic level. However, computer software can be used to yield very precise results as needed.
The costs vary depending on the experiment, but the financial benefits realized from these experiments can be substantial.
The figure below depicts an example of a relationship between the components that DOE examines:
Process input variables, normally referred to as x variables and as factors in DOE terminology.
Process output variables, normally referred to as y variables and as responses in DOE terminology.
The relationship between input variables and output variables.
The interaction, or relationship, between input variables as it relates to the output variables.
Figure 40: Example of relationship between components that DOE examines DOE Terminology
Certain terms are frequently used with DOE that need to be defined clearly.
Factor: A predictor variable that is varied with the intent of assessing its effect on a response variable. Most often referred to as an "input variable."
Factor Level: A specific setting for a factor. In DOE, levels are frequently set as high and low for each factor. A potential setting, value or assignment of a factor of the value of the predictor variable. For example, if the factor is time, then the low level may be 50 minutes and the high level may be 70 minutes.
Response variable: A variable representing the outcome of an experiment. The response is often referred to as the output or dependent variable.
Treatment: The specific setting of factor levels for an experimental unit. For example, a level of temperature at 65° C and a level of time at 45 minutes describe a treatment as it relates to an output of yield.
Experimental error: An error from an experiment reveals variation in the outcome of identical tests.
The variation in the response variable beyond that accounted for by the factors, blocks, or other assignable sources while conducting an experiment. Certain terms are frequently used with DOE that need to be defined clearly.
Experimental run: A single performance of an experiment for a specific set of treatment conditions.
Experimental unit: The smallest entity receiving a particular treatment, subsequently yielding a value of the response variable.
Predictor Variable: A variable that can contribute to the explanation of the outcome of an experiment. Also known as an independent variable.
Repeated Measures: The measurement of a response variable more than once under similar conditions. Repeated measures allow one to determine the inherent variability in the measurement system. Repeated measures are known as "duplication" or 'repetition."
Replicate: A single repetition of the experiment. See also replication.
Replication: Performance of an experiment more than once for a given set of predictor variables.
Each of the repetitions of the experiment is called a "replicate." Replication differs from repeated measures in that it is a repeat of the entire experiment for a given set of predictor variables, not just repeat of measurements of the same experiment. Note: Replication increases the precision of the estimates of the effects in an experiment. Replication is more effective when all elements contributing to the experimental error are included. In some cases replication may be limited to repeated measures under essentially the same conditions. In other cases, replication may be deliberately different, though similar, in order to make the results more general.
Repetition: When an experiment is conducted more than once, repetition describes this event when the factors are not reset. Subsequent test trials are run again but not necessarily under the same conditions.
DOE Applications
Planning the experiment is probably the most important task in the Improve phase when using DOE. For planning to be done well, some experts estimate that 10-25% of your time spent should be devoted to planning and organizing the experiments.
The purpose of DOE is to create an observable event from which data may be extracted and decisions made about the best methods to improve the process. DOE may be used most effectively in the following situations:
o Identifying factors that produce a specific response or outcome o Selecting between alternative approaches to effect the best outcome
In DOE, a full factorial design combines levels for each factor with levels for all other factors. This basic design ensures that all combinations are used, but if factors are many, this design may take too much time or be too costly to implement. In either case, a fractional factorial design is selected as the number of runs is fewer with fewer treatments.
DOE Planning Process
The project team decides the exact steps to follow in the Improve phase. Steps to include in the Improve phase may actually be identified in the Measure and Analyze phases and should be noted to expedite later planning in the Improve phase. What follows is a suggested guide for planning the experiment(s) to be conducted in the Improve phase. The suggested process may be modified depending on the exact situation.
To use DOE, follow these steps:
1. Establish experiment objectives: Objectives differ per project, but the designs typically fall into three categories to support different objectives:
Screening – used to identify which factors are most important.
Characterization – used to quantify the relationships and interaction between several factors.
Optimization – used to develop a more precise understanding of just one or two variables.
2. Identify factors to be considered
Label both input variables (x factors) and output variables (y responses) in the experiment.
Use information collected in prior phases to assist in the identification process.
3. Finalize an experiment design
Select a design for the experiment.
Choose a design type (full factorial, fractional factorial, or others) that meets the experiment’s objectives.
Determine how the factors are measured.
Consider the resources needed and determine whether a practice run or pilot experiment may be needed.
4. Run the experiment
Run the experiment and collect the data. Place initial data in the results column of a design array, a graphical representation of the experiment factors and results. Roll over Page Resources and click on Design Array for an example.
Minimize chance for human error by carefully planning where human error could occur and allow for the possibility in the planning process.
Randomize the runs to reduce confounding (defined later in this topic).
Document the results as needed depending on the experiment.
5. Analyze the results of the experiment
Review the results of the experiment(s).
Examine the relationships among input variables (factors) acting together and with regards to the output variable(s) (responses).
6. Make decisions on next steps
Based on the results, determine next steps.
Are additional runs of the experiment needed?
Do the levels need to be modified prior to conducting the experiment again?
If the results point to an optimal solution, implement the factors and the levels of choice and look at the Control phase to sustain the desired improvements.
Design Array
Table 27: Example of Design of Array
A: TEMPERATURE B: TIME C: CATALYST VOLUME RESULTS (YEILD %)
1 - - -
2 + - -
3 - + -
4 + + -
5 - - +
6 + - +
7 - + +
8 + + +
Barriers to the Planning Process
In all projects, barriers present themselves as obstacles to the project’s successful completion. The Improve phase of DMAIC is no exception. The following are examples of the types of barriers to watch for during planning for an experiment:
Objectives or purpose are unclear – the objectives are not developed and fully understood.
Factor levels are either set too low or too high – factor levels set inappropriately can adversely affect the data and understanding of the relationships between factors.
Unverified or misunderstood data from previous phases may lead to errors in planning and assumptions.
Experimentation is a cost, although DOE is more cost effective than some other options like OFAT (one-factor-at-a-time) experiments, the costs can be too expensive and need to be considered carefully.
Lacks of management support – Experiments require the full support of management in order to effectively use the resources required.
Selecting Experiment Factors
Identifying process variables, both inputs/factors and outputs/responses, is an important part of the planning process. While the selection process varies based on the information gathered in the Analyze phase and the objectives of the experiment, variables should be selected that have the following basic characteristics:
Important to the process in question – Since many inputs and output variables may exist for a process, most experiments focus on only the most critical inputs and outputs for a process. Such emphasis makes it more likely to successfully improve the most relevant parts of a process and, on a practical level, limits the number of variables and the cost of conducting the experiment.
Identifiable relationships to the inputs and the outputs – If relationships are already evident based on prior information gathered, the design of the experiment can be more focused on those factors with the most positive impact on outputs.
Not extreme level values – The information related to the level values for the factors should not be extreme. Values that reflect a reasonable range around the actual performance of the factors usually yield the best results.
There is no magic formula or equation for selecting the factors. The guidelines listed above and a review of the analysis work done in previous phases should provide a good basis for selection. Remember, the goal of Improve is to model the possible combination of factors and levels that yield valid and necessary results. We recommend the use of process experts for selecting experimental factors and levels based on prior analysis.
The prior analysis should suggest what the critical factors are and where the levels should be set for a first run in the experiment.
Other Planning Considerations
The DOE planning phase may include other considerations for the project team.
Iterative process: One large experiment does not normally reveal enough information to make final decisions. Several iterations may be necessary so that the proper decisions may be made and the proper value settings verified.
Measurement methods: Ensure that measurement methods are checked out prior to the experiment to avoid errors or variations from the measurement method itself. Review measurement systems analysis to ensure methods have been reviewed and instruments calibrated as needed, etc.
Process control and stability: The results from an experiment are more accurate if the process in question is relatively stable.
Inference space: If the inference space is narrow, then the experiment is focused on a subset of a larger process such as one specific machine, one operator, one shift, or one production line. With a narrowed or focused inference space, the chance for “noise” (variation in output from factors not directly related to inputs) is much reduced. If the inference is broad, the focus is on the entire process and the chances for noise impacting the results are much greater.
Types of Experiment Designs: Part of planning an experiment is selecting the experiment design.
o 2-level, 2 factor: The simplest of design options, the 2-level, 2-factor design uses only four combinations or runs. The number in the first column represents the run number. The "+"
symbol represents the high level; the "–" symbol represents the low level.
Table 28: Example of 2 level, 2 factor Factor A Factor B
1 + +
2 + -
3 - +
4 - -
o Full Factorial
Table 29: Example of Full Factorial
A: Temperature B: Time C: Catalyst Volume Results (Yield %)
1 - - - 72
2 + - - 90
3 - + - 79
4 + + - 89
5 - - + 78
6 + - + 88
7 - + + 81
This design option includes all levels and all factors for a given process. The advantage of a full factorial design is that all factors and levels are part of the experiment, thus ensuring the most complete data.
If there are 2 levels and 6 factors (26), then there are 64 possible runs for the experiment. A common description of factorial experiments is the designator Lf where f is the number of factors in the experiment and L is the number of levels.
o Fractional Factorial: This design is best used when you are unsure about which factor influences the response outcome or when the number of factors is large (usually considered to be 5 or more). A fractional factorial uses a subset of the total runs. For example, if there are 2 levels and 6 factors (26), then there are 64 possible runs for the experiment. For a fractional factorial, the experiment could be reduced to 32 runs or perhaps even 16 runs. An example of this may be viewed later in this topic.
Design Principles: Black Belts adhere to a set of design principles to assist in the proper experiment design.
Power: The equivalent to one minus the probability of a Type II error (1-β). A higher power is associated with a higher probability of finding a statistically significant difference. Lack of power usually occurs with smaller sample sizes.
The Beta Risk (i.e., Type II Error or Consumer’s Risk) is the probability of failing to reject the null hypothesis when there is significant difference (i.e., a product is passed on as meeting the acceptable quality level when in fact the product is bad). Typically, (β) = 0.10%. This means there is a 90% (1-β) probability you are rejecting the null when it is false (correct decision). Also, the power of the sampling plan is defined as 1-β, hence the smaller the β, the larger the power.
Sample Size: The number of sampling units in a sample.
Determining sample size is a critical decision in any experiment design. Generally, if the experimenter is interested in detecting small effects, more replicates are required than if the experimenter is interested in detecting large effects. Increasing the sample size decreases the margin of error and improves the precision of the estimate. There are several approaches to determining sample size including, but not limited to: Operating Characteristic Curves, Specifying a Standard Deviation Increase, and Confidence Interval Estimation Method.
Note: In a multistage sample, the sample size is the total number of sampling units at the conclusion of the final stage of sampling.
Balanced Design: A design where all treatment combinations have the same number of observations.
If replication in a design exists, it would be balanced only if the replication was consistent across all the treatment combinations. In other words, the number of replicates of each treatment combination is the same.
Replication: Performance of an experiment more than once for a given set of predictor variables.
Each of the repetitions of the experiment is call a replicate. Replication differs from repeated measures in that it is a repeat of the entire experiment for a given set of predictor variables, not just a repeat of measurements for the same experiment.
Replication involves an independent repeat of each factor combination in random order. For example, suppose a metallurgical engineer is interested in studying the effect of two different hardening processes: oil quenching and saltwater quenching on an aluminum alloy. If he has five alloy specimens and treats them in each of the hardening processes, we will make ten observations. These should be done in random order to maintain the properties of replication. First, the experimenter can obtain an estimate of the experimental error which becomes a basic unit of measurement for determining whether observed differences in the data are really statistically different. Second, if the sample mean is used to estimate the true mean response for one of the factor levels in the experiment, replication permits the experimenter to obtain a more precise estimate of this parameter.
Repetition: When an experiment is conducted more than once, repetition describes this event when the factors are not changed or reset. Subsequent test trials are run again but not necessarily under the same conditions.
Efficiency: In experimental designs, efficiency refers to an experiment that is designed in such a way as to include the minimal number of runs and to minimize the amount of resources, personnel, and time utilized.
Randomization: The process used to assign treatments to experimental units so each experimental unit has an equal chance of being assigned a particular treatment. Randomization validates the assumptions made in statistical analysis and prevents unknown biases from impacting conclusions. By randomization we mean that both the allocation of the experimental material and the order in which the individual runs or trials of the experiment are to be performed are arbitrarily determined.
For example, suppose the specimens in a metallurgical hardness experiment are of slightly different thicknesses and the effectiveness of the quenching medium may be affected by the specimen thickness. If all the specimens subjected to the oil quench are thicker than those subjected to the saltwater quench, systematic bias may be introduced into the results. This bias handicaps one of the quenching media and consequently invalidates our results. Randomly assigning the specimens to the quenching media alleviates this problem.
Blocking: The method of including blocks in an experiment in order to broaden the applicability of the conclusions or to minimize the impact of selected assignable causes. The randomization of the experiment is restricted and occurs within blocks.
Order: The order of an experiment refers to the chronological sequence of steps to an experiment.
The trials from an experiment should be carried out in a random run order. In experimental design,
The trials from an experiment should be carried out in a random run order. In experimental design,