Statistics for Materials Engineers
MATLS 3J03
© KevinDunn,2013
Instructor: Tim Dietrich
Overall revision number: 19 (January 2013)
Design resolution
What is “Resolution” ?
The length of the shortest word in the defining relation.
I 27−4 example: shortest word = 3 letters: 27III−4 design
I main effects confounded with 2-factor interactions
I 28−5 example: shortest word = 4 letter: 28−5 IV design
I main effects confounded with 3-factor interactions
I 25−1 half-fraction design
I four factors as standard factorial
I factorE = ABCD, soI = ABCDE
I it is a 25V−1design
Design resolution
Resolution: shows how clearly effects are separated
I Let main effects = 1
I Two-factor interactions = 2
I Three-factor interactions = 3
Resolution V designs
I 5−1 = 4: main effects confounded with 4-factor interactions
I 5−2 = 3: 2-fi confounded with 3-fi
I 5−3 = 2: 3-fi confounded with 2-fi
Aim for a higher resolution, but accept a lower resolution initially, in order to test more factors
Design resolution
Resolution III designs
I Used for: Excellent for initial screening
I Type of confounding?
Resolution IV designs
I Used for: Learning about and understanding a system (characterization)
I Type of confounding?
Resolution V designs and full factorial designs
I Used for: Optimizing a process, understanding complex effects
I To develop high-accuracy models
Design resolution
Saturated designs - screening
I Resolution III design
I Screen many factors
I Good for evaluating a new system
I Lab-scale work
I New product development
I Transfer from the lab to plant scale
Saturated design example
1. Create X andy matrices from the table
2. Solve forb
3. Plot Pareto plot of main effects (they are highly confounded)
4. Usual assumption: high order interactions are small
Saturated design example
I A,CandG are significant
I E: fairly small
I βbE→E + AC + BG + DF
I could be due to main effectE I or due toAC, and/orBGand/or
DFinteractions
I FactorsB,D andFare not important
I If we remove them, we are left with 4 factors, in 8 runs
I This automatically becomes a
resolution IV design
I The coefficients for other terms will not change(Why?)
Next experiments: focus onA,C,G and their interactions. Maybe keepE, but it is small enough that it is likely negligible.
Saturated designs: note
I Fraction factorials: 2k−p runs
I for integersk andp: 4,8,16,32,64,128, . . . I Plackett and Burman designs are for screening also:
I multiples of 4: 12,16,20,24,28, . . .runs
I Box and Bisgaard paper: “What can you find out from 12 experimental runs?”
Foldover: de-aliasing
I Experiments are not a one-shot operation; always run sequential experiments
I How should we work with fractional factorials?
I highly confounded; but say one factor Cis important
I switch the sign ofC: fromCto−C
I Also switch the signs of the terms that depend on it:
I e.g. ifD = ABC,
I then: D=AB(−C) =−ABC
I repeat the fractional set of experiments
I combine the results of both fractions: theXandy will have
double the number of rows
I it unconfoundsC: this main effect will be estimated on its own, and have no confounding associated with it
Switching the sign of a factor will de-alias its main effect and all its associated two-factor interactions.
Foldover: removing 2-fi confounding
I Run another fraction, but switch all the signs in the design table
I i.e. letA=−A, letB=−B, etc
I Also switch the signs of the terms that are generated from it (see previous slide)
I Run another fractional factorial
I Combine both sets of experiments
I All 2-fi will be removed from the main effects; will still be confounded with higher order interactions
Projectivity
Fractional factorials collapse to full factorials when effects are insignificant.
Projectivity= P = resolution - 1 = highest number of factors which can form a full factorial, that are embedded in your fractionated set of experiments.
Example (previous exam/test)
You are developing a new product, but struggling to get product stability (measured in days), to the required level. Aim for stability above 50 days. Four factors considered:
I A = monomer concentration: 30% or 50% I B = acid concentration: low or high
I C= catalyst level: 2% or 3% I D = temperature: 393K or 423K Experiments in standard order:
Example (continued)
1. How was the experimented generated?
2. What is the defining relationship?
3. What will be aliased withA; with D and withBC?
4. Describe the aliasing structure (resolution)?
5. What is the model’s intercept; main effect for A; and for the AD interaction?
Example (continued)
If the least squares model is:
y= 29.5−5.75xA−3.75xB −1.25xC + 0.75xD+ 0.50xAxB + 1.0xAxC −1.0xAxD
what is the predicted stability when operating at:
I monomer concentration of 25%
I low acid concentration
I 1.5% catalyst level
I a temperature of 408 K
Response surface methods
Objective for theresponse surface method (RSM): achieve the best response using sequential experimentation.
Wasn’t the COST approach also sequential experimentation?
Different to COST: We are going to change multiple variables at a time!
George E. P. Box: he pioneered RSM
G. E. P. Box and K. B. Wilson (1951): “On the Experimental Attainment of Optimum Conditions”,Journal of the Royal Statististical Society.B 13, 1 - 45.
[Photo credit: JMP/SAS]
I October 1919 to 28 March 2013
I “... essentially, all models are
wrong, but some are useful”
Single-variable case
We could have got to optimum faster if we had used quadratic (or spline) approximations.
Single-variable case
This coincides with the COST approach:
I take exploratory steps of γi towards an optimum
I refit the model once we plateau
I repeat
We are going to do exactly the same, but withmultiple variables. Key points:
1. use the model to optimize with
2. stop once you detect the “model” is inadequate
3. the rebuild/refit it
Analogy for finding the optimum
Motivation: an example from reactor design
I Reactor inlet temperature,T, and pressure, P can be adjusted I Leads to different responses,y = methanol yield, for different
combinations of T andP
I What if we did not have a reliable “first-principles” model?
2-variable example
I Current baseline:
I T = 325 K
I S = 0.75 g.L−1
I profit =$407 per day
I Example worked out on the board
2-variable example
Questions:
I how “wide” should the initial factorial be?
I should it be a fractional or full factorial?
I how do you deal with more than 2 variables?
I how do you deal with integer variables?
I how large should steps be?
Both experiment 6 and 7 were wasteful.
2-variable example
2-variable example
2-variable example
Adding second order effects; use a central composite design.
I CCD design: full factorial + axial points + center points
2-variable example
Add quadratic terms to model: ˆ
y=b0+bTxT +bSxS +bTSxTxS +bTTxT2 +bSSxS2
y = Xb+e
y8 y9 y10 y11 y6 y13 y14 y15 y16 =
1 −1 −1 +1 +1 +1 1 +1 −1 −1 +1 +1 1 −1 +1 −1 +1 +1 1 +1 +1 +1 +1 +1
1 0 0 0 0 0
1 0 −1.41 0 0 2 1 1.41 0 0 2 0 1 0 1.41 0 0 2 1 −1.41 0 0 2 0
2-variable example
I The next experiment is based on the contour plot output, i.e.
I T(17)≈343 K
I S(17)≈1.60 g.L−1
I yˆ(17)= $736
I yactual(17) = $738
General approach for RSM
1. Start at baseline; run full or fractional factorial
I yˆ=b0+bAxA+bBxB+. . .+bABxAxB+bACxAxC+. . .
2. Main effects usually greater than 2-factor interaction 3. Estimate path of steepest ascent (or descent):
I ∂yˆ
∂xA
=bA
∂yˆ
∂xB
=bB
∂yˆ
∂xC
=bC . . .
I Move bA units inxA;and bB units inxB;andbC units inxC
etc
I These are coded units. Unscale to real-world units!
I Implement a portion of the full step, e.g. only 25% if full step is too large.
4. Make several sequential steps until response levels off
I e.g. y6= 600;y7 = 800,y8 = 825,y9 = 750
General approach for RSM
5. Use a new factorial at the previous peak (e.g. at they8 point)
I perhaps add other factors
I flip signs on binary factors
6. Repeat steps 1 to 5, until linear model is insufficient
I Curvature shows up
I 2-factor interactions similar or greater than main effects
I Contour plots will show this clearly
7. Estimate (calculate) a quadratic model
I Use a central composite design; uses 3-levels per factor
I Add quadratic terms to model, e.g. . . .+bAAxA2+bBBxB2+. . .
8. Draw contour plots (surfaces) and move to next optimum
What is the response variable
I Single y is not always feasible
I Use y = “total costs”, or y = “net profit”
I Superimpose contour surfaces
Page 579 of the Hill and Hunter review article - reference in the notes.
Evolutionary operation (EVOP)
I Similar concept to RSM
I Processes are not constant, the optimum is shifting
I heat-exchanger fouling
I build-up inside reactors and tubing
I catalyst deactivation
I slowly varying disturbances
I Iterative hunt for the process optimum:
I make small perturbations within daily production
I use replicate runs and average
I move along the response surface
General approach for experimentation
I Box: “The best time to run an experiment is after the experiment”
I Box: “To find out what happens when you interfere with a system, you must interfere with it, not passively observe it.”
I Box: “Discovering the unexpected is more important than confirming the known”
I Box: “Do not spend more than 20% to 25% of your time and budget on your first group of experiments”
Phase 1: screening runs
Phase 2: sequential experiments to augment screening Phase 3: optimizing; RSM and full factorials
Phase 4: maintain the optimum, search for better optima
Mistakes, missing values, and constraints
I If you do not, or cannot, reach −1 or +1:
I Use a least squares model, with the coded values actually used in the experiment. E.g.:
I −1 corresponds to 425K, and +1 corresponds to 475K
I experiment ran at 455K, instead of 475K
I then use 455−450
(475−425)/2= 0.2 in theXmatrix
I (see the next slide)
I You lose some orthogonality in theXmatrix
I Missing values
I main effects estimate multiple times (so we have built-in redundancy already)
I drop out insignificant terms, e.g. a 2-fi, and estimate fewer parameters
Handling of constraints
I New runs are not independent; lost orthogonality
Optimal designs
I What is sub-optimal about our existing designs? Nothing!
I Use optimal designs when:
I constraints are complex (plane constraints;k ≥3)
I estimating a non-standard model
I running a reduced number of experiments
I have more than 2-levels per factor
I want to add experiments to existing runs
Optimal designs
Computer-based approach:
1. User specifies the model (i.e. the parameters)
2. Computer finds all possible combinations (grid approach)
I user can augment this list, called candidate points
I center-points added
3. User specifies number of experiments
4. Computer iteratively selects the “optimal” set Optimality criteria:
I A-optimal: minimizes trace
(XTX)−1
I D-optimal: maximizes det(XTX)
I G-optimal: minimize maximum variance of ˆy
I V-optimal: minimize average variance of ˆy
Optimal designs
I A full factorial design, 2k is already A-, D- G- and V-optimal.
I D-optimal designs work well; used most often.
Mixture designs
I Fine chemicals, pharmaceuticals, food manufacturing, and polymer processing
I There are screening and optimization designs for mixtures also
I Constraint for mixtures: P
ixi = 1 I Cannot be changed independently