FENNER, JOEL SCOTT. Multi-Systems Operation and Control. (Under the direction of Jye-Chyi Lu.)
by
Joel S. Fenner
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in
STATISTICS
in the
GRADUATE SCHOOL
at
NC STATE UNIVERSITY
2000
Professor Jye-Chyi Lu Professor Jackie Hughes-Oliver
Chairman of Advisory Committee
Professor John Monahan Professor William Holton
BIOGRAPHY
To my dad,
Kenneth H. Fenner,
Contents
List of Figures
vii
List of Tables
xi
I Background
1
1 Introduction
2
2 Statistical Process Control (SPC) Charts
4
3 Advanced Process Control (APC) Overview
6
3.1 Drift vs. Shift Controllers . . . 7
3.2 Run to Run Controllers . . . 7
3.3 Reachability, Controllability, and Observability . . . 8
3.4 Stability . . . 9
3.5 Parameter Estimation Properties . . . 9
4 Linear Model Based APC
11
4.1 PID Control . . . 114.2 EWMA Control . . . 12
4.3 PCC Control . . . 15
4.4 Bayesian Rapid Mode Controller . . . 16
4.5 TFSM Control . . . 16
4.6 Constrained Control . . . 22
4.7 Spectral Factorization . . . 22
4.8 Dynamic Programming Based Control . . . 23
5 Nonlinear Model Based APC
28
5.1 Neural Networks . . . 285.2 Wavelets . . . 29
6 Adaptive APC
31
6.1 EVOP and DOE Approaches . . . 31
6.2 Avoiding Scrap: Some Clever Approaches . . . 32
6.3 Self Tuning Regulators . . . 32
6.4 Dynamic Programming with Unknown Parameters . . . 33
7 Complementary Roles of SPC and APC
34
8 Multi-Systems Control Approaches
37
8.1 Serial Multi-Stage Control Approaches . . . 378.1.1 Rao's Multi-Stage Monitoring . . . 37
8.1.2 Leang's SPC Triggered APC Control . . . 38
8.1.3 Vaidyanathan's Multi-Stage BMWBO APC Control . . . 39
8.2 Parallel Multi-Stage Approaches . . . 40
II Serial Multi-Stage Operation and Control
41
9 Introduction
42
9.1 Overview and Motivation . . . 429.2 Noise Model . . . 43
10 MMSE Control with Known Parameters
45
10.1 Serial Multi-Stage Data . . . 4510.2 Model Description . . . 46
10.3 MMSE Forecasting . . . 49
10.4 MMSE Adjustment . . . 50
10.5 Monitoring . . . 53
10.6 A New Objective Function . . . 54
11 DP Control with Known Parameters
58
11.1 One Stage Dynamic Programming Framework . . . 5911.2 Two Stage Dynamic Programming Framework . . . 63
11.3 Unconstrained Control with Perfect Information . . . 66
11.4 Constrained Control with Perfect Information . . . 68
11.5 Comparing Constrained and Unconstrained Control with Perfect Informa-tion: An Example . . . 73
11.6 Feed-forward Control for a One Stage Perfect Information Case . . . 83
11.7 Feed-forward Control for a Two Stage Perfect Information Case . . . 90
11.8 Comparing One Stage vs. Two Stage Approaches to Control: An Example . 95
12 Conclusion and Extensions
119
12.1 Conclusion . . . 119III Parallel Multi-Stage Operation and Control
122
13 Introduction
123
13.1 Overview and Motivation . . . 123
13.2 General Modeling Structure . . . 124
13.2.1 Pooling vs. No Pooling: A Binary Choice in Regression Modeling . . 126
14 Parallel Site Process Control
128
14.1 Introduction . . . 12814.2 Description of Simulated Data . . . 131
14.3 Bayesian Approach . . . 132
14.4 Comparison of Approaches for One Parameter Model . . . 132
14.5 Sequential Estimation . . . 141
15 Uniformity
142
15.1 Introduction . . . 14215.2 A Basic Example . . . 144
15.3 Real Example of Uniformity Modeling . . . 151
15.3.1 Description of the Data . . . 151
15.3.2 SRS Approach . . . 154
15.3.3 MRS Approach . . . 154
15.3.4 Bayesian Approach . . . 155
15.3.5 Comparison of Approaches . . . 156
16 Conclusion
170
List of Figures
4.1 EWMA weights for (a)
= 0:
2 , (b) = 0:
5 , and (c) = 0:
8. . . 13 9.1 An example of a three stage serial process with inputsX
andZ
and outputsY
at each stage. . . 43 10.1 Data available for forecasting after< r
+ 1;
1>
,< r;
2>
, and< r
,1;
3>
processes are completed in equal length three stage case with no time delay.. 47 10.2 Stage one run chart for simulated data with known dynamics under adjustment. 52 10.3 Stage two run chart for simulated data with known dynamics under adjustment. 53 10.4 Baseline stage one is charted for (a) the reconstructed and actual disturbance
innovations, (b) the raw material disturbance, and (c) the adjustable variable. Charts are again for simulated data with known dynamics under adjustment. 55 10.5 Baseline stage two is charted for (a) the reconstructed and actual disturbance
innovations, (b) the raw material disturbance, and (c) the adjustable variable. Charts are again for simulated data with known dynamics under adjustment. 56 11.1 Constrained and unconstrained control used in the perfect information case
for a one stage controller. This example is the wafer track step of the lithogra-phy sequence. In this and subsequent gures, the symbols CCPIi, CCPIt, and CCPIss stand for constrained control perfect information controllers with an iterative solution, theoretical one step solution, and the steady state solution respectively. . . 79 11.2 The dierence between the iterative and theoretical steady-state solution for
the matrix
S
and its eect on the regulating matrixL
. These elements are the only ones that change across iterations. This regulating matrix is for the wafer track step of the lithography sequence. . . 80 11.3 Constrained and unconstrained control used in the perfect information casefor a one stage controller. This example is the exposure (stepper) step of the lithography sequence. . . 81 11.4 The dierence between the iterative and theoretical steady-state solution for
11.5 Constrained and unconstrained control used in the perfect information case for a two stage controller. This rst stage is the wafer track step of the lithography sequence for case 1. . . 83 11.6 Constrained and unconstrained control used in the perfect information case
for a two stage controller. This second stage is the exposure (stepper) step of the lithography sequence for case 1. . . 84 11.7 Constrained and unconstrained control used in the perfect information case
for a two stage controller. This rst stage is the wafer track step of the lithography sequence for case 2. . . 85 11.8 Constrained and unconstrained control used in the perfect information case
for a two stage controller. This second stage is the exposure (stepper) step of the lithography sequence for case 2. . . 86 11.9 Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the pad alignment step (stage one) for case 1. In this and subsequent gures, the symbols CCPIi and CCPIt stand for constrained control perfect information controllers with an itera-tively found steady state solution and a theoretical one step solution respec-tively. . . 103 11.10Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the device alignment step (stage two) for case 1. . . 104 11.11Constrained and unconstrained control used in the perfect information case
based on the MS approach. This example is the pad alignment step (stage one) for case 1. . . 105 11.12Constrained and unconstrained control used in the perfect information case
based on the MS approach. This example is the device alignment step (stage two) for case 1. . . 106 11.13Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the pad alignment step (stage one) for case 2. . . 107 11.14Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the device alignment step (stage two) for case 2. . . 108 11.15Constrained control used in the perfect information case based on the MS
approach. This example is the pad alignment step (stage one) for case 2. . . 109 11.16Constrained control used in the perfect information case based on the MS
approach. This example is the device alignment step (stage two) for case 2. 110 11.17Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the pad alignment step (stage one) for case 3. . . 111 11.18Constrained and unconstrained control used in the perfect information case
11.19Constrained and unconstrained control used in the perfect information case based on the MS approach. This example is the pad alignment step (stage one) for case 3. . . 113 11.20Constrained and unconstrained control used in the perfect information case
based on the MS approach. This example is the device alignment step (stage two) for case 3. . . 114 11.21Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the pad alignment step (stage one) for case 4. . . 115 11.22Constrained and unconstrained control used in the perfect information case
based on the SS approach. This example is the device alignment step (stage two) for case 4. . . 116 11.23Constrained and unconstrained control used in the perfect information case
based on the MS approach. This example is the pad alignment step (stage one) for case 4. . . 117 11.24Constrained and unconstrained control used in the perfect information case
based on the MS approach. This example is the device alignment step (stage two) for case 4. . . 118 14.1 An example of global manufacturing for three parallel sites. Information
be-tween the sites could be shared through a decision making control center. . . 129 14.2 Trace and kernel density estimates for several of the slope parameters used
in parallel site process control. . . 134 14.3 Simulated data with actual regression lines used to simulate the data for
par-allel site process control. . . 135 14.4 Data with estimated regression lines based on using global least squares slope
estimate, indivdual regressions at each site, and bayesian quasi-common method to estimate slope for parallel site process control. . . 136 14.5 Bayesian estimate plotted against individual regression estimates for
paral-lel site process control. The 45o line would reect the Bayesian estimate
equalling the individual regression estimate while being close to the at line represents being close to the global regression estimate (which is an average of the intercepts for the intercept case). . . 137 14.6 Since this is simulated data, one can compare the slope estimates of each
method to the actual slope used in producing the data. The bar plot is based on the sum of squared dierences of the ve slope estimates from the actual slope for parallel site process control. . . 138 14.7 Observed output plotted versus Bayesian and individual least squares
predic-tions for parallel site process control. . . 139 14.8 Observed output plotted versus global prediction and comparison of the
predic-tions of all three methods versus the observed output for parallel site process control. . . 140 15.1 Trace and kernel density estimates for several of the slope parameters for the
15.2 Bayesian estimates versus individual regression (MRS) estimates for the quasi-common parameters
1 and 2 for the Guo Sachs example. . . 14715.3 Predicted bayesian deposition rate (derivative response) versus actual depo-sition rate for the Guo Sachs example. . . 148 15.4 Prediction surfaces based on SRS approach, range based SRS approach, MRS
approach, and proposed Bayesian approach for the Guo Sachs example. . . . 149 15.5 Prediction surfaces based on MRS approach and proposed Bayesian approach
for the Guo Sachs example. . . 150 15.6 Histograms of uniformity at the experimental design settings of the two ow
rates for the Guo Sachs example. . . 151 15.7 Median prediction surface and posterior interval surfaces computed using
pro-posed Bayesian approach for the Guo Sachs example. . . 152 15.8 Three dimensional Latin Hypercube design which was used to determine the
experimental settings of nitrogen ramp down time, HCl concentration, and water ow rate for the ECE example.. . . 158 15.9 Trace and kernel density estimates for several of the parameters for the ECE
example. . . 159 15.10Bayesian estimates versus individual regression (MRS) estimates for the slope
parameters for the ECE example. . . 160 15.11Predicted bayesian deposition rate (derivative response) versus actual
depo-sition rate for the ECE example. . . 161 15.12Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian
approach for the ECE example (
x
1 byx
2 withx
3 = 3160 andx
4 = 3). . . . 16215.13Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (
x
1 byx
3 withx
2 = 0.02 andx
4 = 3). . . . 16315.14Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (
x
2 byx
3 withx
1 = 15 andx
4 = 3). . . 16415.15Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (
x
1 byx
4 withx
2 = 0.02 andx
3 = 2000). . 16515.16Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (
x
2 byx
4 withx
1 = 15 andx
3 = 3000). . . 16615.17Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (
x
3 byx
4 withx
1 = 15 andx
2 = 0.02).. . . 16715.18Histograms of uniformity at the experimental design settings (and
x
4 = 3)for the ECE example. . . 168 15.19Median prediction surface and posterior interval surfaces computed using
pro-posed Bayesian approach for the ECE example (
x
1 byx
2 withx
3 = 3160 andList of Tables
11.1 Values of objectives and the total objective function (OF) for comparison of unconstrained control with perfect information (UCPI) versus constrained control with perfect information (CCPI). The abbreviations UCPI and CCPI are used in subsequent gures as well. . . 78 11.2 Values of objectives and the total objective function (OF) for the `single stage
at a time' (SS) approach versus the multi-stage (MS) approach to control for both unconstrained control with perfect information (UCPI) and constrained control with perfect information (CCPI). The abbreviations UCPI and CCPI are used in subsequent gures as well. . . 102 13.1 General specication of the likelihood, priors, and hyperpriors used for the
proposed Bayesian approach.. . . 126 14.1 Likelihood, priors, and hyperpriors used in the proposed Bayesian approach
for parallel site process control. . . 133 15.1 Likelihood, priors, and hyperpriors used in the proposed Bayesian approach
for the Guo Sachs example. . . 144 15.2 Site parameter estimates based on MRS and proposed Bayesian approach for
the Guo Sachs example. . . 145 15.3 Posterior intervals for uniformity at the experimental design values of the
ow rates based on the proposed Bayesian approach for the Guo Sachs example.145 15.4 Experimental design settings of machine parameters for the ECE class
uni-formity study. . . 153 15.5 Likelihood, priors, and hyperpriors used for the proposed Bayesian approach
Acknowledgements
Part I
Chapter 1
Introduction
The background part of the dissertation will discuss various approaches to con-trolling a process step in isolation that have been considered in the literature and then will discuss the limited amount of research that has been devoted to serial and parallel multi-stage operation and control concepts. The background is not intended to be a com-prehensive review, but rather just an introduction to the concepts used in other parts of the dissertation. Readers who are familiar with the basics of control theory and are interested only in the new ideas presented by this dissertation should read only parts II and III of the dissertation which discuss serial and parallel multi-stage control respectively.
Chapter 2
Statistical Process Control (SPC)
Charts
For statisticians, the idea of process control almost always means using statistical process control charts. In fact if the process exhibits only stationary random variation that is not correlated across runs, it can be demonstrated as done by Deming in his funnel experiment (27), (28) that passive monitoring is the most ecient method of control. In his experiment, Deming shows that various adjustment schemes that try to compensate for the variation in past realizations of the process can only make the variations worse. However, in many situations SPC is not the optimal control action as will be discussed in section 3.
version of SPC in any situation if possible.
Chapter 3
Advanced Process Control (APC)
Overview
As mentioned in the previous section, SPC methods are not the answer for all situations. If there is nonstationary drift occurring in the process or if the response is corre-lated across runs, then advanced process control or active process control (APC) methods become the optimal form of control. APC methods use past observations to learn about how the process will vary in the future and compensate for the expected disturbance by adjusting some inuential variable. Box, et al. (14) point out that the idea of waiting for a statistically signicant deviation before acting as is done in SPC is not appropriate when there is a nonstationary inherent variation in the process except if there is a cost associ-ated with the control action. In general, the optimal control action should be based on the minimization of some cost functional such as the sum of squared deviations from target.
Before delving into the approaches to APC that are discussed in chapters 4, 5, and 6; the general concepts of rapid mode controllers, gradual mode controllers, run to run controllers, and batch to batch controllers will rst be explored in the rest of this section.
3.1 Drift vs. Shift Controllers
The two major types of disturbances that are considered when trying to control a process are gradual drifts and sudden shifts. Hence the controllers designed to handle gradual drifts and sudden shifts are called gradual mode and rapid mode controllers re-spectively. A typical example given for gradual drift is that the machine gradually ages or goes out of adjustment, but there could be many other reasons for drift. Sudden shifts are often because of some maintenance action or a change of operator. The primary emphasis in adjustment schemes is on gradual mode controllers since many processes are aected by at least mild nonstationary drift.
3.2 Run to Run Controllers
3.3 Reachability, Controllability, and Observability
Important properties of dynamic systems with respect to control include reacha-bility, controllareacha-bility, and observability. Reachability and controllability are very similar. They refer to the ability of a system to attain a particular state in a nite number of time steps. Specically, reachability is the existence of a control law which allows the system to reach any particular state in a nite amount of time starting from any initial state. Con-trollability is the existence of a control law that allows the system state to get to the origin in a nite amount of time from any initial state. The complementary concept to controlla-bility and reachacontrolla-bility is the observacontrolla-bility of a dynamic system. A system is observable if the initial state can be determined from the observed outputs over a nite number of time steps.
Precise mathematical conditions for whether a system is controllable, reachable, or observable have been developed in the control eld. The most natural denition of these properties is in the context of state space models as will be given here, but there are many equivalent denitions in this framework as well as in the transfer function modeling framework. Consider the following state space model:
x
k+1 =Ax
k+Bu
ky
k =Cx
k (3.1)The evolution of the state variable
x
is described by the rst equation which is called the system equation. The system equation shows how the current state is a function of the previous state and the input variableu
. The actual output of the systemy
is observed as a function of the state variable in the second equation which is called the measurement equation. This dynamic system is reachable if the controllability matrix:W
c= (B AB A
2B ::: A
n,1B
) (3.2)W
o=0 B B B B B B B B B @
C
CA
CA
2:::
CA
n,11 C C C C C C C C C A
(3.3)
has rank
n
.3.4 Stability
A key requirement of a controlled system is the stability of the system. The sta-bility of the system is related to the eigenvalues of the matrix
A
in the state space model of equation 3.1. In a transfer function modeling approach, stability can be explored by looking at poles and zeros of the polynomials dening the transfer function relationship. Stability is achieved if poles are on negative real side of complex plane or within the unit circle depending whether the analysis proceeds from the viewpoint of the time or frequency domain. Many graphical tools can be used to investigate stability properties including the Bode, Nyquist, and Nichols plots. The simplest stability criterion is that for a bounded in-put, the system results in a bounded output. However, other criteria for stability have been investigated such as that proposed by Lyapanouv or by Routh and Hurwitz. A successful control strategy that is used extensively in the engineering realm is basing the controller on these various stability criteria such as is done in pole placement design where the poles are manipulated to inuence stability. The primary concern in this approach is the response of the system to various types of inputs such as a pulse, a sine wave, or some more compli-cated input. As a consequence, these controllers are not designed to explicitly optimize the process in the face of stochastic noise.3.5 Parameter Estimation Properties
is just the requirement that
X
TX
be nonsingular whereX
is the design matrix in a typicalregression framework and the superscript
T
denotes the transpose. Since large changes in the adjustment scheme do make for better parameter estimates, one strategy is to sacrice short term performance by perturbing the process and using these perturbations to get stable parameter estimates. Of course this leads to a sacrice in the immediate performance of the controller. This tradeo between control and estimation goals is a central theme in adaptive control.Chapter 4
Linear Model Based APC
There are many dierent approaches to APC in the linear model framework which are based on an array of concepts such as proportional integral derivative (PID) controllers, exponentially weighted moving averages (EWMA), transfer functions, stochastic modeling, Kalman ltering and dynamic programming. However, many of these diverse concepts are inter-related as will be discussed. Of course, linear models refer to the model being linear in the parameters and so nonlinear transformations of the variables are permitted and are often used while still in the linear model framework.
4.1 PID Control
Proportional integral derivative (PID) controllers have been successfully used for years by control engineers. The continuous form of these controllers is the following:
X
t=k
0+k
Dd"
(t
)dt
+k
P"
(t
) +k
IZ t 0
"
(u
)du
(4.1) and the discrete analogue is:X
t=k
0 0+k
0
Dr
"
t+k
0P
"
t+k
0I t
X
j=1
"
j (4.2)where
k
0,k
D,k
P,k
I,k
0 0k
0
D,
k
0P, and
k
0I are just constants and
X
tis the setting of the controlThese controllers are used by adjusting or tuning the constants until good perfor-mance is accomplished. This approach has proven to work fairly well in many situations. The obvious advantages of this type of control are the ease of implementation and formula-tion of the control rule. Also PID controllers have been shown to be very robust to model misspecication. Tsung, et al. (67) give a good illustration of how the robustness of PI controllers can be a compelling reason to choose them over the minimum mean square error (MMSE) controllers (which are optimal if the model chosen is correct) which are discussed in later sections. The approach of PID controllers is not appropriate for all problems, but serves as an adequate approximation in many situations and is the reason for their popularity.
4.2 EWMA Control
One of the most popular and enduring APC methods is the exponentially weighted moving average (EWMA) controller which is based on EWMA forecasting. The denition of an EWMA which can be used as a forecast of
y
t+1 for an arbitrary time series variabley
t with weight parameter 0< <
1 is as follows:~
y
t= (1,)(y
t+y
t ,1+2
y
t,2+
3
y
t,3+
:::
) (4.3)or equivalently in recursive form: ~
y
t= (1,)y
t+y
~t,1 (4.4)
t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9
0.0
0.2
0.4
0.6
0.8
Observation Number
Weight
(a)
t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9
0.0
0.1
0.2
0.3
0.4
0.5
Observation Number
Weight
(b)
t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9
0.0
0.05
0.10
0.15
0.20
Observation Number
Weight
(c)
y
t=c
t+Ax
t+e
t (4.5)where
y
tis the response or output variable,x
tis a vector of the adjustable process settings,A
is assumed to be a constant gain matrix which gives the coecients of the relationship betweenx
t and the responsey
t,e
t is the error in the model, andc
t is a nonstationary driftcomponent. A forecasting technique is applied to estimating the drift
c
t with the implicitassumption that the error
e
tthat is not part of the random drift is negligible for each run. Inthe case of EWMA forecasting, the one step forecast of
c
t+1denoted ^c
t+1 given in recursiveform is:
^
c
t+1 = ~c
t= (1,
)c
t+~c
t,1 (4.6)
or
^
c
t+1= (1,
)(y
t,Ax
t) +c
~t,1 (4.7)
since the assumption is that:
y
t=c
t+Ax
t (4.8)Then using the forecast ^
c
t+1 that is found by EWMA forecasting or some othertechnique, the recipe is found by replacing
y
twith its target value T and solving the followingequation:
T
= ^c
t+1+Ax
t+1 (4.9)If
x
t+1 is a single process setting, the solution is unique unless an additional restriction isadded to bound
x
t+1. The possible solutions in the vector case are innite unless thereis a restriction on the bounds of
x
t+1. In either the vector or the one dimensional case,optimization is usually needed. If there are any solutions that satisfy the above equation and the restrictions on
x
t+1, then the choice between these solutions is usually based on thesolution that minimizes the change in the recipe
x
t+1 from its previous settingx
t . If nosolutions satisfy the above equation (due to restrictions on the bounds of
x
t+1), then thej
T
,^c
t +1,
Ax
t +1j (4.10)
4.3 PCC Control
The predictor corrector controller (PCC) was introduced by Butler, et al. (17) and is based on a double exponential lter. This more aggressive strategy may be more appropriate than a single EWMA for estimating process behavior that is characterized by an especially strong tendency to drift. The concept is to estimate not only the drift oset, but the trend in the drift as well. In the notation of Butler, et al. (17), the double lter examines the error of the measured response from the o-line developed model,
Delta
t=y
t,Model
0t (4.11)and estimates this error for the subsequent run with the exponentially smoothed quantity
FDelta
t= (1,)Delta
t+FDelta
t,1 (4.12)
Then the trend of the error of this estimate,
PE
t=Delta
t+FDelta
t,1 (4.13)is estimated with the exponentially smoothed quantity
FPE
t= (1,)PE
t+FPE
t,1 (4.14)
Together these two components are used as an estimate of the intercept or oset (where
FPE
t is measuring the tendency for the intercept to drift) which we earlier denoted ^c
t+1.So the nal result is that the drift estimate is ^
c
t+1 =FDelta
t+FPE
t (4.15)4.4 Bayesian Rapid Mode Controller
Another approach to rapid mode control is an optimal Bayesian process controller as described by Sachs et al. (60) which assesses shifts and uses least squares to determine shift magnitude,
d
, and the location of the shift in time,m
, where the model is of the form:z
t=a
+e
t fort
m
(4.16)for t in region 1 (that is, up to some time m)
z
t=a
+d
+e
t fort > m
(4.17)for t in region 2 (that is, after time m) where
z
tis just the response minus the deterministicportion of the model without the intercept:
z
t=y
t,bx
t (4.18)where
b
is just the slope or what is sometimes called the sensitivity ofy
ttox
t. Then Bayesiansequential probabilities are used to assess the probability that a shift really occurred as data is gathered over time. Adjustment is compensated by the probability a shift occurred denoted
f
t and thus the control adjustment is notd
but insteadd
f
t so as to avoidover-adjustment.
4.5 TFSM Control
An approach that has been used to derive minimum mean square error controllers is to use transfer functions to model the dynamics of the process while specifying the error structure with stochastic time series models. The general framework provided by the transfer function / stochastic modeling strategy (TFSM) can be shown to encompass many other control methods.
of the system. A transfer function representation is the discrete version of the general linear dierential equation which results in a general linear dierence equation given by:
(1 +
1 r+2 r
2+
:::
+rr
r)
Y
t=g
(1 +1 r+
2 r
2+
:::
+sr
s)
X
t,b (4.19)
where
Y
is the deviation from some target of the response variable, andX
is the deviation from some datum of the adjustable variable,b
is the amount of pure delay,g
is the steady state gain inY
that will eventually be achieved after a unit change inX
, (r;s
) is the order of the process, andris a dierence operator such that:r
Y
t=Y
t,Y
t,1 (4.20)
An alternative way to write the transfer function relationship between a response variable Y and an adjustable variable
X
is as follows:(1 +
1B
+2B
2+
:::
+r
B
r)Y
t=g
(1 +!
1B
+!
2B
2+
:::
+!
s
B
s)X
t,b (4.21)where
B
is the backshift operator such that:B
kY
t=Y
t,k (4.22)
Sometimes the transfer function relationship is just written as:
(B
)Y
t=g!
(B
)X
t,b (4.23)where
(B
) and!
(B
) are the polynomials dened in equation 4.21 of the backshift operatorB
. The same kind of transfer function model could also hold as the relationship of the response variableY
and other adjustable inputs or perhaps an observable variableZ
.The error structure is given by the auto-regressive integrated moving average (ARIMA) stochastic time series models. Integrated means that an appropriately dierenced quantity is sometimes needed to get a stationary process in the regular auto-regressive moving average (ARMA) form. The following is the form of the ARIMA model:
(1,
1B
,
2
B
2,
:::
,pB
p)
r
d
N
t= (1,
1B
,
2B
2
,
:::
,qB
where
N
t is the error that is being modeled,e
t is white noise,B
is the backshift operator,(
p;q
) is the order of the error process,d
is the degree or number of times that dierencing is required to get stationarity, and ris again the dierence operator. White noise meansthat
e
t are independent and identically distributed (usually with a Gaussian distribution)and often
e
tare referred to as random shocks or innovations. It is important to realize thatthis error structure allows for nonstationarity in
N
t. Sometimes the ARIMA model is justwritten as:
(B
)rdN
t=(B
)e
t (4.25)where
(B
) and(B
) are the polynomials in the backshift operatorB
dened by equation 4.24.Transfer function models are used with stochastic models by assuming the inuence of the deterministic transfer function relationships and the stochastic eects are additive. Letting
"
denote the deviation from target, the model is thus:"
t+f+1 =L
,1 1 (B
)L
2(
B
)X
t+N
t (4.26)where the noise component
N
t is assumed to be independent ofX
t and where thepoly-nomials
L
1(B
) andL
2(B
) can be determined from equation 4.23. The solution to ndingthe MMSE control action is to substitute the MMSE forecast ^
N
t+f+1 forN
t+f+1 and thuseectively assume that the error in this forecast is equal to its expected value of zero. This procedure can be justied by the concept of certainty equivalence discussed in section 4.8. The resulting equation is:
0 =
L
,1 1 (B
)L
2(
B
)X
t+ ^N
t+f+1 (4.27)where we have also let
"
t+f+1 = 0 for the purposes of solving for the control action thatleads to no deviation from target.
Now
"
tis a forecast error and thus can be expressed as a polynomial in the backshiftoperator of past innovations
e
t."
t=L
4(B
)e
t (4.28)Also the forecast ^
N
t+f+1 can be expressed as a polynomial in the backshift operator of past^
N
t+f+1 =L
3(B
)e
t=L
3(B
)L
,14 (
B
)"
t (4.29)By substituting equation 4.29 into equation 4.27 and solving for
X
t, we obtain the optimalcontrol action that cancels out the disturbances:
X
t=,L
1(B
)L
3(B
)L
2(B
)L
4(B
)"
t (4.30)or in terms of adjustment:
r
X
t= (1,B
)X
t=,L
1(B
)L
3(B
)(1 ,B
)L
2(B
)L
4(B
)"
t (4.31)whererdenotes the dierence operator. This general solution to nding the MMSE control
action is treated in detail by Box and Jenkins (13).
Case 1. Now let's consider a special case of this general TFSM methodology with the following simple transfer function:
Y
t=gX
t (4.32)Let the error be described by a dierenced series which is a one lag moving average se-ries. This is a special case of the general ARIMA(p,d,q) denoted ARIMA(0,1,1) or simply IMA(1,1).
N
t,N
t,1 =
e
t ,e
t,1
where e
tiid
(0;
2e) (4.33)
The complete model of the deterministic and stochastic components is:
"
t+1=Y
t+1+N
t+1 (4.34)Now it can be shown for this error structure that:
N
t+1= ~N
t+e
t+1 (4.35)and since
e
t+1 is unpredictable random noise, the MMSE forecast is ^N
t+1 = ~N
t where asnoted earlier the tilde denotes an exponentially weighted moving average (EWMA). Also note that the forecast error
e
t+1 in this case is exactly the same as"
t+1. Now equation 4.340 =
gX
t+1+ ~N
t (4.36)and thus the control action using the general TFSM methodology is:
X
t+1 = ,1
g
N
~t (4.37)or in terms of adjustment:
r
X
t +1 =,
(1,
)g "
t (4.38)The special case of TFSM just described is related to the EWMA controller de-scribed earlier in section 4.2. Consider the case of one adjustable variable
X
t and oneresponse variable
"
twith a target of zero. Recall that the drift forecast ^c
t+1 for the EWMAcontroller is:
^
c
t+1= (1,
)("
t,gX
t) +~c
t,1= ~
N
t (4.39)This EWMA forecast is identical to the result found in the special case of TFSM described by equations 4.32, 4.33, and 4.34. Thus the MMSE forecast of an IMA(1,1) model is always just the EWMA forecast. In addition the control action resulting from the EWMA control algorithm results in solving the following equation:
T
= 0 = ^c
t+1+gX
t+1 (4.40)which results in the same control action given above in equations 4.37 and 4.38 for the special case of TFSM discussed above.
In fact, this same controller case can also be viewed as a special case of the PID controller described in section 4.1. Equation 4.37 can be rewritten by summing the adjust-ments in equation 4.38 to get:
X
t+1 =X
0 ,(1,
)g
t+1
X
j=1
"
j (4.41)Case 2. Now consider another special case of TFSM. Suppose that just a rst order discrete dynamical system is appropriate. So the deterministic transfer function model is the following:
r
Y
t= 1 (gX
t,1,
Y
t)or
(1 +r)Y
t=gBX
t (4.42)The eect of any adjustment in
X
t will take full eect at timet
+1. Let the noise again bedescribed by equation 4.33. The deviation from target after any adjustment results in the following relationship:
"
t+1 =Y
t+1+N
t+1=g
(1 +
r)X
t+N
t+1 (4.43)Using the same EWMA forecast that was shown in Case 1 to be the MMSE forecast for this error model, the MMSE control action can be derived as:
r
X
t=,(1,
)g
((1 +)"
t,"
t,1) (4.44)
Summing these adjustments (assuming the system is initially on target) gives the following form:
X
t=,(1,
)g "
t,(1,
)g
t
X
j=1
"
j (4.45)which can be seen to be a discrete PI controller which is just a special case of a discrete PID controller with appropriately chosen constants. In some cases the resulting MMSE controller from the general TFSM methodology is just a particular form of a discrete PID controller as illustrated here, but the TFSM methodology actually encompasses a much broader class of controllers than can be described by PID controllers (13).
noise inputs in the analysis" according to Astrom (1). More detailed discussion of how to derive TFSM control algorithms are given by Box, Jenkins, and Macgregor (14), Box and Jenkins (13), Macgregor (46), and Wilson (74). Multivariate versions of this control methodology are discussed in Harris and Macgregor (32).
4.6 Constrained Control
Constrained control not only attempts to minimize the deviation from target of the response or output variable, it has as its objective to constrain or restrict the variation in the adjustable variable. The objective function that is minimized is the following:
2"+
2X (4.46)
where
is a weighting parameter between the two objectives. It is often the case that substantial reductions in variation of the adjustable variable can be achieved at the expense of only a minor increase in the mean squared error from target of the output variable. Deriving constrained controllers can be dicult, but it is accomplished and explored by several authors such as (13), (14), and (8) by extending the concepts of TFSM control. A suboptimal but simpler approach is Clarke's constrained controller (20), (21). As will be discussed in section 4.8, another approach to constrained control is to use dynamic programming which provides an explicit methodology for the desired weighted optimization problem.4.7 Spectral Factorization
to errors in the specications in the parameters as well as errors in specifying the model form.
4.8 Dynamic Programming Based Control
Dynamic programming is another powerful technique that has been used exten-sively for solving process control problems as illustrated in detail by Bertsekas (9), Astrom (1), and others. Dynamic programming uses state space modeling to model the process and then iteratively minimizes an objective function or cost functional. The typical problem for linear systems and quadratic costs can be stated as nding the control setting which minimizes:
J
(x
0) =E
w kf
x
0N
Q
Nx
N +N,1 X
k=0
(
x
0k
Q
kx
k+u
0k
R
ku
k)g (4.47)subject to the system equation:
x
k+1=A
kx
k+B
ku
k+w
k (4.48)where
w
k are independent random disturbances,x
k is the response or state variable,u
kis the adjustable variable,
Q
k andR
k are known cost matrices, and whereA
k andB
k areknown parameter matrices for modeling the change in
x
k. The transpose of a vector ora matrix is denoted by the 0 notation. Notice the cost functional allows incorporation of
constrained control.
The dynamic programming problem is solved iteratively with:
J
N(x
N) =x
0N
Q
Nx
N (4.49)and
J
k(x
k) =inf
ukUk (xk )
E
wk f
x
0
k
Q
kx
k+u
0k
R
ku
k+J
k+1(A
kx
k+B
ku
k+w
k)g (4.50)
for
k
= 0;
1;
2;:::;N
,1. The basic steps of the algorithm are:1. Start with time step N and get
J
N(x
N) =x
02. For
k
=N
,1, nd the minimum associated withJ
k(x
k) typically by taking aderiva-tive.
3. For
k
=N
,1, evaluateJ
k(x
k) at minimum found in last step.4. Repeat steps 2 and 3 for
k
=N
,2;N
,3;:::;
0 .5. Control rule is the set of minimums found in step 2.
Central to the logic of the iterative algorithm is the principle of optimality given by Bellman (7) who was an early pioneer in this eld. The principle of optimality states that if the optimal policy for all time periods
t
= 0;:::;N
is found, then the truncated version of this policy for time periodst
=i;:::;N
is also optimal for the associated subproblem.Using the algorithm, the solution of the typical linear systems with quadratic criteria problem described by equations 4.47 and 4.48 is:
k(
x
k) =L
kx
kwhere L
k =,(B
0k
S
k+1B
k+R
k),1
B
0k
S
k+1A
k (4.51)The control action
k(
x
k) is the optimal setting for the adjustable variableu
k. The matricesS
k are given recursively by:S
N =Q
N (4.52)and
S
k =A
0k[
S
k+1,
S
k+1
B
k(B
0k
S
k+1B
k+R
k),1
B
0k
S
k+1]A
k+Q
k (4.53)Equation 4.53 is called the discrete matrix Riccati equation and is the discrete time analog of a Riccati dierential equation.
If the matrices involved in the discrete matrix Riccati equation are constant and thus can be referred to without their subscripts, then as
k
goes to negative innity the solution of theS
k matrices tends to a `steady-state'S
found by solving the algebraic matrixRiccati equation:
S
=A
0[S
,
SB
(B
Furthermore, the solution of the problem of a linear system with quadratic criteria and constant matrices
A
,B
,Q
, andR
can be approximated by: (x
) =Lx where L
k=,(
B
0
SB
+R
),1B
0SA
(4.55)where
S
is the `steady-state' matrix solved for in equation 4.54.When solving dynamic programming control problems, it is sometimes the case that the optimal control policy where
w
kare stochastic disturbances is the same as theop-timal control policy for the deterministic problem where
w
k are not random but are insteadknown and equal to their expected values. This property is called certainty equivalence and section 4.5 made use of this property to derive MMSE controllers. If certainty equiv-alence holds, the control law only depends on the expected value of the disturbance and not on other aspects of its distribution and hence dealing with the whole distribution is unnecessary.
Dynamic programming can be generalized using a more complicated system equa-tion which is also termed a state space model. The technique of adding other state variables to the original state vector
x
k is called state augmentation. This technique allows theprob-lem to include correlated disturbances as shown in Bertsekas (9). For instance, a delayed rst order dynamics model with an IMA(1,1) noise model is solved by MacGregor (46) using dynamic programming with state augmentation. MacGregor also obtains the same solution using a generalized version of the Wiener Hopf techniques described by Wilson (74).
Another key concept that can be embedded into dynamic programming is the problem of imperfect state information. The typical control problem for a linear system with quadratic costs can be expanded to the case where the response or state variable is seen imperfectly with noise and is never actually observed. So the problem involves minimizing the cost functional of equation 4.47 subject to the system equation 4.48 and additionally the following measurement equation:
y
k=C
kx
k+v
k (4.56)where
v
k is a random disturbance which may depend only on the currentx
k and previousu
k,1,y
k is the quantity that is actually measured which isx
k measured with noise, andC
k is a known parameter matrix for eachk
. The information vector keeps track of what isI
0=y
0;
I
k= (y
0;y
1;:::;y
k;u
0;u
1;:::;u
k)for k
= 1;
2;:::;N
,1 (4.57)
The algorithm changes to become a function of the known information vector rather than unknown
x
k and the expectation in this algorithm is now conditional onI
N,1 andu
N,1.The solution of the problem of a linear system with quadratic criterion with state observed imperfectly diers from the perfect state information case only in that
x
k isre-placed by the estimator
E
[x
kjI
k]. That is, the control law is identical to equation 4.51except that:
k(
x
k) =L
kE
[x
kjI
k] (4.58)The
L
k matrices are dened the same and theS
k matrices are again dened by equations4.52 and 4.53. In this case of imperfect state information, the increase in the cost functional evaluated at the optimal control law is directly attributable to a term that can be identied as estimation error.
The above problem can also be viewed from the perspective of a pure estimation problem. The estimation problem is to nd the estimator which is closest to the true state given the information available. Using the traditional least squares criterion, nding the closest estimator means nding ^
x
(I
) which minimizes:E
x;If(x
,x
^(I
)) 0M
(x
,
x
^(I
))g (4.59)where
M
is a positive denite symmetric matrix. It turns out that the solution to this problem is also the conditional expectationE
x[x
jI
]. This dual result that the optimalcontrol law from the control perspective and the optimal estimator from the estimation perspective are identical in this case is called the separation therom for linear systems with quadratic criteria. This theorem implies that controllers can be designed in two parts { an actuator corresponding to the control action of multiplying by
L
k (which can be foundby solving the simpler perfect state information case) and an estimator component which involves nding
E
[x
kjI
k] (which can be found without being concerned with the eect ofThe next hurdle is nding
E
[x
kjI
k], but in some cases the result is known. In thespecial case where
w
k,v
k, andx
0are from a spherically invariant distribution (the Gaussiandistribution is the prime example), the Kalman lter gives the recursive solution (37). For the general case where the distributions are not spherically invariant, using a linear least squares estimator instead of
E
[x
kjI
k] is still optimal in the class of estimators that are linearfunctions of the state. Furthermore, the recursive Kalman lter can also be used to nd the least squares estimator by using it rather than
E
[x
kjI
k] in the iterative Kalman lteralgorithm.
Consider the system equation 4.48 with
B
kconstant over time asB
and themea-surement equation 4.56. Let
w
kN
(0;W
k) andv
kN
(0;V
k). Then the recursive Kalmanlter equations are given by a set of predictor equations: ^
x
,k+1 =
A
kx
^k+Bu
k (4.60)P
,k =
A
kP
kA
0k+
W
k (4.61)and a set of corrector equations:
K
k=P
,k
C
Tk(C
kP
,k
C
0k+
V
k),1 (4.62)^
x
k= ^x
,k +
K
k(y
k,C
kx
^,
k) (4.63)
P
k= (I
,K
kC
k)P
,k (4.64)
where
I
is an appropriately dimensioned identity matrix. The recursion starts with initial estimates ^x
,k and
P
,k of the state and estimation error covariance respectively. The Kalman
gain matrix
K
k is central to the Kalman lter since it determines the relative weight orimportance of the predicted state versus the measured error in this prediction. If the cost matrices
W
k andV
k are constant over time, then the estimation error covarianceP
k andChapter 5
Nonlinear Model Based APC
Often nonlinear behavior can be approximated by a linear model and the tech-niques of the previous chapter are appropriate. However, there are approaches that more explicitly try to model nonlinear behavior in an eort to achieve more viable control schemes. Nonlinear model based approaches include using neural networks, wavelets, and the more complicated general form of dynamic programming.
5.1 Neural Networks
There are a lot of current so called `learning' algorithms that have been applied to run to run control. The most prominent of these is based on neural networks which generally use one hidden layer to emulate the eect of latent variables and this results in the following model:
y
k=g
(2) 0 @t
X
j=1
w
2;j;kg
(1)s
X
i=1
w
1;i;j+j !+
k1
A (5.1)
The function
g
(1) is typically a S-shaped sigmoidal function similar to a logistic functionand
g
(2)may also be a nonlinear function, but is sometimes just a linear function. Thew
1;i;jand the
w
2;j;k are both sets of weights or coecients while the bias terms j and k actas constant or intercept terms. Mozumder, et al. (50) and Stefani, et al. (66) investigate another similar layered model.
is not interpretable and thus an underlying understanding of the process is not gained. Also care must be taken not to overt the model. If
p
is the number of parameters in a particular model, overtting can be tempered by invoking a penalty for extra parameters by minimizing an objective function such asE
[(y
,y
^)2] +
p
2 (whererepresents the relativeweighting of the two goals) over the possible models instead of the typical objective function of
E
[(y
,y
^)2]. The minimization can be accomplished through dierent methods, but the
most common is back-propagation which just involves taking the derivative with respect to the unknown weights to perform the usual calculus minimization. It is also necessary to set aside a portion of the data for independent cross-validation of the nal model chosen.
5.2 Wavelets
A promising strategy for modeling highly nonlinear data is the use of wavelets. Some work is being done by Rying, et al. (57) to apply the concepts of wavelets to develop control systems. This recently developed mathematical technique is able to represent com-plicated nonlinear functions very accurately and somewhat parsimoniously using wavelet basis functions.
5.3 General DP
In section 4.8, the dynamic programming algorithm was presented for the case of a linear system with quadratic costs. However, the most general form of dynamic program-ming does not require a linear system or quadratic costs. The general cost functional that is minimized is:
J
(x
0) =E
w kf
g
N(x
N) +N,1
X
k=0
g
k(x
k;
k(x
k);w
k)g (5.2)subject to the system equation:
x
k+1 =f
k(x
k;
k(x
k);w
k) (5.3)The algorithm used to solve this problem uses:
and
J
k(x
k) =inf
ukUk (xk )
E
wk
f
g
k(x
k;u
k;w
k) +J
k+1[
f
k(x
k;u
k;w
k)]g (5.5)
Chapter 6
Adaptive APC
Up until this section, it has been assumed that the parameters of the process are known at least approximately. Control schemes that use measurements on-line to estimate the parameters and improve the control of the process are called adaptive controllers. The controller is only considered adaptive if the controller uses \measurements with advantage" according to the denition adopted by Bertsekas (9), but there is some controversy about the most appropriate denition of adaptive. Updating the drift or intercept term is not considered adaptive since it is basically just adjusting for error. The parameters that adaptive schemes estimate on-line are the slopes or what is sometimes called the sensitivities of the process.
6.1 EVOP and DOE Approaches
Control algorithms based on the sequential design of experiments and general response surface methods or sometimes locally weighted regression actually induce small perturbations in the process so as to explore the design space. If no perturbations are introduced as in the traditional control scenario then there is usually a lot of data; but this data is not very worthwhile since it is all gathered basically at or near one setpoint since most algorithms have as an objective to minimize the changes in the recipe in order to maintain a stable process.
most appropriate near the beginning of a process when it is desired to optimize the process on-line rather than using o-line experiments to characterize the process. The proprietary optimization software Ultramax uses a similar approach of perturbing the process in order to learn about the parameters of the process. An early version of this strategy is the evolu-tionary operation mode (EVOP) of Box and Draper (12) which has been used extensively in the chemical industry. Initially performance is sacriced in these algorithms by introducing the non-optimal perturbations either at the beginning of the process or after a disturbance, but at the same time the cost of o-line experimentation is greatly reduced.
6.2 Avoiding Scrap: Some Clever Approaches
Mozumder, et al. (50) and Stefani, et al. (66) formalize the industry practice of using monitor wafers to avoid producing scrap. When the process is deemed out of control by generalized SPC, then perturbations are used to requalify or tune the process on the monitor wafers instead of the actual wafers. Another approach is an enhanced version of the Bayesian rapid mode controller described earlier in section 4.4. The natural perturbation in the inputs necessitated by a shift in the process is used as an opportunity to update the slope term. A few runs under just an intercept adjustment is used to get data for a new estimate.
6.3 Self Tuning Regulators
6.4 Dynamic Programming with Unknown Parameters
Chapter 7
Complementary Roles of SPC and
APC
In the past, the use of process monitoring and active process adjustment were considered mutually exclusive alternatives. However, now several authors including Box and Kramer (15) have given convincing arguments that these two successful control strategies actually complement each other and can be used most successfully in tandem. Vander Wiel, et al. in (70) and Tucker, et al. in (68) have given one strategy called algorithmic statistical process control (ASPC) for combining active adjustment and process monitoring. Their control algorithms are based on the techniques of TFSM discussed in section 4.5. A simple analogy is presented of controlling the operation of a car. On one hand, active adjustment is needed in the form of steering, braking, and shifting gears to keep the car on the road; but, just as important, is the detection and correction of out of control conditions such as a at tire or signs that the car is breaking down. However, the most appropriate way to monitor the process while active adjustment is taking place is left as an open question.
can not be detected and corrected. On the other hand, the main complaint of control engi-neers is that control charting is inecient. One common misconception is that the only data that can be controlled is data that is stationary, independent, and identically distributed. However, nonstationary processes can be controlled through adjustment as long as a model for the dynamics of the system and the structure of the error can somehow be estimated.
Reasonable responses to the above criticisms can be found in all cases. The reason why a feedback controller attempts to compensate rather than correct the disturbance may be that the disturbance is inherent to the process or its cause is not known or it is just not feasible due to costs or other reasons to remove the disturbance. Box and Kramer (15) demonstrate that a mistuned feedback controller will result in overcompensation, but if properly tuned the feedback controller will result in a minimum mean square error (MMSE) from target controller in the case where the cost of adjustment and taking observations is negligible. This may result in a controller with no feedback component if the process is stationary about some xed mean, but in most real situations a mildly nonstationary model will be appropriate.
The most major criticism of feedback controllers is that they tend to conceal the disturbance, but this problem can be avoided. If the dynamics of the system are known, the actual original disturbances can be reconstructed. These independent and identically distributed disturbances are the most obvious quantity to chart in a SPC scheme if it is desired to reap the benets of SPC process improvement within the APC framework. How-ever, a mistuned controller can cause the pattern of the disturbance to be blurred and in complex situations it may be necessary to lter the noise to accentuate it for detectability. Capilla, et al. (18) suggest that detecting all types of disturbances in the process neces-sarily requires a variety of charting schemes. In their example which is based on Clarke's constrained control, other quantities besides the reconstructed disturbances were found to be helpful for detecting out of control conditions.
Chapter 8
Multi-Systems Control Approaches
8.1 Serial Multi-Stage Control Approaches
The usual approach to controlling the many steps of manufacturing processes is to consider each step in isolation. This does not utilize the correlation between the steps which will be exploited by the proposed multistage modeling framework. Multistage modeling is becoming more feasible due to the extent and the sophistication of data collection techniques that are being developed in many industries. For instance in the semiconductor industry, in situ sensors are used to collect data in real time while techniques such as ellipsometry, scanning electron microscopy (SEM), and electrical measurements have been used to collect data even at extremely small dimensions. One manufacturer reports that they already use lasers to mark wafers to facilitate the tracking of the wafer's history as it makes its trip through the manufacturing process (62). This tracking system through multiple stages has already been used to detect correlation between response variables which has led to signicant improvements in their processing. Despite the viability and potential impact of serial multistage modeling and control, there has been little research done in this area. This section will investigate the work done to date in solving the serial multi-stage control problem.
8.1.1 Rao's Multi-Stage Monitoring
the process.
8.1.2 Leang's SPC Triggered APC Control
Leang, et al. (40) accomplish control of a three stage photolithography sequence by triggering feedback and feed-forward control through the use of SPC based alarms. Although SPC monitoring can be used in the APC framework as discussed in chapter 7, the concept of triggering APC with SPC is a very dierent philosophy. The APC controller that they employ to control drift when it is detected is described as \similar" to the PCC controller presented in section 4.3.
The ad hoc model for the process presumably assumes that in general the process has stationary independent and identically distributed errors so that SPC techniques are appropriate. However, from time to time the assumption is that the process parameters of the linear models of the processes change due to rapid shifts or gradual drifts which are each detected by appropriate SPC methods which generate malfunction and control alarms respectively. Malfunction alarms trigger inspection by an operator, but control alarms are handled by the controller automatically. Feed-forward alarms are also used and these are detected by a version of acceptance charts based on whether the output of one stage is acceptable as an input to the next stage.
Leang and his co-authors actually present two dierent control strategies termed local and global control. In the local controller the control alarms invoke a routine of weighted least squares (weighting recent data more heavily) to generate new estimates of the parameters. These new estimates are used to produce a new recipe by optimizing a weighted sum of squared output deviations from target subject to the constraints that the settings lie in restricted ranges. In addition, the feed-forward alarms work in a similar fashion except that they adjust the subsequent stage recipe within the current run. The possible need for perturbation for stable estimation is not assessed.
a weighted cost scheme to determine the new optimal specications. The eect of using these new specications in the local control algorithms is that inputs from various stages get adjusted rather than just the current stage as is done in feedback control or just the subsequent stage as was done in feed-forward control.
The global controller has as its criterion to maintain the nal stage output on target even if it requires sacricing performance of intermediate stages through the adjustment of the intermediate targets. This is one way to approach the desired global optimization which will allow a multi-stage controller to outperform a controller which treats the control of each stage in isolation. A signicant drawback of this controller is the strategy of waiting until the occurrence of a signicant drift before implementing APC techniques. This will be inecient if the processes actually exhibit nonstationary drift as an inherent part of the process.
8.1.3 Vaidyanathan's Multi-Stage BMWBO APC Control
Vaidyanathan (69) uses dynamic programming to develop the Batch-Wise Myopic Within Batch Optimal (BMWBO) control scheme along with other suboptimal versions of this controller for more practical implementation. The name of the controller stems from the fact that the controller uses the dynamic programming algorithm to optimize the current batch across stages rather than across runs or batches. So the controller is optimal within that batch, but does not consider all batches or runs in the optimization. Since the controller does not consider past runs explicitly, the controller is only capable of feed-forward control of the process. The objective function of the dynamic programming methodology includes quadratic forms of control variables' deviations from their nominal values for each stage to avoid excessive control action. But a quadratic form only of the nalstage responses' deviation from target is included.
considered, modeling drift across runs or batches is not attempted.
The uncertainty in the parameters is handled by using the Bayesian framework to specify conjugate prior distributions. The conjugate distributions are used to simplify the derivation of the posterior distribution. The resulting posterior parameter estimates are used within the dynamic programming algorithm to derive control laws that directly con-sider the uncertainty in the parameters. Although some \passive learning" is accomplished over runs due to the updating of the posterior distribution, convergence of the parame-ters is not achieved and the objective of learning is not explicitly included in the objective function.
8.2 Parallel Multi-Stage Approaches
Part II