• No results found

Multi-Systems Operation and Control

N/A
N/A
Protected

Academic year: 2020

Share "Multi-Systems Operation and Control"

Copied!
195
0
0

Loading.... (view fulltext now)

Full text

(1)

FENNER, JOEL SCOTT. Multi-Systems Operation and Control. (Under the direction of Jye-Chyi Lu.)

(2)

by

Joel S. Fenner

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in

STATISTICS

in the

GRADUATE SCHOOL

at

NC STATE UNIVERSITY

2000

Professor Jye-Chyi Lu Professor Jackie Hughes-Oliver

Chairman of Advisory Committee

Professor John Monahan Professor William Holton

(3)

BIOGRAPHY

(4)

To my dad,

Kenneth H. Fenner,

(5)

Contents

List of Figures

vii

List of Tables

xi

I Background

1

1 Introduction

2

2 Statistical Process Control (SPC) Charts

4

3 Advanced Process Control (APC) Overview

6

3.1 Drift vs. Shift Controllers . . . 7

3.2 Run to Run Controllers . . . 7

3.3 Reachability, Controllability, and Observability . . . 8

3.4 Stability . . . 9

3.5 Parameter Estimation Properties . . . 9

4 Linear Model Based APC

11

4.1 PID Control . . . 11

4.2 EWMA Control . . . 12

4.3 PCC Control . . . 15

4.4 Bayesian Rapid Mode Controller . . . 16

4.5 TFSM Control . . . 16

4.6 Constrained Control . . . 22

4.7 Spectral Factorization . . . 22

4.8 Dynamic Programming Based Control . . . 23

5 Nonlinear Model Based APC

28

5.1 Neural Networks . . . 28

5.2 Wavelets . . . 29

(6)

6 Adaptive APC

31

6.1 EVOP and DOE Approaches . . . 31

6.2 Avoiding Scrap: Some Clever Approaches . . . 32

6.3 Self Tuning Regulators . . . 32

6.4 Dynamic Programming with Unknown Parameters . . . 33

7 Complementary Roles of SPC and APC

34

8 Multi-Systems Control Approaches

37

8.1 Serial Multi-Stage Control Approaches . . . 37

8.1.1 Rao's Multi-Stage Monitoring . . . 37

8.1.2 Leang's SPC Triggered APC Control . . . 38

8.1.3 Vaidyanathan's Multi-Stage BMWBO APC Control . . . 39

8.2 Parallel Multi-Stage Approaches . . . 40

II Serial Multi-Stage Operation and Control

41

9 Introduction

42

9.1 Overview and Motivation . . . 42

9.2 Noise Model . . . 43

10 MMSE Control with Known Parameters

45

10.1 Serial Multi-Stage Data . . . 45

10.2 Model Description . . . 46

10.3 MMSE Forecasting . . . 49

10.4 MMSE Adjustment . . . 50

10.5 Monitoring . . . 53

10.6 A New Objective Function . . . 54

11 DP Control with Known Parameters

58

11.1 One Stage Dynamic Programming Framework . . . 59

11.2 Two Stage Dynamic Programming Framework . . . 63

11.3 Unconstrained Control with Perfect Information . . . 66

11.4 Constrained Control with Perfect Information . . . 68

11.5 Comparing Constrained and Unconstrained Control with Perfect Informa-tion: An Example . . . 73

11.6 Feed-forward Control for a One Stage Perfect Information Case . . . 83

11.7 Feed-forward Control for a Two Stage Perfect Information Case . . . 90

11.8 Comparing One Stage vs. Two Stage Approaches to Control: An Example . 95

12 Conclusion and Extensions

119

12.1 Conclusion . . . 119

(7)

III Parallel Multi-Stage Operation and Control

122

13 Introduction

123

13.1 Overview and Motivation . . . 123

13.2 General Modeling Structure . . . 124

13.2.1 Pooling vs. No Pooling: A Binary Choice in Regression Modeling . . 126

14 Parallel Site Process Control

128

14.1 Introduction . . . 128

14.2 Description of Simulated Data . . . 131

14.3 Bayesian Approach . . . 132

14.4 Comparison of Approaches for One Parameter Model . . . 132

14.5 Sequential Estimation . . . 141

15 Uniformity

142

15.1 Introduction . . . 142

15.2 A Basic Example . . . 144

15.3 Real Example of Uniformity Modeling . . . 151

15.3.1 Description of the Data . . . 151

15.3.2 SRS Approach . . . 154

15.3.3 MRS Approach . . . 154

15.3.4 Bayesian Approach . . . 155

15.3.5 Comparison of Approaches . . . 156

16 Conclusion

170

(8)

List of Figures

4.1 EWMA weights for (a)

= 0

:

2 , (b)

= 0

:

5 , and (c)

= 0

:

8. . . 13 9.1 An example of a three stage serial process with inputs

X

and

Z

and outputs

Y

at each stage. . . 43 10.1 Data available for forecasting after

< r

+ 1

;

1

>

,

< r;

2

>

, and

< r

,1

;

3

>

processes are completed in equal length three stage case with no time delay.. 47 10.2 Stage one run chart for simulated data with known dynamics under adjustment. 52 10.3 Stage two run chart for simulated data with known dynamics under adjustment. 53 10.4 Baseline stage one is charted for (a) the reconstructed and actual disturbance

innovations, (b) the raw material disturbance, and (c) the adjustable variable. Charts are again for simulated data with known dynamics under adjustment. 55 10.5 Baseline stage two is charted for (a) the reconstructed and actual disturbance

innovations, (b) the raw material disturbance, and (c) the adjustable variable. Charts are again for simulated data with known dynamics under adjustment. 56 11.1 Constrained and unconstrained control used in the perfect information case

for a one stage controller. This example is the wafer track step of the lithogra-phy sequence. In this and subsequent gures, the symbols CCPIi, CCPIt, and CCPIss stand for constrained control perfect information controllers with an iterative solution, theoretical one step solution, and the steady state solution respectively. . . 79 11.2 The dierence between the iterative and theoretical steady-state solution for

the matrix

S

and its eect on the regulating matrix

L

. These elements are the only ones that change across iterations. This regulating matrix is for the wafer track step of the lithography sequence. . . 80 11.3 Constrained and unconstrained control used in the perfect information case

for a one stage controller. This example is the exposure (stepper) step of the lithography sequence. . . 81 11.4 The dierence between the iterative and theoretical steady-state solution for

(9)

11.5 Constrained and unconstrained control used in the perfect information case for a two stage controller. This rst stage is the wafer track step of the lithography sequence for case 1. . . 83 11.6 Constrained and unconstrained control used in the perfect information case

for a two stage controller. This second stage is the exposure (stepper) step of the lithography sequence for case 1. . . 84 11.7 Constrained and unconstrained control used in the perfect information case

for a two stage controller. This rst stage is the wafer track step of the lithography sequence for case 2. . . 85 11.8 Constrained and unconstrained control used in the perfect information case

for a two stage controller. This second stage is the exposure (stepper) step of the lithography sequence for case 2. . . 86 11.9 Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the pad alignment step (stage one) for case 1. In this and subsequent gures, the symbols CCPIi and CCPIt stand for constrained control perfect information controllers with an itera-tively found steady state solution and a theoretical one step solution respec-tively. . . 103 11.10Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the device alignment step (stage two) for case 1. . . 104 11.11Constrained and unconstrained control used in the perfect information case

based on the MS approach. This example is the pad alignment step (stage one) for case 1. . . 105 11.12Constrained and unconstrained control used in the perfect information case

based on the MS approach. This example is the device alignment step (stage two) for case 1. . . 106 11.13Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the pad alignment step (stage one) for case 2. . . 107 11.14Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the device alignment step (stage two) for case 2. . . 108 11.15Constrained control used in the perfect information case based on the MS

approach. This example is the pad alignment step (stage one) for case 2. . . 109 11.16Constrained control used in the perfect information case based on the MS

approach. This example is the device alignment step (stage two) for case 2. 110 11.17Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the pad alignment step (stage one) for case 3. . . 111 11.18Constrained and unconstrained control used in the perfect information case

(10)

11.19Constrained and unconstrained control used in the perfect information case based on the MS approach. This example is the pad alignment step (stage one) for case 3. . . 113 11.20Constrained and unconstrained control used in the perfect information case

based on the MS approach. This example is the device alignment step (stage two) for case 3. . . 114 11.21Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the pad alignment step (stage one) for case 4. . . 115 11.22Constrained and unconstrained control used in the perfect information case

based on the SS approach. This example is the device alignment step (stage two) for case 4. . . 116 11.23Constrained and unconstrained control used in the perfect information case

based on the MS approach. This example is the pad alignment step (stage one) for case 4. . . 117 11.24Constrained and unconstrained control used in the perfect information case

based on the MS approach. This example is the device alignment step (stage two) for case 4. . . 118 14.1 An example of global manufacturing for three parallel sites. Information

be-tween the sites could be shared through a decision making control center. . . 129 14.2 Trace and kernel density estimates for several of the slope parameters used

in parallel site process control. . . 134 14.3 Simulated data with actual regression lines used to simulate the data for

par-allel site process control. . . 135 14.4 Data with estimated regression lines based on using global least squares slope

estimate, indivdual regressions at each site, and bayesian quasi-common method to estimate slope for parallel site process control. . . 136 14.5 Bayesian estimate plotted against individual regression estimates for

paral-lel site process control. The 45o line would reect the Bayesian estimate

equalling the individual regression estimate while being close to the at line represents being close to the global regression estimate (which is an average of the intercepts for the intercept case). . . 137 14.6 Since this is simulated data, one can compare the slope estimates of each

method to the actual slope used in producing the data. The bar plot is based on the sum of squared dierences of the ve slope estimates from the actual slope for parallel site process control. . . 138 14.7 Observed output plotted versus Bayesian and individual least squares

predic-tions for parallel site process control. . . 139 14.8 Observed output plotted versus global prediction and comparison of the

predic-tions of all three methods versus the observed output for parallel site process control. . . 140 15.1 Trace and kernel density estimates for several of the slope parameters for the

(11)

15.2 Bayesian estimates versus individual regression (MRS) estimates for the quasi-common parameters

1 and

2 for the Guo Sachs example. . . 147

15.3 Predicted bayesian deposition rate (derivative response) versus actual depo-sition rate for the Guo Sachs example. . . 148 15.4 Prediction surfaces based on SRS approach, range based SRS approach, MRS

approach, and proposed Bayesian approach for the Guo Sachs example. . . . 149 15.5 Prediction surfaces based on MRS approach and proposed Bayesian approach

for the Guo Sachs example. . . 150 15.6 Histograms of uniformity at the experimental design settings of the two ow

rates for the Guo Sachs example. . . 151 15.7 Median prediction surface and posterior interval surfaces computed using

pro-posed Bayesian approach for the Guo Sachs example. . . 152 15.8 Three dimensional Latin Hypercube design which was used to determine the

experimental settings of nitrogen ramp down time, HCl concentration, and water ow rate for the ECE example.. . . 158 15.9 Trace and kernel density estimates for several of the parameters for the ECE

example. . . 159 15.10Bayesian estimates versus individual regression (MRS) estimates for the slope

parameters for the ECE example. . . 160 15.11Predicted bayesian deposition rate (derivative response) versus actual

depo-sition rate for the ECE example. . . 161 15.12Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian

approach for the ECE example (

x

1 by

x

2 with

x

3 = 3160 and

x

4 = 3). . . . 162

15.13Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (

x

1 by

x

3 with

x

2 = 0.02 and

x

4 = 3). . . . 163

15.14Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (

x

2 by

x

3 with

x

1 = 15 and

x

4 = 3). . . 164

15.15Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (

x

1 by

x

4 with

x

2 = 0.02 and

x

3 = 2000). . 165

15.16Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (

x

2 by

x

4 with

x

1 = 15 and

x

3 = 3000). . . 166

15.17Prediction surfaces based on SRS models, MRS approach, and proposed Bayesian approach for the ECE example (

x

3 by

x

4 with

x

1 = 15 and

x

2 = 0.02).. . . 167

15.18Histograms of uniformity at the experimental design settings (and

x

4 = 3)

for the ECE example. . . 168 15.19Median prediction surface and posterior interval surfaces computed using

pro-posed Bayesian approach for the ECE example (

x

1 by

x

2 with

x

3 = 3160 and

(12)

List of Tables

11.1 Values of objectives and the total objective function (OF) for comparison of unconstrained control with perfect information (UCPI) versus constrained control with perfect information (CCPI). The abbreviations UCPI and CCPI are used in subsequent gures as well. . . 78 11.2 Values of objectives and the total objective function (OF) for the `single stage

at a time' (SS) approach versus the multi-stage (MS) approach to control for both unconstrained control with perfect information (UCPI) and constrained control with perfect information (CCPI). The abbreviations UCPI and CCPI are used in subsequent gures as well. . . 102 13.1 General specication of the likelihood, priors, and hyperpriors used for the

proposed Bayesian approach.. . . 126 14.1 Likelihood, priors, and hyperpriors used in the proposed Bayesian approach

for parallel site process control. . . 133 15.1 Likelihood, priors, and hyperpriors used in the proposed Bayesian approach

for the Guo Sachs example. . . 144 15.2 Site parameter estimates based on MRS and proposed Bayesian approach for

the Guo Sachs example. . . 145 15.3 Posterior intervals for uniformity at the experimental design values of the

ow rates based on the proposed Bayesian approach for the Guo Sachs example.145 15.4 Experimental design settings of machine parameters for the ECE class

uni-formity study. . . 153 15.5 Likelihood, priors, and hyperpriors used for the proposed Bayesian approach

(13)

Acknowledgements

(14)

Part I

(15)

Chapter 1

Introduction

The background part of the dissertation will discuss various approaches to con-trolling a process step in isolation that have been considered in the literature and then will discuss the limited amount of research that has been devoted to serial and parallel multi-stage operation and control concepts. The background is not intended to be a com-prehensive review, but rather just an introduction to the concepts used in other parts of the dissertation. Readers who are familiar with the basics of control theory and are interested only in the new ideas presented by this dissertation should read only parts II and III of the dissertation which discuss serial and parallel multi-stage control respectively.

(16)
(17)

Chapter 2

Statistical Process Control (SPC)

Charts

For statisticians, the idea of process control almost always means using statistical process control charts. In fact if the process exhibits only stationary random variation that is not correlated across runs, it can be demonstrated as done by Deming in his funnel experiment (27), (28) that passive monitoring is the most ecient method of control. In his experiment, Deming shows that various adjustment schemes that try to compensate for the variation in past realizations of the process can only make the variations worse. However, in many situations SPC is not the optimal control action as will be discussed in section 3.

(18)

version of SPC in any situation if possible.

(19)

Chapter 3

Advanced Process Control (APC)

Overview

As mentioned in the previous section, SPC methods are not the answer for all situations. If there is nonstationary drift occurring in the process or if the response is corre-lated across runs, then advanced process control or active process control (APC) methods become the optimal form of control. APC methods use past observations to learn about how the process will vary in the future and compensate for the expected disturbance by adjusting some inuential variable. Box, et al. (14) point out that the idea of waiting for a statistically signicant deviation before acting as is done in SPC is not appropriate when there is a nonstationary inherent variation in the process except if there is a cost associ-ated with the control action. In general, the optimal control action should be based on the minimization of some cost functional such as the sum of squared deviations from target.

(20)

Before delving into the approaches to APC that are discussed in chapters 4, 5, and 6; the general concepts of rapid mode controllers, gradual mode controllers, run to run controllers, and batch to batch controllers will rst be explored in the rest of this section.

3.1 Drift vs. Shift Controllers

The two major types of disturbances that are considered when trying to control a process are gradual drifts and sudden shifts. Hence the controllers designed to handle gradual drifts and sudden shifts are called gradual mode and rapid mode controllers re-spectively. A typical example given for gradual drift is that the machine gradually ages or goes out of adjustment, but there could be many other reasons for drift. Sudden shifts are often because of some maintenance action or a change of operator. The primary emphasis in adjustment schemes is on gradual mode controllers since many processes are aected by at least mild nonstationary drift.

3.2 Run to Run Controllers

(21)

3.3 Reachability, Controllability, and Observability

Important properties of dynamic systems with respect to control include reacha-bility, controllareacha-bility, and observability. Reachability and controllability are very similar. They refer to the ability of a system to attain a particular state in a nite number of time steps. Specically, reachability is the existence of a control law which allows the system to reach any particular state in a nite amount of time starting from any initial state. Con-trollability is the existence of a control law that allows the system state to get to the origin in a nite amount of time from any initial state. The complementary concept to controlla-bility and reachacontrolla-bility is the observacontrolla-bility of a dynamic system. A system is observable if the initial state can be determined from the observed outputs over a nite number of time steps.

Precise mathematical conditions for whether a system is controllable, reachable, or observable have been developed in the control eld. The most natural denition of these properties is in the context of state space models as will be given here, but there are many equivalent denitions in this framework as well as in the transfer function modeling framework. Consider the following state space model:

x

k+1 =

Ax

k+

Bu

k

y

k =

Cx

k (3.1)

The evolution of the state variable

x

is described by the rst equation which is called the system equation. The system equation shows how the current state is a function of the previous state and the input variable

u

. The actual output of the system

y

is observed as a function of the state variable in the second equation which is called the measurement equation. This dynamic system is reachable if the controllability matrix:

W

c= (

B AB A

2

B ::: A

n,1

B

) (3.2)

(22)

W

o=

0 B B B B B B B B B @

C

CA

CA

2

:::

CA

n,1

1 C C C C C C C C C A

(3.3)

has rank

n

.

3.4 Stability

A key requirement of a controlled system is the stability of the system. The sta-bility of the system is related to the eigenvalues of the matrix

A

in the state space model of equation 3.1. In a transfer function modeling approach, stability can be explored by looking at poles and zeros of the polynomials dening the transfer function relationship. Stability is achieved if poles are on negative real side of complex plane or within the unit circle depending whether the analysis proceeds from the viewpoint of the time or frequency domain. Many graphical tools can be used to investigate stability properties including the Bode, Nyquist, and Nichols plots. The simplest stability criterion is that for a bounded in-put, the system results in a bounded output. However, other criteria for stability have been investigated such as that proposed by Lyapanouv or by Routh and Hurwitz. A successful control strategy that is used extensively in the engineering realm is basing the controller on these various stability criteria such as is done in pole placement design where the poles are manipulated to inuence stability. The primary concern in this approach is the response of the system to various types of inputs such as a pulse, a sine wave, or some more compli-cated input. As a consequence, these controllers are not designed to explicitly optimize the process in the face of stochastic noise.

3.5 Parameter Estimation Properties

(23)

is just the requirement that

X

T

X

be nonsingular where

X

is the design matrix in a typical

regression framework and the superscript

T

denotes the transpose. Since large changes in the adjustment scheme do make for better parameter estimates, one strategy is to sacrice short term performance by perturbing the process and using these perturbations to get stable parameter estimates. Of course this leads to a sacrice in the immediate performance of the controller. This tradeo between control and estimation goals is a central theme in adaptive control.

(24)

Chapter 4

Linear Model Based APC

There are many dierent approaches to APC in the linear model framework which are based on an array of concepts such as proportional integral derivative (PID) controllers, exponentially weighted moving averages (EWMA), transfer functions, stochastic modeling, Kalman ltering and dynamic programming. However, many of these diverse concepts are inter-related as will be discussed. Of course, linear models refer to the model being linear in the parameters and so nonlinear transformations of the variables are permitted and are often used while still in the linear model framework.

4.1 PID Control

Proportional integral derivative (PID) controllers have been successfully used for years by control engineers. The continuous form of these controllers is the following:

X

t=

k

0+

k

D

d"

(

t

)

dt

+

k

P

"

(

t

) +

k

I

Z t 0

"

(

u

)

du

(4.1) and the discrete analogue is:

X

t=

k

0 0+

k

0

Dr

"

t+

k

0

P

"

t+

k

0

I t

X

j=1

"

j (4.2)

where

k

0,

k

D,

k

P,

k

I,

k

0 0

k

0

D,

k

0

P, and

k

0

I are just constants and

X

tis the setting of the control

(25)

These controllers are used by adjusting or tuning the constants until good perfor-mance is accomplished. This approach has proven to work fairly well in many situations. The obvious advantages of this type of control are the ease of implementation and formula-tion of the control rule. Also PID controllers have been shown to be very robust to model misspecication. Tsung, et al. (67) give a good illustration of how the robustness of PI controllers can be a compelling reason to choose them over the minimum mean square error (MMSE) controllers (which are optimal if the model chosen is correct) which are discussed in later sections. The approach of PID controllers is not appropriate for all problems, but serves as an adequate approximation in many situations and is the reason for their popularity.

4.2 EWMA Control

One of the most popular and enduring APC methods is the exponentially weighted moving average (EWMA) controller which is based on EWMA forecasting. The denition of an EWMA which can be used as a forecast of

y

t+1 for an arbitrary time series variable

y

t with weight parameter 0

< <

1 is as follows:

~

y

t= (1,

)(

y

t+

y

t ,1+

2

y

t,2+

3

y

t,3+

:::

) (4.3)

or equivalently in recursive form: ~

y

t= (1,

)

y

t+

y

~t

,1 (4.4)

(26)

t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9

0.0

0.2

0.4

0.6

0.8

Observation Number

Weight

(a)

t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9

0.0

0.1

0.2

0.3

0.4

0.5

Observation Number

Weight

(b)

t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9

0.0

0.05

0.10

0.15

0.20

Observation Number

Weight

(c)

(27)

y

t=

c

t+

Ax

t+

e

t (4.5)

where

y

tis the response or output variable,

x

tis a vector of the adjustable process settings,

A

is assumed to be a constant gain matrix which gives the coecients of the relationship between

x

t and the response

y

t,

e

t is the error in the model, and

c

t is a nonstationary drift

component. A forecasting technique is applied to estimating the drift

c

t with the implicit

assumption that the error

e

tthat is not part of the random drift is negligible for each run. In

the case of EWMA forecasting, the one step forecast of

c

t+1denoted ^

c

t+1 given in recursive

form is:

^

c

t+1 = ~

c

t= (1

,

)

c

t+

~

c

t

,1 (4.6)

or

^

c

t+1= (1

,

)(

y

t,

Ax

t) +

c

~t

,1 (4.7)

since the assumption is that:

y

t=

c

t+

Ax

t (4.8)

Then using the forecast ^

c

t+1 that is found by EWMA forecasting or some other

technique, the recipe is found by replacing

y

twith its target value T and solving the following

equation:

T

= ^

c

t+1+

Ax

t+1 (4.9)

If

x

t+1 is a single process setting, the solution is unique unless an additional restriction is

added to bound

x

t+1. The possible solutions in the vector case are innite unless there

is a restriction on the bounds of

x

t+1. In either the vector or the one dimensional case,

optimization is usually needed. If there are any solutions that satisfy the above equation and the restrictions on

x

t+1, then the choice between these solutions is usually based on the

solution that minimizes the change in the recipe

x

t+1 from its previous setting

x

t . If no

solutions satisfy the above equation (due to restrictions on the bounds of

x

t+1), then the

(28)

j

T

,^

c

t +1

,

Ax

t +1

j (4.10)

4.3 PCC Control

The predictor corrector controller (PCC) was introduced by Butler, et al. (17) and is based on a double exponential lter. This more aggressive strategy may be more appropriate than a single EWMA for estimating process behavior that is characterized by an especially strong tendency to drift. The concept is to estimate not only the drift oset, but the trend in the drift as well. In the notation of Butler, et al. (17), the double lter examines the error of the measured response from the o-line developed model,

Delta

t=

y

t,

Model

0t (4.11)

and estimates this error for the subsequent run with the exponentially smoothed quantity

FDelta

t= (1,

)

Delta

t+

FDelta

t

,1 (4.12)

Then the trend of the error of this estimate,

PE

t=

Delta

t+

FDelta

t,1 (4.13)

is estimated with the exponentially smoothed quantity

FPE

t= (1,

)

PE

t+

FPE

t

,1 (4.14)

Together these two components are used as an estimate of the intercept or oset (where

FPE

t is measuring the tendency for the intercept to drift) which we earlier denoted ^

c

t+1.

So the nal result is that the drift estimate is ^

c

t+1 =

FDelta

t+

FPE

t (4.15)

(29)

4.4 Bayesian Rapid Mode Controller

Another approach to rapid mode control is an optimal Bayesian process controller as described by Sachs et al. (60) which assesses shifts and uses least squares to determine shift magnitude,

d

, and the location of the shift in time,

m

, where the model is of the form:

z

t=

a

+

e

t for

t

m

(4.16)

for t in region 1 (that is, up to some time m)

z

t=

a

+

d

+

e

t for

t > m

(4.17)

for t in region 2 (that is, after time m) where

z

tis just the response minus the deterministic

portion of the model without the intercept:

z

t=

y

t,

bx

t (4.18)

where

b

is just the slope or what is sometimes called the sensitivity of

y

tto

x

t. Then Bayesian

sequential probabilities are used to assess the probability that a shift really occurred as data is gathered over time. Adjustment is compensated by the probability a shift occurred denoted

f

t and thus the control adjustment is not

d

but instead

d

f

t so as to avoid

over-adjustment.

4.5 TFSM Control

An approach that has been used to derive minimum mean square error controllers is to use transfer functions to model the dynamics of the process while specifying the error structure with stochastic time series models. The general framework provided by the transfer function / stochastic modeling strategy (TFSM) can be shown to encompass many other control methods.

(30)

of the system. A transfer function representation is the discrete version of the general linear dierential equation which results in a general linear dierence equation given by:

(1 +

1 r+

2 r

2+

:::

+

rr

r)

Y

t=

g

(1 +

1 r+

2 r

2+

:::

+

sr

s)

X

t

,b (4.19)

where

Y

is the deviation from some target of the response variable, and

X

is the deviation from some datum of the adjustable variable,

b

is the amount of pure delay,

g

is the steady state gain in

Y

that will eventually be achieved after a unit change in

X

, (

r;s

) is the order of the process, andris a dierence operator such that:

r

Y

t=

Y

t,

Y

t

,1 (4.20)

An alternative way to write the transfer function relationship between a response variable Y and an adjustable variable

X

is as follows:

(1 +

1

B

+

2

B

2+

:::

+

r

B

r)

Y

t=

g

(1 +

!

1

B

+

!

2

B

2+

:::

+

!

s

B

s)

X

t,b (4.21)

where

B

is the backshift operator such that:

B

k

Y

t=

Y

t

,k (4.22)

Sometimes the transfer function relationship is just written as:

(

B

)

Y

t=

g!

(

B

)

X

t,b (4.23)

where

(

B

) and

!

(

B

) are the polynomials dened in equation 4.21 of the backshift operator

B

. The same kind of transfer function model could also hold as the relationship of the response variable

Y

and other adjustable inputs or perhaps an observable variable

Z

.

The error structure is given by the auto-regressive integrated moving average (ARIMA) stochastic time series models. Integrated means that an appropriately dierenced quantity is sometimes needed to get a stationary process in the regular auto-regressive moving average (ARMA) form. The following is the form of the ARIMA model:

(1,

1

B

,

2

B

2

,

:::

,

p

B

p)

r

d

N

t= (1

,

1

B

,

2

B

2

,

:::

,

q

B

(31)

where

N

t is the error that is being modeled,

e

t is white noise,

B

is the backshift operator,

(

p;q

) is the order of the error process,

d

is the degree or number of times that dierencing is required to get stationarity, and ris again the dierence operator. White noise means

that

e

t are independent and identically distributed (usually with a Gaussian distribution)

and often

e

tare referred to as random shocks or innovations. It is important to realize that

this error structure allows for nonstationarity in

N

t. Sometimes the ARIMA model is just

written as:

(

B

)rd

N

t=

(

B

)

e

t (4.25)

where

(

B

) and

(

B

) are the polynomials in the backshift operator

B

dened by equation 4.24.

Transfer function models are used with stochastic models by assuming the inuence of the deterministic transfer function relationships and the stochastic eects are additive. Letting

"

denote the deviation from target, the model is thus:

"

t+f+1 =

L

,1 1 (

B

)

L

2(

B

)

X

t+

N

t (4.26)

where the noise component

N

t is assumed to be independent of

X

t and where the

poly-nomials

L

1(

B

) and

L

2(

B

) can be determined from equation 4.23. The solution to nding

the MMSE control action is to substitute the MMSE forecast ^

N

t+f+1 for

N

t+f+1 and thus

eectively assume that the error in this forecast is equal to its expected value of zero. This procedure can be justied by the concept of certainty equivalence discussed in section 4.8. The resulting equation is:

0 =

L

,1 1 (

B

)

L

2(

B

)

X

t+ ^

N

t+f+1 (4.27)

where we have also let

"

t+f+1 = 0 for the purposes of solving for the control action that

leads to no deviation from target.

Now

"

tis a forecast error and thus can be expressed as a polynomial in the backshift

operator of past innovations

e

t.

"

t=

L

4(

B

)

e

t (4.28)

Also the forecast ^

N

t+f+1 can be expressed as a polynomial in the backshift operator of past

(32)

^

N

t+f+1 =

L

3(

B

)

e

t=

L

3(

B

)

L

,1

4 (

B

)

"

t (4.29)

By substituting equation 4.29 into equation 4.27 and solving for

X

t, we obtain the optimal

control action that cancels out the disturbances:

X

t=,

L

1(

B

)

L

3(

B

)

L

2(

B

)

L

4(

B

)

"

t (4.30)

or in terms of adjustment:

r

X

t= (1,

B

)

X

t=,

L

1(

B

)

L

3(

B

)(1 ,

B

)

L

2(

B

)

L

4(

B

)

"

t (4.31)

whererdenotes the dierence operator. This general solution to nding the MMSE control

action is treated in detail by Box and Jenkins (13).

Case 1. Now let's consider a special case of this general TFSM methodology with the following simple transfer function:

Y

t=

gX

t (4.32)

Let the error be described by a dierenced series which is a one lag moving average se-ries. This is a special case of the general ARIMA(p,d,q) denoted ARIMA(0,1,1) or simply IMA(1,1).

N

t,

N

t

,1 =

e

t ,

e

t

,1

where e

t

iid

(0

;

2

e) (4.33)

The complete model of the deterministic and stochastic components is:

"

t+1=

Y

t+1+

N

t+1 (4.34)

Now it can be shown for this error structure that:

N

t+1= ~

N

t+

e

t+1 (4.35)

and since

e

t+1 is unpredictable random noise, the MMSE forecast is ^

N

t+1 = ~

N

t where as

noted earlier the tilde denotes an exponentially weighted moving average (EWMA). Also note that the forecast error

e

t+1 in this case is exactly the same as

"

t+1. Now equation 4.34

(33)

0 =

gX

t+1+ ~

N

t (4.36)

and thus the control action using the general TFSM methodology is:

X

t+1 = ,

1

g

N

~t (4.37)

or in terms of adjustment:

r

X

t +1 =

,

(1,

)

g "

t (4.38)

The special case of TFSM just described is related to the EWMA controller de-scribed earlier in section 4.2. Consider the case of one adjustable variable

X

t and one

response variable

"

twith a target of zero. Recall that the drift forecast ^

c

t+1 for the EWMA

controller is:

^

c

t+1= (1

,

)(

"

t,

gX

t) +

~

c

t

,1= ~

N

t (4.39)

This EWMA forecast is identical to the result found in the special case of TFSM described by equations 4.32, 4.33, and 4.34. Thus the MMSE forecast of an IMA(1,1) model is always just the EWMA forecast. In addition the control action resulting from the EWMA control algorithm results in solving the following equation:

T

= 0 = ^

c

t+1+

gX

t+1 (4.40)

which results in the same control action given above in equations 4.37 and 4.38 for the special case of TFSM discussed above.

In fact, this same controller case can also be viewed as a special case of the PID controller described in section 4.1. Equation 4.37 can be rewritten by summing the adjust-ments in equation 4.38 to get:

X

t+1 =

X

0 ,

(1,

)

g

t+1

X

j=1

"

j (4.41)

(34)

Case 2. Now consider another special case of TFSM. Suppose that just a rst order discrete dynamical system is appropriate. So the deterministic transfer function model is the following:

r

Y

t= 1

(

gX

t,1

,

Y

t)

or

(1 +

r)

Y

t=

gBX

t (4.42)

The eect of any adjustment in

X

t will take full eect at time

t

+1. Let the noise again be

described by equation 4.33. The deviation from target after any adjustment results in the following relationship:

"

t+1 =

Y

t+1+

N

t+1=

g

(1 +

r)

X

t+

N

t+1 (4.43)

Using the same EWMA forecast that was shown in Case 1 to be the MMSE forecast for this error model, the MMSE control action can be derived as:

r

X

t=,

(1,

)

g

((1 +

)

"

t,

"

t

,1) (4.44)

Summing these adjustments (assuming the system is initially on target) gives the following form:

X

t=,

(1,

)

g "

t,

(1,

)

g

t

X

j=1

"

j (4.45)

which can be seen to be a discrete PI controller which is just a special case of a discrete PID controller with appropriately chosen constants. In some cases the resulting MMSE controller from the general TFSM methodology is just a particular form of a discrete PID controller as illustrated here, but the TFSM methodology actually encompasses a much broader class of controllers than can be described by PID controllers (13).

(35)

noise inputs in the analysis" according to Astrom (1). More detailed discussion of how to derive TFSM control algorithms are given by Box, Jenkins, and Macgregor (14), Box and Jenkins (13), Macgregor (46), and Wilson (74). Multivariate versions of this control methodology are discussed in Harris and Macgregor (32).

4.6 Constrained Control

Constrained control not only attempts to minimize the deviation from target of the response or output variable, it has as its objective to constrain or restrict the variation in the adjustable variable. The objective function that is minimized is the following:

2

"+

2

X (4.46)

where

is a weighting parameter between the two objectives. It is often the case that substantial reductions in variation of the adjustable variable can be achieved at the expense of only a minor increase in the mean squared error from target of the output variable. Deriving constrained controllers can be dicult, but it is accomplished and explored by several authors such as (13), (14), and (8) by extending the concepts of TFSM control. A suboptimal but simpler approach is Clarke's constrained controller (20), (21). As will be discussed in section 4.8, another approach to constrained control is to use dynamic programming which provides an explicit methodology for the desired weighted optimization problem.

4.7 Spectral Factorization

(36)

to errors in the specications in the parameters as well as errors in specifying the model form.

4.8 Dynamic Programming Based Control

Dynamic programming is another powerful technique that has been used exten-sively for solving process control problems as illustrated in detail by Bertsekas (9), Astrom (1), and others. Dynamic programming uses state space modeling to model the process and then iteratively minimizes an objective function or cost functional. The typical problem for linear systems and quadratic costs can be stated as nding the control setting which minimizes:

J

(

x

0) =

E

w k

f

x

0

N

Q

N

x

N +N

,1 X

k=0

(

x

0

k

Q

k

x

k+

u

0

k

R

k

u

k)g (4.47)

subject to the system equation:

x

k+1=

A

k

x

k+

B

k

u

k+

w

k (4.48)

where

w

k are independent random disturbances,

x

k is the response or state variable,

u

k

is the adjustable variable,

Q

k and

R

k are known cost matrices, and where

A

k and

B

k are

known parameter matrices for modeling the change in

x

k. The transpose of a vector or

a matrix is denoted by the 0 notation. Notice the cost functional allows incorporation of

constrained control.

The dynamic programming problem is solved iteratively with:

J

N(

x

N) =

x

0

N

Q

N

x

N (4.49)

and

J

k(

x

k) =

inf

ukUk (x

k )

E

w

k f

x

0

k

Q

k

x

k+

u

0

k

R

k

u

k+

J

k+1(

A

k

x

k+

B

k

u

k+

w

k)

g (4.50)

for

k

= 0

;

1

;

2

;:::;N

,1. The basic steps of the algorithm are:

1. Start with time step N and get

J

N(

x

N) =

x

0

(37)

2. For

k

=

N

,1, nd the minimum associated with

J

k(

x

k) typically by taking a

deriva-tive.

3. For

k

=

N

,1, evaluate

J

k(

x

k) at minimum found in last step.

4. Repeat steps 2 and 3 for

k

=

N

,2

;N

,3

;:::;

0 .

5. Control rule is the set of minimums found in step 2.

Central to the logic of the iterative algorithm is the principle of optimality given by Bellman (7) who was an early pioneer in this eld. The principle of optimality states that if the optimal policy for all time periods

t

= 0

;:::;N

is found, then the truncated version of this policy for time periods

t

=

i;:::;N

is also optimal for the associated subproblem.

Using the algorithm, the solution of the typical linear systems with quadratic criteria problem described by equations 4.47 and 4.48 is:

k(

x

k) =

L

k

x

k

where L

k =,(

B

0

k

S

k+1

B

k+

R

k)

,1

B

0

k

S

k+1

A

k (4.51)

The control action

k(

x

k) is the optimal setting for the adjustable variable

u

k. The matrices

S

k are given recursively by:

S

N =

Q

N (4.52)

and

S

k =

A

0

k[

S

k+1

,

S

k

+1

B

k(

B

0

k

S

k+1

B

k+

R

k)

,1

B

0

k

S

k+1]

A

k+

Q

k (4.53)

Equation 4.53 is called the discrete matrix Riccati equation and is the discrete time analog of a Riccati dierential equation.

If the matrices involved in the discrete matrix Riccati equation are constant and thus can be referred to without their subscripts, then as

k

goes to negative innity the solution of the

S

k matrices tends to a `steady-state'

S

found by solving the algebraic matrix

Riccati equation:

S

=

A

0[

S

,

SB

(

B

(38)

Furthermore, the solution of the problem of a linear system with quadratic criteria and constant matrices

A

,

B

,

Q

, and

R

can be approximated by:

(

x

) =

Lx where L

k=,(

B

0

SB

+

R

),1

B

0

SA

(4.55)

where

S

is the `steady-state' matrix solved for in equation 4.54.

When solving dynamic programming control problems, it is sometimes the case that the optimal control policy where

w

kare stochastic disturbances is the same as the

op-timal control policy for the deterministic problem where

w

k are not random but are instead

known and equal to their expected values. This property is called certainty equivalence and section 4.5 made use of this property to derive MMSE controllers. If certainty equiv-alence holds, the control law only depends on the expected value of the disturbance and not on other aspects of its distribution and hence dealing with the whole distribution is unnecessary.

Dynamic programming can be generalized using a more complicated system equa-tion which is also termed a state space model. The technique of adding other state variables to the original state vector

x

k is called state augmentation. This technique allows the

prob-lem to include correlated disturbances as shown in Bertsekas (9). For instance, a delayed rst order dynamics model with an IMA(1,1) noise model is solved by MacGregor (46) using dynamic programming with state augmentation. MacGregor also obtains the same solution using a generalized version of the Wiener Hopf techniques described by Wilson (74).

Another key concept that can be embedded into dynamic programming is the problem of imperfect state information. The typical control problem for a linear system with quadratic costs can be expanded to the case where the response or state variable is seen imperfectly with noise and is never actually observed. So the problem involves minimizing the cost functional of equation 4.47 subject to the system equation 4.48 and additionally the following measurement equation:

y

k=

C

k

x

k+

v

k (4.56)

where

v

k is a random disturbance which may depend only on the current

x

k and previous

u

k,1,

y

k is the quantity that is actually measured which is

x

k measured with noise, and

C

k is a known parameter matrix for each

k

. The information vector keeps track of what is

(39)

I

0=

y

0

;

I

k= (

y

0

;y

1

;:::;y

k

;u

0

;u

1

;:::;u

k)

for k

= 1

;

2

;:::;N

,1 (4.57)

The algorithm changes to become a function of the known information vector rather than unknown

x

k and the expectation in this algorithm is now conditional on

I

N,1 and

u

N,1.

The solution of the problem of a linear system with quadratic criterion with state observed imperfectly diers from the perfect state information case only in that

x

k is

re-placed by the estimator

E

[

x

kj

I

k]. That is, the control law is identical to equation 4.51

except that:

k(

x

k) =

L

k

E

[

x

kj

I

k] (4.58)

The

L

k matrices are dened the same and the

S

k matrices are again dened by equations

4.52 and 4.53. In this case of imperfect state information, the increase in the cost functional evaluated at the optimal control law is directly attributable to a term that can be identied as estimation error.

The above problem can also be viewed from the perspective of a pure estimation problem. The estimation problem is to nd the estimator which is closest to the true state given the information available. Using the traditional least squares criterion, nding the closest estimator means nding ^

x

(

I

) which minimizes:

E

x;If(

x

,

x

^(

I

)) 0

M

(

x

,

x

^(

I

))g (4.59)

where

M

is a positive denite symmetric matrix. It turns out that the solution to this problem is also the conditional expectation

E

x[

x

j

I

]. This dual result that the optimal

control law from the control perspective and the optimal estimator from the estimation perspective are identical in this case is called the separation therom for linear systems with quadratic criteria. This theorem implies that controllers can be designed in two parts { an actuator corresponding to the control action of multiplying by

L

k (which can be found

by solving the simpler perfect state information case) and an estimator component which involves nding

E

[

x

kj

I

k] (which can be found without being concerned with the eect of

(40)

The next hurdle is nding

E

[

x

kj

I

k], but in some cases the result is known. In the

special case where

w

k,

v

k, and

x

0are from a spherically invariant distribution (the Gaussian

distribution is the prime example), the Kalman lter gives the recursive solution (37). For the general case where the distributions are not spherically invariant, using a linear least squares estimator instead of

E

[

x

kj

I

k] is still optimal in the class of estimators that are linear

functions of the state. Furthermore, the recursive Kalman lter can also be used to nd the least squares estimator by using it rather than

E

[

x

kj

I

k] in the iterative Kalman lter

algorithm.

Consider the system equation 4.48 with

B

kconstant over time as

B

and the

mea-surement equation 4.56. Let

w

k

N

(0

;W

k) and

v

k

N

(0

;V

k). Then the recursive Kalman

lter equations are given by a set of predictor equations: ^

x

,

k+1 =

A

k

x

^k+

Bu

k (4.60)

P

,

k =

A

k

P

k

A

0

k+

W

k (4.61)

and a set of corrector equations:

K

k=

P

,

k

C

Tk(

C

k

P

,

k

C

0

k+

V

k),1 (4.62)

^

x

k= ^

x

,

k +

K

k(

y

k,

C

k

x

^

,

k) (4.63)

P

k= (

I

,

K

k

C

k)

P

,

k (4.64)

where

I

is an appropriately dimensioned identity matrix. The recursion starts with initial estimates ^

x

,

k and

P

,

k of the state and estimation error covariance respectively. The Kalman

gain matrix

K

k is central to the Kalman lter since it determines the relative weight or

importance of the predicted state versus the measured error in this prediction. If the cost matrices

W

k and

V

k are constant over time, then the estimation error covariance

P

k and

(41)

Chapter 5

Nonlinear Model Based APC

Often nonlinear behavior can be approximated by a linear model and the tech-niques of the previous chapter are appropriate. However, there are approaches that more explicitly try to model nonlinear behavior in an eort to achieve more viable control schemes. Nonlinear model based approaches include using neural networks, wavelets, and the more complicated general form of dynamic programming.

5.1 Neural Networks

There are a lot of current so called `learning' algorithms that have been applied to run to run control. The most prominent of these is based on neural networks which generally use one hidden layer to emulate the eect of latent variables and this results in the following model:

y

k=

g

(2) 0 @

t

X

j=1

w

2;j;k

g

(1)

s

X

i=1

w

1;i;j+

j !

+

k

1

A (5.1)

The function

g

(1) is typically a S-shaped sigmoidal function similar to a logistic function

and

g

(2)may also be a nonlinear function, but is sometimes just a linear function. The

w

1;i;j

and the

w

2;j;k are both sets of weights or coecients while the bias terms

j and

k act

as constant or intercept terms. Mozumder, et al. (50) and Stefani, et al. (66) investigate another similar layered model.

(42)

is not interpretable and thus an underlying understanding of the process is not gained. Also care must be taken not to overt the model. If

p

is the number of parameters in a particular model, overtting can be tempered by invoking a penalty for extra parameters by minimizing an objective function such as

E

[(

y

,

y

^)

2] +

p

2 (where

represents the relative

weighting of the two goals) over the possible models instead of the typical objective function of

E

[(

y

,

y

^)

2]. The minimization can be accomplished through dierent methods, but the

most common is back-propagation which just involves taking the derivative with respect to the unknown weights to perform the usual calculus minimization. It is also necessary to set aside a portion of the data for independent cross-validation of the nal model chosen.

5.2 Wavelets

A promising strategy for modeling highly nonlinear data is the use of wavelets. Some work is being done by Rying, et al. (57) to apply the concepts of wavelets to develop control systems. This recently developed mathematical technique is able to represent com-plicated nonlinear functions very accurately and somewhat parsimoniously using wavelet basis functions.

5.3 General DP

In section 4.8, the dynamic programming algorithm was presented for the case of a linear system with quadratic costs. However, the most general form of dynamic program-ming does not require a linear system or quadratic costs. The general cost functional that is minimized is:

J

(

x

0) =

E

w k

f

g

N(

x

N) +

N,1

X

k=0

g

k(

x

k

;

k(

x

k)

;w

k)g (5.2)

subject to the system equation:

x

k+1 =

f

k(

x

k

;

k(

x

k)

;w

k) (5.3)

The algorithm used to solve this problem uses:

(43)

and

J

k(

x

k) =

inf

ukUk (x

k )

E

w

k

f

g

k(

x

k

;u

k

;w

k) +

J

k

+1[

f

k(

x

k

;u

k

;w

k)]

g (5.5)

(44)

Chapter 6

Adaptive APC

Up until this section, it has been assumed that the parameters of the process are known at least approximately. Control schemes that use measurements on-line to estimate the parameters and improve the control of the process are called adaptive controllers. The controller is only considered adaptive if the controller uses \measurements with advantage" according to the denition adopted by Bertsekas (9), but there is some controversy about the most appropriate denition of adaptive. Updating the drift or intercept term is not considered adaptive since it is basically just adjusting for error. The parameters that adaptive schemes estimate on-line are the slopes or what is sometimes called the sensitivities of the process.

6.1 EVOP and DOE Approaches

Control algorithms based on the sequential design of experiments and general response surface methods or sometimes locally weighted regression actually induce small perturbations in the process so as to explore the design space. If no perturbations are introduced as in the traditional control scenario then there is usually a lot of data; but this data is not very worthwhile since it is all gathered basically at or near one setpoint since most algorithms have as an objective to minimize the changes in the recipe in order to maintain a stable process.

(45)

most appropriate near the beginning of a process when it is desired to optimize the process on-line rather than using o-line experiments to characterize the process. The proprietary optimization software Ultramax uses a similar approach of perturbing the process in order to learn about the parameters of the process. An early version of this strategy is the evolu-tionary operation mode (EVOP) of Box and Draper (12) which has been used extensively in the chemical industry. Initially performance is sacriced in these algorithms by introducing the non-optimal perturbations either at the beginning of the process or after a disturbance, but at the same time the cost of o-line experimentation is greatly reduced.

6.2 Avoiding Scrap: Some Clever Approaches

Mozumder, et al. (50) and Stefani, et al. (66) formalize the industry practice of using monitor wafers to avoid producing scrap. When the process is deemed out of control by generalized SPC, then perturbations are used to requalify or tune the process on the monitor wafers instead of the actual wafers. Another approach is an enhanced version of the Bayesian rapid mode controller described earlier in section 4.4. The natural perturbation in the inputs necessitated by a shift in the process is used as an opportunity to update the slope term. A few runs under just an intercept adjustment is used to get data for a new estimate.

6.3 Self Tuning Regulators

(46)

6.4 Dynamic Programming with Unknown Parameters

(47)

Chapter 7

Complementary Roles of SPC and

APC

In the past, the use of process monitoring and active process adjustment were considered mutually exclusive alternatives. However, now several authors including Box and Kramer (15) have given convincing arguments that these two successful control strategies actually complement each other and can be used most successfully in tandem. Vander Wiel, et al. in (70) and Tucker, et al. in (68) have given one strategy called algorithmic statistical process control (ASPC) for combining active adjustment and process monitoring. Their control algorithms are based on the techniques of TFSM discussed in section 4.5. A simple analogy is presented of controlling the operation of a car. On one hand, active adjustment is needed in the form of steering, braking, and shifting gears to keep the car on the road; but, just as important, is the detection and correction of out of control conditions such as a at tire or signs that the car is breaking down. However, the most appropriate way to monitor the process while active adjustment is taking place is left as an open question.

(48)

can not be detected and corrected. On the other hand, the main complaint of control engi-neers is that control charting is inecient. One common misconception is that the only data that can be controlled is data that is stationary, independent, and identically distributed. However, nonstationary processes can be controlled through adjustment as long as a model for the dynamics of the system and the structure of the error can somehow be estimated.

Reasonable responses to the above criticisms can be found in all cases. The reason why a feedback controller attempts to compensate rather than correct the disturbance may be that the disturbance is inherent to the process or its cause is not known or it is just not feasible due to costs or other reasons to remove the disturbance. Box and Kramer (15) demonstrate that a mistuned feedback controller will result in overcompensation, but if properly tuned the feedback controller will result in a minimum mean square error (MMSE) from target controller in the case where the cost of adjustment and taking observations is negligible. This may result in a controller with no feedback component if the process is stationary about some xed mean, but in most real situations a mildly nonstationary model will be appropriate.

The most major criticism of feedback controllers is that they tend to conceal the disturbance, but this problem can be avoided. If the dynamics of the system are known, the actual original disturbances can be reconstructed. These independent and identically distributed disturbances are the most obvious quantity to chart in a SPC scheme if it is desired to reap the benets of SPC process improvement within the APC framework. How-ever, a mistuned controller can cause the pattern of the disturbance to be blurred and in complex situations it may be necessary to lter the noise to accentuate it for detectability. Capilla, et al. (18) suggest that detecting all types of disturbances in the process neces-sarily requires a variety of charting schemes. In their example which is based on Clarke's constrained control, other quantities besides the reconstructed disturbances were found to be helpful for detecting out of control conditions.

(49)
(50)

Chapter 8

Multi-Systems Control Approaches

8.1 Serial Multi-Stage Control Approaches

The usual approach to controlling the many steps of manufacturing processes is to consider each step in isolation. This does not utilize the correlation between the steps which will be exploited by the proposed multistage modeling framework. Multistage modeling is becoming more feasible due to the extent and the sophistication of data collection techniques that are being developed in many industries. For instance in the semiconductor industry, in situ sensors are used to collect data in real time while techniques such as ellipsometry, scanning electron microscopy (SEM), and electrical measurements have been used to collect data even at extremely small dimensions. One manufacturer reports that they already use lasers to mark wafers to facilitate the tracking of the wafer's history as it makes its trip through the manufacturing process (62). This tracking system through multiple stages has already been used to detect correlation between response variables which has led to signicant improvements in their processing. Despite the viability and potential impact of serial multistage modeling and control, there has been little research done in this area. This section will investigate the work done to date in solving the serial multi-stage control problem.

8.1.1 Rao's Multi-Stage Monitoring

(51)

the process.

8.1.2 Leang's SPC Triggered APC Control

Leang, et al. (40) accomplish control of a three stage photolithography sequence by triggering feedback and feed-forward control through the use of SPC based alarms. Although SPC monitoring can be used in the APC framework as discussed in chapter 7, the concept of triggering APC with SPC is a very dierent philosophy. The APC controller that they employ to control drift when it is detected is described as \similar" to the PCC controller presented in section 4.3.

The ad hoc model for the process presumably assumes that in general the process has stationary independent and identically distributed errors so that SPC techniques are appropriate. However, from time to time the assumption is that the process parameters of the linear models of the processes change due to rapid shifts or gradual drifts which are each detected by appropriate SPC methods which generate malfunction and control alarms respectively. Malfunction alarms trigger inspection by an operator, but control alarms are handled by the controller automatically. Feed-forward alarms are also used and these are detected by a version of acceptance charts based on whether the output of one stage is acceptable as an input to the next stage.

Leang and his co-authors actually present two dierent control strategies termed local and global control. In the local controller the control alarms invoke a routine of weighted least squares (weighting recent data more heavily) to generate new estimates of the parameters. These new estimates are used to produce a new recipe by optimizing a weighted sum of squared output deviations from target subject to the constraints that the settings lie in restricted ranges. In addition, the feed-forward alarms work in a similar fashion except that they adjust the subsequent stage recipe within the current run. The possible need for perturbation for stable estimation is not assessed.

(52)

a weighted cost scheme to determine the new optimal specications. The eect of using these new specications in the local control algorithms is that inputs from various stages get adjusted rather than just the current stage as is done in feedback control or just the subsequent stage as was done in feed-forward control.

The global controller has as its criterion to maintain the nal stage output on target even if it requires sacricing performance of intermediate stages through the adjustment of the intermediate targets. This is one way to approach the desired global optimization which will allow a multi-stage controller to outperform a controller which treats the control of each stage in isolation. A signicant drawback of this controller is the strategy of waiting until the occurrence of a signicant drift before implementing APC techniques. This will be inecient if the processes actually exhibit nonstationary drift as an inherent part of the process.

8.1.3 Vaidyanathan's Multi-Stage BMWBO APC Control

Vaidyanathan (69) uses dynamic programming to develop the Batch-Wise Myopic Within Batch Optimal (BMWBO) control scheme along with other suboptimal versions of this controller for more practical implementation. The name of the controller stems from the fact that the controller uses the dynamic programming algorithm to optimize the current batch across stages rather than across runs or batches. So the controller is optimal within that batch, but does not consider all batches or runs in the optimization. Since the controller does not consider past runs explicitly, the controller is only capable of feed-forward control of the process. The objective function of the dynamic programming methodology includes quadratic forms of control variables' deviations from their nominal values for each stage to avoid excessive control action. But a quadratic form only of the nalstage responses' deviation from target is included.

(53)

considered, modeling drift across runs or batches is not attempted.

The uncertainty in the parameters is handled by using the Bayesian framework to specify conjugate prior distributions. The conjugate distributions are used to simplify the derivation of the posterior distribution. The resulting posterior parameter estimates are used within the dynamic programming algorithm to derive control laws that directly con-sider the uncertainty in the parameters. Although some \passive learning" is accomplished over runs due to the updating of the posterior distribution, convergence of the parame-ters is not achieved and the objective of learning is not explicitly included in the objective function.

8.2 Parallel Multi-Stage Approaches

(54)

Part II

References

Related documents

Compared to an existing simple model for storing database at multiple devices in parallel, the proposed optimal multi storage parallel backup data compression

Proposed approach is based on using a fault estimation scheme together with a Robust Unknown-Input Observer (RUIO) that allows to estimate the fault as well as the robot state,

The ARIMA model uses the frequentist approach in forecasting the future values of a time series while state space models use the Bayesian approach. The results of this

However, whereas Fiebig’s scheme and all other SIC schemes designed for FFH-MA are applicable to HLMV combining [1]–[5], one of our proposed schemes invokes the SLC, while the

The predictive algorithm uses a Direct Flux Vector Control scheme based on a multi three-phase approach, where each three-phase winding set is independently controlled.. In this

In this paper, a decentralized method based on SA approach is proposed in a decentralized way for coordinated secondary voltage control problem of highly interconnected multi-area

In general, it has been found that the solution methods to the recursive Bayesian estimation problem are amenable to parallel implementation, as substantial speedup has been

In this paper, we propose queueing models for a multi- model multi-input MLS. We consider two types of archi- tectures, namely the parallel MLS and the shared MLS. In the parallel