Constant Forgetting Factors - Least Squares Parameter Estimation

CHAPTER 3 Adaptive Prediction

3.3 Least Squares Parameter Estimation

3.3.2 Constant Forgetting Factors

Since

= ^ x { k ) x(k) ^ (3.38)

it is clear that the norm | |5| | tends to infinity for persistent excitation as the number of observations increases. As a result the norm of the gain matrix ||K|| tends to zero so that the parameter estimate 6 tends to a constant value d . Such behaviour

is acceptable if the parameters of the process are indeed constant. However for slowly time varying parameters the algorithm should "track" the parameters. This can easily be done by introducing an exponential forgetting of past data so that more weighting is given to recent data or as below using an estimate based on Kalman filtering. With exponential forgetting the functional to be minimised is modified to

V = {3,29)

j=i

Where 0 < (S < 1 is a forgetting factor, when P=1 all data is weighted equally, with /?< 1 recent measurements are given greater weighting than older ones. The recursive equations are then modified to become

P+x(k)

(3.40)

P

The update of the information matrix S]^ becomes with the inclusion of the (3 forgetting factor

Sj^ = + x{k) x{k) ^ (3.41)

or equivalently

“ (i“P)‘S'jt-i + x{k)x{k) ^ (3.42)

This shows clearly that the data matrix update involves discarding or forgetting (l-^)Sj^ of the previous data together with the addition of the new data term x (k) x (k)"^.

Since beta (3<1 (3.40) shows that with this simple modification I IP I I and I|iC| I do not tend to zero and the estimator can track slowly varying parameters. The estimation of 6 is affected by all data but the asymptotic sample length or number of important

observations is given as l/d-p).

Use of this technique requires precaution when very little exciting data is available. The vector x(k) may be close to zero which will after a period result in

(3-43)

giving an exponential blow-up of the P matrix. In picturesque speech the algorithm is forgetting but at the same time getting keener to learn. Blow up of the covariance matrix results in extreme sensitivity to disturbance and noise. In adaptive control loops this extreme sensitivity may cause rapid movements of the parameter estimates when the data vector becomes exciting after a prolonged "quiet" state and result in instability.

To avoid parameter blow-up difficulties various empirical methods (Goodwin and Sin 1984) such as putting upper bounds on the diagonal elements of the P matrix or its trace have been proposed, unfortunately the choice of upper bound is often not obvious.

3.3.3 VaricüDle Exponential Forgetting

It is logical that the amount of exponential forgetting should depend on the information content of the data. The absence of exciting data will then cause (3 to tend to unity whereas any sudden/gradual parameter changes should reduce /S to extract the maximum information from the new data. The following data dependent modification was proposed by A s t r o m (1980)

(J(^c) = 1 - (3.44)

Where e (k)^ is the prediction error at time k and is a moving average mean value of e (k)^ , a is a small (typically 1/1 0 0 0) constant. A sudden increase in prediction errors reduces p (k)

temporarily, enabling an increase in the covariance matrix and rapid adaption.

Fortescue et al (1981) develop an algorithm that calculates the required forgetting factor to keep the weighted sum of squares of posterior estimation errors constant. The weighted moving average sum of squares of the posterior errors is used as an indication of the information content of the estimator. The method can be justified for near deterministic systems by considering the posterior prediction error at each sampling interval. If the error is small then the possibilities are

^Nothing has happened, process is at steady state.

*An excitation has occurred but the estimated parameters are correct.

*The forgetting factor is already sensitive enough to reduce parameter errors.

A large error indicates the need for increased sensitivity by reducing the forgetting factor.

The algorithm finds the value of the time varying P (k) to keep constant the "information measure" which is a time varying weighted version of least squares.

V{k) = V{k-1)

An exact solution involves solving a complex quadratic relationship for p(k)

For small excitation where the errors e (k) are close to zero, (3.46) shows that P (k) tends to 1, but for large errors e (k)^ increases and makes P (k) tend to 0 temporarily which implies a

rapid increase in P so that the desired adaption can occur. After satisfactory adaption /? (k) will tend to 1 as the prediction errors decrease. In terms of implementation Eg , the nominal information content has to be chosen instead of /S. A reasonable choice for Eg ,which is related to both the asymptotic sample length and expected noise variance , would be where is the expected noise variance and N the desired nominal ASL. A small value of N will give a large covariance matrix and sensitive system , larger values imply slower adaption and lower sensitivity, larger values will result in slower less sensitive adaption. The tacit assumption in the above choice of Eg is that the variance is constant. This will be a false assumption in many real processes. An increase in noise or sudden load disturbance for a fixed value of Eg will attempt to keep constant. The recent data would then be weighted strongly whereas in the absence of any real parameter changes the old data is more accurate. In practical applications to avoid complications with the quadratic (3.46), the approximate Fortescue algorithm

e(k) = y(k) a {k) - ^ 0 G (A) x(A)^p/x(ic) p 47J So p (k) = 1 -a (k) l+x(k) ^P^_^x{k) p^ = [T-kpc(k) n P(k)

is used. This algorithm is closely related to the algorithm of Astrom, as the following shows.

Rewriting (3.14)

Where w(k) is N(0,a^) . The prediction error at time k will be

e(k) = y(k)-x(k) (3.49)

= x{k)^[0o - ^},_J+w{k)

From E(d]^/J=6 it follows that the expected prediction error will be zero.

Ee{k) = 0

and the expected mean square prediction error is

(3.50)

E[e{k)^] =

E[x{k)^. O - 0 j t - i ) + ^ik)] [x{k)'^. ( 6 - 0 j t - i ) + iv(ic) ] ^

= x(ic)^£’[(e-0j,_,) (0-0;,_,)]x(k) + 2Ex(k)^(0-0^_J p/(k) +Ew{k) (3.51)

From (3.33) for the covariance matrix and noting that w(k) is a white sequence

Ee(k)^ = x{k)'^a^P^_^x{k) + )

= [x(k) + 1]

The mean value used in Astroms algorithm is directly proportional to (3.52) so that the two algorithms are approximately the same. The variable forgetting methods discussed cannot guarantee that

P will not get very large, and in the presence of noise they can still result in blow-up. Modifications that monitor the trace of

P and revert to (3=1 whenever trace P is bigger than some limit can be included (Cordero & Mayne 1981) .

Different variable forgetting factors can be devised by stipulating alternative measures of information to be kept constant by the variable forgetting factor. If for example the norm of the covariance matrix weighted by the current data vector is used as measure of information,(this is related to the sigma diagnostic of Clarke & Gawthrop (1979) ) , then (3 must be chosen to make

x{k) ^Pjpc(k) =x(k)'^Pj^_^x{k) (3.53)

The solution for (3 is

P (k) = 1 - 0 {k)

x{k)'^Pj^_^x{k) (3.54)

o (k) =

1 + x{k)'^Pj^_^x{k)

This demonstrates that when the information content is small (3

will be close to 1.0, when excitation increases (3 decreases to increase the rate of adaption.

In document Prediction and control of the motions of marine structures (Page 65-71)