where I is the identity matrix and Ô is an arbitrary weight that defines whether the algorithm behaves more as a gradient descent or more as a Newton approach If ô is zero
algorithm 10 times to a 1000 point orbit In section 3.5 the dynamic error was reduced to near machine precision whereas here there are points where the dynamic error has
actually increased. Thus, for non-hyperbolic systems, the limitations o f noise reduction associated with poor function approximation will not just be o f the order o f the approximation errors, as they would be with hyperbolicity. This means that function approximation plays a very important role in achieving good noise reduction and it may well be the bottle-neck o f the process as a whole.
However this does not necessarily mean that if we do not know the dynamics exactly we must reconcile ourselves to being able to, at best, obtain moderate performance from our algorithms. This would only be true if there where no nearby mapping functions that contained shadowing trajectories o f the orbit o f interest. In fact we can hope that there will be plenty. The work by Nusse and Yorke [1988] on the shadowing o f pseudo-orbits from a one-parameter family o f one-dimensional maps by true orbits from nearby systems lends weight to this idea. Furthermore we are only interested in achieving this shadowing for a finite length trajectory. If there exist nearby functions that satisfy this weaker shadowing condition it is clear that some nearby maps will be far more effective than others when applying noise reduction. It would therefore be wise to try to optimise this in the noise reduction algorithm. Although may this appear to be a daunting task, in the next two sections, we will discuss some ideas that go some way towards achieving this aim.
4.2 Estimating the Dynamics
It is not immediately clear what the relationship between function approximation and the noise reduction process is. However if we consider the problem o f noise reduction in terms o f the minimisation o f the cost function H as in chapter 3 this is the same cost function that is minimised in an Ordinary Least Squared function approximation, as described in chapter 2. The only difference between the two is that in each case the minimisation is with respect to different parameters.
Thus it is possible to consider the two processes simultaneously by using the extended cost function H(x,p). That is: a function o f both the trajectory and the parameter family, p , o f the function approximation (although the sets o f local approximations do not have
explicit independent families o f parameters defining them it is fair to assume their existence). This is used in section 4.2.1 to investigate how the choice o f the mapping function is chosen in the two step method. In section 4.2 .2 we go on to propose an optimised algorithm.
4.2.1 Alternating Noise Reduction and Dynamics Estimation
When the problem is posed in terms o f the cost function H(x,p), alternating between function estimation and noise reduction can be seen as minimising H by means o f zigzagging down the cost function. This idea is illustrated graphically in Figure 4.2.1. However this still ignores the problem that minimising the error by making adjustments to the trajectory only implicitly tackles the shadowing problem. The cost function is still degenerate due to the existence o f a d-\-dp dimensional minimum set, where dp is the dimension o f the parameter vector p, and thus how well the cleaned orbit shadows the original data depends on the resulting point in the minimum set that is reached. This will now be a function o f how the final mapping function is chosen.
Since the shadowing problem requires that the adjustments to the trajectory are kept small the bias between reducing the error by trajectory adjustment or parameter adjustment should overwhelming tend towards optimising the mapping function estimate. This method can be justified if a gradient descent algorithm is used, since, in this case, the process takes small steps down the extended cost function, constrained to the best estimate for the mapping function. Although, theoretically, it is better to take small steps followed by repeatedly refitting the mapping function, in practice, the gradient descent method suffers on speed and the best approach will be somewhere in between a Newton type method and the gradient descent approach.
Figure 4.2.1 A graphical representation of the two step approach to noise reduction, where the adjustments to p and the adjustments to x are made independently. The resulting
minimisation
scheme involves zigzagging down the cost function H(x,p).algorithm since it allows us to consider the performance on a whole spectrum o f methods from almost Newton to almost gradient descent by varying the Levenberg-Marquardt parameter 5. The data for this comparison was from the Henon map and was the same as that used at the end o f chapter 3. The function approximation used was a set o f Gaussian radial basis functions placed at the first 10 points in the embedded data set, once the data had been rescaled to span [-0.5,0.5]. The rescaling parameter in the Gaussian was set to 0.1. This function approximation was chosen since it produced a good fit, although no attempt was made to formally optimise it and the same function approximation was used in each case. Finally the embedding dimension used was 2, as before.
The mapping function was estimated using an ordinary least squared estimate and was followed by a single step o f the Levenberg-Marquardt algorithm detailed in section 3.4.3. Both these stages were then repeated 10 times, after which the distance between the cleaned trajectory and the original deterministic orbit was measured, where is defined as:
- y f
where x is the filtered trajectory and y is the original deterministic orbit. The sum o f the