• No results found

Iterative solution methods

3.3 General optimisation theory

3.3.4 Iterative solution methods

Once a suitable function has been determined to be minimised, the next step is to set up computational algorithms to find the minimum. Many reference works have been written on various methods and their characteristics. A few of the basic algorithms are treated in this section.

First-order Lagrangian method

As the name implies, only the first derivatives of the optimisation function L(x) to its parameters need to exist in the neighbourhood of the optimum state. The information of the second derivatives is neglected. The idea of the first-order Lagrangian method is that at any given point, the vector of first derivatives ∇xL(x) is pointing to the direction of the steepest upwards slope, hence the name steepest descend method. So, when taking a step in the reverse direction, one would expect to end in a new position with a function value lower than the previous one:

xk+1= xk− α∇zL(z) (3.43)

with step size α > 0.

With a step size α sufficiently small, L(xk+1) ≤ L(xk). Note that the step size may vary at every step. Whereas larger step sizes are interesting for quicker convergence, the inconvenience is that it risks to jump too much back and forth or even jump “over” the local minimum without gradually converging to that point.

Without taking into account inequality constraints, the solutions for the Lagrangian function

L(x, µ, ) = f (x) + µTgi(x) (3.44) are given by the iteration algorithm

xk+1= xk− α∇xL(xk, µk)

µk+1= µk− α g(xk) (3.45)

The convergence of this first-order algorithm is linear [42]. It converges to the closest local minimum as seen from the entry point of the sequence. Furthermore, when the Lagrangian function is twice continuously differentiable, there exists an upper limit for the step size for which an optimal point is always found.

Second-order method

Originally developed to find the roots of an analytic function, the method is applied in the optimisation environment to find the roots of the first derivative of the function to be minimised. For this, the Lagrangian function needs to be at least twice differentiable to all its parameters in the neighbourhood of the local optimum. A drawback of the method is that it is attracted by local minima and only local convergence is guaranteed. It is almost obligatory to start the searching sequence sufficiently close to the real optimum. The region of convergence can be enlarged using a merit function ([42], p.454). Moreover a number of functions have been developed to let the Newton-Raphson method (further named the Newton method) converge to the global solution.

The purpose of the Newton method is to find the roots x? of a given function

F (x): F (x?) = 0. For the optimisation case, not the function itself (the Lagrangian function L) but its first derivative needs to be set to zero.

xL(x) = 0 (3.46)

As the function F : <n 7→ < has n variables, (3.46) is implicitly a system of n equations. To solve this set of equations, the Newton method states:

xk+1= xk+ ∆xk (3.47)

∇2L(x

k) · ∆xk= −∇L(xk) (3.48)

It can be proven that if the optimal solution x? is indeed existing, the function ∇2F (x) is invertible in the neighbourhood of x?. The point sequence given by the Newton iteration is in this case said to be well-defined. The convergence to the local minimum is super-linear and at least of the second order if ∇2F (x) is

Lipschitz continuous3in the neighbourhood of z?.

Searching global convergence

The Newton method has a few drawbacks

• When the inverse of ∇2xF (x) in a particular region does not exist (for instance the optimising function can become linear (∇2xF (x) = 0), it is impossible for the iteration process to make a next step.

• It is not guaranteed that the sequence of optimisation function values descends at each step.

• Local extrema (minima as well as maxima) attract the solution when the starting point is close enough to one of them, even when they are not the global optimum.

• No guarantee is given that the real global minimum is found.

The general idea to cope with the drawbacks of the Newton method, is to make an interpolation of both the steepest descent method (first-order method) and (gradually) switch to the Newton method.

In order to make, at least for the first steps of the iteration process, the left- hand side of the change vector calculation invertible, a method often used is to manipulate the diagonal of this matrix so that it becomes positive definite (and as a consequence also invertible).

The modified direction calculation becomes ∇2F (x

k) + Γk ∆xk = −∇F (xk) (3.49)

with Γk a diagonal matrix, chosen such that the matrix ∇2F (zk) + Γk is positive definite.

3Lipschitz continuity is a strong form of function continuity where no gradient between

An easy understandable algorithm, developed in [46], is to take for

Γk = γk· I (3.50)

with I the identity matrix of the appropriate size.

Taking the positive scalar parameter γk very high compared to the elements of ∇2F (x

k), the total matrix becomes automatically positive definite. However, the solution of the step size ∆xk becomes small. With γk= 0, the method falls back to the normal Newton formulation.

A possible algorithm is the following:

• With a very high value of γ, calculate the step and do a provisional step forward.

• Accept the step when the optimisation function decreased, lower the value of γ, e.g. divide by 10.

• Refuse the step when the function increased, take a higher value for γ, e.g. multiply by 10.

As already pointed out, the algorithm takes small steps in the beginning, dictated by the steepest slope method. When the iterations progress, the Newton’s elements in the left-hand side of the step equation become predominant. As mentioned earlier, in the close vicinity of a minimum, the Hessian ∇2F (x

k) is invertible as well. However, this demonstration algorithm is not robust enough to solve complicated cases: when applying constraint functions, the algorithm becomes indecisive (i.e. the magnitude order of γ does not decrease significantly over the iterations) to activate the second order convergence and successive iteration go back and forth without advancing to an optimum. This effect is due to the multiplication of the Lagrange multipliers with the constraint functions. The method described in AppendixBapplies step size control to add robustness in case the Lagrange parameter step updates are fluctuating (subsectionB.3.3).