Two problems should be considered in any multi-dimensional minimization procedure. Firstly, how a direction
d
is chosen, and secondly, what step is taken in that direction away from a current point to the next. In the direction set methods, multi-dimensional minimization is reduced to a number o f one-dimensional minimization {line- minimization){scQ section 5.4). A sequence of directions are constructed, and the function is minimized in each direction, separately.Generally, it is desirable that a movement along a new direction does not spoil the minimization so far obtained along the direction d^. This is the concept o f mutual conjugacy being desirable in any direction set method. This criterion may be regarded as the major part of a minimization algorithm.
Various multi-dimensional minimization algorithms differ in the way that they define and update the next search direction. A diverse class o f them are based on the fact that f(x) (function value at point x) increases, or decreases in direction
d
according to the sign o f the directional derivative Vf(x) (i.e. gradient methods). Although minimization along the basis vectors can be attempted, minimizing the function only in these directions is inefficient when many tiny steps are required in a long, twisted and narrow valley. Using gradient information, on the other hand, is the most powerful means o f setting the directiond
when the number o f variables (directions) are large (Powell 1964). The methods o f finding a direction in which to search for the minimum.are divided into two main categories: the methods that do not require the evaluation of the derivatives, and those based on the directional derivatives. Although, both can depend on the properties of conjugate directions, different complexities and thus accuracies and timings can be obtained by them. Both types o f method have been revised and improved several times by different workers since their initial implementations. However, in the following sections only the latest known version of these algorithms are presented.
5.3.1- Powell method
The Powell method is a type of gradient method which does not require the evaluation of derivatives. A quadratic function f(x) can be approximated by its Taylor series as
f(x) » c - b.x + (l/2)x.A .x, Equ. 5.1
where vector b and matrix A are the first and second partial derivatives of the function. The gradient of such function can be expressed by
Vf= A.x-b. Equ. 5.2
Moving along some directions (d^+i) towards the minimum implies a change Ô in gradient
5(Vf)= A.(6x). Equ. 5.3
As defined by e.g. Powell 1964, the gradient must stay perpendicular to the previous direction d^, while moving along a new direction d^+i, in order to satisfy a conjugate condition for the two directions d^ and d^+i. From the above equation this is just
d^.0(Vf)= dk.A.dk+i= 0. Equ. 5.4
As stated by Powell 1964, a set o f n (number o f dimensions) linearly independent conjugate directions (or n line-minimization) will find the minimum o f a quadratic function. However, in a real situation the function is not exactly quadratic and thus the process needs to be iterated using a number of such the direction sets. Powell’s quadratically convergent method
on the properties o f the conjugate directions. He used the quadratic interpolation method to find the minimum of the function f(x) along a direction d (for a set of n directions).
The algorithm finds a quadratic function (e.g. Y(X)) which takes the same value as f(Xj,+^d) for three current values o f X (step), where x^ is the current point and d is a given search direction along the line x=x^+Xd. Having found the value of X at the minimum (i.e. X^) of the quadratic function Y(X), one o f the old three values o f X is replaced by X^. The process is repeated until the desired accuracy is obtained in the value o f X^ in each direction.
The directions dp d^,...» d„ are defined by the algorithm outlined in the following steps. The minimization in these directions are left for section 5.4.
a) Give an initial guess as starting point Xq in search space.
b) The n directions in the first iteration are initialized with the direction o f the basis vectors
c) For k= l,2,...,n calculate step-size X^^ by a line-minimization algorithm (see section 5.4), so that f(Xk_^4-X^dJ is a miniirium and define x^=x^.i+X|.dt.
d) Find one o f the old directions in which the function made its largest decrease (cL) and define Af=f(x^.J-f(Xn.), where 1 ^ ^ .
e) Calculate the function values fo=f(Xo), fn=f(x„) and fj=f(2Xn-Xo) at a point somewhere further along the proposed new direction.
f) If fd>fo and/or (fo-2f„+fjj) (fo-fg-Af)^ > (l/2)Af(fo-fd)^ repeat the above procedure (steps a to e) using the old directions dpd2,...,dn for the next iteration, and x„ for
the next Xq, otherwise do next step.
g) Minimize the function f(Xg4-Xd) in a new direction d=Xn~Xo, set x^+Xd as starting point (Xq) for next iteration, and replace the old direction d„ by d, where the rest o f the old directions would remain unchanged for the next iteration.
It was also shown by Powell that the way in which d is defined ensures the conjugacy criterion o f all the directions after proceeding n iterations. As well as being simple, the method was claimed to be more efficient than those methods which are based on the evaluation o f the function derivatives. However, it shares the problem o f dropping into local minima with other direction set methods.
5.3.2- Fletcher-Reeves method
The Fletcher-Reeves method (e.g. FRPRMN method) is a type of minimization algorithm which involves the calculation o f the first derivative of the function. A sequence o f mutually
conjugate directions were constructed by Fletcher and Reeves 1964, using the derivative of a function approximated by a quadratic form. The method makes each (gradient of function) orthogonal to its immediate predecessor, and each conjugate to its predecessor.
gk+i-gk=0 dk+i.G.dk=0 Equ. 5.5
where G is the Hessian matrix (which is positive and definite) and the gradient vector o f f(x). For a quadratic function f(x). The minimum along the direction d^ can be found at the point
Xk+i = %k + ^\dk, Equ. 5.6
when moving from a current minimum at x^. The value is the step size required to minimize f(x^+X^d|^). The gradient at such a minimum is obtained for a quadratic function as
gk+i = gk + ^*kG.dk (k=l,2,...,n) Equ. 5.7
X is chosen by a line-minimization algorithm (see section 5.4) to take the function to the minimum along the line where the new gradient g^+i is orthogonal to the direction d^, satisfying
-Vf.dk = gk+i^k = O' Equ. 5.8
The direction o f downhill gradient (as in the steepest descent method di=-gi) is used initially. Subsequent conjugate directions are chosen by the following direction sequence satisfying the mutually conjugacy o f the directions.
dk+i = -gk+i + Ydk (k=l,2,...,n), Equ. 5.9