The objective of optimization is to approach the optimum value iteratively from the start point. Optimization algorithms have been developed by various methods of choosing direc- tion and step length. Popular optimization algorithms for local and global optimization are surveyed as follows:
Trust region reflective algorithm
In an unconstrained minimization problem minimizingf(x), optimization algorithms seek a
proper stepsfrom the current positionxby various approaches for a smaller updated function
valuef(x+s)< f(x). In the trust region reflective (TRR) algorithm, the objective function
f is approximated by another function g(x), which is often quadratic, within a subspace of
the region of f around the current position x. This subspace is named the trust region R.
CHAPTER 4. OPTIMAL INPUT DESIGN FOR SYSTEM IDENTIFICATION 58 the trust region:
min
u q(u), u∈R (4.6)
Iff(x+u)< f(x), the current position is updated to x+u and the trust region is enlarged
and the procedure is repeated until the function value converges; If f(x+u) ≥ f(x), the
current position is not moved and the trust region is contracted and the step in equation (4.6) is repeated [89].
Sequential quadratic programming
Sequential quadratic programming (SQP) is one of the most popular methods for constrained optimization. Considering the general optimization problem in equation (4.1), the Lagrangian function given by:
L(x, λ, σ) =f(x)−λTa(x)−σTb(x) (4.7)
whereλand σ are Lagrange multipliers. The principle idea of SQP is to solve this problem
by working out a sequence of approximated subproblems. At the current position xk, a
subproblem is formed by a quadratic approximation of the Lagrangian function: min x,λ,σ L(xk, λk, σk) +∇L(xk, λk, σk) Td+1 2d TH kd (4.8) subject to : a(xk) +∇a(xk)Td <0 b(xk) +∇b(xk)Td= 0
whereHkis the Hessian of the Lagrangian function anddis the search direction. The solution
of the subproblem is used to find the position of the next pointxk+1. This iterative process
is done in such a way that the sequencex converges to a local minimum [90].
Interior point algorithm
The interior point (IP) algorithm is a method for linear and nonlinear convex optimization. It translates the general form of equation (4.1) into an equality constrained form given by:
min
x,s f(x, s) = min f(x)−µ
∑
ln(si) (4.9)
subject to :a(x) = 0, b(x) +s= 0
where ∑ln(si) denotes a barrier function and the slack variable si is a positive value for
restricting the logarithmic term. µ denotes a barrier parameter. Since µ converges to ze-
ro, the solution of equation (4.9) will approach the solution of equation (4.1). Therefore an optimization problem with equality and inequality constraints is reduced to an equality constrained problem. Both the Newton and conjugate gradient methods can be utilized to solve the approximated equality optimization problem [91].
CHAPTER 4. OPTIMAL INPUT DESIGN FOR SYSTEM IDENTIFICATION 59 Pattern search
The Pattern search (PS) algorithm is one of the popular direct search methods which can be used in functions that are not continuous or differentiable. This algorithm approaches an optimal solution iteratively without any assistance of the gradient or higher order derivative of the objective function. In each iteration, directions of search and corresponding sequencing, called patterns, are decided firstly. The variables move from the current position towards the first determined direction with a specified step. After that the function value at the updated position is computed and if the obtained value is smaller than the previous one, it is recognized as a successful poll. The new position becomes the current position of the next iteration and the step size is doubled. If the poll failed, variables will be moved along other available directions in order with the same step size and then with a reduced size until a successful poll occurs. Alternatively, pattern search can calculate the function values in all feasible directions then move to the position where the function value is the smallest. However as the feasible directions increase exponentially with the number of variables, the complete directional search only fits for optimizations with small amount of variables [92].
Although pattern search may not be as efficient as other gradient based deterministic algorithms, it has a unique merit. In non-convex optimization, gradient based algorithms converge to a local minimum because the reposition of variables is guided by the gradient and the gradient approaches zero at the local minimum then the process ends. However, the reposition of variable of pattern search is determined by an adjustable step size of arguments, which means the variable can move from one cone to another, provided that the function value at the position on the new cone is smaller than the current value. Pattern search hence has a capability of giving a global optimum.
Genetic algorithm
The genetic algorithm (GA) is inspired by the evolution theory of Darwin. It is capable of solving local optimization and global optimization based on the procedure of natural selection. Unlike most gradient-based deterministic algorithms, the genetic algorithm can be used to solve problems which have discontinuous or undifferentiable objective function. As a stochastic algorithm, it generates a population of solutions at each iteration and selects the best one, while most other stochastic methods operate on a single solution. The procedure of the genetic algorithm can be briefly described as follows [93]:
Initialization: Initially a random population, composed of many individual solutions, is produced as parents of the first generation. A proper size of the population is essential to the optimization result since an extremely large size will occupy most system resource and an insufficient one may omit the global optimum. Generally the random production takes
CHAPTER 4. OPTIMAL INPUT DESIGN FOR SYSTEM IDENTIFICATION 60 place in the entire feasible region, whilst when prior knowledge is available, the production of population can be manually restricted to a particular sub-region for higher probability of finding the optimal value. The quality of initial generation is improved correspondingly.
Selection: In each generation, all individual solutions are measured by a fitness function. The solutions which have better fitness have stronger probability to be selected as “parents” to breed the next generation. However, the selection is not solely guided by the fitness because it may lead the algorithm to quickly converge to a local minimum rather than a global minimum if low fitness solutions are completely omitted.
Regeneration: The selected individual solutions in the current generation are used to produce new solutions for the next generation, by following the rules of crossover and mutation [94]. The new generation resulting from the process of selection, crossover and mutation is different from the initial generation and is likely to have better fitness because the individual solutions are produced by the best “parents”. The process of selection and regeneration continues until a stopping criterion is satisfied.
Simulated annealing
The simulated annealing (SAN) algorithm belongs to the family of stochastic probabilistic methods. It is inspired by annealing in metallurgy which minimizes the internal energy by means of heating and slowly cooling the metal.
Initially, a state pointS is randomly generated in the feasible space and a temperature
T is given. Then a new stateS′ is produced whose position is based on a probability distri-
bution of the temperature and the corresponding value of the objective function is updated
subsequently. The increment of objective function value from S toS′ is calculated and the
new point is accepted if it causes a lower objective. Nevertheless, even if it raises the ob-
jective, S′ can still be accepted with a certain probability in order to avoid approaching a
local minimum. In the next iteration, the temperature is adjusted according to the annealing
schedule and a similar process is implemented to the new state pointS′ orS if no point is
accepted [95].
The simulated annealing algorithm is independent of the initial state. Theoretically it converges to the global optimum with the probability of 1, but the demanded experimental time to achieve a good probability of SAN is often extremely long and can even exceed that for a full search in the entire region.
The three local optimization algorithms, the TRR, SQP and IP all require a second
derivative of the Lagrangian function. These second variational methods are claimed to
CHAPTER 4. OPTIMAL INPUT DESIGN FOR SYSTEM IDENTIFICATION 61 Table 4.1: Features of optimization algorithms
TRR IP SQP PS GA SAN
Global optimization √ √ √
Input bound √ √ √ √ √ √
Linear equality constraints √ √ √ √ √
Nonlinear constraints √ √ √ √
Gradient based √ √ √
Direct search √
Stochastic Algorithm √ √
conjugate gradient method [96]. It is also worth noting that although global optimization algorithms have the capability of finding the global optimal value, they can be easily trapped at a local optimum. All of the global algorithms compromise between the convergence rate and the extent of the global optimum. Therefore parameters of global algorithms should be selected appropriately in different applications. In complex practical work, although no algorithm can guarantee a global optimum within a finite time, it is still favourable for a solution which satisfies the specific requirements to be found without knowing the existence of a better solution.
Optimization algorithms mentioned above are provided as Matlab functions by the MAT- LAB Optimization toolbox and are utilized in the optimal input design work in this thesis. Characteristics of the algorithms are listed as in Table 4.1. The toolbox is able to approxi- mate the gradient as necessary and the stopping criteria are given ready for the specification of users. In this thesis, the specification of stopping criteria in a certain optimization problem is kept unchanged between different algorithms in order to fairly compare their effects.