The general-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms is an optimization method that includes an option for box-constrained optimization and simulated annealing. The approach denoted as "optim ()" requires the following; an initial values for the parameters to be optimized over usually referred to as "par" in R-package, a function of interest to be minimized (or maximized) with first argument being the vector of parameter over which minimization is to take place, a function to return the gradient for BFGS and L-BFGS-B method. Method "L-BFGS-B" is that of Byrd et. al. (1995) which allows box constraints, that is each variable can be given a lower and upper bound. The initial value must satisfy the constraints.
This uses a limited-memory modification of the BFGS quasi-Newton method. If non-trivial bounds are supplied, this method will be selected, with a warning.
Quasi-Newton Methods
Considering the quasi-Newton approximations, it can be noted that most of the numerical studies of interior-point methods have focused on the use of exact Hessian information. It is well known, however, that in many practical applications, second derivatives are not available, and it is therefore of interest to compare the performance of active-set and interior-point methods in this context.
47
The Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The "full" Newton's method requires the Jacobian in order to search for zeros, or the Hessian for finding extrema.
More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid-structure interaction problems). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found (Haelterman, 2009) Observing that the search for a minimum or maximum of a single-valued function is nothing other than the search for the zeroes of the gradient of that function, quasi-Newton methods can be readily applied to find extrema of a function. In other words, if is the gradient of then searching for the zeroes of the multi-valued function corresponds to the search for the extrema of the single-valued function ; the Jacobian of now becomes the Hessian of . The main difference is that the Hessian matrix now is a symmetrical matrix, unlike the Jacobian when searching for zeros. In optimizing a function, the quasi-Newtow methods (a special case of variable metric method) are algorithms for finding local maxima and minima of functions. Quasi-Newton methods are
48
based on Newton's method to find the stationary point of a function, where the gradient is zero. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and the Hessian matrix of second derivatives of the function to be minimized.
In quasi-Newton methods the Hessian matrix does not need to be computed rather the Hessian is updated by analyzing successive gradient vectors. Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.
The first quasi-Newton algorithm was proposed by William Davidon C., a physicist who developed algorithm in 1959; the DFP updating formula, was later popularized by Fletcher and Powell in 1963. The most common quasi-Newton algorithms are currently the SR1 formula (for symmetric rank one), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory
49
extension, L-BFGS (Nocedal and Wright, 1999). The Broyden's class is a linear combination of the DFP and BFGS methods.
The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and it is used to find the root of a general system of equations (rather than the gradient) by updating the Jacobian (rather than the Hessian). One of the chief advantages of quasi-Newton methods over Newton's method is that the Hessian matrix (or, in the case of quasi-Newton methods, its approximation) does not need to be inverted.
Newton's method, and its derivatives such as interior point methods, requires the Hessian to be inverted, which is typically implemented by solving a system of linear equations and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of directly. This can be expressed as
) (x f P
B
k k
(2.4)k k
k
P
S
(2.5)) ( )
1
(
k kk
f x f x
y
(2.6)k k k
k k k k k
k k k k
k
s B s
B s s B s
y y B y
B
1
(2.7)
50
where, B is an approximation to the Hessian Matrix (H), s is the trial step, and Pk is the search direction
The L-BFGS Algorithm
The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm is employed for solving high-dimensional minimization problems in scenarios where both the objective function and its gradient can be computed analytically (Liu and Nocedal, 1989). The L-BFGS algorithm belongs to the class of quasi-Newton optimization routines, which solve the given minimization problem by computing approximations to the Hessian matrix of the objective function. At each iteration, quasi-Newton algorithms locally model f at the point xk using a quadratic approximation:
) (
) (
2 1 )
( ) ( )
( x f X k X X k T g k X X k T B k X X k
Q
(2.8)The procedure is iterated until the gradient is zero, with some degree of convergence tolerance. In order to optimize memory usage, the L-BFGS algorithm avoids storing the sequential approximations of the Hessian matrix.
Instead, L-BFGS stores curvature information from the last m iterations of the algorithm, and uses them to find the new search direction. More specifically, the algorithm stores information about the spatial displacement and the change in gradient, and uses them to estimate a search direction without storing or computing the Hessian explicitly. Explaining the application of the method,
51
Nash and Varadhan (2011) solved using the general-purpose optimization based on Nelder–Mead, quasi-Newton and conjugate-gradient algorithms the equation
with the form 100( ) (1 )
1 2 1
1 2
2 x
x x
y x .