Numerical Essentials
7.4 Finite Difference
This section presents finite difference and the important role it plays in computational optimization. The first subsection introduces the fundamentals of finite difference and the second presents its pertinent accuracy issues. 7.4.1 Fundamentals of Finite Difference A particular class of optimization algorithms is known as gradient-based algorithms. This class of algorithms uses the gradient of the objective function and of the constraints in the search for the optimal solution. As discussed in Sec. 2.5.4, the gradient of a scalar-valued function is a vector of the partial derivatives of a function with respect to each of the design variables. Specifically, the gradient of a scalar function f (x), where x is an n- dimensional column vector, is given by
(7.35) The gradient vector is used not only to govern the search, but also to help decide when the optimum is reached and the search terminated. When the optimum is reached, we say that the optimality condition is satisfied.
Generally, the value of the objective function is evaluated using a complex computer code. As such, we do not have an explicit analytical expression for the objective function that can be used to evaluate its gradient. As a result, an adequate approximation is used
instead. This approximation is referred to as a finite difference derivative, as opposed to an analytical derivative [4]. The finite difference derivative of f(x) at a point x0 is given by
(7.36) with
(7.37) where Δxi is a small deviation of xi about xi0, and Δf0 is the corresponding variation in f(x). The quantity can be evaluated using three typical approaches: (i) forward difference, (ii) backward difference, or (iii) central difference. We express each as follows: Forward Difference (7.38) Backward Difference (7.39) Central Difference (7.40) Further, we note that the finite difference approximation entails an error that can be partially explained by the equation
(7.41) where ∝ (Δxi)2 is a term proportional to (Δx i)2 that is ignored by the finite difference approximation above, and HOT represents additional Higher Order Terms (proportional to (Δxi)nh ; n
h > 2) that are also ignored. The smaller the magnitude of Δxi, the more negligible
the ignored terms become and the more accurate the finite difference, at least in theory. However, as we will see later in this section, there is a limit to the acceptable smallness of Δxi in practice.
It is interesting to think of the three finite difference evaluation options, both in mathematical terms and in geometrical/graphical terms. Equations 7.38, 7.39, and 7.40
provide the expressions for the mathematical evaluations of the forward, backward, or central difference, respectively. Similarly, Figs. 7.2 (a), (b), and (c) provide the respective graphical interpretations of these finite differences, in the case where x is a scalar.
Figure 7.2. Graphical Representation of Finite Difference Approximation We make the following observations:
(i) The solid line represents the tangent line at the point x0. The slope of the tangent line is exactly the derivative of f (x) at the point x0.
(ii) The dashed line represents the so-called secant line. The slope of the secant line is equal to the finite difference value.
(iii) As Δx tends to zero, the secant line converges to the tangent line; and the finite difference (Eqs. 7.38, 7.39, or 7.40) converges to the gradient (Eq. 7.35). However, as we will see shortly, excessively small values of Δx pose some numerical difficulties. (iv) Number of Function Calls – Objective Function: When x is a scalar, the gradient is
also a scalar (or a one-dimensional vector). In this case, the forward, backward, and central difference evaluations require two function calls each (see Eqs. 7.38, 7.39, and
7.40). In the case where the vector x has dimension nx, then the finite difference approximation of the gradient requires nx + 1 function evaluations for forward and backward difference (i.e., one evaluation at x0, and nx evaluations obtained after deviating each of the nx entries of the vector x). Note that the case of central difference requires 2nx function evaluations. In this latter case, there is no evaluation at x0; we instead deviate each variable forward and backward (see Eq. 7.40). (v) Number of Function Calls – Constraints: When we have constraints, and we also use a finite difference approximation for the gradient in the constraints, it may lead to a large number of constraint functions evaluation. For example, if we have neq constraints and we use forward difference, the finite difference evaluations will require neq(nx + 1) constraint function evaluations.
(vi) The central difference option generally yields more accurate answers (see Fig. 7.2), but also requires more function evaluations as discussed above.
This discussion leads us to the all important topic of the accuracy of the finite difference approximation.
7.4.2 Accuracy of Finite Difference Approximation
The success of all gradient-based optimization algorithms, as their names suggest, strongly depends on the accuracy of the evaluated gradient vector. An important question at this point is: How accurate is this finite difference approximation? This is a critical question,
since gradient-based optimization is one of the most popular approaches in practice.
Assume that the function f (x) has nsda significant digits of accuracy. As a rule of thumb, the number of digits of accuracy of derivatives drops by half (nsda⁄2). For the second derivatives, it drops by another half (nsda⁄4). Please keep in mind that this is indeed a rule of thumb. In practical cases, the situation could be much worse or much better. The resulting finite difference values may become useless in the optimization algorithm, and result in serious convergence difficulties.
Optimizing with Experimental Data
An important practical situation of interest occurs when experimental data is used for the objective function or the constraint functions. This situation may have low accuracy (say six digits.) In this case, the first derivative may only have three digits of accuracy, and the second derivative might be practically unusable in an optimization algorithm.
In these cases, it may be useful to first develop so-called response surfaces of the resulting data. Once obtained, it might be more reliable to optimize using these surfaces, which are essentially a best-fit of the data available. At a basic level, the situation is straightforward; that is, (i) form a best-fit function of the data (make the best fit as smooth as possible) and (ii) optimize using this best fit function. This approach works quite well. The details of this topic are beyond the scope of this book. References [5] and [6] offer representative works in the area. The first is a fundamental book on response surface methodology. The second provides response surface information from the perspective of design of experiments, within one concise chapter.
How to Impact Finite Difference Accuracy
We can impact finite difference accuracy in three basic ways. Fortunately, most optimization codes perform well in promoting the maximum accuracy of the obtained results. However, there is much that we can also do. The three basic approaches are as follows.
(1) Adequate Scaling. Adequate scaling (as previously discussed) will address: (ii) the magnitude of the objective functions, (i) the magnitude of the design variables, (iii) the magnitude of the constraints, and (iv) the pertinent setting parameters of the optimization codes. When these issues are addressed, finite difference will tend to perform more effectively.
(2) Forward, Backward, or Central Difference: As previously discussed, forward and backward differences provide similar accuracies, while central difference provides greater accuracy. However, the central difference is more computationally intensive. Depending on the computer labor involved in evaluating the objective functions and constraints, we may decide to choose one option vs. another.
(3) The Magnitude of Δx: This last consideration is the most critical. Too small or too large a magnitude will result in excessive inaccuracies. Let us consider each scenario.
i Too large a magnitude of Δx makes the secant line too distinct from the tangent line. Their corresponding slopes become too dissimilar (i.e., the finite difference and gradient become too different). This situation is readily seen in Fig. 7.2.