• No results found

Optimization Algorithms and Convergence

We return to optimization problem (1.1) and briefly discuss the choice of optimization algorithms to use in conjunction with the multifidelity estimator. Given the vector of design variables xk at optimization iteration k, we apply Algorithm 2.1 to compute ˆ

sA,p(xk) and evaluate the functions ˆf (xk) = f (xk, ˆsA,p(xk)), ˆg(xk) = g(xk, ˆsA,p(xk)), and ˆh(xk) = h(xk, ˆsA,p(xk)). Due to the pseudo-randomness of Monte Carlo sampling, the objective and constraint values returned to the optimizer, ˆf (xk), ˆg(xk), and ˆh(xk), are noisy with respect to the exact objective and constraint values f (xk, sA(xk)), g(xk, sA(xk)), and h(xk, sA(xk)) and the optimization problem becomes a stochastic optimization problem. While the level of noise is controlled by the specified root mean square error (RMSE) tolerance in Algorithm 2.1, it nevertheless poses a challenge for any optimization algorithm that is not noise tolerant. Thus, we consider three classes of optimization algorithms: stochastic approximation, sample average approximation, and derivative-free optimization.

2.3.1 Stochastic Approximation

The stochastic approximation method (also known as the Robbins-Monro method) [47] is designed to find at least a local solution to the unconstrained minimization

x = arg min

x

f (x, sA(x)),

assuming the objective function is bounded from below, using only noisy approxi-mations of the objective. Motivated by the steepest descent method, the algorithm

generates a new vector of design variables at optimization iteration k as

xk+1 = xk− λkxf (xˆ k)

starting from an initial vector of design variables x0. The parameters λk, k = 0, 1, 2, . . . is a prescribed sequence of step lengths and ∇xf (xˆ k) is the gradient

II. Search direction: For some symmetric, positive definite matrix H and every 0 < ξ < 1, infξ<kx−xk<1/ξ(x − x)>H∇xf (x, sA(x)) > 0.

III. Mean-zero noise:3 E

h∇xf (x) − ∇ˆ xf (x, sA(x))i

Using the theoretically optimal step lengths λk = k+1λ for some positive constant λ, the asymptotic rate of convergence is Ok−1/2.

The stochastic approximation method requires estimates of the objective gradient.

The expectation operator and the gradient operator can be interchanged, for exam-ple, ∇xE [M (x, U(ω))] = E [∇xM (x, U(ω))], as long as the function and its gradient are continuous and are bounded above and below [47]. Thus, we can apply Algo-rithm 2.1 to the gradient output of the high-fidelity model and the gradient output of the low-fidelity model and obtain the multifidelity gradient estimator. If gradient output is unavailable, variations on the stochastic approximation method such as the simultaneous perturbation stochastic approximation method construct the gradient estimator using only noisy function evaluations [46]. The basic method can also be extended to constrained optimization by projecting the vectors of design variables it-erates into the feasible region. However, the projection technique is only practical for

3This may not be satisfied for some objective functions, but there exists an alternative set of conditions that allow for some bias [47].

simple constraints such as variable bounds and cannot, in general, handle nonlinear (potentially also noisy) constraints.

2.3.2 Sample Average Approximation

In the sample average approximation method (also known as the sample path method) [47], the same realizations {ui}ni=1 of the random input vectors U(ω) are used to com-pute the estimators for all optimization iterations. This effectively turns ˆf (x), ˆg(x), and ˆh(x) into deterministic functions of x, permitting a wide range of deterministic constrained nonlinear programming techniques to solve optimization problem (1.1).

However, since ˆf (x), ˆg(x), and ˆh(x) are approximations to f (x, sA(x)), g(x, sA(x)), and h(x, sA(x)), respectively, the solution of the deterministic problem using these n realizations of the random input vectors, denoted as x(n), does not, in general, coin-cide with true solution x of (1.1). Nevertheless, as long as the function is bounded above and below, x(n) → x as n → ∞ at an asymptotic rate of On−1/2 [47]. For finite sample size n, a confidence bound on the optimality gap can be computed to assess the quality of the solution [31].

In order to use the multifidelity estimator with the sample average approximation method, n, r, and m = rn must remain fixed for all optimization iterations so that the same samples {ui}mi=1 can be used to compute ¯bm and the same subset of the samples {ui}ni=1 can be used to compute ¯an and ¯bn. Once n is chosen, a practical choice is to fix the value of r at the optimal r for the multifidelity estimator at the first optimization iteration. However, it may no longer be optimal for the multifidelity estimators at subsequent optimization iterations.

2.3.3 Derivative-Free Optimization

If it is not possible to fix the realizations of the random input vector U(ω) for all optimization iterations, derivative-free optimization methods may be used to solve the noisy (stochastic) optimization problem. Although derivative-free optimization methods are not designed specifically for stochastic optimization problems, they are

typically tolerant to small levels of noise in practice [8]. Examples of derivative-free optimization algorithms include mesh adaptive direct search (MADS) [2], implicit fil-tering [23], derivative-free optimization (DFO) [7], bound optimization by quadratic approximation (BOBYQA) [42], and constrained optimization by linear approxima-tion (COBYLA) [41]. Most of these methods sample the objective and constraint functions relatively widely in the design space to determine the search direction, which has the effect of smoothing out the high-frequency noise in the function evalu-ations provided that the magnitude of the noise is small relative to the true function values. In typical practice, this allows the optimizer to find a solution that is close to the true optimum x of (1.1). However, there is no guarantee that the derivative-free optimization methods will actually converge to x.

The noise tolerance of these methods can be improved by accounting for the mag-nitude of the noise when comparing objective function values to accept or reject can-didate vectors of design variables. The dynamic accuracy framework [6] uses bounds in the comparison of objective function values in order to specify the acceptable noise level that enables progress in the optimization. A challenge with this approach is that the conservative bounds may result in acceptable noise levels that decrease rapidly from one optimization iteration to the next, requiring increasingly large amounts of computational effort to compute the estimators ˆf (xk), ˆg(xk), and ˆh(xk).