Theoretical Considerations - Multidelity methods for multidisciplinary system design

2.6 Summary

3.1.7 Theoretical Considerations

This subsection discusses some performance limitations of the proposed algorithm as well as theoretical considerations needed for robustness.

This chapter has not presented a formal convergence theory; at best, such a theory will apply only under many restrictive assumptions. In addition, we can at best guarantee convergence to a near-optimal solution (i.e., optimality to an a priori tolerance level and not to stationarity). Forcing the trust region size to decrease every time the sufficient decrease condition is not satisfied means that if the projection of the gradient onto the feasible domain is less than the parameter a, then the algorithm can fail to make progress. This limitation is presented in (3.17); however, it is not seen as severe because a can be set at any value arbitrarily close to (but strictly greater than) zero.

A second limitation is that the model calibration strategy, generating a fully linear model, is theoretically only guaranteed to be possible for functions that are twice continuously differentiable and have Lipschitz-continuous first derivatives. Though this assumption may seem to limit some desired opportunities for derivative-free optimization of functions with noise, noise with certain characteristics like that discussed in [28, Section 9.3], or the case of models with dynamic accuracy as in [31, Section 10.6] can be accommodated in this framework. Approaches such as those in [80] may be used to characterize the noise in a specific problem. However, our algorithm applies even in the case of general noise, where no guarantees can be made on the quality of the surrogate models. As will be shown in the test problem in Section 3.3, in such a case our approach exhibits robustness and significant advantages over gradient-based methods that are susceptible to poor gradient estimates.

Algorithm 3.1 is more complicated than conventional gradient-based trust-region algo-rithms using quadratic penalty functions, such as the algorithm in [31, Algorithm 14.3.1].

There are three sources of added complexity. First, our algorithm increases the penalty parameter after each trust-region subproblem as opposed to after a completed minimization of the penalty function. This aspect can significantly reduce the number of required high-fidelity evaluations, but adds complexity to the algorithm. Second, our algorithm switches between two trust region subproblems to avoid numerical issues associated with the

condi-tioning of the quadratic penalty function Hessian. The third reason for added complexity is that in derivative-free optimization the size of the trust-region must decrease to zero in order to demonstrate optimality. This aspect serves to add unwanted coupling between the penalty parameter and the size of the trust region, which must be handled appropriately.

We now discuss how these two aspects of the algorithm place important constraints on the penalty parameter µ_k.

The requirements for the penalty parameter, µ_k, are that (i) lim_k→∞µ_k∆_k = ∞, (ii) lim_k→∞µ_k∆²_k = 0, and (iii)P∞

k=01/µ_k is finite. The lower bound for the growth of µ_k, that (i) lim_k→∞µ_k∆_k = ∞, comes from the properties of the minima of quadratic penalty func-tions presented in [31, Theorem 14.3.1], properties of a fully linear model, and (3.9). If x_k is at an approximate minimizer of the surrogate quadratic penalty function, (3.6), then a bound on the constraint violation is

kh(xk)^> g⁺(xk)^> k ≤ (3.18)

κ₁(max{α∆_k, a} + κ_g∆_k) + kλ(x^∗)k + κ₂kx_k− x^∗k

µ_k ,

where x^∗ is a KKT point of 3.1, and κ₁, κ₂ are finite positive constants. If {µ_k∆_k} diverges then an iteration exists where both the bound, (3.18), holds (the trust region size must be large enough that a feasible point exists in the interior) and the constraint violation is less than the given tolerance . This enables the switching between the two trust region subproblems, (3.8) and (3.7). The upper bound for the growth of µ_k, that

(ii) limk→∞µk∆²_k = 0, comes from the smoothness of the quadratic penalty function. To establish an upper bound for the Hessian 2-norm for the subproblem in (3.8) we compare the value of the merit function at a point Υ(xk+ p, µk) with its linearized prediction based on Υ(x_k, µ_k), ˜Υ(x_k+ p, µ_k). If κ_{f g} is the Lipschitz constant for ∇f_high(x), κ_cm is the maximum Lipschitz constant for the constraints, and κcgm is the maximum Lipschitz constant for a

constraint gradient, we can show that

kΥ(x_k+ p, µ_k) − ˜Υ(x_k+ p, µ_k)k ≤ (3.19)

κ_{f g} + µ_k κ_cmkh(x_k)^> g⁺(x_k)^> k + κ_cgmkA(x_k)kkpk + κ_cmκ_cgmkpk² kpk².

We have a similar result for ˆΥ(x, µ_k) by replacing κ_{f g} with the sum κ_{f g} + κ_g, using the definition of a fully linear model, and by bounding kpk by ∆_k. Therefore, we may show the error in a linearized prediction of the surrogate model will go to zero and that Lipschitz-type smoothness is ensured provided that the sequence {µ_k∆²_k} converges to zero regardless of the constraint violation.

The final requirement, that (iii)P∞

k=01/µ_kis finite, comes from the need for the size of the trust region to go to zero in gradient-free optimization. The sufficient decrease condition, that ρ_k = 0 unless ˆΥ(x_k, µ_k) − ˆΥ(x_k+ s_k, µ_k) ≥ a∆_k ensures that the trust region size decreases unless the change in the merit function Υ(xk, µk) − Υ(xk+ sk, µk) ≥ η0a∆k. This provides an upper bound on the total number of times in which the size of the trust region is kept constant or increased. We have assumed that the merit function is bounded from below and also that the trust-region iterates remain within a level-set L(x₀, µ₀) as defined by (3.2). We now consider the merit function written in an alternate form,

Υ(x_k, µ_k) = f_high(x_k) + µ_k

2 kh(x_k)^>, g⁺(x_k)^> k². (3.20) From (3.18), if µ_k is large enough such that the bound on the constraint violation is less than unity, we may use the bound on the constraint violation in (3.18) to show that an upper bound on the total remaining change in the merit function is

f_high(x_k) − min

x∈L(x0,µ0)f_high(x)+ (3.21)

[κ₁(max{α∆_k, a} + κ_g∆_k) + kλ(x^∗)k + κ₂kx_k− x^∗k]²

µ_k .

Each term in the numerator is bounded from above because ∆k is always bounded from

above by ∆_max, kλ(x^∗)k is bounded from above because x^∗ is a regular point, and kx_k− x^∗k is bounded from above because L(x₀, µ₀) is a compact set. Therefore, if the series {1/µ_k} has a finite sum, then the total remaining improvement in the merit function is finite. Accordingly, the sum of the series {∆_k} must be finite, and ∆_k → 0 as k → ∞. The prescribed sequence for {µ_k} in our algorithm satisfies these requirements for a broad range of problems.

3.2 Multifidelity Objective and Constraint

In document Multidelity methods for multidisciplinary system design (Page 83-86)