Numerical Essentials
7.2 Numerical Conditioning—Algorithms, Matrices and Optimization Problems
We begin by asking the following questions: What is numerical conditioning? And how does it relate to optimization? As we have learned, optimization depends on the numerical evaluation of the performance of the system being optimized. This performance evaluation typically involves coding, simulation, or software-based mathematical analysis, often also involving matrix manipulations (e.g., involving Hessians). For example, linear programming involves extensive matrix manipulations. Therefore, understanding the numerical properties of the matrices and of the algorithms used in optimization codes is important. From a practical point of view, we can think of a numerically well-conditioned problem or matrix as one that lends itself to easy numerical computation. Conversely, we can think of a numerically ill-conditioned problem or matrix as one that lends itself to difficult numerical computation. A well-conditioned problem is also said to be well-posed. Numerical conditioning can also refer to the property of a particular algorithm, as you will see shortly. A well-conditioned algorithm is likely to converge relatively easily, while an ill-conditioned algorithm may converge after many more iterations or may not converge at all. In other words, numerical conditioning may refer to the facility with which an algorithm converges. Numerical conditioning may also have practical consequences on the quality or accuracy of the resulting solution.
The immediate questions that come to mind at this point are: How do we know if an algorithm, a matrix, or an optimization problem is well-conditioned or not? How do we quantify numerical conditioning? Well, there is some good news:
(i) Algorithms: Since this book is primarily concerned with the practical aspects of formulating and applying optimization, we will not be learning about how to make algorithms well-conditioned. This is an advanced topic of numerical computation. Fortunately, if we use reputed optimization codes, the algorithmic numerical conditioning properties are usually adequate.
(ii) Matrices: For our purpose, the numerical conditioning of a matrix is quantified by the condition number. For symmetric matrices (e.g., Hessians (Eq. 2.42)), the condition number is the square of the ratio of the highest to the lowest eigenvalues. The case of non- symmetric matrices involves singular values, which is an advanced topic that does not directly concern us. A condition number with an order-of-magnitude of one is desirable, while a much higher condition number is a concern. The numerical properties of matrices play an important role in optimization algorithms. Sometimes an optimization run that does not converge will report that the “Hessian is ill-conditioned” (see Sec. 2.5.4 for the definition the Hessian). The strong relevance of a function’s Hessian becomes fully evident in our study of the more advanced aspects of optimization presented in Part IV of this book; specifically, Chapters 12 and 13.
(iii) Optimization Problems: Posing our problems well is critically important. Two theoretically equivalent problems can have radically different numerical properties, as described later. Fortunately, we can generally deal with problem conditioning through proper scaling, which we will study in Sec. 7.3. 7.2.1 Reasons Why the Optimization Process Sometimes Fails There are many reasons why optimization runs sometimes fail to converge to an adequate solution. The following are the prevailing ones that you should keep in mind: 1. The problem has a coding bug – a software/programming error; 2. The problem is ill conditioned – poorly scaled; 3. The problem is incorrectly formulated (e.g., missing a constraint);
4. The problem posed does not reflect a physical design that is realistic (you can’t fool mother nature!);
5. The algorithm used is not appropriate, or not sufficiently robust, for the problem at hand.
In the event of non-convergence, or indicate convergence to a solution that is not to one’s liking, the above items should be explored – roughly in the order presented. Next, we briefly comment on each of the above items. (1) Regarding coding errors, one simply needs to employ the debugging strategy of personal choice. This issue concerns computer coding in general, and is not exclusive to optimization. (2) The problem of scaling is addressed in detail later in this chapter. (3) As far as the formulation of the optimization problem is concerned, the material in this book is of direct help, in particular, the previous
chapter on multiobjective optimization. Proper formulation is also an issue of common sense. Failing to include a constraint, for example, could yield a design that is not desirable. (4) The problem of seeking an unrealistic design is one that should be carefully examined. Often, relaxing the constraints will allow the search process to explore physically feasible designs. Another possible cause for unknowingly seeking an unrealistic design is the inappropriateness of the objective function (e.g., wrong weights in the weighted sum approach). As a final example, we could be trying to design a small table to support an elephant in a way that is impossible. All the modeling equations and optimization formulations issues might be seemingly fine, but we are simply asking for the impossible. (5) The final item presented concerns the appropriateness of the algorithm. For example: (i) the algorithm might not be sufficiently robust for problems of poor numerical conditioning; (ii) the algorithm might not be appropriate for problems of large dimensions; (iii) the algorithm might be limited to solving specific types of problems: continuous, discrete, or integer variables. (iv) the algorithm might not work well for noisy (i.e., non-smooth) objective functions or constraints.
Before we present the approaches to address numerical conditioning issues in optimization, it is important that we first learn about certain numerical problems that can occur independently of the optimization process. Specifically, we find that the matrices themselves can be problematic, and so can the way that they are used in a given algorithm. These two issues are addressed next.
7.2.2 Exposing Numerical Conditioning Issues—Algorithms and Matrices
Through simple examples, we illustrate how we must concern ourselves with matrices and algorithm issues, in addition to those directly related to optimization. We provide an example of how numerical conditioning issues can affect us in a seemingly simple case. This telling example will sensitize us to the critical nature of numerical issues. For the sake of simplicity of presentation, we only use a 3 × 3 matrix. We also use a numerical computation that is simple and readily understood. In practice, matrices are much larger and computations are much more complex. In spite of the simplicity of the present case, the numerical difficulties presented are quite serious.
Consider Matrix A, which depends on α, given by
A = (7.1)
The three eigenvalues of Matrix A can be evaluated as: 1⁄α, 1, and α. As a result, the condition number of A (the square of the ratio of the highest to the lowest eigenvalue since A is symmetric) is 1⁄α4 when α ≤ 1 and α4 when α ≥ 1. Therefore, we should expect that for a very low or a very high value of α, we may experience numerical difficulties, particularly when the algorithm within which it is being used is not well conditioned.
To explore the numerical properties of Matrix A, let n be any positive integer, and consider the expressions
(7.3) where I is the identity matrix. Using elementary linear algebra, we can indeed verify that both A1 and A2 are identically equal to 3 × 3 zero matrices. We can further write the scalar equations
(7.4) (7.5) where we use the maximum norm defined as M = max{|mij|}, with mij denoting the ij-th entry of Matrix M.
We make the important observation that the zero answers in Eqs. 7.4 and 7.5 are exact only from a theoretical standpoint. When we compute A1 and A2 using a computer, we immediately observe that the numerical results depart markedly from the theoretical answers. To illustrate this important point, we present Table 7.1, where the incorrect answers are in bold face. In this table, we vary the parameter α in Matrix A, as well as the power n in the A1 and A2 expressions.
Table 7.1. Ill-Conditioned Matrices and Algorithms
We further make three specific observations: (1) Even for high values of the condition number Cn, the quantity A1 is evaluated accurately. In fact, A1 is evaluated accurately for all the cases presented in Table 7.1. (2) Even for values of the condition number that are less than 100 (α = 0.4), the quantity A2 is unacceptably inaccurate, for n = 50. (3) Even though A1 and A2 are theoretically both equal to zero, the computation of A1 is more numerically robust than that of A2 . Finally, (4) in general, the stability of the algorithm and the condition numbers of the matrices involved can greatly impact the accuracy of the solutions obtained. For example, in optimization, how we pose a constraint (numerically) can impact the success of the optimization. Next, we expose the critical need for scaling in the following example.
We provide a simple example of an optimization problem for which proper scaling is essential. We begin by stating the general optimization problem formulation as follows. PROB-7.2-GOPF: General Optimization Problem Formulation (7.6) subject to (7.7) (7.8) (7.9) Next, we consider the following seemingly trivial optimization problem given by (7.10) subject to (7.11) (7.12) Using the techniques later presented in this chapter on scaling, we find that the solution to this problem is x = 10–6 ×{3.083, 1.541}. The important message here is that, without proper scaling, the solution produced by MATLAB could be deemed incorrect. Specifically, MATLAB converged to the solution x = 10–6 ×{4.948, 2.474}, which has strongly inaccurate values of x. (We note that different MATLAB settings may lead to different equally erroneous answers). The danger here is that there is no indication that we are dealing with an incorrect solution. This simple example points to the importance of using various strategies to increase our confidence in the solutions obtained by optimization codes. Indeed, one of the more important ways to increase confidence is to implement proper scaling as mentioned earlier in this section, and presented in the next section.
7.3 Scaling and Tolerances for Design Variables, Constraints and Objective