The generalized reduced gradient method - Constrained optimization

Nonlinear programming: theory and algorithms

5.5 Constrained optimization

5.5.1 The generalized reduced gradient method

In this section, we introduce an approach for solving constrained nonlinear pro-grams. It builds on the method of steepest descent method we discussed in the context of unconstrained optimization. The idea is to reduce the number of vari-ables using the constraints and then to solve this reduced and unconstrained problem using the steepest descent method.

Linear equality constraints

First we consider an example where the constraints are linear equations.

Example 5.6

min f (x) = x₁²+ x2+ x₃²+ x4

g1(x)= x1+ x2+ 4x3+ 4x4− 4 = 0 g2(x)= −x1+ x2+ 2x3− 2x4+ 2 = 0.

It is easy to solve the constraint equations for two of the variables in terms of the others. Solving for x2and x3in terms of x1and x4gives

x₂= 3x1+ 8x4− 8 and x3= −x1− 3x4+ 3.

Substituting these expressions into the objective function yields the following re-duced problem:

min f (x1, x4)= x₁²+ (3x1+ 8x4− 8) + (−x1− 3x4+ 3)²+ x4.

This problem is unconstrained and therefore it can be solved using one of the methods presented in Section 5.4.

Nonlinear equality constraints

Now consider the possibility of approximating a problem where the constraints are nonlinear equations by a problem with linear equations. To see how this works, consider the following example, which is similar to the preceding one but has constraints that are nonlinear.

Example 5.7

min f (x) = x₁²+ x2+ x₃²+ x4

g1(x)= x₁²+ x2+ 4x3+ 4x4− 4 = 0 g2(x)= −x1+ x2+ 2x3− 2x₄²+ 2 = 0.

We use the Taylor series approximation to the constraint functions at the current point ¯x:

The idea of the generalized reduced gradient algorithm (GRG) is to solve a sequence of subproblems, each of which uses a linear approximation of the con-straints. In each iteration of the algorithm, the constraint linearization is recalculated at the point found from the previous iteration. Typically, even though the constraints are only approximated, the subproblems yield points that are progressively closer

5.5 Constrained optimization 105 to the optimal point. A property of the linearization is that, at the optimal point, the linearized problem has the same solution as the original problem.

The first step in applying GRG is to pick a starting point. Suppose that we start with x⁰= (0, −8, 3, 0), which happens to satisfy the original constraints.

It is possible to start from an infeasible point as we discuss later on. Using the approximation formulas derived earlier, we form our first approximation problem as follows:

min f (x) = x₁²+ x2+ x₃²+ x4

g1(x)= x2+ 4x3+ 4x4− 4 = 0 g2(x)= −x1+ x2+ 2x3+ 2 = 0.

Next we solve the equality constraints of the approximate problem to express two of the variables in terms of the others. Arbitrarily selecting x2and x3, we get

x2= 2x1+ 4x4− 8 and x3= −1

2x1− 2x4+ 3.

Substituting these expressions in the objective function yields the reduced problem min f (x1, x4)= x₁²+ (2x1+ 4x2− 8) +

−¹₂x1− 2x4+ 32

+ x4. Solving this unconstrained minimization problem yields x1= −0.375, x4= 0.96875. Substituting in the equations for x2 and x3 gives x2= −4.875 and x3= 1.25. Thus the first iteration of GRG has produced the new point x¹= (−0.375, −4.875, 1.25, 0.968 75).

To continue the solution process, we would re-linearize the constraint functions at the new point, use the resulting system of linear equations to express two of the variables in terms of the others, substitute into the objective to get the new reduced problem, solve the reduced problem for x², and so forth. Using the stopping criterion x^k⁺¹− x^k < T where T = 0.0025, we get the results summarized in Table 5.7.

This is to be compared with the optimum solution, which is x^∗ = (−0.500, −4.825, 1.534, 0.610)

and has an objective value of−1.612. Note that, in Table 5.7, the values of the function f (x^k) are sometimes smaller than the minimum value for k = 1, and 2.

How is this possible? The reason is that the points x^kcomputed by GRG are usually not feasible to the constraints. They are only feasible to a linear approximation of these constraints.

Now we discuss the method used by GRG for starting at an infeasible solution:

a phase 1 problem is solved to construct a feasible one. The objective function for the phase 1 problem is the sum of the absolute values of the violated constraints.

The constraints for the phase 1 problem are the nonviolated ones. Suppose we had

Table 5.7 Summarized results

x₁^k, x₂^k, x₃^k, x₄^k

f (x^k) x^k⁺¹− x^k 0 (0.000,−8.000, 3.000, 0.000) 1.000 3.729 1 (−0.375, −4.875, 1.250, 0.969) −2.203 0.572 2 (−0.423, −5.134, 1.619, 0.620) −1.714 0.353 3 (−0.458, −4.792, 1.537, 0.609) −1.610 0.022 4 (−0.478, −4.802, 1.534, 0.610) −1.611 0.015 5 (−0.488, −4.813, 1.534, 0.610) −1.612 0.008 6 (−0.494, −4.818, 1.534, 0.610) −1.612 0.004 7 (−0.497, −4.821, 1.534, 0.610) −1.612 0.002 8 (−0.498, −4.823, 1.534, 0.610) −1.612

started at the point x⁰= (1, 1, 0, 1) in our example. This point violates the first constraint but satisfies the second, so the phase 1 problem would be

min x₁²+ x2+ 4x3+ 4x4− 4

−x1+ x2+ 2x3− 2x₄²+ 2 = 0.

Once a feasible solution has been found by solving the phase 1 problem, the method illustrated above is used to find an optimal solution.

Linear inequality constraints

Finally, we discuss how GRG solves problems having inequality constraints as well as equalities. At each iteration, only the tight inequality constraints enter into the system of linear equations used for eliminating variables (these inequality constraints are said to be active). The process is complicated by the fact that active inequality constraints at the current point may need to be released in order to move to a better solution. We illustrate the ideas with the following example:

min f (x1, x2) =

x₁−¹₂2

x₂−⁵₂2

x₁− x2 ≥ 0 x₁ ≥ 0 x2 ≥ 0 x2≤ 2.

The feasible set of this problem is shown in Figure 5.5. The arrows in the fig-ure indicate the feasible half-spaces dictated by each constraint. Suppose that we start from x⁰= (1, 0). This point satisfies all the constraints. As can be seen from Figure 5.5, x1− x2≥ 0, x1≥ 0, and x2≤ 2, are inactive, whereas the constraint x2≥ 0 is active. We have to decide whether x2should stay at its lower bound or be allowed to leave its bound. We first evaluate the gradient of the objective function

5.5 Constrained optimization 107

0 1 2 3

x⁰ = (1, 0) x¹ = (0.833, 0.833)

x² = (1.5, 1.5)

Feasible region

x₁

GRG iterates

x₁≥ 0

x2 ≥ 0 x₂ ≤ 2

x₁− x₂≥ 0

Figure 5.5 Progress of the generalized reduced gradient algorithm at x⁰:

∇ f (x⁰)=

2x₁⁰− 1, 2x2⁰− 5

= (1, −5).

This indicates that we will get the largest decrease in f if we move in the direction d⁰= −∇ f (x⁰)= (−1, 5), i.e., if we decrease x1and increase x2. Since this direc-tion is towards the interior of the feasible region, we decide to release x2 from its bound. The new point will be x¹= x⁰+ α⁰d⁰, for someα⁰> 0. The constraints of the problem induce an upper bound onα⁰, namelyα⁰≤ 0.8333. Now we per-form a line search to determine the best value ofα⁰in this range. It turns out to be α⁰= 0.8333, so x¹= (0.8333, 0.8333); see Figure 5.5.

Now, we repeat the process: the constraint x1− x2≥ 0 has become active whereas the others are inactive. Since the active constraint is not a simple up-per or lower bound constraint, we introduce a surplus variable, say x3, and solve for one of the variables in terms of the others. Substituting x1= x2+ x3, we obtain the reduced optimization problem:

min f (x2, x3) =

x2+ x3−¹₂2

x2−⁵₂2

0≤ x2≤ 2 x₃≥ 0.

The reduced gradient is

∇ f (x2, x3)= (2x2+ 2x3− 1 + 2x2− 5, 2x2+ 2x3− 1)

= (−2.667, 0.667) at point (x2, x3)¹= (0.8333, 0).

Therefore, the largest decrease in f occurs in the direction (2.667, −0.667), that is when we increase x2 and decrease x3. But x3is already at its lower bound, so we cannot decrease it. Consequently, we keep x3at its bound, i.e., we move in the direction d¹= (2.667, 0) to a new point (x2, x3)²= (x2, x3)¹+ α¹d¹. A line search in this direction yieldsα¹= 0.25 and (x2, x3)²= (1.5, 0). The same constraints are still active so we may stay in the space of variables x2and x3. Since

∇ f (x2, x3)= (0, 2) at point (x2, x3)²= (1.5, 0)

is perpendicular to the boundary line at the current solution x²and points towards the exterior of the feasible region, no further decrease in f is possible. Therefore, we have found the optimal solution. In the space of original variables, this optimal solution is x1 = 1.5 and x2= 1.5.

This is how some of the most widely distributed nonlinear programming solvers, such as Excel’s SOLVER, GINO, CONOPT, GRG2, and several others, solve non-linear programs, with just a few additional details such as the Newton-Raphson direction for line search. Compared with linear programs, the problems that can be solved within a reasonable amount of computational time are typically smaller and the solutions produced may not be very accurate. Furthermore, the potential nonconvexity in the feasible set or in the objective function may generate local optimal solutions that are far from a global solution. Therefore, the interpretation of the output of a nonlinear program requires more care.

Exercise 5.15 Consider the following optimization problem:

min f (x1, x2)= −x1− x2− x1x2+¹₂x₁²+ x₂² s.t. x1+ x₂²≤ 3

x₁²− x2= 3 (x1, x2)≥ 0.

Find a solution to this problem using the generalized reduced gradient approach.

5.5.2 Sequential quadratic programming

In document Optimization Methods in Finance (Page 117-122)