DYNAMIC PROGRAMMING .1 Introduction - Minimize the Absolute Value of the Deviation between Two

X At any iteration, let

4. Minimize the Absolute Value of the Deviation between Two Non-negative Variables Let the absolute value of deviation be equal to D (non-negative),

2.3 DYNAMIC PROGRAMMING .1 Introduction

Dynamic programming (DP) is ideally suited for sequential decision problems.

Sequential (or multistage) decision problems are those in which decisions are made sequentially, one after another, based on the state of the system. An example of a sequential decision problem in water resources is the reservoir operation problem, in which release decisions need to be made sequentially across time periods (e.g. months) based on the available water in storage in a period. In this section, we discuss the formulation and solution of DP prob-lems. Unlike linear programming problems, DP problems are not amenable to a single, standard algebraic formulation. Different types of sequential decision problems may need to be formulated differently considering specific features of the problems, and therefore, DP is said to be as much an art as it is a mathematical technique. In this book, we deal with only discrete DP, in which the variables are allowed to take only discrete values. The continuous state DP, which allows variables to take continuous values, has limited applications in water resources and therefore is not discussed here. The method of computa-tions of discrete DP is illustrated with examples of water allocation, reservoir operation, capacity expansion and shortest route problems.

A single-stage decision problem may be represented as in Fig. 2.9. In this representation, S is the input, X is the decision and T is the output. Because of decision X for input S, there is a return R which is, in general, a function of both S and X. The transformation of input S to output T is called the state transformation. In a serial multistage decision problem, the output T from one stage forms the input S to the next stage, and a decision is associated with each stage.

A typical serial multistage decision problem, consisting of n stages, may be represented as in Fig. 2.10 (e.g. Rao, 1999). S_n is the input to stage n, x_n is the decision taken at stage n, and R_n is the return at stage n, corresponding to the decision x_n for the input S_n. As a result of this decision, the input S_n gets transformed into output S_{n – 1}, that forms the input to the next stage n – 1. As an example of a multistage decision problem, consider the problem of water allocation among n users. Each user in such a problem defines one stage in the Fig. 2.9 A Single-stage Decision Problem

Rn Rn– 1

Xn Xn– 1

Sn Sn– 1 Sn– 2

R2 R1

X2 X1

S1 S0

Stagen Stagen– 1 Stage 2 Stage 1

Fig. 2.10 A Serial Multistage Decision Problem

multistage decision problem. The water allocated to a particular user, i, constitutes the decision x_I at that stage. Starting with a known amount of available water Q, the input to the last stage n, (i.e. S_n) will be equal to Q.

Based on the decision taken at the stage n, (i.e. the amount of water allocated to the user n), the amount of water available at stage n – 1, will be S_{n – 1} = S_n – x_n, which defines the state transformation. S_{n – 1} forms the input to stage n – 1 and is equal to the amount of water available to be allocated to all the remaining stages, including the stage n – 1. The return R_n may be the monetary returns obtained from allocating an amount of water x_n to user n. The allocation decision at any stage depends on the water available at that stage, after allocation to all previous stages.

The objective of a multistage decision problem as represented above, is to find the values of x₁, x₂, x₃, …, x_n to maximize a function of the returns (e.g. to maximize the sum of returns over all stages), while satisfying the state transfor-mation equation. The input S_i at any stage i is called the state variable. The state variable defines the state of the system at a particular stage. In the water allocation problem mentioned earlier, the state variable at a particular stage is the water available for allocation to all the remaining users. It may be noted that a multistage problem may consist of more than one state variable.

2.3.2 Solution of DP problems

The solution procedure for DP problems is based on Bellman’s principle of optimality, which states, “An optimal policy (a set of decisions) has the prop-erty that whatever the initial state and initial decisions are, the remaining deci-sions must constitute an optimal policy with respect to the state resulting from the initial decision.” This principle implies that, given the state S_i of the system at a stage i, one must proceed optimally till the last stage, irrespective of how one arrived at the state S_i. The solution procedure using this principle involves dividing the original problem with n decision variables into n subproblems, each with one decision variable. The subproblems are then solved recursively, one at each stage, either with a backward recursion or a forward recursion. The solution procedure is illustrated with a water allocation problem in the follow-ing paragraphs, by both backward and forward recursions.

In serial multistage decision making problems, such as those discussed in this section, the stage numbers may be assigned in increasing order either in the forward direction or in the backward direction, no matter which direction of recursion we use, forward or backward. We must, however, define the state variable, state transformation equation and the recursive relationship at a given

stage very precisely. The examples presented in this section illustrate different conventions that may be followed.

Backward Recursion

In backward recursion the computations proceed backwards, as illustrated in Fig. 2.11 for a three-user water allocation problem.

Fig. 2.11 Backward Recursion for a Three-satge Water Allocation Problem We define:

S₁: Amount of water available for allocation at Stage 1 (to User 3);

S₂: Amount of water available for allocation at Stage 2 (to Users 2 and 3 together);

S₃: Amount of water available for allocation at Stage 3 (to Users 1, 2, and 3 together);

f₁*(S₁): the maximum return due to allocation of S₁; f₂*(S₂): the maximum return due to allocation of S₂; f₃*(S₃): the maximum return due to allocation of S₃

Only one user, User 3, is considered in the first stage, and optimal allocation to that user is obtained for all possible values of the state variable, S₁ (the amount

of water available for allocation at Stage 1). The problem is written, for Stage 1, as,

f1^*(S₁) = max [R₃(x₃)]

0 £ x3 £ S1

0 £ S1£ Q

where f₁^*(S₁) is the maximum return at Stage 1 for given S₁, and R₃(x₃) is the return from allocating an amount of water x₃ to User 3. The bound for x₃ indicates that the search for the optimal value is carried out over all values that x₃ can take, viz. all values less than or equal to S₁, the water available for allocation at Stage 1. S₁ itself may take values less than or equal to Q, the total amount of water available for allocation to all users.

In the second stage, two users are considered together, as shown in Fig. 2.11, at Stage 2. The state variable S₂ is the amount of water available to be allocated to User 2 and User 3 together. The problem is solved for all possible values of S₂, and the optimal value is obtained as

f^*2(S₂) = max [R₂(x₂) + f₁^*(S₂– x₂)]

0 £ x2£ S2

0 £ S₂£ Q

where f ₂^*(S₂) is the maximum return at Stage 2 for given S₂, and R₂(x₂) is the return from allocating an amount of water x₂ to User 2. Note that the water available for allocation to User 3 (Stage 1) is S₂– x₂, when an allocation of x₂ is made to User 2 out of the quantity, S₂, available for allocation at Stage 2. For this availability (S₂ – x₂) for User 3, the maximum return is f₁^*(S₂ – x₂), which is obtained from the Stage 1 computations.

In the third (last) stage, all the three users are included in the optimization.

The state variable, S₃, is the amount of water available for allocation to all the three users, which is Q. The problem is solved for this value of S₃, and the optimal value is obtained as

f3^*(S₃) = max [R₁(x₁) + f ₂^*(S₃– x₁)]

0 £ x1£ S3

S₃ = Q

where f ₃^*(S₃) is the maximum return at Stage 3 for given S₃, and R₁(x₁) is the return from allocating an amount of water x₁ to User 1. When the problem is solved for the last stage, the entire amount, Q, is optimally allocated among the three users. The individual allocations are then obtained by tracing back the solution. This general procedure may be adopted for any resource allocation problem that has n users. In backward recursion, the computations start with the last user, User n, and proceed backwards stagewise, up to User 1, adding one user at every stage to the number of users included in the previous stage computations. At every stage in the backward recursion, we look forward to obtain optimal solution from the other stages that are already solved.

For discrete DP problems, the solution is conveniently obtained through a tabular procedure, as illustrated for a simple application of the water allocation problem.

Example 2.3.1 Water Allocation Problem

A total of 6 units of water is to be allocated optimally to three users. The allocation is made in discrete steps of one unit ranging from 0 to 6. With the three users denoted as User 1, User 2 and User 3 respectively, the returns obtained from the users for a given allocation are given in the following table.

Returns from Water Allocation

Amount of Water Return from

Allocated User 1 User 2 User 3

x R₁(x) R₂(x) R₃(x)

0 0 0 0

1 5 5 7

2 8 6 12

3 9 3 15

4 8 – 4 16

5 5 –15 15

6 0 –30 12

The problem is to find allocations to the three users such that the total return is maximized. The following steps show the computations of backward recursion.

Stage 1

S₁ : Amount of water available for allocation to User 3 x₃ : Amount of water allocated to User 3

x3 : Allocation to User 3, that results in f₁^*(S₁) f1^*(S₁) : Maximum return due to allocation of S₁ f1^*(S₁) = max [R₃(x₃)]

0 £ x3 £ S1

0 £ S1 £ 6

S₁ x₃ R₃(x₃) f₁^*(S₁) = x₃* max [R₃(x₃)]

0 0 0 0 0

0 0

1 1 7 7 1

0 0

2 1 7 12 2

2 12

(Contd)

S₂ : Amount of water available for allocation to Users 2 and 3 together.

x₂ : Amount of water allocated to User 2

S₂ – x₂ : Amount of water available for allocation at Stage 1 (to User 3)

(Contd)

S₃ : Amount of water available for allocation to Users 1, 2 and 3 together = 6 units

x₁ : Amount of water allocated to User 1

S₃ – x₁: Amount of water available for allocation at Stage 2 (Users 2 and

We define, for forward recursion:

S₁ : Amount of water available for allocation at stage 1 (to User 1);

S₂ : Amount of water available for allocation at stage 2 (to Users 1 and 2 together);

S₃ : Amount of water available for allocation at Stage 3 (to Users 1, 2, and 3 together);

S₁ : Amount of water available for allocation at stage 1 (User 1) x₁ : Amount of water allocated to User 1

x1 : Allocation to User 1 that results in f₁^*(S₁)

When the third (last) stage is solved, all the three users are considered for allocation. Thus the total maximum return is, f₃^*(6) = 28. The allocations to individual users are traced back as follows: From the table for Stage 3, we getx₁^* = 2. From this, the water available at Stage 2 is obtained as S₂ = Q –x₃^*

Maximum returns resulting from the allocations = 28.

Forward Recursion We will illustrate the forward recursion with the same example of water allocation.

Example 2.3.2 In forward recursion the solution starts from the first user, User 1, and proceeds in the forward direction to the third user, User 3. At every stage in the forward recursion, we look backward to obtain optimal solutions from the other stages that have already been solved.

User

S₁ x₁ R₁(x₁) f₁^*(S₁) = max [R₁(x₁)] x₁^*

S₂ : Amount of water available for allocation to Users 1 and 2 together.

x₂ : Amount of water allocated to User 2.

(Contd)

S₃ : Amount of water available for allocation to Users 1, 2 and 3 together = 6 units

x₃ : Amount of water allocated to User 3.

Trace back: x₃^* = 3

S2= S₃– x₃ = 6 – 3 = 3

\ x^*₂ = 1 … from Stage 2 table

S3 = S₂ – x₂ = 3 – 1 = 2

\ x1^* = 2 … from Stage 1 table Optimal allocation x₁^* = 2 units

x2 = 1 unit

x3 = 3 units Maximum return = 28 units.

These results are the same as those obtained by backward recursion.

The Bellman’s principle of optimality may be interpreted for backward re-cursion as: “No matter what state of what stage one may be in, in order for a policy to be optimum, one must proceed from that state and stage in an optimal manner,” and for forward recursion as: “No matter what state of what stage one may be in, in order for a policy to be optimum, one had to get to that state and stage in an optimal manner” (Loucks et al., 1981).

2.3.3 Characteristics of a DP problem

l A single n-variable problem is divided into n number of single variable problems. This requires that the objective function of the optimization problem be separable with respect to stages. For example, the function, R₁(X₁) + R₂(X₂) + … + R_n(X_n) is separable, because we can identify exactly one variable in the objective function associated with each stage.

Similarly the function, X₁.X₂.X₃…X_n is also separable. However, the func-tion, X₁X₂ + X₂X₃ + X₃X₄… is not separable and problems with such objective functions cannot be formulated as DP problems.

l A problem is divided into stages, with a policy decision required at each stage. In the water allocation problem, each user constitutes a stage, and the amount of water allocated at a stage constitutes a policy decision.

l Each stage has a number of possible states associated with it. In the water allocation problem, the amount of water available for allocation at a stage defines the state at that stage.

l The policy decision transforms the current state into a state associated with the next stage.

l A recursive relationship identifies the optimal decision at a given stage for a specific state, given the optimal decision for each state at the previous stage.

l A solution moves backward (or forward), stage by stage, till optimal deci-sion for the last stage is found. From this solution, the optimal decideci-sions for other stages are traced back.

Within this broad framework of discrete DP problems, we shall now discuss solutions for three specific problems in water resources decision making:

(a) Shortest Route Problem, (b) Reservoir Operation Problem, and (c) Capacity Expansion Problem.

2.3.4 Example Applications of DP Shortest Route Problem

Consider the problem of determining the shortest route for a pipeline from among various possible routes available from destination to source. Figure 2.12 shows the network of possible routes connecting several nodes, along with the distance between two nodes, shown along the arrows, on a pipe route. Starting from the source node A, the pipeline must run through one of the nodes B₁, B₂ or B₃, one of the nodes, C₁, C₂, or C₃, and one of the nodes, D₁, D₂ or D₃, before reaching the destination node E. The problem is to determine the shortest route between node A and node E. We may pose this problem as a multistage, sequential decision problem and solve the problem with Dynamic Programming. To do this, we define stages as shown in the figure. Starting from the source node, A, we can reach one of the nodes B₁, B₂ or B₃, in Stage 1 and one of the nodes C₁, C₂ or C₃ in Stage 2, and one of the nodes D₁, D₂ or D₃ in Stage 3, and reach the destination node E in Stage 4. Using forward recursion, we define the state of the system, S_n, at a stage n (n = 1, 2, 3, 4) as the node reached in that stage, and the decision variable as the node x_n in the previous stage, n – 1, from which the node S_n is reached. We look for that node x_n, out of all possible nodes x_n, which results in the shortest distance from the source to the node S_n. We denote this node x_n as x^*_n. In stage n, we answer the question, “if we are in node S_n, then which node in the previous stage n – 1 we must have come from so that the total distance up to the node S_n, starting from the source node A is minimum?”

Fig. 2.12 Network of Routes in the Shortest Route Problem

In Stage 1, the possible states, S₁, are B₁, B₂ and B₃. There is only one node from which any of these nodes could be reached, and that is the source node A itself, so that the decision variable, x₁, in this case is the source node A. Being the only node from which any of the nodes, B₁, B₂ and B₃ in Stage 1 may be reached, the node A also is x₁^*.

We proceed in a similar way to the other stages until Stage 4, and then trace back the solution to obtain the shortest route. The stagewise calculations are as follows:

Stage 1

S₁ : Node in Stage 1

x₁ : Node from which S₁ is reached d (x₁, S₁) : Distance between nodes x₁ and S₁

f1(S₁) : Minimum distance from source node A to node S₁

= min [d (x₁, S₁)]

S₁ x₁= A f^*₁(S₁) x₁^* d(x₁,S₁)

B₁ 10 10 A

B₂ 15 15 A

B₃ 12 12 A

Stage 2

S₂ : Node in Stage 2

x₂ : Node in Stage 1 from which S₂ is reached d (x₂, S₂) : Distance between nodes x₂ and S₂

f 2(S₂) : Minimum distance from source node A to node S₂

= min [d (x₂, S₂) + f^*₁(x₂)]

S₂ x₂ d (x₂, S₂) f^*₁(x₂) d (x₂, S₂) + f₁^*(x₂) f^*₂(S₂) x^*₂

C₁ B₁ 8 10 18

B₂ 9 15 24 18 B₁

B₃ 7 12 19

C₂ B₁ 12 10 22

B₂ 11 15 26 22 B₁

B₃ 15 12 27

C₃ B₁ 19 10 29

B₂ 13 15 28 26 B₃

B₃ 14 12 26

Stage 3

S₃ : Node in Stage 3

x₃ : Node in Stage 2 from which S₃ is reached d (x₃, S₃) : Distance between nodes x₃ and S₃

f 3(S₃) : Minimum distance from source node A to node S₃

= min [d (x₃, S₃) + f^*₂(x₃)]

S₃ x₃ d (x₃, S₃) f^*₂(x₃) d (x₃, S₃) + f^*₂(x₃) f^*₃(S₃) x^*₃

D₁ C₁ 8 18 26

C₂ 9 22 29 26 C₁

C₃ 7 26 33

(Contd)

S₃ x₃ d (x₃, S₃) f^*₂(x₃) d (x₃, S₃) + f^*₂(x₃) f^*₃(S₃) x^*₃

D₂ C₁ 12 18 30

C₂ 11 22 33 30 C₁

C₃ 15 26 41

D₃ C₁ 19 18 37

C₂ 13 22 35 35 C₃

C₃ 14 26 40

In this table, values for f^*₂(x₃) are read from the previous stage (Stage 2) table, for a given value of S₂ = x₃.

Stage 4

S₄ : Node in Stage 4 = E₁

x₄ : Node in Stage 3 from which S₄ is reached d (x₄, S₄) : Distance between x₄ and S₄

f 4(S₄) : Minimum distance from source node A to node S₄

= min [d (x₄, S₄) + f^*₃(x₄)]

S₄ x₄ d (x₄, S₄) f^*₃(x₄) d (x₄, S₄) + f^*₃(x₄) f^*₄(S₄) x^*₄

E D₁ 9 26 35

D₂ 6 30 36 35 D₁

D₃ 12 35 47

The optimal route is traced back from the last table as follows: The table shows that to reach the destination node E, we must come from the node,

x4 = D₁ in Stage 3. We enter the table for Stage 3 with S₃ (node to be reached in Stage 3) = D₁ to obtain the node in Stage 2 from which we should have reached this node in Stage 3, as x₃^* = C₁. Similarly from Stage 2 table, we get

x2 as B₁ and from Stage 1 table, x₁^* = A. The shortest route is thus, A–B₁–C₁– D₁–E, and the shortest distance is 35 units, which is the optimized objective function value, f^*₄(S₄), in the last stage.

Note that the state transformation relation for this problem is defined by the network diagram, and may be expressed as, S_{n – 1} = x_n.

Reservoir Operation Problem

A classical multistage decision problem in the area of water resources manage-ment is the reservoir operation problem. Simply stated, the problem is to specify release R_t during a time period t (such as a month, a season, etc), when the storage, S_t at the beginning of the period t, and the inflow, Q_t during the period t are known. The sequence of releases, {R_t}, in a year, is called the Reservoir Release Policy for the year. The release policy is obtained to optimize (mini-mize or maxi(mini-mize) a system performance measure (e.g. hydropower produced during the year, annual deviations of irrigation allocations from targets etc), which is, in general, a function of the storage, S_t and the release R_t. Many variations of this simple problem, with varying degrees of complexities, are discussed in Chapters 5 to 9.

The operational objective may be to maximize the total net benefit during a year, which is stated as

Maximize

( , )

t t t

B S R

å

where B_t(S_t, R_t) is the net benefit during period t for given values of S_t and R_t, and T is the number of periods in the year. The problem is illustrated here for derivation of reservoir operating policy when only the current year is of inter-est, with the initial storage, S₁, specified. Derivation of stationary policy using DP is discussed in Chapter 5.

The reservoir operation problem may be solved as a multistage decision problem using DP. The time period t for which the release decision R_t is to be made defines a stage in the DP. The storage, S_t, at the beginning of the period t is the state variable and R_t, the release during the period t is the decision variable. Further, only discrete values, (such as 0, 10, 20, …) are considered for storage, inflow and release.

The storage, S_t at the beginning of the period t changes to storage S_{t + 1} at the beginning of period t + 1 (or storage at the end of period t) because of the inflow Q_t and the release R_t (see Figure 2.13). This transformation is governed by the reservoir mass balance, or the storage continuity equation, which is written, neglecting all losses, as

S_{t + 1}= S_t + Q_t – R_t

The reservoir storage must also satisfy the capacity constraint, S_t£ K, where K is the live storage capacity of the reservoir.

St+ 1

S_t Qt

Storage Inflow Release

Fig. 2.13 Reservoir Operation Problem

Starting with the last period, T, in the year, we solve this problem with backward recursion. Letting n denote the stage in the DP, we define f_tⁿ(S_t) as the maximized net benefits up to and including the period t. Note that the period t corresponds to stage n in this notation. The index n keeps track of the

stage in the DP and the index t denotes the time period within the year. The feasible values for the release R_T, over which the search is made. The first one restricts the release to the total water available in storage in period t, and the second one ensures that the end of period storage is restricted to the live storage capacity. storage at the end of the period T – 1, which is also the storage at the beginning

In document Water Resources Systems - S Vedula and P P Mujumdar.pdf (Page 70-98)