X At any iteration, let
4. Minimize the Absolute Value of the Deviation between Two Non-negative Variables Let the absolute value of deviation be equal to D (non-negative),
2.3 DYNAMIC PROGRAMMING .1 Introduction
Dynamic programming (DP) is ideally suited for sequential decision problems.
Sequential (or multistage) decision problems are those in which decisions are made sequentially, one after another, based on the state of the system. An example of a sequential decision problem in water resources is the reservoir operation problem, in which release decisions need to be made sequentially across time periods (e.g. months) based on the available water in storage in a period. In this section, we discuss the formulation and solution of DP prob-lems. Unlike linear programming problems, DP problems are not amenable to a single, standard algebraic formulation. Different types of sequential decision problems may need to be formulated differently considering specific features of the problems, and therefore, DP is said to be as much an art as it is a mathematical technique. In this book, we deal with only discrete DP, in which the variables are allowed to take only discrete values. The continuous state DP, which allows variables to take continuous values, has limited applications in water resources and therefore is not discussed here. The method of computa-tions of discrete DP is illustrated with examples of water allocation, reservoir operation, capacity expansion and shortest route problems.
A single-stage decision problem may be represented as in Fig. 2.9. In this representation, S is the input, X is the decision and T is the output. Because of decision X for input S, there is a return R which is, in general, a function of both S and X. The transformation of input S to output T is called the state transformation. In a serial multistage decision problem, the output T from one stage forms the input S to the next stage, and a decision is associated with each stage.
A typical serial multistage decision problem, consisting of n stages, may be represented as in Fig. 2.10 (e.g. Rao, 1999). Sn is the input to stage n, xn is the decision taken at stage n, and Rn is the return at stage n, corresponding to the decision xn for the input Sn. As a result of this decision, the input Sn gets transformed into output Sn – 1, that forms the input to the next stage n – 1. As an example of a multistage decision problem, consider the problem of water allocation among n users. Each user in such a problem defines one stage in the Fig. 2.9 A Single-stage Decision Problem
Rn Rn– 1
Xn Xn– 1
Sn Sn– 1 Sn– 2
R2 R1
X2 X1
S1 S0
Stagen Stagen– 1 Stage 2 Stage 1
Fig. 2.10 A Serial Multistage Decision Problem
multistage decision problem. The water allocated to a particular user, i, constitutes the decision xI at that stage. Starting with a known amount of available water Q, the input to the last stage n, (i.e. Sn) will be equal to Q.
Based on the decision taken at the stage n, (i.e. the amount of water allocated to the user n), the amount of water available at stage n – 1, will be Sn – 1 = Sn – xn, which defines the state transformation. Sn – 1 forms the input to stage n – 1 and is equal to the amount of water available to be allocated to all the remaining stages, including the stage n – 1. The return Rn may be the monetary returns obtained from allocating an amount of water xn to user n. The allocation decision at any stage depends on the water available at that stage, after allocation to all previous stages.
The objective of a multistage decision problem as represented above, is to find the values of x1, x2, x3, …, xn to maximize a function of the returns (e.g. to maximize the sum of returns over all stages), while satisfying the state transfor-mation equation. The input Si at any stage i is called the state variable. The state variable defines the state of the system at a particular stage. In the water allocation problem mentioned earlier, the state variable at a particular stage is the water available for allocation to all the remaining users. It may be noted that a multistage problem may consist of more than one state variable.
2.3.2 Solution of DP problems
The solution procedure for DP problems is based on Bellman’s principle of optimality, which states, “An optimal policy (a set of decisions) has the prop-erty that whatever the initial state and initial decisions are, the remaining deci-sions must constitute an optimal policy with respect to the state resulting from the initial decision.” This principle implies that, given the state Si of the system at a stage i, one must proceed optimally till the last stage, irrespective of how one arrived at the state Si. The solution procedure using this principle involves dividing the original problem with n decision variables into n subproblems, each with one decision variable. The subproblems are then solved recursively, one at each stage, either with a backward recursion or a forward recursion. The solution procedure is illustrated with a water allocation problem in the follow-ing paragraphs, by both backward and forward recursions.
In serial multistage decision making problems, such as those discussed in this section, the stage numbers may be assigned in increasing order either in the forward direction or in the backward direction, no matter which direction of recursion we use, forward or backward. We must, however, define the state variable, state transformation equation and the recursive relationship at a given
stage very precisely. The examples presented in this section illustrate different conventions that may be followed.
Backward Recursion
In backward recursion the computations proceed backwards, as illustrated in Fig. 2.11 for a three-user water allocation problem.
Fig. 2.11 Backward Recursion for a Three-satge Water Allocation Problem We define:
S1: Amount of water available for allocation at Stage 1 (to User 3);
S2: Amount of water available for allocation at Stage 2 (to Users 2 and 3 together);
S3: Amount of water available for allocation at Stage 3 (to Users 1, 2, and 3 together);
f1*(S1): the maximum return due to allocation of S1; f2*(S2): the maximum return due to allocation of S2; f3*(S3): the maximum return due to allocation of S3
Only one user, User 3, is considered in the first stage, and optimal allocation to that user is obtained for all possible values of the state variable, S1 (the amount
of water available for allocation at Stage 1). The problem is written, for Stage 1, as,
f1*(S1) = max [R3(x3)]
0 £ x3 £ S1
0 £ S1£ Q
where f1*(S1) is the maximum return at Stage 1 for given S1, and R3(x3) is the return from allocating an amount of water x3 to User 3. The bound for x3 indicates that the search for the optimal value is carried out over all values that x3 can take, viz. all values less than or equal to S1, the water available for allocation at Stage 1. S1 itself may take values less than or equal to Q, the total amount of water available for allocation to all users.
In the second stage, two users are considered together, as shown in Fig. 2.11, at Stage 2. The state variable S2 is the amount of water available to be allocated to User 2 and User 3 together. The problem is solved for all possible values of S2, and the optimal value is obtained as
f*2(S2) = max [R2(x2) + f1*(S2 – x2)]
0 £ x2£ S2
0 £ S2 £ Q
where f 2*(S2) is the maximum return at Stage 2 for given S2, and R2(x2) is the return from allocating an amount of water x2 to User 2. Note that the water available for allocation to User 3 (Stage 1) is S2 – x2, when an allocation of x2 is made to User 2 out of the quantity, S2, available for allocation at Stage 2. For this availability (S2 – x2) for User 3, the maximum return is f1*(S2 – x2), which is obtained from the Stage 1 computations.
In the third (last) stage, all the three users are included in the optimization.
The state variable, S3, is the amount of water available for allocation to all the three users, which is Q. The problem is solved for this value of S3, and the optimal value is obtained as
f3*(S3) = max [R1(x1) + f 2*(S3 – x1)]
0 £ x1£ S3
S3 = Q
where f 3*(S3) is the maximum return at Stage 3 for given S3, and R1(x1) is the return from allocating an amount of water x1 to User 1. When the problem is solved for the last stage, the entire amount, Q, is optimally allocated among the three users. The individual allocations are then obtained by tracing back the solution. This general procedure may be adopted for any resource allocation problem that has n users. In backward recursion, the computations start with the last user, User n, and proceed backwards stagewise, up to User 1, adding one user at every stage to the number of users included in the previous stage computations. At every stage in the backward recursion, we look forward to obtain optimal solution from the other stages that are already solved.
For discrete DP problems, the solution is conveniently obtained through a tabular procedure, as illustrated for a simple application of the water allocation problem.
Example 2.3.1 Water Allocation Problem
A total of 6 units of water is to be allocated optimally to three users. The allocation is made in discrete steps of one unit ranging from 0 to 6. With the three users denoted as User 1, User 2 and User 3 respectively, the returns obtained from the users for a given allocation are given in the following table.
Returns from Water Allocation
Amount of Water Return from
Allocated User 1 User 2 User 3
x R1(x) R2(x) R3(x)
0 0 0 0
1 5 5 7
2 8 6 12
3 9 3 15
4 8 – 4 16
5 5 –15 15
6 0 –30 12
The problem is to find allocations to the three users such that the total return is maximized. The following steps show the computations of backward recursion.
Stage 1
S1 : Amount of water available for allocation to User 3 x3 : Amount of water allocated to User 3
*
x3 : Allocation to User 3, that results in f1*(S1) f1*(S1) : Maximum return due to allocation of S1 f1*(S1) = max [R3(x3)]
0 £ x3 £ S1
0 £ S1 £ 6
S1 x3 R3(x3) f1*(S1) = x3* max [R3(x3)]
0 0 0 0 0
0 0
1 1 7 7 1
0 0
2 1 7 12 2
2 12
(Contd)
(Contd)
S2 : Amount of water available for allocation to Users 2 and 3 together.
x2 : Amount of water allocated to User 2
S2 – x2 : Amount of water available for allocation at Stage 1 (to User 3)
(Contd)
S3 : Amount of water available for allocation to Users 1, 2 and 3 together = 6 units
x1 : Amount of water allocated to User 1
S3 – x1: Amount of water available for allocation at Stage 2 (Users 2 and
We define, for forward recursion:
S1 : Amount of water available for allocation at stage 1 (to User 1);
S2 : Amount of water available for allocation at stage 2 (to Users 1 and 2 together);
S3 : Amount of water available for allocation at Stage 3 (to Users 1, 2, and 3 together);
S1 : Amount of water available for allocation at stage 1 (User 1) x1 : Amount of water allocated to User 1
*
x1 : Allocation to User 1 that results in f1*(S1)
When the third (last) stage is solved, all the three users are considered for allocation. Thus the total maximum return is, f3*(6) = 28. The allocations to individual users are traced back as follows: From the table for Stage 3, we getx1* = 2. From this, the water available at Stage 2 is obtained as S2 = Q –x3*
Maximum returns resulting from the allocations = 28.
Forward Recursion We will illustrate the forward recursion with the same example of water allocation.
Example 2.3.2 In forward recursion the solution starts from the first user, User 1, and proceeds in the forward direction to the third user, User 3. At every stage in the forward recursion, we look backward to obtain optimal solutions from the other stages that have already been solved.
User
S1 x1 R1(x1) f1*(S1) = max [R1(x1)] x1*
S2 : Amount of water available for allocation to Users 1 and 2 together.
x2 : Amount of water allocated to User 2.
*
(Contd)
S3 : Amount of water available for allocation to Users 1, 2 and 3 together = 6 units
x3 : Amount of water allocated to User 3.
*
Trace back: x3* = 3
*
S2= S3 – x3 = 6 – 3 = 3
\ x*2 = 1 … from Stage 2 table
*
S3 = S2 – x2 = 3 – 1 = 2
\ x1* = 2 … from Stage 1 table Optimal allocation x1* = 2 units
*
x2 = 1 unit
*
x3 = 3 units Maximum return = 28 units.
These results are the same as those obtained by backward recursion.
The Bellman’s principle of optimality may be interpreted for backward re-cursion as: “No matter what state of what stage one may be in, in order for a policy to be optimum, one must proceed from that state and stage in an optimal manner,” and for forward recursion as: “No matter what state of what stage one may be in, in order for a policy to be optimum, one had to get to that state and stage in an optimal manner” (Loucks et al., 1981).
2.3.3 Characteristics of a DP problem
l A single n-variable problem is divided into n number of single variable problems. This requires that the objective function of the optimization problem be separable with respect to stages. For example, the function, R1(X1) + R2(X2) + … + Rn(Xn) is separable, because we can identify exactly one variable in the objective function associated with each stage.
Similarly the function, X1.X2.X3…Xn is also separable. However, the func-tion, X1X2 + X2X3 + X3X4… is not separable and problems with such objective functions cannot be formulated as DP problems.
l A problem is divided into stages, with a policy decision required at each stage. In the water allocation problem, each user constitutes a stage, and the amount of water allocated at a stage constitutes a policy decision.
l Each stage has a number of possible states associated with it. In the water allocation problem, the amount of water available for allocation at a stage defines the state at that stage.
l The policy decision transforms the current state into a state associated with the next stage.
l A recursive relationship identifies the optimal decision at a given stage for a specific state, given the optimal decision for each state at the previous stage.
l A solution moves backward (or forward), stage by stage, till optimal deci-sion for the last stage is found. From this solution, the optimal decideci-sions for other stages are traced back.
Within this broad framework of discrete DP problems, we shall now discuss solutions for three specific problems in water resources decision making:
(a) Shortest Route Problem, (b) Reservoir Operation Problem, and (c) Capacity Expansion Problem.
2.3.4 Example Applications of DP Shortest Route Problem
Consider the problem of determining the shortest route for a pipeline from among various possible routes available from destination to source. Figure 2.12 shows the network of possible routes connecting several nodes, along with the distance between two nodes, shown along the arrows, on a pipe route. Starting from the source node A, the pipeline must run through one of the nodes B1, B2 or B3, one of the nodes, C1, C2, or C3, and one of the nodes, D1, D2 or D3, before reaching the destination node E. The problem is to determine the shortest route between node A and node E. We may pose this problem as a multistage, sequential decision problem and solve the problem with Dynamic Programming. To do this, we define stages as shown in the figure. Starting from the source node, A, we can reach one of the nodes B1, B2 or B3, in Stage 1 and one of the nodes C1, C2 or C3 in Stage 2, and one of the nodes D1, D2 or D3 in Stage 3, and reach the destination node E in Stage 4. Using forward recursion, we define the state of the system, Sn, at a stage n (n = 1, 2, 3, 4) as the node reached in that stage, and the decision variable as the node xn in the previous stage, n – 1, from which the node Sn is reached. We look for that node xn, out of all possible nodes xn, which results in the shortest distance from the source to the node Sn. We denote this node xn as x*n. In stage n, we answer the question, “if we are in node Sn, then which node in the previous stage n – 1 we must have come from so that the total distance up to the node Sn, starting from the source node A is minimum?”
Fig. 2.12 Network of Routes in the Shortest Route Problem
In Stage 1, the possible states, S1, are B1, B2 and B3. There is only one node from which any of these nodes could be reached, and that is the source node A itself, so that the decision variable, x1, in this case is the source node A. Being the only node from which any of the nodes, B1, B2 and B3 in Stage 1 may be reached, the node A also is x1*.
We proceed in a similar way to the other stages until Stage 4, and then trace back the solution to obtain the shortest route. The stagewise calculations are as follows:
Stage 1
S1 : Node in Stage 1
x1 : Node from which S1 is reached d (x1, S1) : Distance between nodes x1 and S1
*
f1(S1) : Minimum distance from source node A to node S1
= min [d (x1, S1)]
S1 x1 = A f*1(S1) x1* d(x1,S1)
B1 10 10 A
B2 15 15 A
B3 12 12 A
Stage 2
S2 : Node in Stage 2
x2 : Node in Stage 1 from which S2 is reached d (x2, S2) : Distance between nodes x2 and S2
*
f 2(S2) : Minimum distance from source node A to node S2
= min [d (x2, S2) + f*1(x2)]
S2 x2 d (x2, S2) f*1(x2) d (x2, S2) + f1*(x2) f*2(S2) x*2
C1 B1 8 10 18
B2 9 15 24 18 B1
B3 7 12 19
C2 B1 12 10 22
B2 11 15 26 22 B1
B3 15 12 27
C3 B1 19 10 29
B2 13 15 28 26 B3
B3 14 12 26
Stage 3
S3 : Node in Stage 3
x3 : Node in Stage 2 from which S3 is reached d (x3, S3) : Distance between nodes x3 and S3
*
f 3(S3) : Minimum distance from source node A to node S3
= min [d (x3, S3) + f*2(x3)]
S3 x3 d (x3, S3) f*2(x3) d (x3, S3) + f*2(x3) f*3(S3) x*3
D1 C1 8 18 26
C2 9 22 29 26 C1
C3 7 26 33
(Contd)
(Contd)
S3 x3 d (x3, S3) f*2(x3) d (x3, S3) + f*2(x3) f*3(S3) x*3
D2 C1 12 18 30
C2 11 22 33 30 C1
C3 15 26 41
D3 C1 19 18 37
C2 13 22 35 35 C3
C3 14 26 40
In this table, values for f*2(x3) are read from the previous stage (Stage 2) table, for a given value of S2 = x3.
Stage 4
S4 : Node in Stage 4 = E1
x4 : Node in Stage 3 from which S4 is reached d (x4, S4) : Distance between x4 and S4
*
f 4(S4) : Minimum distance from source node A to node S4
= min [d (x4, S4) + f*3(x4)]
S4 x4 d (x4, S4) f*3(x4) d (x4, S4) + f*3(x4) f*4(S4) x*4
E D1 9 26 35
D2 6 30 36 35 D1
D3 12 35 47
The optimal route is traced back from the last table as follows: The table shows that to reach the destination node E, we must come from the node,
*
x4 = D1 in Stage 3. We enter the table for Stage 3 with S3 (node to be reached in Stage 3) = D1 to obtain the node in Stage 2 from which we should have reached this node in Stage 3, as x3* = C1. Similarly from Stage 2 table, we get
*
x2 as B1 and from Stage 1 table, x1* = A. The shortest route is thus, A–B1–C1– D1–E, and the shortest distance is 35 units, which is the optimized objective function value, f*4(S4), in the last stage.
Note that the state transformation relation for this problem is defined by the network diagram, and may be expressed as, Sn – 1 = xn.
Reservoir Operation Problem
A classical multistage decision problem in the area of water resources manage-ment is the reservoir operation problem. Simply stated, the problem is to specify release Rt during a time period t (such as a month, a season, etc), when the storage, St at the beginning of the period t, and the inflow, Qt during the period t are known. The sequence of releases, {Rt}, in a year, is called the Reservoir Release Policy for the year. The release policy is obtained to optimize (mini-mize or maxi(mini-mize) a system performance measure (e.g. hydropower produced during the year, annual deviations of irrigation allocations from targets etc), which is, in general, a function of the storage, St and the release Rt. Many variations of this simple problem, with varying degrees of complexities, are discussed in Chapters 5 to 9.
The operational objective may be to maximize the total net benefit during a year, which is stated as
Maximize
1
( , )
T
t t t
t
B S R
å
=where Bt(St, Rt) is the net benefit during period t for given values of St and Rt, and T is the number of periods in the year. The problem is illustrated here for derivation of reservoir operating policy when only the current year is of inter-est, with the initial storage, S1, specified. Derivation of stationary policy using DP is discussed in Chapter 5.
The reservoir operation problem may be solved as a multistage decision problem using DP. The time period t for which the release decision Rt is to be made defines a stage in the DP. The storage, St, at the beginning of the period t is the state variable and Rt, the release during the period t is the decision variable. Further, only discrete values, (such as 0, 10, 20, …) are considered for storage, inflow and release.
The storage, St at the beginning of the period t changes to storage St + 1 at the beginning of period t + 1 (or storage at the end of period t) because of the inflow Qt and the release Rt (see Figure 2.13). This transformation is governed by the reservoir mass balance, or the storage continuity equation, which is written, neglecting all losses, as
St + 1= St + Qt – Rt
The reservoir storage must also satisfy the capacity constraint, St£ K, where K is the live storage capacity of the reservoir.
Rt
St+ 1
St Qt
Storage Inflow Release
Fig. 2.13 Reservoir Operation Problem
Starting with the last period, T, in the year, we solve this problem with backward recursion. Letting n denote the stage in the DP, we define ftn(St) as the maximized net benefits up to and including the period t. Note that the period t corresponds to stage n in this notation. The index n keeps track of the
stage in the DP and the index t denotes the time period within the year. The feasible values for the release RT, over which the search is made. The first one restricts the release to the total water available in storage in period t, and the second one ensures that the end of period storage is restricted to the live storage capacity. storage at the end of the period T – 1, which is also the storage at the beginning
stage in the DP and the index t denotes the time period within the year. The feasible values for the release RT, over which the search is made. The first one restricts the release to the total water available in storage in period t, and the second one ensures that the end of period storage is restricted to the live storage capacity. storage at the end of the period T – 1, which is also the storage at the beginning