Deterministic Dual Dynamic Programming - Stochastic Dual Dynamic Programming

2.6 Stochastic Dual Dynamic Programming

2.6.1 Deterministic Dual Dynamic Programming

The concepts of Dual Dynamic Programming (DDP) is illustrated with the following LP problem: min c1Tx1+ c2Tx2 s.t. A1x1 ≥ b1 E1x1+ A2x2 ≥ b2 (2.22) where c1 ∈ Rn1, x1 ∈ Rn1, c2 ∈ Rn2, x2 ∈ Rn2, A1 ∈ Rm1×n1, b1 ∈ Rm1,

E1 ∈ Rm2×n1, A2 ∈ Rm2×n2, b2 ∈ Rm2. Problem (2.22) can be interpreted as

a two-stage problem. The first-stage problem is defined as: min c1Tx1+ α1(x1)

s.t. A1x1 ≥ b1

(2.23) The second-stage problem is defined as:

α1(x1) = min c2Tx2

s.t. A2x2 ≥ b2− E1x1

(2.24)

In Dynamic Programming (DP), c1Tx1 represents the current cost; α1(x1)

represents the future cost, which is a function of the first-stage decision vari-

ables x1. DP algorithms construct the future cost function by discretizing x1

into a set of trial values {ˆx1i, i = 1, . . . , n} and then solving problem (2.24)

for each trial value. The intermediate values of α1(x1) can be obtained by in-

terpolating the neighbouring discretized states. Once the future cost function is constructed, problem (2.23) is solved again to find the optimal solutions and objective value.

DP has many attractive properties. Firstly, it can easily extend to multistage and stochastic problems. Secondly, it can solve nonlinear problems with relative ease. Its drawback is that the first-stage decisions need to be discretized into a very large number of values such that the future cost function can be constructed accurately. The computation therefore can become

very intensive or even intractable. For example, if the vector x1 has ten

components and each component is discretized into five values, there are

510 _{≈ 9.76 million different combinations of x}

1. This problem is well-known

as the Curse of Dimensionality.

An approach to avoiding the Curse of Dimensionality is to approximate the future cost function by an analytical function rather than discretizing the first-stage decision variables. In this case, the future cost function is approximated by a piecewise linear function by taking the dual of Equation

(2.24), which is a similar technique used in Bender’s decomposition (Section 2.1). The dual of Equation (2.24) is given as:

α1(x1) = max πT(b2− E1x1)

s.t. A2Tπ ≤ c2

(2.25) Where π is the simplex multiplier vector associated with the constraint in (2.23). According to the LP theory, the optimal solution obtained from the original problem coincides with the one obtained from its dual. Hence, problem (2.24) and (2.25) are equivalent and both of them represent the

future cost function. However, the constraint A2Tπ ≤ c2 in problem (2.25)

is not dependent on x1, so a set of possible solutions π can be obtained even

before knowing the decision x1.

Let Π = {π1, . . . , πm} be the set of all vertices of the constraint in

(2.25). The future cost function can be found using the complete enumera- tion method:

α1(x1) = max{πi}T(b2− E1x1), i = 1, . . . , m (2.26)

As a result, the equivalent problem to problem (2.24) is:

α1(x1) = min α

s.t. _{α ≥ {π}i_}T(b2− E1x1), i = 1, . . . , m

(2.27) Where α is a scalar variable. Equation (2.27) shows that the future cost function can be described by a piecewise linear function of different components.

Each component is a hyperplane defined by {πi_}T_(b

2− E1x1). It is sufficient

to use only the coefficients {πi} to construct the future cost function, and

it is not required to discretize x1. Finding all vertices {πi}, however, may

be very challenging. For many problems, we can use only a subset of these vertices. If ˆx1i is the trial first-stage decisions, the vertices can be obtained

by solving the dual of the following problems: α1(ˆx1i) = min c2Tx2

s.t. A2x2 ≥ b2− E1xˆ1i

(2.28)

Let πi be the vector multiplier of problem (2.28). πi therefore belongs to

the set Π. Given a set of n trial values {ˆx1i, i = 1, . . . , n}, one can obtain n

associated multipliers πi, i = 1, . . . , n by solving (2.28) for each trial value.

The future cost function therefore can be approximated as: ˆ

α1(x1) = min α

s.t. _{α ≥ {π}i}T(b2E1x1), i = 1, . . . , n

Since only a subset of {Π} are used to approximate the future cost function, Equation (2.29) is the lower bound of the true future cost function. After that, we can solve the first-stage problem:

z = min c1Tx1+ ˆα1(x1) s.t. A1x1 ≥ b1 (2.30) Substituting (2.29) into (2.30), z = min c1Tx1+ α s.t. A1x1 ≥ b1 α ≥ {πi}T(b2− E1x1), i = 1, . . . , n (2.31)

Equation (2.31) is the lower bound of the true optimal cost because the approximate future cost is the lower bound to the true future cost. The lower bound z is therefore calculated as:

z = c1Txˆ1+ ˆα (2.32)

Where ˆx1 and ˆα are the solutions of Equation (2.31). The upper bound ¯z

is obtained by solving second-stage problem (2.28) for the trial first-stage decisions ˆx1:

z = c1Txˆ1+ α1(ˆx1) (2.33)

z and z can be considered as the actual and predicted cost respectively. If (¯_z−

z_{) ≤ for a small tolerance ≥ 0, the optimal actual and predicted cost are}

very close to each other. The problem is solved. Otherwise, a new set of trial decisions is used and the whole process repeats. The new set of trial decisions can be obtained from the previous iteration. The approximate future cost function in this case can be built upon the previous good candidates for the optimal solution.

In summary, DDP has several advantages. Firstly, it does not require state discretization (Pereira and Pinto (1991)). Secondly, the optimal solution obtained at every iteration can be reused as the trial solution for the next iteration. Thirdly, the upper and lower bound can be calculated for every iteration and they are used directly for the stopping criterion. Finally, the algorithm will converge after a finite number of iterations (Philpott and Guan (2008) Shapiro (2011)).

In the next section, we are going to investigate the DDP approach for the stochastic cases, which can deal with uncertain data and is more applicable to real-life situations.

2.6.2 Stochastic Dual Dynamic Programming

In the previous section, we have seen how DDP can solve a two-stage and then multistage DDP (deterministic) problems. In this section, we are going to investigate Stochastic Dual Dynamic Programming (SDDP). We are going to start with a two-stage stochastic problem and then extend it to a multistage stochastic problem.

A two-stage SDDP problem is given as: min c1Tx1+ m X j=1 pjc2Tx2j s.t. A1x1 ≥ b1 E1x1+ A2x2j ≥ b2j, j = 1, . . . , m (2.34) Where c1 ∈ Rn1, x1 ∈ Rn1, pj ∈ R, c2 ∈ Rn2, x2j ∈ Rn2, A1 ∈ Rm1×n1, b1 ∈ Rm1, E1 ∈ Rm2×n1, A2 ∈ Rm2×n2, b2j ∈ Rm2. Problem (2.34) can be

interpreted as a two-stage problem with the first-stage decision x1. Given

the trial decision x1, there are m second-stage problems (subproblems):

α1j(x1) = min c2Tx2j

s.t. A2x2j ≥ b2j − E1x1

(2.35) Where j = 1, . . . , m. Each subproblem is a scenario that occurs with the probability pj. Let ¯α1(x1) =Pm_j=1pjα1j(x1). The first-stage problem in the

DP recursion becomes:

min c1Tx1+ ¯α1(x1)

s.t. A1x1 ≥ b1

(2.36) All of the derivations with two-stage DDP problems can be applied to the stochastic case. Furthermore, it can be extended to the multistage stochastic program.

In the multistage stochastic problems, the SDDP algorithm performs one forward simulation and one backward simulation for every iteration. The purpose of the forward simulation is to find “good” trial decisions at each stage. Also, it gives the upper bound on the objective value of the problem. Then, given the trial decisions at each stage, the backward simulation will solve the subproblems corresponding to each scenario generated from the random variables. The dual obtained after solving these problems are used to construct the lower-bound piecewise-linear approximation of the future cost function at each stage. This technique is known as Bender’s decomposition as explained in Section 2.1. Therefore, the objective value at the first stage

can also be used as the lower bound on the objective value of the whole problem.

Depending on different ways of upper bound, lower bound and confidence interval calculations, there are different ways to stop the SDDP algorithm (Shapiro (2011) Homem-de Mello et al. (2011) Homem-de Mello and Bayrak- san (2014)). The stopping criterion we use throughout the thesis is given in Shapiro (2011) to avoid an early stop caused by the large variance of the upper bound estimator (Remark 4 in the paper Shapiro (2011)). Its principle is as follows: Every iteration of the SDDP algorithm includes an backward and an forward simulation. The backward simulation constructs the lower bound on the recourse function at every stage so the objective value at the first stage will also be the lower bound of the whole optimization problem. This is denoted as θ. Then, in the forward simulation, we have n paths from stage one to the final stage. For every projection j, we calculate the “true” cost: θi = T X t=1 cT_tixˆti, i = 1, . . . , n (2.37)

Where cti and ˆxti are the cost and trial decicion vectors at stage t for the

projection j. We then calculate the mean, ¯θ, and the variance, ˆσ_θ2, of these

paths: ¯ θ = 1 n n X i=1 θi (2.38) ˆ σ2_θ = 1 n − 1 n X i=1 (θi− ¯θ)2 (2.39)

Then the confidence interval can be constructed as:

[¯_{θ − z}α/2σˆθ/√n, ¯θ + zα/2σˆθ/√n] (2.40)

Where zα denotes the (1 − α)-quantile of the standard Normal distribution.

For example: z0.025 = 1.96 shows the 95% confidence interval. If the difference

between the upper confidence bound ¯θ + zασˆθ/√n and the lower bound θ is

less than a prescribed accuracy level > 0, the algorithm stops. Typically, the value of is 10%, meaning that the upper bound is no more than 10% above the lower bound Homem-de Mello et al. (2011). With this stopping criterion, the optimization problem is guaranteed to be solved with accuracy for the (1 − α) confidence.

The accuracy of SDDP and other sampling algorithms can be increased by increasing the number of samples n (Shapiro (2011) Homem-de Mello et al. (2011) Homem-de Mello and Bayraksan (2014)). However, increasing in the

number of samples may make the problem intractable or very computation- ally intensive. Our aim is to reduce the error while keeping a small number of samples. The multistage SDDP algorithm is summarized in Algorithm 2. In the next section, we are going to show an example of how sampling algorithms can give wrong solutions when using a small number of samples. After that, we are going to review different Variance Reduction (VR) methods that can be used to solve this problem before proposing our own method based on the theory of Importance Sampling.

In document Importance sampling for stochastic programming (Page 39-44)