Dynamic Programming - COMPUTATIONAL METHODS FOR DECISION PROBLEMS

COMPUTATIONAL METHODS FOR DECISION PROBLEMS

Section 13.1: Dynamic Programming

The computational methods presented in section 12 neglected decision problems which involved a sequential approach or which had been transformed such that they could be assessed sequentially. Sequential problems lend themselves to another extension of optimization methods. Dynamic programming is a mathematical technique which allows for a sequence of interrelated decisions to be made. And further, it provides a systematic procedure for determining optimal combinations of decisions.

Hillier and Lierberman (2001) present the basic features that characterize dynamic programming problems. These include:

1. The problem can be divided into stages, with a decision required at each stage.

2. Each stage has a number of states associated with the beginning of that stage.

3. The consequence of the decision at each stage is to transform the current state to a state associated with the beginning of the next stage.

4. The solution procedure is designed to find an optimal choice for the overall problem.

5. Bellman’s principle (Bather, 2000): Given the current state, an optimal policy for the remaining stages is independent of the decisions adopted in previous stages.

6. The solution begins by finding an optimal choice for the last stage.

Dynamic programming is defined by a recursive relationship. The general notation used to describe a dynamic programming problem is summarized below:

N = number of stages n = index for current stage θn= current state for stage n

x = decision variable for stage n n

x = optimal value of n x |_n θ_n )

, ( _n _n

n x

f θ = contribution of stages n,n+1,...,Nto objective function

The recursive relationship will lead to two possible forms, depending on the problem:

This recursive procedure induces a backward form of mathematical induction in order to achieve the required task. This approach (as dynamic programming) was first outlined by Bellman, and was coined such to describe the techniques which he had brought together to study a class of optimization problems involving sequential decisions (Bather, 2000).

Sequential problems and subsequently dynamic programming problems can be classified into two groups: deterministic and stochastic. The former approach suggests that the state at the next stage is completely determined by the state and decision at the current stage. The latter approach (which is probabilistic) implies that a probability distribution will dictate what the next state will be. This probabilistic approach is more conducive to our needs and is associated with solving decision tree diagrams.

Consider the diagram above illustrating the general structure of a stochastic dynamic programming problem. Let S denotes the number of possible states at stage n + 1. The system goes to state i with probabilityp for_i i=1,2,...,S, given stateθ_nand decision x at _n stage n. At state i, the cumulative information of stage n to the objective function is represented byC . This figure can be extended in such a manner for all possible stages; _i and diagrammatically has the form of a decision tree.

With the inclusion of a probability distribution dictating the consequence of the next state, means that the precise form of the objective function will differ slightly from the one given above. It can now be written as:

where the minimization is taken over all feasible values of x . _n₊₁

In statistical decision theory¹, it might be said that the transition from one stage to another is controlled by a sequence of actions, given the stateθ_nat stage n, the choice of action a _n determines the probability distribution of the next state. Essentially because, we are

1 Bather provides a concise introduction to both deterministic and probabilistic dynamic programming and its application to utility theory.

dealing with a Markov system, only the current state is important. The state and action variables may be discrete or continuous and the range of possible choices of an action may vary state to state. The essential characteristic of such modeling (in summary) is that givenθ_nat stage n, the choice of action or decisionx determines both the probability _n distribution of the next state θ_n₊₁and the expected cost² of the transition θ_n →θ_n₊₁.

Section 13.2: Bellman’s Principle and the Bayesian Perspective

Recall that one of the fundamental elements of decision problems is the loss function and in dynamic programming it is this objective function we tend to want to minimize. Now consider an additive function which by definition is monotonic and separable. (See Part II on Utility Theory.) That is if the first few terms are fixed then minimizing its whole sum is equivalent to minimizing its individual terms - a property which holds when taking expectations. This expounds the aforementioned Bellman Principle of Optimality:

The optimal sequential decision policy for the problem which begins with (⋅)

Pθ as the decision maker’s prior forθ and has R stages to run must have the property that is, at any stage n < N, the observations X₁ = ,...,x₁ X_n =x_nhave been made, then the continuation of the optimal policy must be the optimal sequential policy for the problem beginning withP_θ(⋅|x₁,...,x_n)as the prior and having N – n stages to run. (French and Insua, 2000)

French and Insua make it a point to emphasize the importance of the decision maker’s prior in Bellman’s principle. Since the prior describes the state of knowledge, it also describes the state of the decision making process.

For simplicity of notation, letπ =P_θ(⋅)andr_n(π)be the Bayes risk of the optimal policy with at most n stages left to run with knowledgeπ . Then the situation in which the decision maker has no option to make an observation and must choose an action can be expressed as:

)]

, ( [ min )

0(π E_θ l a θ

r = a∈A

where the expectation overθ is taken with respect toπ . Next consider that the decision maker is given the chance to make an observation. The decision maker has two options:

1. Take an actiona∈A without making an observation at an expected loss ofr₀(π). 2. Make a single observation X at costγ and choose an actiona∈Ain light of her

current knowledgeπ(X)at an expected loss ofE_X[r₀(π(X))].

Bellman’s Principle assumes that after the observation(s), the optimal Bayes action is taken (French and Insua, 2000). Accordingly, this can be extended to n observations

2 This refers to the expected loss function.

which then give her option (1) stated above or an extension of option (2) which would result in observations made for the remaining n -1 stages with an expected loss of

))]

( ( [r ₁ X E_X _n₋ π . Thus, forn=1,2,...,N

))]}

( ( [ ),

( min{

)

( r₀ E r ₁ X

r_n π = π γ + _X _n₋ π .

Note the form of this recursion formula. This defines the dynamic programming or backward induction form introduced in the previous subsection. Thus, dynamic programming allows for both the calculation ofr_n(π)and the characterization of the optimal policy. This following section now extends the basic principles demonstrated in these earlier subsections to illustrate the analytic approach to solving decision problems as applied to utility theory – or more precisely in the maximization of utilities.

In document Statistical Decision Theory: Concepts, Methods and Applications. (Special topics in Probabilistic Graphical Models) (Page 134-137)