Dynamic Programming: Variants and Applicability

Generation of Optimal Bushy Execution Plans

4.2 General Queries with Joins, Cross Products and Selec- Selec-tionsSelec-tions

4.2.8 Dynamic Programming: Variants and Applicability

Dynamic Programming is a general mathematical optimization principle applicable to many discrete optimization problems. All these optimization problems have one thing in common—their cost functions are decomposable [Min86]. Basically, the notion of decomposability comprises the following two properties. First, the cost function (say f ) can be formulated as a recurrence¹³ involving one or more calls to f with simpler arguments (“separability”). Second, in such a recurrence the function that combines the different recursive function applications is monotonically non-decreasing with respect to the arguments where f is called recursively (“monotonicity”). For example, the function

f (x₁, . . . , xn) = h(f (x₁, . . . , x_n−1), xn)

is decomposable, provided that h is monotonically non-decreasing with respect to the first argument.

The basis of Dynamic Programming is the Principle of Optimality which stipulates¹⁴ that every optimal solution can only be formed by partially optimal solutions. The validity of the Principle of Optimality ensures that we can state a recurrence that computes the cost of an optimal solution (together with the optimal solution itself).

The relation “is a subproblem of” defines a partial order P among all subproblems. The valid enumeration orders of the subproblems are exactly the linear extensions of P . Since we cannot a priori decide which of these enumeration orders is the best, it seems reasonable to chose one that can be generated efficiently—as we did in our approach. On the other hand, there are well-known methods that use cost information to direct the enumeration of subproblems. Two examples are the A^∗ and IDA^{∗ 15} algorithms from the area of heuristic search [Pea84]. These algorithms can be used to compute an optimal path in a directed graph. Unlike bushy trees, computing optimal left-deep trees can be stated as a problem of computing an optimal path in a graph. As far as we know, there is no generalization of A^∗ search to non-path problems.

Although A^∗ is optimal (and IDA^∗ is asymptotically optimal) among all informed search methods [Pea84], these algorithms have some drawbacks. First, compared to dynamic program-ming, A^∗ has the additional overhead of keeping track of (potentially very large) priority queues and hash tables and of computing non-trivial lower bounds to future costs. Second, the efficiency of A^∗ crucially depends on the quality of the used lower bound. Although IDA^∗ does without priority queues and hash tables, it considers slightly more nodes than A^∗. Nevertheless, for larger problems IDA^∗ usually outperforms A^∗. We made a few experiments that indicated that even if we use the excellent lower bound c ∈ (0, 1) × “the true future cost”, A^∗ essentially considers as many subproblems as with the trivial lower bound 0 (corresponding to best-first search).¹⁶ The number of considered subproblems decreases only if c is close to 1. At least for join ordering problems such lower bounds seem unattainable. However this experiment strongly indicates that cost-based pruning (e.g. as proposed by Graefe in his top-down dynamic programming algorithm

13or a system of recurrences

14Actually, the following weaker formulation would be sufficient unless we are to enumerate all optimal solutions:

“There exists an optimal solution that is only formed by partially optimal solutions”.

15Iterative Deepening A^∗

16These results are not new. In [Poh70] it is shown that for constant relative errors the running times of A^∗and IDA^∗are exponential in the depth of the graph.

or as sugested by Vance and Maier [VM96]) does not seem to reduce the number of considered subproblems substantially.

Dynamic programming algorithms are usually classified as bottom-up or top-down approaches

17. However, the names “bottom-up” and “top-down” are somewhat misleading. They refer to the order in which the subproblems are generated and not in which they are solved. In both approaches the subproblems are solved bottom-up. One advantage of the top-down approach is that new subproblems can be generated and optimized dynamically during the optimization process and multiple, overlapping problems can be optimized together. For example, similar as in the volcano optimizer generator [GM93], we could mix top-down dynamic programming with a transformation-based approach. In each step of the recursive optimization algorithm, first all applicable transformations are applied in turn (with provisions to avoid reverse transformations and duplicate subproblems) and then all possible decompositions into subproblems are enumerated and the subproblems optimized recursively. Another advantage of the top-down approach is that useless subproblems are automatically avoided in the computation. For example, suppose a query gives raise to N different subproblems and for each subproblem we have to account for k different physical properties p₁, . . . , pk, where each property pi can assume ki different values, we have to enumerate N ∗Qk

i=iki combinations of subproblems and physical properties. Often, not all these combinations really make sense. While the top-down approach seems to be more flexible in dealing with subproblems it is also less efficient in the enumeration of subproblems [Van98].

Although cost bounds can be used in both the top-down as well as the bottom-up version of dynamic programming they seem to have only little effect. Note that in the top-down approach it is possible to use upper bounds on the costs to “prune” whole subproblems (which is not directly possible in the bottom-up approach), this rarely is effective too because all the subproblems are highly interdependent. It is much more beneficial to use upper bounds for saving some cost computations.

Allmost all references in the literatur refer to this “traditional” version of the dynamic program-ming [SAC⁺79, GD87, OL90, GM93, STY93, VM96, CYW96]. However, it is sometimes necessary to use a slightly more general version of dynamic programming named partial order dynamic programming in [GHK92].

Let us first describe a straightforward generalization of the traditional dynamic programming scheme. In the traditional scheme, costs are numeric values that define a total ordering among all plans for a subproblem. Now suppose that we cannot always decide whether one plan is better than another plan, i.e. all we have is a plan comparison relation which defines a partial order among plans. Obviously, when the Priciple of Optimality holds it is save to discard all suboptimal plans for subproblems (since an optimal plan can cannot contain suboptimal subplans). In other words, instead of computing one optimal plan for each subproblem we compute all plans that do not prove to be suboptimal. As to the implementation, all that changes is that we now have to deal with lists of plans instead of single plans.

The idea behind partial order dynamic programming is the following. Suppose our comparison relation≺1does not fulfill the Principle of Optimality. Now, if we can find a weaker comparison relation≺218that does fulfill the Priciple of Optimality, we can use the above described generaliza-tion with ≺2 to compute a set of potentially optimal plans from which we determine the true optimal plan using≺.

With traditional dynamic programming the cost function computes a scalar, numeric cost value and keeps track of the best plan. So does partial order dynamic programming, but now the cost function may be non-scalar. A typical representation of costs are resource vectors (or resource descriptors). A resource vector is a tuple where the components quantify the usage of a certain resource. For example the components of a resource vector used in parallel query processing might be the time to complete a query (elapsed time), the time when the first tuple is produced, the

17though different orders are conceivable

18i.e. suboptimality with respect to≺2 implies suboptimality with respect to≺1

sum of processing times for each processor (total work), the total amount of buffer space used, disk access times, network communication costs, etc. Sometimes additional parameters, which do not reflect physical resources, are incorporated into resource vectors such that the resource vector of a plan can now be computed in terms of the resource vectors of its suplans (depending on the properties of the subproblem). Obviously, if one plan uses less resources than another plan the plan using less resources is superior. This defines a natural (partial) order amoung resource vectors that we can use to eliminate all sub-optimal plans for a subproblem: plan p₁ is superior to plan p₂if the resource vector of p₁is “component-wise” smaller than the resource vector of p₁. More on partial order dynamic programming can be found in [GHK92].

In document Algebraic Query Optimization in Database Systems (Page 117-121)