Result Precision - Formal Analysis - From Massive Parallelization to Quantum Computing: Seven N

3.5 Formal Analysis

3.5.1 Result Precision

The following analysis is based on the Principle of Near-Optimality(PONO) for MOQO, in- troduced in the previous chapter, which states that replacing optimal sub-plans within a complete query plan by near-optimal sub-plans still yields a near-optimal complete plan for a broad class of cost metrics. The class of cost metrics to which the PONO applies is characterized by the Aggregation Function, i.e. by the recursive function that calculates the cost of a plan according to that metric out of the cost of the two sub-plans: the PONO applies to all cost metrics whose aggregation function can be represented using a combination of the operators sum, maximum, minimum, and multiplication by a constant. This applies for

instance to metrics such as energy consumption or execution time2. The PONO has also been shown to apply for several other metrics whose aggregation formulas do not ﬁt into the latter scheme, such as failure resilience or result precision. A formal deﬁnition of the PONO follows.

Deﬁnition 8 (PONO). Let p be a query plan with sub-plans p1and p2and pick an arbitrary

α ≥ 1. Derive p∗from p by replacing p1by p∗₁ and p2by p₂∗. Then c(p₁∗) αc(p1) and c(p₂∗)

αc(p2) together imply c(p∗) αc(p).

The following theorems are based on the PONO. We also assume Monotone Cost Aggregation, meaning that the cost of a plan must be higher or equal to the cost of its sub-plans according to each cost metric.

Theorem 9. After invoking OPTIMIZEwith bounds b and resolution r for query Q, Resq[0..b, 0..r ]

contains anαr-approximate b-bounded Pareto plan set for each table q∈ Q.

Proof. For each table q, all applicable scan plans are generated and pruned before the main

loop starts. Let p be an arbitrary scan plan for an arbitrary table q. Once procedure OPTIMIZE

is invoked later for resolution r and bounds b, there are two possibilities for p: either p was inserted into the result plan set in prior invocations or it is not in the result plan set at the start of the current invocation. If p was not inserted before then we must make sure that it is either inserted in the current invocation or not required to form anαr-approximate b-bounded

Pareto plan set.

If p was not inserted before then it must be included in C andq[0..b, 0..r ] unless it exceeds the bounds b or can be approximated by an alternative plan. In both cases, p is not required for anαr-approximate b-bounded Pareto plan set. If p is however in C andq[0..b, 0..r ] at the start

of the current invocation then procedure OPTIMIZEwill retrieve and prune p; plan p will be inserted if it is required for anαr-approximate b-bounded Pareto plan set.

Theorem 10. After invoking OPTIMIZEwith bounds b and resolution r for query Q, Resq[0..b, 0..r ]

contains anαk_r-approximate b-bounded Pareto plan set for each table subset q⊆ Q with cardi-

nality k= |q|.

Proof. The proof is an induction over the number of tables k. Theorem 9 proves the induction

start for k= 1. Assume that the inductional assumption has been proven for all k < K . Let

q⊆ Q be a set of K tables and p an arbitrary plan that joins those tables with αK_rc(p) b. Plan

p must have two sub-plans p1and p2that each join at most K− 1 tables. Let q1and q2be the

set of tables joined by p1and p2respectively. We assume monotone cost aggregation which impliesαK_rc(p1) b and αrKc(p2) b. The inductional assumption applies to p1and p2such that Resq1_{[0..b, 0..r ] will contain a plan p}∗

1 with c(p∗1) αKr−1c(p1) and Resq2[0..b, 0..r ] will

2_{The energy consumption of a plan is the sum of the energy consumption of the sub-plans. The plan execution} time is the maximum of the execution times of the sub-plans for parallel execution, and the sum for sequential execution.

3.5. Formal Analysis

contain a plan p∗₂with c(p₂∗) α_rK−1c(p2) after the optimizer invocation. Plans p∗1and p2∗can be combined into a plan p∗that joins the same tables as p and has similar cost according to the PONO: c(p∗) αK_r−1c(p). Plan p∗is generated either in the current optimizer invocation with resolution r and bounds b or in one of the prior invocations. If p∗is generated in the current invocation then it is inserted unless an alternative plan p∗∗with c(p∗∗) αrc(p∗) αKrc(p)

is already in the result set. In that case the theorem holds. If p∗ was generated in prior invocations then it was either inserted into the result set, or it was already pruned at resolution

r and its cost too similar to one of the result plans, or it will be pruned in the current iteration.

In all cases the theorem holds.

Knowing the relationship between the precision factorsαrand the approximation quality of

the result plan sets allows to choose the factorαrMfor the maximal resolution in function of

the desired target precision.

In document From Massive Parallelization to Quantum Computing: Seven Novel Approaches to Query Optimization (Page 77-79)