Query execution - Toward timely, predictable and cost-effective data analytics

rather than being updated every time the data changes, histograms are updated periodically14. Therefore, there is a risk that at times histograms will contain stale information.

3.3 Query execution

Once all plans have been quantiﬁed and the best one has been chosen, the plan is presented to the query execution module that, by following the plan as a blueprint, accesses and processes data. At the lowest level of the tree-shaped plan are access path operators (i.e., scan operators) that instruct how to access data which is placed on external storage. On top of access paths operators are usually joins and aggregations - the operators that instruct how data from different inputs can be combined. In the following we discuss access path operators and join implementation variations used in this thesis.

3.3.1 Data organization on external storage

Access path operator alternatives directly depend on the way data is organized on external storage. In general, each relation is stored in a separate file called a heap file; each file can store records either in an arbitrary way, or with respect to a particular order (we say such records are sorted).

A heap ﬁle stores tuples typically in the order of their arrival, i.e., semantically looking at tuples with respect to their attribute values, they have no particular order. Tuples are organized in pages (blocks on disk) that are allocated one by one usually sequentially on the storage medium15. As there is no order among tuples, a search for a particular tuple involves going through all the blocks of the relation.

Since searching for a particular tuple is very inefﬁcient in a heap structure, DBMS build indexes on top of data pages to retrieve tuples in much shorter time. An index is a data structure used to efﬁciently locate tuples satisfying search conditions on the search keys of the index16. Internally, entries of an index can be hashed on the search key(s) (in which case we refer to

hash indexes17), or organized in a tree-like structure used to direct the search for particular keys (in which case we refer to tree-based indexes, in particular B+ trees18). Hash indexing is only amenable for predicates of the typeattribute = value, while tree-indexing can be efﬁciently used for both equality and range expressions on search keys. Since the usability of B+ trees is higher than hash-based indexes, we focus on them in this thesis.

14 _{Some DBMS feature special commands for explicit statistics collection during which histograms will be updated.} 15 _{Due to fragmentation of the storage medium, data blocks, however, can be arbitrarily allocated.}

16 _{Search keys of the index are attributes based on which the entries are organized.} 17 _{A hash index uses a hash function to group tuples in buckets.}

18 _{B+ tree is a data structure in which all paths from the root (the up most layer) to leaves (the lowest layers) are} the same in length, i.e., the tree is balanced in height. When implementing tree-based indexing, DBMS mostly implement B+ trees.

Chapter 3. Query Processing …. ... ROOT INTERNAL NODES LEAF NODES

INDEX FILE

HEAP FILE

TUPLE

PAGES

Figure 3.1: Clustered Index

At the top of the tree-based index there is a root note; the lower levels called internal nodes contain index entries used to navigate the search. At the lowest level of the index are leaf nodes that contain data entries, which in addition to search keys have pointers to actual tuples stored in data pages in the heap ﬁle.

Indexes can be clustered (primary) as depicted in Figure3.1or unclustered (secondary) as depicted in Figure3.2. In the case of clustered indexes, the tuples in a data file are organized (sorted) based on the search keys of the index, i.e., the ordering of the tuples in the heap file is the same as the ordering of data entries in the index file. For unclustered indexes, such a correlation does not exist, i.e., two adjacent data entries from the index file can have pointers to two distant locations in the heap file. From the efficiency point of view, accessing tuples through the clustered index is cheaper than through the unclustered index, since in the former case sequential page reads of pages stored in the heap are invoked, while the latter approach involves random page reads19. Nonetheless, since every file can be organized in a single way, for each relation only one clustered index could be created, the remaining indexes (and data analytics applications have multitude of them) have to be unclustered.

3.3.2 Execution operators

Access path operators. Depending on the storage organization of relations, access path

operators can access data in several ways: a) by using index only access, when all attributes of interest of a query (payload) are covered with the search keys of the index, b) by using single

index access, if only one index is used to locate tuples of interest that are then fetched from

the heap ﬁle, c) by using multi-index access, when multiple indexes, used to match different predicates, are intersected to get a ﬁnal set of tuples, or d) by using full sequential access (full

table scan), when no indexes are used, and the entire relation is read sequentially from the

heap.

3.3. Query execution …. ... ROOT INTERNAL NODES LEAF NODES

INDEX FILE

HEAP FILE

TUPLE

PAGES

Figure 3.2: Unclustered Index

Table 3.1: Cost model parameters

Parameter Description

|R| Number of pages the relation occupies.

||R|| Number of tuples in the relation.

TP Number of tuples per page.

sel Selectivity of the query predicate(s) (%).

r andi o Cost of a random I/O access (per page).

seqi o Cost of a sequential I/O access (per page).

i nd excost Number of I/O needed to ﬁnd the ﬁrst qualifying tuple in the index.

Typically 2-4 in the case of B+ tree, 1.2 in the case of hash index.

The task of the query optimizer is to choose the most selective path, i.e., the path that will be the cheapest in terms of I/O operations. Table3.2summarizes the analytical formulas used to compute the cost of different access paths. The meaning of parameters of the cost model is shown in Table3.1. As it can be seen from Table3.2, the cost of full table scan is constant regardless of the result size and is equal to the cost of reading all pages of the relation with the sequential access. The cost of unclustered index is equal to the cost to traverse the tree or hash to find the first tuple, plus the cost to read remaining qualifying tuples, where each tuple could potentially incur one random I/O operation. The cost of clustered index is equal to the cost to traverse the tree or hash to find the first tuple, plus the cost to sequentially read the remaining data pages containing matches from the heap.

Joins. Join operators implement ways of combining two (or more) relations. Many DBMS

implement three types of join operations, nested loops join, sort-merge join and hash join20. For a single tuple of the left input (usually referred to as the outer input), nested loops accesses

20 _{These types have several subtleties, e.g. nested loops could be blocked nested loops, index nested loops; hash} join could be implemented as grace hash join (partitioned hash join) or hybrid hash join; sort-merge could be in-memory or external sort-merge.

Chapter 3. Query Processing

Table 3.2: Access path selection formulas

Access path alternative Cost formula

Full Table Scan |R| × seqi o

Unclustered Index Access i nd excost+ ||R|| × sel × r andi o

Clustered Index Access i nd excost+||R||×sel_T_P × seqi o

Table 3.3: Join implementations formulas

Join implementation Cost formula

Nested Loops Join |R| + |R| × |S|

Sort-Merge Join |R| × log(R) + |S| × log(S) + |R| + |S|

Hash Join |R| + |S|

all tuples of the right input (the inner input). This access is obviously inefficient. Nonetheless, if there is an index on the inner input, for each tuple of the outer input, matching tuples from the inner input are found with the help of the index, reducing thereby the unnecessary accesses of the inner input. Sort-merge join first sorts both inputs on the join key, after which it performs (coordinated) sequential access over both inputs producing matches directly. Hash join builds a hash table on the smaller input, and then it (reads sequentially and) probes all tuples from the bigger input to find the matches. The analytical formulas of the join operator implementations are summarized in Table3.3. The meaning of the cost model parameters is given in Table3.1; Relation R is considered to be the outer input, and relation S the inner input. These are basic cost formulas, assuming that both inputs fit in memory. Once the input size is bigger than the memory size, the formulas change depending on the join implementations. For instance, hash join introduces an additional partitioning step in which case both inputs are partitioned prior to joining; sort-merge performs external sorting that usually involves two passes over the data.

When estimating the execution performance of join algorithms various factors need to be considered. For instance, if an index exists, and the query optimizer estimated that the join will not have too many matches, index nested loops is a clear choice, since the inner input does not have to be fully traversed. If there are no indexes, and one table is much smaller than the other (meaning that it fits in the DBMS cache), hash join will outperform other solutions, because its cost is equal to the cost of sequentially reading both inputs. Nonetheless, if the hash table cannot fit in memory or the data distribution of the join attribute is highly skewed (i.e., some keys are frequently appearing), hash join might not be the optimal solution (due to hash table and buckets overflowing)21. Sort-merge is usually a winner when tables are similar in size, or if the query requests tuples to be served in a particular order (i.e., the order of the join predicate). Nonetheless, if both inputs have multiple duplicates, sort-merge join might become inefficient. Therefore, when deciding on the optimal join algorithm, the query

21 _{In the case that the hash table does not ﬁt in memory, the hash join formula degrades into 3}_{×|R|+3×|S| where} each table is partitioned (i.e., being read and written) prior to joining.

In document Toward timely, predictable and cost-effective data analytics (Page 45-49)