• No results found

Two Alternate Approaches to Building a Column-Store

Now that we have described in detail the first approach to building a column-store, we describe two alternative approaches: modifying the storage manager to store tables column-by-column on disk, but merging the columns on-the-fly at the beginning of query execution so the rest of the row-oriented query executor can be kept in tact; and modifying both the storage manager and query execution engine.

2.4.1 Approach 2: Modifying the Storage Layer

Unlike the first approach discussed in this chapter, this approach does not require any changes to the logical schema when converting from a row-store. All table definitions and SQL remain the same; the only change is the way tables are physically laid out on storage. Instead of mapping a two-dimensional table row-by-row onto storage, it is mapped column-by-column.

When storing a table on storage in this way, the tuple IDs (or primary keys) needed to join together columns from the same table are not explicitly stored. Rather, implicit column positions are used to reconstruct columns (the ith value from each column belong to the ith tuple in the table). Further, tuple headers are stored in their own separate columns and so they can be accessed separately from the actual column values. Consequently, a column stored using this approach contains just data from that column, unlike the vertical partitioning approach where a tuple header and tuple ID is stored along with column data. This solves one of the primary limitations of the previous approach. As a point of comparison, a single column of integers from the SSBM fact table stored using this approach takes just 240 MB (4 bytes × 6 × 107tuples= 240 MB). This is much smaller than the 960 MB needed to store a SSBM fact table integer column in the vertical partitioning approach above.

In this approach, it is necessary to perform the process of tuple reconstruction before query execution. For each query, typically only a subset of table columns need to be accessed (the set of accessed columns can be derived directly from inspection of the query). Only this subset of columns are read off storage and merged into rows. The merging algorithm is straightforward: the ith value from each column are copied into the ith tuple for that table, with the values from each component attribute stored consecutively. In order for the same query executor to be used as for row-oriented database systems, this merging process must be done before query execution.

Clearly, it is necessary to keep heap files stored in position order (the ith value is always after the i − 1th value), otherwise it would not be possible to match up values across columns without a tuple ID. In contrast, in the storage manager of a typical row-store (which, in the first approach presented in this chapter, is used to implement a column- store), the order of heap files, even on a clustered attribute, is only guaranteed through an index. This makes a merge join (without a sort) the obvious choice for tuple reconstruction in a column-store. In a row-store, since iterating through a sorted file must be done indirectly through the index, which can result in extra seeks between index leaves, an index-based merge join is a slow way to reconstruct tuples. Hence, by modifying the storage layer to guarantee that heap files are in position order, a faster join algorithm can be used to join columns together, alleviating the other primary limitation of the row-store implementation of a column-store approach.

2.4.2 Approach 3: Modifying the Storage Layer and Query Execution Engine

The storage manager in this approach is identical to the storage manager in the previous approach. The key difference is that in the previous approach, columns would have to be merged at the beginning of the query plan so that no modifications would be necessary to the query execution engine, whereas in this approach, this merging process can be delayed and column-specific operations can be used in query execution.

To illustrate the difference between these approaches, take, for example, the query:

SELECT X FROM TABLE WHERE Y < CONST

In approach 2, columns X and Y would be read off storage, merged into 2-attribute tuples, and then sent to a row-oriented query execution engine which would apply the predicate on the Y attribute and extract the X attribute if the predicate passed. Intuitively, there is some wasted effort here – all tuples in X and Y are merged even though the predicate will cause some of these merged tuples to be immediately discarded. Further, the output of this query is a single column, so the executor will have to eventually unmerge (“project”) the X attribute.

When modifications to the query executor are allowed, a different query plan can be used. The Y column is read off storage on its own, and the predicate is applied. The result of the predicate application is a set of positions of values (in the Y column) that passed the predicate. The X column is then scanned and values at this successful set of positions are extracted.

There are a variety of advantages of this query execution strategy. First, it clearly avoids the unnecessary tuple merging and unmerging costs. However, there are some additional less obvious performance benefits. First, less data is being moved around memory. In approach 2, entire tuples must be moved from memory to CPU for predicate application (this is because memory cannot be read with fine enough granularity to access only one attribute from a

tuple; this is explained further in Section 3.2). In contrast, in this approach, only the Y column needs to be sent to the CPU for predicate application.

Second, since heap files are stored in position order as described above, we can access the heap file directly to perform this predicate application. If the column is fixed-width, it can be iterated through as if it were an array, so calculations do not have to be performed to find the next value to apply the predicate to. However, once attributes have been merged into tuples, as soon as any attribute is not fixed width, the entire tuple is not fixed width, and the location of the next value to perform the predicate on is no longer at a constant offset.