• No results found

4.6 Related Work

4.6.3 Join Optimization

The problem of sparse matrix chain multiplication is related to join enumeration and cardinality esti- mation in a relational database management system. In fact, the connection becomes more obvious when sparse matrices are represented as ⟨row, col, val⟩ triple tables, as sketched in Figure 3.3c1 of Section 3.4. As we have already shown by the SQL expression in Listing 2.1, Section 2.6.1, a matrix multiplication can be expressed as a join on A.col = B.row, followed by a sum aggregation, which groups the matrix values by their target coordinates A.row, B.col. Vice versa, a certain type of

join could also be re-interpreted as a sparse matrix multiplication. For example, Amossen and Pagh (2009) describe how they accelerate a join-project by using a fast Boolean matrix multiplication. Multi-way Join Order

A join of multiple tables (multi-way join) can be executed in many ways. Similar to the case of matrix chain multiplication, the size (or cardinalities) of intermediate join products influence the execution time, as well as the selection of different join algorithms. Indeed, the use of dynamic pro- gramming in join optimization is common sense. As early as four decades ago Selinger et al. (1979) proposed a cost-based optimization of join plans and join enumeration using a bottom-up dynamic programming approach in “System R”. In a recent paper, Leis et al. (2015) survey and evaluate query optimizers of state-of-the-art DBMS. They found that the query optimization of multi-join queries primarily depends on the quality of the cardinality estimation, and bad plans mostly yield from estimation errors that quickly grow as the number of joins increase. Furthermore, the authors conclude that the query optimization depends only insignificantly on accurate cost models. This result, however, might be true for most join algorithms that all depend on the number of tuples, but is definitely not applicable to our setup, which consists of sparse and dense multiplication kernels each having a completely different dependency on the matrix density.

In contrast to matrix multiplications, joins are generally commutative. For instance, consider a three-way join on an attribute that is common to all tables: T1 ◃▹k T2 ◃▹k T3. This join could

be executed as (T1 ◃▹k T2) ◃▹k T3, but alternatively in the order (T1 ◃▹k T3) ◃▹k T2, which

commutes table T2 with T3. An explicit connection from joins to matrix chain multiplication is

drawn by Moerkotte (2003), who considers order-preserving joins. These are non-commutative and thus, more closely related to our problem. However, the cost model shown by Moerkotte (2003) merely consists of summing cardinalities. Despite being adequate for relational joins, it can not be used in the same way for sparse matrix multiplication, since the intermediate cardinality of the join A.col = B.row is much higher than the result size of a matrix product. This is due to the aggregation step (Amossen and Pagh, 2009), which reduces the size of the matrix. Naturally, this aggregation has also to be considered in the matrix density estimation, and is the reason why simple join cardinality estimations can not be applied to our case.

Cardinality Estimation

Some parts of our approach to estimate the density of sparse matrix intermediates are related to methods used for cardinality estimation used for relational query processing. In particular, attribute histograms are used for estimating the selectivity of a query (e.g. in Piatetsky-Shapiro and Connell, 1984), approximate query answering and load balancing in join execution. A good overview of using histograms in databases is given by (Ioannidis, 2003). In particular, multidimensional histograms can be used for the optimization of queries on multidimensional data (e.g. in Muralikrishna and De- Witt, 1988). Moreover, the concept of integrating physical properties into the optimization process, such as the data representation, has also been treated in the context of relational plan generation. For example, Graefe and McKenna (1993) outline the dependency of the plan selection on physical properties of the corresponding intermediate results. It should be mentioned here that we prefer

equi-width multidimensional histograms (divided into buckets of equal areas) for the density map

rather than the equi-depth (divided into buckets of equal number of entries and varying areas). This is because equi-depth histograms would complicate the density estimation and thus, negatively im- pact the performance of SpProdest.

To summarize, many aspects of the works in the relational universe were inspiring for this work. Although the mathematical characteristics of matrices and multiplications require a slightly different perspective, we find it an interesting result that some ideas of relational join optimization can be used in a similar way for linear algebra.

4.7

SUMMARY

We argued in the introduction that integrating linear algebra operations, such as matrix multiplica- tions, is not just adding data structures and algorithms to the database engine. This chapter revealed that the existence of an expression-based language interface incurs additional complexity, especially if linear algebra expressions should be executed in an optimized way. In fact, due to different matrix representations, algorithms, and the presence of data skew, we observed that a naive execution of sparse matrix chain multiplications can be up to orders of magnitude slower than an optimized one.

In this chapter we presented SpMachO, the optimizer component in the logical layer of our Lapeg. SpMachO optimizes sparse, dense and mixed matrix multiplications of arbitrary length by creating an execution plan, which consists of transformation and multiplication operators. By using a detailed cost model of different sparse, dense and mixed matrix multiplication kernels, SpMachO leads to a faster and more robust execution compared to widely used algebra systems. The required estimation of intermediate matrix densities is handled by our prediction approach SpProdest, which is more efficient compared to conventional methods by working on density sketches rather than visiting the complete matrix representation. Moreover, with an entropy-based skew awareness, SpProdest is able to dynamically adjust the granularity of the density map, and en- ables accurate memory consumption and runtime estimates at each stage in the execution plan. The density estimate of SpProdest is further used for a resource-aware, dynamic selection of the out- put matrix representation in the context of our adaptively tiled multiplication operator (ATmult), which is presented in Chapter 5.

Finally and most impressively, we were able to show how optimization methods inspired from database technology can improve the execution of linear algebra expression. In particular, by leav- ing the optimized execution of expressions to the system we reduce the computational complexity for data scientists – who should not be required to have profound knowledge about the connec- tions between mathematical optimization, matrix characteristics, algorithmic complexities and the hardware parameters of their system.

5

ADAPTIVE MATRICES

5.1

MOTIVATION

In Chapter 3, we introduced data structures for sparse and dense matrices in the column-oriented storage layer. In particular, we showed that efficient matrix representations seamlessly integrate with the columnar layout. Nevertheless, so far we assumed that the user is required to predefine the final data structure of a matrix, for example whether it is stored in a sparse or in a dense format. The same selection must be made by users of math software, such as R, Matlab or Blas libraries, where they have to choose among an even wider variety of sparse matrix types. However, the pre- configuration of matrix storage types is disadvantageous, since it can lead to a negative impact on the performance, as we observed for sparse matrix chain multiplications in Chapter 4. However, not only does the distinction between plain dense and sparse matrix operands influence the execution performance, but also the physical organization of each matrix on its own. A matrix is commonly stored as a whole in a static, homogeneous sparse or dense format (e.g., in Scalapack), resulting in poor memory utilization and processing performance if the data representation was not chosen wisely. In fact, there is additional tuning potential that can be leveraged when matrices are consid- ered as heterogeneous objects. Conventional multiplication algorithms are agnostic of density varia- tions within matrices, whereas at the same time there are many efficient routines for either plain sparse or plain dense matrices. Hence, the idea is that the Lapeg splits each matrix into several sparse, dense, and potentially further constituents, in order to decompose a single multiplication operation into multiple optimized sub-multiplications. This motivated us to rethink and redesign data structures, and processing practices for large, sparse data matrices.

In this chapter, we push the envelope further towards a fully dynamic, and adaptive physical organization of matrices, completely transparent to the user. Therefore, we developed an adaptive tile matrix (AT Matrix) data structure, which uses an optimized data layout based on the matrix non-zero topology. Moreover, we present a matrix multiplication operator ATmult that is able to exploit the heterogeneity of the AT Matrix, by using optimization techniques that are inspired from relational query processing.

The core contributions comprised in this chapter are:

1. A hybrid matrix representation AT Matrix consisting of adaptive tiles to store large sparse and dense matrices of any individual non-zero pattern efficiently.

2. A time- and space-efficient matrix multiplication operator ATmult that performs dynamic tile-granular optimizations based on density estimates and a sophisticated cost model. 3. The adoption of several methods based on database technologies for ATmult, such as 2D

indexing, cardinality estimation or just-in-time partial data conversions.

4. An evaluation of AT Matrix and ATmult using a wide range of synthetic and real world matrices from various domains.

Section 5.2 starts with the description of our adaptive tiled matrix data type, followed by a de- scription of the partitioning algorithm that converts a raw matrix into an AT Matrix. In Section 5.3 we present our matrix multiplication operator ATmult, including our tile-granular optimization approach that uses just-in-time data conversions. Finally, the extensive evaluation of ATmult is presented in Section 5.5.