• No results found

Density-Aware Matrix Chain Optimization

4.2 Expression Optimization

4.2.2 Density-Aware Matrix Chain Optimization

By restricting the cost model in Equation (4.5) to dense-only matrices, the applicability of the textbook-approach is limited in practice. In fact, many of the real-world matrices in big data envi- ronments are sparse. A sparse m × n matrix is not only defined by its row and column dimensions, but also by the number Nnzof non-zero elements as well as the non-zero pattern (the distribution

of non-zero elements in the matrix.)

In order to quantify the sparsity or non-zero population of a matrix, we use a measure that relates Nnzto the absolute matrix dimensions: the matrix density:

ρ = Nnz

mn (4.7)

Most of the related work on matrix chain multiplications considers dense-only multiplications (Godbole, 1973; Cormen et al., 2001). However, we identify two major arguments why the sparsity of matrices has a significant influence on the optimization:

Efficient Sparse Multiplication kernels. As a matter of fact, the complexity of multiplying two sparse matrices differs significantly from the naive inner product algorithm for dense matrices. By a swift look at our sparse matrix algorithm (Alg. 10), one quickly concludes that multiplication costs rather depend on the number of non-zero elements in the CSR representation than on the matrix dimensions. Consequently, the cost modelTMin the optimization recurrence (Eq. 4.5) has to be changed in a way that it takes the matrix density into account.

Variation of Intermediate Densities. The dense-only approach is agnostic to the fact that den- sities of intermediate result matrices can vary significantly from the initial matrix densities. For example, the density of the result matrix C = A · B can be much higher, or even less than that of both A and B (see Figure 4.8). Despite the asymptotic complexity, it is in many cases more efficient to use a dense algorithm, even if the matrix contains a significant fraction of zero elements. This can be reasoned with the efficient and well-tuned implementations of dense matrix multiplication kernels in the Blas library, which have low constants in their cost function.

Our idea is to take the individual characteristics of the different matrix representations and mul- tiplication kernels into account. In particular, we exploit the potential performance benefits from changing the physical implementation of the initial matrices or intermediate results. Therefore, we construct an expression execution plan that may contain dense, sparse, and mixed dense/sparse ma- trix multiplications. Furthermore, the execution plan could include conversions from a sparse into a dense representation. To generate the plan, we adopt the idea of dynamic programming and mod- ify it in such a way that it incorporates the physical properties of the matrices: the recurrence (4.5) is extended by adding input and output storage types as independent dimensions, and including additional cost functions for the storage type conversions:

CΠB(ij)So = min i≤k<j Sl,Sr,S1,S2∈S { CΠB(ik)Sl+ CΠB((k+1)j)Sr +TTS1 ( A[i..k],Sll ) +TTS2 ( A[k+1..j],Srr)+TMSo(A[i..k],S11, A[k+1..j],S22 ) } (4.8) The terms in Equation (4.8) are defined as follows:

• Π(ij): execution plan for a matrix (sub)chain multiplication. It contains the multiplication execution order, as well as all storage transformations. ΠBdenotes the optimal plan.

• SX: storage type, which is either dense or sparse. The superscript X labels each of the five matrix “checkpoints” that are considered per execution node, as illustrated in Figure 4.3.

An execution node comprises one multiplication operation, and potentially two preceding

[Al] [Ar] TSS1l TS r S2 [A1] [A2] ×So [Ao]

Figure 4.3: The matrix checkpoints in a multiplication execution node.

transformations of the left and/or the right input matrices. Hence, the checkpoints labeled with X are the following: l: left subplan output-, r: right subplan output-, 1: left input-, 2: right input-, o: execution node output matrix.

AS1 AS2 AS3 ×S ×S AS1 AS2 AS3 ×S ×D AS1 AS2 AS3 ×S TDS ×S AS1 AS2 AS3 ×S TDS ×D AS1 AS2 AS3 ×S TDS ×S AS1 AS2 AS3 ×S TDS ×D AS1 AS2 AS3 ×S ×S AS1 AS2 AS3 ×S ×D

Figure 4.4: Eight of the possible 128 execution plans for the multiplication of three sparse matrices A1 · A2 · A3. × are multiplication operators, and T are storage type transformations of matri-

ces. The super-/subscripts denote the input/output matrix representation of the respective operator: S/Ddenotes the internal sparse/dense representation.

• ρX: Density of intermediate matrix at checkpoint X.

• TTSY

(

A[i..k],SXX

)

: cost function for the conversion of a matrix from type SX into type

SY. The costs are zero if the storage type SX is equal to SY.

The cost functionsTMin Equation (4.8) not only depend on the densities, but also on the stor- age types of the input and output matrices. Since single-pass storage type conversions are usually less costly than multiplications, it could be beneficial to convert a matrix from one into the other representation prior to the multiplication. Hence, besides the parenthesization split point k, we vary the input and output storage types for each step in the recurrence. For example, if the initial matrices are in a sparse representation, and the dense multiplication kernel plus the conversion has a far lower cost than the sparse multiplication, then they are first converted into the dense repre- sentation. The value of the conversion costTT(·) equals zero for identity transformations. Thus, if a matrix is already in the optimal representation, no transformation is performed.

Some parameters that contribute to the cost functionsTM/TT(·) are not known prior to the execution and have to be estimated, for instance the density ρ of intermediate results. Therefore, we developed the sparse matrix product density estimator (SpProdest), which is described in Sec- tion 4.4. The final costs derived from the recurrence in Equation (4.8) are minimal, given that the estimated costs encoded inTMandTTare determined precisely. In particular, the optimality or goodness of our sparse matrix chain optimizer (SpMachO) depends on two aspects that poten- tially contain uncertainties: first, the accuracy of the quantitative cost model of the multiplication kernels, and second, the precision of the density estimates provided by SpProdest.

The total number of the possible execution plans using our model (Eq. 4.8) for a matrix chain multiplication of length p is

Cp−1· 23(p−1), (4.9)

where Cp−1denotes the Catalan number Cn= (n+1)!n!(2n)! .

Cp−1 reflects the number of possible parenthesizations, which is the same as for the textbook

case (Cormen et al., 2001). The second factor is related to the 23storage type combinations {left in-

put type, right input type, output type} that are connected with each of the p−1 type multiplication nodes. The number in (4.9) equals the size of the plan search space, which is growing exponentially with the length p. To give an example, it yields 2560 for a matrix chain of length p = 4, and al- ready 1,376,256 for p = 6. Some of the possible plans for p = 3 are shown in Figure 4.4. Like in the dense-only case, we solve the dynamic programming recurrence (4.8) in O(p3)time using a

bottom-up approach.