Proof of Theorem 4.3.3 - Proofs of theoretical results

4.5 Proofs of theoretical results

4.5.3 Proof of Theorem 4.3.3

We address the approximation error for BLOCK-ID due to the basis update, the basis thin-ning procedure, and the final basis thinthin-ning separately. The final error is bounded above by the sum of these three contributions via repeated application of the triangle inequality. Throughout this proof, we will make use of several facts and assumptions:

(1) The thinning procedure is called at mostdm/be times.

(2) The column dimension of the coefficient matrix P (and equally the number of basis vectors) prior to the i^th thinning cannot exceed min(ib, ` + b).

(3) The rank during the i^th thinning is reduced from at most ` + b to p.

(4) The row dimension of the coefficient matrix prior to the i^th thinning is ib.

(5) We assume throughout this proof that 2` + 3b≤ m.

Error due to basis update

Let Aib denote the matrix A(1 : ib, :), comprised of the first i blocks of rows of A, each of size b× n; Bi the matrix A((i− 1)b + 1 : ib, :), i.e., the i^thblock of A; ˆAib the rank-k approximation of Aib obtained using BLOCK-ID with maximal basis size `; and Pib the coefficient matrix and Iib the row basis index vector corresponding to ˆAib. Then, the updated approximation to the new

data stream ˆA_(i+1)b= P_(i+1)bA(I(i+1)b, :) satisfies

The basis update procedure will be calleddm/be times during the execution of the algorithm. Taking the following sum yields the contribution to the error due to the basis update when 2` + 3b≤ m:



Let P A(I, :) be the approximation to the matrix prior to the i^th thinning, P Pthin the updated

coefficient matrix following thinning, and A(Ithin, :) the basis following thinning. Then,

Assuming 2`+3b≤ m we take the following sum to compute the overall contribution of the thinning subroutine to the approximation error:

Error due to final thinning procedure

Prior to the final thinning, the matrix P is of row dimension m and column dimension at most `. By the earlier assumption that 2` + 3b ≤ m, we also have 2` + 1 ≤ m. Following the final thinning, the approximation rank is k. Let P A(I, :) be the approximation to the matrix prior to the final thinning, P P_{f inal} the updated coefficient matrix following the final thinning, and A(If inal, :) the basis following the final thinning. Then,

kP A(I, :) − P Pf inalA(If inal, :)k2 ≤ kP k2kA(I, :) − Pf inalA(If inal, :)k2,

≤p

1 + α`(m− `)p

1 + αk(`− k)σk+1.

Combining the contributions from the three subroutines via repeated application of the

tri-angle inequality, we obtain the final bound for the approximation.

Asymptotically, this bound may be interpreted as

kA − ˆAk2 . `^1/2m^3/2b⁻¹ + p^1/2m^3/2σ_p+1+ `m^1/2k^1/2σ_k+1.

This gives us the result provided in Theorem 4.3.3.

4.6 Conclusions

Low-rank approximation and row subset selection are vital to a broad spectrum of applica-tions in computational mathematics and machine learning. As the size of matrices involved in these applications has grown, the importance of reducing RAM usage, memory movement, and building single-pass implementations for these tools has grown in accordance.

To address this, we have presented two novel algorithms for single-pass computation of matrix interpolative decompositions: RBR-ID and BLOCK-ID. Both algorithms can be used to solve the RSS problem, as well as provide a low-rank approximation for the original dataset in a single pass over the input.

We provide detailed complexity and error analysis of the two methods, as well as numerical experiments verifying the theoretical results. We show that the proposed algorithms can be

em-ployed in single-pass low-rank approximation of data taken from the direct numerical simulation of particle-laden turbulent flows. Applications of this include in situ data compression and reduced order modeling for large-scale simulation of complex systems.

Potential extensions of this work include the incorporation of matrix sketching, randomized sampling (as in, e.g., [19]), and parallelism (particularly with respect to the data stream as in [203]) to increase the efficiency of RBR-ID and BLOCK-ID. Exploring the performance of other self-expressive decomposition algorithms would be a natural extension of this work as well. Finally, another interesting avenue would be to extend this single-pass framework to high-order tensor IDs.

Acknowledgements

The authors would like to thank Llu´ıs Jofre for the datasets used in the numerical experiments.

This work was funded by the United States Department of Energy’s National Nuclear Security Administration under the Predictive Science Academic Alliance Program (PSAAP) II at Stanford University, Grant DE-NA-0002373. The work of AD was also supported by the AFOSR grant FA9550-20-1-0138.

Summary of Linear Methods and Nonlinear Extensions

The goal of this thesis is to derive and analyze pass-efficient methods for compression of large-scale data matrices arising in scientific applications. Two broad strategies are employed:

linear subspace and nonlinear manifold learning methods. The linear approaches identify bases for fundamental subspace(s) of the input matrix to obtain a low-rank approximation of the data. The factor matrices comprising the low-rank approximation form a compressed version of the input data.

The nonlinear manifold learning approaches proposed in the appendix to this thesis learn nonlinear embeddings and reconstruction mappings which identify a low-dimensional manifold to approximate the input matrix. This learned manifold often captures a greater fraction of the information in the input matrix in fewer latent dimensions than linear subspace methods. However, manifold learning methods lack the approximation and convergence guarantees of many linear methods. The resulting embeddings and nonlinear reconstruction mappings are also less interpretable than the output of a low-rank approximation.

In the remaining sections of this chapter we briefly summarize the contributions of each of the previous chapters of this thesis and suggest future directions of research. For more detailed conclusions, please see Sections 2.4, 3.10, 4.6, and A.5.

5.1 Linear methods

In Chapters 2-4, we derived linear approximation methods for the low-rank compression of matrices arising in scientific simulation. In Chapter 2, we introduced two methods, one single-pass

and the other two-pass, for computing the matrix interpolative decomposition based on forming a coarse grid sketch of a data matrix taken from a physical simulation. A key limitation of the single-pass method presented in this chapter was its need for an associated grid to form the coarse-to-fine grid interpolation operator. In Chapter 3, we generalized the single-pass approach to accommodate any data matrix.

In Chapter 3, we also introduced significantly more theory regarding coarse grid sketches for low-rank compression. A taxonomy of different approaches was presented, demonstrating the benefits and limitations of using such approaches. Further, a novel single-pass power iteration-based algorithm for matrix approximation was presented. Through numerical experiments, the coarse grid sketches are shown to effective for computing the low-rank SVD and interpolative decomposition of scientific data matrices. Applying our approaches to a large-scale data compression problem, we showed that low-rank matrix methods, used in tandem with state-of-the-art compressors SZ, FPZIP, and ZFP, can lead to significant enhancement of compression at a minimal loss of accuracy.

In Chapter 4, we derived two novels algorithms for computing matrix interpolative decom-positions. Different from the approaches presented in Chapter 2 and 3, this approach involved no sketching, and instead maintained a basis of rows for a data matrix streaming into working memory.

The size and approximation error are actively regulated by the approach based on parameters input by the user. Through theoretical error bounds and numerical experiments, we demonstrated that the algorithms can match an analogous offline interpolative decomposition algorithm in accuracy while being faster in terms of wall-clock runtime, more parsimonious in terms of memory usage and movement, and achieve compression in one pass over the entire data stream.

In document Matrix Methods for Low-Rank Compression in Large-Scale Applications (Page 170-176)