Introduction - Matrix Methods for Low-Rank Compression in Large-Scale Applications

Data compression is an increasingly vital tool in dealing with the over-abundance of data in modern computing. As an example, simulations today can easily generate 1TB/s of data [207], which typically cannot be stored in RAM or even disk memory. Due to the exorbitant size of such data, significant compression is necessary in order to enable, e.g., visualization, inference, and uncertainty quantification (UQ). Techniques for data compression can be categorized as lossless or lossy. Lossless compression methods provide an exact reconstruction of the original data, but this frequently comes at the cost of small compression ratios. Lossy data compression methods, on the other hand, generate significantly compressed formats of data with the trade-off of error in reconstruction. This has made them ideal for addressing the challenge of compressing ever larger datasets generated in modern experimentation and simulation.

Examples of modern lossy compression algorithms include SZ [64, 227, 156], FPZIP [161], ZFP [160], and ISABELA [144]. Transform based approaches, such as the discrete cosine transform (DCT) [262], wavelet-based compression approaches [51, 85, 226], and the Karhunen-Lo´eve trans-form [165, 231], have also been applied successfully to problems in data compression. Low-rank

approximation, and more broadly matrix decompositions, have been used for data compression in works such as [44, 9, 238, 75]. Generalizing approaches for two dimensional data arrays (i.e., matrices), Tensor decomposition algorithms [114, 240, 140, 60, 245, 8] identify low-dimensional representations of higher order data representations, e.g., videos. For a comprehensive review on lossy data compression, as well as other categories of data compression, we refer the reader to the review paper [153].

Low-rank matrix approximation methods lay the foundation for the method proposed in this work. These approaches exploit the fundamental observation that a large fraction of variance in many data matrices can be captured on a low-dimensional linear subspace of the range of the matrix.

Linear dimensionality reduction methods for data matrices include, but are not limited to, singular value decomposition (SVD) [77], known as principal component analysis (PCA) in the machine learning and statistics literature or proper orthogonal decomposition (POD) in the computational fluid dynamics (CFD) literature; self-expressive decompositions such as the interpolative decompo-sition (ID) [44, 176] and CUR decompodecompo-sition [73, 169, 26]; and sparsifying decompodecompo-sitions [76].

Whether a linear dimensionality reduction approach identifies a low-rank approximation or sparse approximation, it is strictly limited to identifying a linear subspace when compressing data.

Low-rank methods such as SVD are therefore particularly useful in compressing simulation data when the system of concern features a fast-decaying Kolmogorov n-width, e.g., diffusion-dominated problems [146]. In such scenarios, an SVD (or similar low-rank approximation) will often capture a significant fraction of the variance in the data in many fewer dimensions than that which the original, high-dimensional feature vectors occupy. This, combined with the interpretability of the low-dimensional linear subspace in the context of the original problem, has made linear low-dimensionality reduction methods broadly accepted tools within the scientific community.

Despite their many advantages, linear dimensionality reduction methods are not without lim-itations. In many applications, the data may be more optimally captured on a low-dimensional manifold, a locally linear but globally nonlinear space. Recent advances in nonlinear dimensionality reduction, specifically those brought on by research in deep learning and the development of

soft-ware such as Tensorflow [1] and PyTorch [199], have enabled the identification of low-dimensional manifolds characterizing ever larger-scale datasets. Advection-dominated systems (such as those found in turbulence studies), or systems featuring shocks (such as those found in hypersonics), lend themselves to slow-decaying Kolmogorov n-widths [146]. In such scenarios, nonlinear dimension-ality reduction methods have the potential to significantly outperform linear methods. However, the low-dimensional latent space learned by such approaches, i.e., the generalized coordinates of the low-dimensional nonlinear manifold, are often far less interpretable in the context of the origi-nal problem. This has made the community hesitant to replace the existing state-of-the-art linear subspace-based approaches with nonlinear manifold learning methods, despite the clear advantages of using nonlinear methods over linear approaches in certain applications.

Because the focus of this chapter is data compression, it is vital that we not only identify nonlinear embeddings which reduce the dimension of the dataset, but also construct preimage map-pings which reconstruct approximations to the original data. To the best of our knowledge, methods which provide such preimage mappings are kernel PCA (KPCA) [216], self-organizing maps [138], generative topographic mapping [22], diffeomorphic dimensionality reduction [247], Gaussian pro-cess latent variable models (GPLVMs) [145], and autoencoders (AEs) [113]. It is also paramount that we provide scalable, numerically stable, and generalizable methods for constructing preimage mappings from nonlinear, low-dimensional feature spaces identified by the nonlinear dimension-ality reduction methods. Because they satisfy all desired properties, AEs were selected from the aforementioned methods for implementation in this work.

An AE is an auto-associative neural network which generalizes PCA and SVD via the in-troduction of, e.g., nonlinear activation functions [139, 61, 113]. It is comprised of two main components: an encoder, which (usually) reduces the dimension of the input by mapping it into a learned latent space, and a decoder, which maps the reduced dimensional latent data back to the original feature space to form an approximation to the input. A key benefit of the encoder-decoder structure of the AE is that data compression reconstruction is a built-in feature. The encoder re-duces the dimensionality of the data, while the corresponding decoder attempts to reconstruct the

original data from the low-dimensional nonlinear feature space. Because AEs introduce nonlinear-ity via activation functions, they often outperform linear approaches like PCA and SVD in matrix dimensionality reduction when the singular value decay of the matrix is slow due to nonlinearity in the system. Moreover, a single-layer unbiased AE with linear activations will identify the same subspace spanned by the modes identified by the optimal linear embedding given by PCA. In this sense, AEs can be thought of as a generalization of PCA.

AEs have been used successfully in lossy image compression, shown to compete with existing state-of-the-art approaches such as JPEG-2000 [230]. Other examples of AE-based image compres-sion include the works [45, 268, 13, 14, 127, 43]. The authors of [47] present a variational AE-based compression scheme for data extracted from simulations of plasma physics. Nonlinear PCA, which is quite similar to AEs, has also been used to compress seismic data [209]. In [184], the authors present a physics-based AE for the enhancing the low-rank approximation (and therefore low-rank compression) of convection-dominated PDEs. AEs have been used successfully in other reduced order modeling (ROM) works, including but not limited to [133, 97, 228, 112, 135, 136, 146, 123, 179, 162, 260, 88, 57, 259, 90]. ROM, though a different application than data compression, bears many structural similarities which suggest that if AEs are successful in ROM, they will be of use in compression as well. In the following subsection, we highlight AE-based data compression schemes tailored to scientific data. These methods have all been published with the past few years and constitute the state-of-the-art of scientific data compression based on deep learning; they also bear the greatest resemblance to the efforts of this work.

A.1.1 Contribution of this work

This work presents an entirely single-pass data compression algorithm which trains a fully-connected AE on a sketch of simulation data, comprised of a nonlinear embedding (encoder) and nonlinear reconstruction mapping (decoder). By introducing the sketching procedure, we improve scalability of fully-connected AEs and enable single-pass implementation of them. Recent works have presented methods using convolutional AEs for in situ spatial compression of large-scale

In document Matrix Methods for Low-Rank Compression in Large-Scale Applications (Page 197-200)