TDB − NLA
Numerical Linear Algebra
Self reading: Introduction, dense matrices
Maya Neytcheva, TDB, Feb-March 2021
When talking about the solution of a linear system of equations:
- computational complexity - computer demands (computing time and memory consumption)
- robustness wrt to (problem, discretization and method) parameters
- numerical efficiency(later, for iterative methods - number of iterations)
- parallelization aspects, HPC flavour
Before discussing sparse matrices...
we are going to look first at dense matrices... because these are easier.
Large dense matrices
GOAL: get a global overview of issues related to
direct solution methods:
◮ Gauss elimination
◮ LU factorization, Cholesky factorization ◮ stability, pivoting, errors;
◮ complexity
Large dense matrices
An idea what matrix dimensions might have been considered very large for a dense, direct matrix computation through the years:
n Year Source
20 1950 Wilkinson 200 1965 Forsythe&Moler 2000 1980 LINPACK 20000 1995 LAPACK >200 000 some years ago (Umeå) J. Wilkinson, The algebraic eigenvalue problem, 1965
G. Forsythe& C. Moler, Computer solutions of linear algebraic systems, 1967.
Computational complexity issues: Cramer’s rule
Ax = b, A(n × n), det(A) 6= 0
a11 · · · a1,i−1 a1,i a1,i+1 · · · a1,n
· · · · ai1 · · · ai,i−1 ai,i ai,i+1 · · · ai,n
· · · · an1 · · · an,i−1 an,i an,i+1 · · · an,n
x1 · · · xi · · · xn = b1 · · · bi · · · bn
Computational complexity issues: Cramer’s rule
xi = det(A)1 a11 · · · a1,i−1 b1 a1,i+1 · · · a1,n · · · · ai1 · · · ai,i−1 bi ai,i+1 · · · ai,n
· · · · an1 · · · an,i−1 bn an,i+1 · · · an,n
Computational complexity issues
Consider products of n elements of A,
a1,α1,a2,α2,· · · , an,αn,
where α1, α2,· · · , αn is a permutation of 1, 2, · · · , n.
The number of all these products is n! . det(A) = X i=1 n! n Y j=1 (−1)γaj,αj,
thus, the computational complexity to solve the system is n!. To be more precise: (n + 1)(n!) = (n + 1)! multiplications and (n + 1)(n!) = (n + 1)! additions.
Gauss elimination/LU factorization: A(m, n) for k = 1, 2 · · · m − 1 d = 1/a(k)kk for i = k + 1, · · · m ℓ(ikk)=−a(ikk)d for j = k + 1, · · · n a(k)ij =a(k)ij + ℓika(k)kj end end end
The operational count for the LU factorization can be obtained by integrating the loops:
FlopsLU= m−1Z 1 m Z k n Z k djdidk≈ n3/3 (m = n)
Computational complexity issues: computing det(A)
Method Multiplications Additions
Gaussian Elimination 13n3+n2− 1
3n 13n3+ 12n2− 56n
Gauss-Jordan Elimination 13n3+n2− 52n + 2 13n3− 32n2+1
Cramer’s Rule n! n!
Computational complexity issues: Cramer against Gauss
A comparison of the amount of time to solve Ax = b on a Cray J90. The Cray J90 performs one trillion operations per second (one teraflop).
n Gaussian Elimination Cramer’s Rule 2 6 x 10−12secs 6 x 10−12secs 3 1.7 x 10−11secs 2.4 x 10−11secs 4 3.6 x 10−11secs 1.2 x 10−10secs 5 6.5 x 10−11secs 7.2 x 10−10secs 6 1.06 x 10−10secs 5.04 x 10−09secs 10 4.3 x 10−10secs 3.99168 x 10−05secs 20 3.06 x 10−9secs 1.622 years 100 3.433 x 10−7secs 2.9889 x 10138centuries 1000 3.3433 x 10−4secs
Computational complexity issues: hardware development
Top500, November 2021, no 1:
Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Fujitsu
RIKEN Center for Computational Science
- performance of 537.212 petaflops on High Performance Linpack, - Tofu interconnect D
- 7,630,848 cores
Factorials...
In 2001, the value of 1000! was currently too large to be stored as a single number in the memory of a computer.
(Computational Science: Tools for a Changing World by R.A. Tapia, C. Lanius, 2001, Rice.)
The scientific calculator in Windows XP is able to calculate factorials up to at least 100000!.
(look-up tables)
Dense LU remains an active field of research:
"Dense Matrix Factorization of Linear Complexity
for Impedance Extraction of Large-Scale 3-D Integrated Circuits" Wenwen Chai, Dan Jiao School of Electrical and Computer Engineering, Purdue University, IEEE Xplore, July 2010
Abstract: A fast LU factorization of linear complexity is developed to directly solve a dense system of linear equations for the interconnect extraction of any arbitrary shaped 3-D structure embedded in
inhomogeneous materials. The proposed solver successfully factorizes dense matrices that involve more than one million unknowns in fast CPU run time and modest memory consumption. Comparisons with
state-of-the-art integral equation- based interconnect extraction tools have demonstrated its clear advantages.
Dense LU remains an active field of research:
Programming parallel dense matrix factorizations with look-ahead and OpenMP
Sandra Catalán, Adrián Castelló, Francisco D. Igual, Rafael Rodríguez-Sánchez Enrique S. Quintana-Ortí
Cluster Computing, 23, 359–375(2020)
We investigate a parallelization strategy for dense matrix
factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution ...
Stability of Gauss elimination
Factorizing symmetric positive definite matrices
Factorize A = LLT, L – lower-triangular
Cholesky factorization
Factorizing symmetric marices
The mathematician after whom the Cholesky factorisation is named.
Major Andre-Louis Cholesky (1875-1918)
Born in France, worked in the Geodesic section of the Geographic service to the French army’s artillery branch.
At this time the system of triangulation used in France, and based on the meridian line of Paris, was being revised; new methods were needed in order to facilitate what was not yet a quick or convenient process.
Cholesky invented computation procedures based on the method of least squares, for the solution of certain data-fitting problems in geodesy, to be put into practice in his triangulation of the French and British parts of Crete, and in his work in Algeria and Tunisia. His mathematical work was posthumously published on his behalf in 1924 by a fellow officer, Benoit.
Cholesky factorization ...
% Maya’s version of Cholesky - to compare execution time
% ---function [U]=my_chol(A) A = triu(A); n = size(A,1); for k=1:n, A(1:k-1,k) = A(1:k-1,1:k-1)’\A(1:k-1,k); A(k,k) = sqrt(A(k,k) - A(1:k-1,k)’*A(1:k-1,k)); end
U = triu(A); return
Cholesky factorization ...
size(A) chol chol Ratio Matlab mine 10 0.000292 0.004360 14.9315 50 0.000183 0.6267 0.002697 0.6186 14.7377 100 0.000327 1.7869 0.002305 0.8547 7.0489 500 0.002132 6.5199 0.264100 114.5770 123.8724 1000 0.008465 3.9705 0.970080 3.6732 114.5987 5000 0.583840 68.9711 161.698800 166.6860 276.9573
Outer Product Cholesky
for k = 1 : n
A(k, k) =pA(k, k)
A(k + 1 : n, k) = A(k + 1 : n, k) − A(n : k, k − 1)/A(k, k) for j = k + 1 : n
A(j : n, j) = A(j : n, j) − A(j : n, j)A(j, k) end
end
Example of implementing Cholesky factorization
for k=1:n
xeuitb(A(1:k-1,k),A(1:k-1,1:k-1),A(1:k-1,k)) A(k,k) = sqrt(A(k,k) - A(1:k-1,k)^T*A(1:k-1,k)) end
Computes U (which overwrites A).