Tensor Least Angle Regression for Sparse Representations of Multidimensional Signals

(1)

Tensor Least Angle Regression for Sparse Representations

of Multidimensional Signals

Ishan Wickramasingha [email protected] Ahmed Elrewainy [email protected]

Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, R3T 5V6, Canada

Michael Sobhy [email protected]

Biomedical Engineering Program, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada

Sherif S. Sherif [email protected]

Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, MB, R3T 5V6, Canada

Biomedical Engineering Program, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada

Sparse signal representations have gained much interest recently in both signal processing and statistical communities. Compared to orthogonal matching pursuit (OMP) and basis pursuit, which solve the L0 andL1 constrained sparse least-squares problems, respectively, least angle re-gression (LARS) is a computationally efficient method to solve both prob-lems for all critical values of the regularization parameterλ. However, all of these methods are not suitable for solving large multidimensional sparse least-squares problems, as they would require extensive compu-tational power and memory. An earlier generalization of OMP, known as Kronecker-OMP, was developed to solve theL0 problem for large mul-tidimensional sparse least-squares problems. However, its memory us-age and computation time increase quickly with the number of problem dimensions and iterations. In this letter, we develop a generalization of LARS, tensor least angle regression (T-LARS) that could efficiently solve either largeL0 or large L1 constrained multidimensional, sparse, least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameterλ and with lower computational complexity and memory usage than Kronecker-OMP. To demonstrate the validity and performance of our T-LARS algorithm, we used it to

(2)

successfully obtain different sparse representations of two relatively large 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving bothL0andL1constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS al-gorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining K-sparse solutions for multilinear leastsquares problems.

However, theK-sparse solutions obtained using Kronecker-OMP always

have a slightly lower residual error (1.55% to 2.25%) than ones obtained by T-LARS. Therefore, T-LARS could be an important tool for numerous multidimensional biomedical signal processing applications.

1 Introduction

Sparse signal representations have gained much interest recently in both signal processing and statistics communities. A sparse signal representa-tion usually results in simpler and faster processing, in addirepresenta-tion to lower memory storage requirement for fewer coefficients (Mallat, 2009; Wickra-masingha, Sobhy, & Sherif, 2017). However, finding optimal sparse repre-sentations for different signals is not a trivial task (Mallat, 2009). Therefore, redundant signal representations using overcomplete dictionaries have been introduced to facilitate finding more sparse representations for dif-ferent signals (Mallat & Zhang, 1993; Mallat, 2009; Pati, Rezaiifar, & Krish-naprasad, 1993; Wickramasingha et al., 2017). Undercomplete dictionaries could also be used to obtain approximate signal representations (Kreutz-Delgado et al., 2003; Malioutov, Çetin, & Willsky, 2005; Tosic & Frossard, 2011). We note that at their core such signal representation problems typ-ically involve solving a least-squares problem (Donoho, 2006; Donoho & Elad, 2003).

A number of methods have been proposed to solve the sparse least-squares problem, including the Method of Frames (MOF; Daubechies, 1990), Matching Pursuit (MP; Mallat & Zhang, 1993), Orthogonal Match-ing Pursuit (OMP; Pati et al., 1993), Best Orthogonal Basis (BOB; Coif-man & Wickerhauser, 1992), Lasso (also known as Basis Pursuit, BP); Chen, Donoho, & Saunders, 1998; Efron et al., 2004), and Least Angle Regression (LARS; Efron et al., 2004). MP and OMP obtain sparse signal representa-tions by solving a nonconvex L0constrained least-squares problem (Tropp, 2004). MPs are heuristic methods that construct sparse signal representa-tions by sequentially adding atoms from a given dictionary in a greedy (i.e., nonglobally optimal) manner. BP relaxes the nonconvex L0constrained op-timization problem to solve a convex L1constrained least-squares problem instead (Chen et al., 1998). In both problem formulations, a regularization parameterλ determines the trade-off between the representation error of the signal and its sparsity, as shown in the Pareto curve in van den Berg and Friedlander (2009). A common approach to obtaining a sparse signal repre-sentation using BP is to solve the optimization problem multiple times for

(3)

different valuesλ, before choosing the most suitable solution for the appli-cation at hand (van den Berg & Friedlander, 2009).

Compared to the above methods, LAR efficiently solves the L0or, with a slight modification the L1constrained least-squares problem for all critical values of the regularization parameterλ (Efron et al., 2004). However, even LARS is not suitable for large-scale problems as it would require multiplica-tion and inversion of very large matrices (Yang, Peng, Xu, & Dai, 2009). For example, for m unknown variables and n equations, the LARS algorithm has O(m3_{+ nm}2_{) computational complexity (Efron et al., 2004).}

LARS and other algorithms to solve sparse least-squares problems are directly applicable to one-dimensional signals. Therefore, multidimen-sional signals that are represented by tensors (i.e., multidimenmultidimen-sional arrays) would need to be vectorized first to enable the application of these meth-ods (Acar, Dunlavy, & Kolda, 2011; Cichocki et al., 2015; Kolda, 2006). For

IN_{number of vectorized variables, a dictionary of size I}N_{× J}N_{, where J}_{> I} for an overcomplete dictionary, is required to solve the sparse linear least-squares problem, using LARS and other algorithms mentioned above. N is the order of the tensor, also known as the number of modes, and I is the dimension of each mode. Therefore, the number of vectorized unknown variables IN_{would increase exponentially with the order of the tensor N.} Such problems would quickly become increasingly large and computation-ally intractable. For example, a 3D tensor with 100 unknown variables in each mode has 1 million unknowns, which requires a dictionary with at least 1 trillion (1012) elements; a 4D tensor with 100 unknown variables in each mode has 10 million unknowns, which requires a dictionary of at least 10 quadrillion (1016) elements.

Mathematically separable signal representations (i.e., those using sepa-rable dictionaries) have been typically used for multidimensional signals because they are simpler and easier to obtain than nonseparable represen-tations (Caiafa & Cichocki, 2012; Sulam, Ophir, Zibulevsky, & Elad, 2016). Caiafa and Cichocki (2012) introduced Kronecker-OMP a generalization of OMP that could represent multidimensional signals represented by tensors using separable dictionaries. They also developed the N-BOMP algorithm to exploit block-sparse structures in multidimensional signals. However, similar to OMP, Kronecker-OMP could obtain only an approximate non-globally optimal solution of the nonconvex L0 constrained sparse, least-squares problem (Elrewainy & Sherif, 2019; van den Berg & Friedlander, 2009). Also, there is currently no computationally efficient method to ob-tain a sparse representation of a multidimensional signal by solving the convex L1constrained sparse, least-squares problem for all critical values of the regularization parameterλ. However, two of us, Elrewainy and Sherif have developed the Kronecker least angle regression (K-LARS) algorithm to efficiently solve either large L0or large L1sparse, least-squares problems (overdetermined) with a particular Kronecker form A⊗ I, for all critical val-ues of the regularization parameterλ (Elrewainy & Sherif, 2019). They used

(4)

K-LARS to sparsely fit one-dimensional multichannel, hyperspectral, spec-tral imaging data to a Kronecker model A⊗ I.

In this letter, we develop a generalization of K-LARS, tensor least angle regression (T-LARS) which could efficiently solve either large L0 or large

L1 multidimensional sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regularization parameterλ. The balance of the letter is organized as follows: Section 2 provides a brief intro-duction to tensors, tensor operations, and multilinear sparse least-squares problems. In section 3, we review LARS and describe our T-LARS algo-rithm in detail. Section 4 presents the computational complexity of our T-LARS algorithm and compares its computational complexity with that of Kronecker-OMP. Section 5 provides results of experiments applying both T-LARS and Kronecker-OMP. We present our conclusions in section 6. 2 Problem Formulation

2.1 Tensors and Multilinear Transformations. The term tensor has a specific mathematical definition in physics, but it has been widely accepted in many disciplines (e.g., signal processing and statistics) to mean a multi-dimensional array. Therefore, a vector is a first-order tensor, and a matrix is a second-order tensor. An N-dimensional array is an Nth-order tensor whose N dimensions are also known as modes (Hawe, Seibert, & Kleinsteu-ber, 2013; Sidiropoulos et al., 2017). The Nth-order tensorX ∈ RI1×...×In×...×IN

has N modes, with dimensions I1, I2, . . . , INwhere vectors along a specific

mode, n, are called mode-n fibers.

Vectorization and mode-n matricization of tensors (Hawe et al., 2013; Sidiropoulos et al., 2017) are two important tensor reshaping operations. As the names imply vectorization of a tensor generates a vector and the matri-cization of a tensor generates a matrix. Tensors are vectorized by stacking mode-1 fibers in reverse lexicographical order, where this vectorization is denoted vec(X ):

X ∈ RI1×...×In×...×IN −→ vec(X ) ∈ RININ−1...I1.

In mode-n tensor matricization, mode-n fibers become the columns of the resulting matrix. We note that such ordering of these columns is not consis-tent across the literature (Kolda & Bader, 2009; Kolda, 2006). In this letter, we use reverse lexicographical order (IN. . . In+1In−1. . . I1) for the column order-ing in mode-n tensor matricization. In such reverse lexicographical order,

I1varies the fastest and INvaries the slowest. Let X(n)denote mode-n ma-tricization of a tensorX :

X ∈ RI1×...×In×...×IN −→ X

(5)

Another important operation that could be performed on a tensor is its mode-n product, that is, its multiplication by a matrix along one of its modes. Let the tensorX ∈ RI1×...×In×...×IN _{and a matrix} (n)∈ RJn×In_{; then}

their mode-n product,Yn∈ RI1×...×Jn×...×IN, is denoted as

Yn= X ×n(n), (2.1)

where all mode-n fibers of the tensor are multiplied by the matrix (n). Equation 2.1 could also be written as Y(n)= (n)X(n), where Y(n)and X(n)are mode-n matricizations of tensorsX and Y, respectively. A mode-n product could be thought of as a linear transformation of the mode-n fibers of a ten-sor. Therefore, a multilinear transformation of a tensorX could be defined as

Y = X ×1(1)×2(2)×3· · · ×N(N), (2.2)

where, (n)_{; n ∈ {1, 2, . . . , N} are matrices with dimensions}(n)_∈ RJn×In; n ∈ {1, 2, . . . , N} and Y ∈ RJ1×...×Jn×...×JN_{(Hawe et al., 2013;} Sidiropou-los et al., 2017). This multilinear transformation could also be writ-ten as a product of vec(X ) and the Kronecker product of matrices (n)_{; n ∈ {1, 2, . . . , N} (Hawe et al., 2013; Sidiropoulos et al., 2017):}

vec(Y) =

(N)_⊗(N−1)_{⊗ · · · ⊗}(1)_vec(_{X ).} _(2.3) We note that equation 2.3 is a linear system relating to x= vec(X ) and y = vec(Y). If matrix represents a separable dictionary, that is,

= ((N)_⊗(N−1)_{⊗ · · · ⊗}(2)_⊗(1)₎_. _(2.4)

then equation 2.3 describes a representation of y= vec(Y) using a

dictio-nary, where x = vec(X ) represents its coefficients (y = x). Similarly, we

could think of equation 2.2 as a representation of tensorY using dictionaries (n)_{; n ∈ {1, 2, . . . , N}, where tensor X represents its coefficients.}

2.2 Sparse Multilinear Least-Squares Problem. A sparse multilinear representation of equation 2.3 could be obtained by rewriting it as an Lp minimization problem (Wickramasingha et al., 2017)

˜x= arg min x (N)_{⊗ · · · ⊗}(n)_{⊗ · · · ⊗}(1)_vec(_{X ) − vec(Y)}2 2 + λvec(X )_p, (2.5)

(6)

Alternatively using equations 2.2 and 2.3, equation 2.5 could be written as ˜ X = arg min X X ×1(1)×2(2)×3· · · ×N(N)− Y 2 2+ λ X p, (2.6)

whereY ∈ RJ1×...×Jn×...×JN_andX ∈ RI1×...×In×...×IN_.

For the same values of p, the Lpminimization problems in equations 2.5 and 2.6 are equivalent, even though their formulations are in vector and tensor forms, respectively. We note that for 0≤ p < 1, equations 2.5 and 2.6 are nonconvex optimization problems, while for p≥ 1, they are convex op-timization problems. The sparsest solution of equations 2.5 and 2.6 would be obtained when p= 0, that is, L0 constrained problem, but its sparsity would be reduced as p increases.

The L0minimization problem (p= 0) is a nonconvex optimization prob-lem whose vector formulation, probprob-lem 2.5, could be solved approximately (i.e., nonglobally) using OMP or LARS (van den Berg & Friedlander, 2009). The L1 minimization problem (p= 1) is a convex optimization problem, whose vector formulation, problem 2.5, could be exactly (i.e., globally) solved using BP or LARS (Donoho & Tsaig, 2008). However, OMP, BP, and LARS share a serious drawback: they are not suitable for solving very large sparse least-squares problems because they involve multiplication and in-verting very large matrices.

As a way of overcoming this drawback, Caiafa and Cichocki (2012) proposed Kronecker-OMP a tensor-based generalization of OMP for solv-ing sparse multilinear least-squares problems. However, similar to OMP, Kronecker-OMP could obtain only a nonglobally optimal solution of the nonconvex L0constrained, sparse, least-squares problem (Donoho & Tsaig, 2008).

3 Tensor Least Angle Regression (T-LARS)

Least angle regression (LARS) is a computationally efficient method to solve an L0or L1constrained minimization problem in vector form, problem 2.5 for all critical values of the regularization parameterλ (Efron et al., 2004). In this letter, we develop a generalization of LARS, tensor least angle re-gression (T-LARS), to solve large sparse tensor least-squares problems to for example, obtain sparse representations of multidimensional signals us-ing a separable dictionary as described by equation 2.3. As shown below, our T-LARS calculations are performed without explicitly generating or in-verting large matrices, thereby keeping its computational complexity and memory requirement relatively low. Both T-LARS and Kronecker-OMP al-gorithms use the Schur complement inversion formula for inverting large matrices without explicitly inverting them (Caiafa & Cichocki, 2012).

(7)

3.1 Least Angle Regression. LARS solves the L0or L1constrained min-imization problem in equation 2.5 for all critical values of the regularization parameterλ. It starts with a very large value of λ that results in an empty ac-tive column’s matrix,Iand a solution ˜xt=0= 0. The set I denotes an active set of the dictionary that is column indices where the optimal solution ˜xt at iteration t is nonzero, and Ic denotes its corresponding inactive set. Therefore,Icontains only the active columns of the dictionary and Ic

contains only its inactive columns.

At each iteration t, a new column is either added (L0) to the active set I or added or removed (L1) from the active set I, andλ is reduced by a calcu-lated valueδt∗. As a result of such iterations new solutions with an increased number of coefficients that follow a piecewise linear path are obtained until a predetermined residual errorε is obtained. One important characteristic of LARS is that the current solution at each iteration is the optimum sparse solution for the selected active columns.

Initialization of LARS includes setting the active set to an empty set,

I= { }, the initial solution vector ˜x0= 0, initial residual vector r0= y, and initial regularization coefficientλ1= max(c1) where, c1= Tr0. The opti-mal solution ˜xtat any iteration t must satisfy the following two optimality conditions,

T

Icrt_∞≤ λt (3.1)

T

Irt= −λtzt, (3.2)

where, rtis the residual vector at iteration t, rt= y − ˜xt, and ztis the sign sequence of cton the active set I.

The condition in equation 3.2 ensures that the magnitude of the corre-lation between all active columns and the residual is equal to|λt| at each iteration, and the condition in equation 3.1 ensures that the magnitude of the correlation between the inactive columns and the residual is less than or equal to|λt|.

For an L1constrained minimization problem, at each iteration, if an in-active column violates condition 3.1, it is added to the in-active set, and if an active column violates condition 3.2, it is removed from the active set. For an L0constrained minimization problem only the columns that violate con-dition 3.1 are added to the active set at each iteration.

For a given active set I, the optimal solution ˜xtcould be written as

˜xt = T ItIt −1 T Ity− λtzt , on I 0, Otherwise (3.3)

(8)

where, ztis the sign sequence of cton the active set I and ct= Trt−1is the correlation vector of all columns of the dictionary with the residual vector rt−1at iteration t. (See algorithm 1 for the LARS algorithm.)

3.2 Tensor Least Angle Regression Algorithm. T-LARS is a generaliza-tion of LARS to solve the sparse multilinear least-squares problem in equa-tion 2.6 using tensors and multilinear algebra. Unlike LARS, T-LARS does not calculate large matrices such as the Kronecker dictionary, in equation 2.4, which is required in vectorized sparse multilinear least-squares prob-lems. Instead, T-LARS uses much smaller mode-n dictionaries for calcu-lations. A mapping between column indices of dictionary and column indices of mode-n dictionaries(n); n ∈ {1, . . . , N} is essential in T-LARS calculations (see appendix A).

Required inputs to the T-LARS algorithm are the tensor Y ∈ RJ1×...×Jn×...×JN_{, mode-n dictionary matrices} (n); n ∈ {1, . . . , N}, and the

stopping criterion as a residual toleranceε or the maximum number of nonzero coefficients K (K-sparse representation). The output is the solution tensorX ∈ RI1×...×In×...×IN_.

First, normalize the tensorY and columns of each dictionary (n); n ∈ {1, . . . , N} to have a unit L2norm. Note that normalizing columns of each dictionary(n)_{; n ∈ {1, . . . , N} ensures normalization of the separable} dic-tionary in equation 2.4 (see appendix B). For notational simplicity in the following sections, we will useY to represent the normalized tensor and (n)_{to represent normalized dictionary matrices.}

Gram matrices are used in several steps of T-LARS. For a large sep-arable dictionary, , its Gram matrix G = T_{would be large as well.} Therefore, explicitly building this Gram matrix and using it in computa-tions could be very inefficient for large problems. Instead, T-LARS uses Gram matrices of mode-n dictionary matrices,(1), (2), . . . , (N), defined as G(1), G(2), . . . , G(N). We can obtain a Gram matrix G(n); n ∈ {1, . . . , N} for each mode-n dictionary(n); n ∈ {1, . . . , N} by

G(n)= (n)T(n) (3.4)

The tensorC1is the correlation between the tensorY and the mode-n dic-tionary matrices(n)_{; n ∈ {1, . . . , N}:} C1= Y ×1(1) T ×2. . . ×n(n) T ×n+1. . . ×N(N) T . (3.5)

The tensorC1could be calculated efficiently as N mode-n products and the initial correlation vector is obtained by vectorizingC1, where c1= vec (C1) (see appendix C).

T-LARS requires several parameters to be initialized before starting the iterations. The regularization parameterλ1is initialized to the maximum

(9)

(10)

value of the correlation vector c1, and the corresponding most correlated columnφI1from the separable dictionary is added to the initial active set I. The initial residual tensorR0is set toY and the initial solution vector x0 and the initial direction vector d0are set to 0. Initial step sizeδ0∗is also set to 0. T-LARS starts the iterations at t= 1 to run until a stopping criterion is reached:

• Initial residual tensor: R0= Y • Initial solution vector: ˜x0= 0 • Initial direction vector: d0= 0 • Initial step size: δ∗

0 = 0

• Initial regularization parameter: λ1= max(c1) • Active set: I =φI1

• Start iterations at t = 1

The following calculations are performed at every iteration t= 1, 2, . . . of the T-LARS algorithm until the stopping criterion is reached.

3.2.1 Obtain the Inverse of the Gram Matrix of the Active Columns of the Dic-tionary. We obtain the Gram matrix of the active columns of the dictionary

Gt= TItIt at each iteration t. The size of this Gram matrix would either

increase (dictionary column addition) or decrease (dictionary column re-moval) with each iteration t. Therefore, for computational efficiency, we use the Schur complement inversion formula to calculate G−1t from G−1t−1 thereby avoiding its full calculation (Caiafa & Cichocki, 2012; Goldfarb, 1972; Rosário, Monteiro, & Rodrigues, 2016).

Updating the Gram matrix after adding a new column kato the active set. Let the column ka∈ I be the new column added to the active matrix. Given G−1t−1, the inverse of the Gram matrix G−1t could be calculated using the Schur complement inversion formula for a symmetric block matrix (Björck, 2015; Boyd & Vandenberghe, 2010; Hager, 1989),

G−1_t = F−1₁₁ αb αbT _α , (3.6) where F−1₁₁ = G−1_t₋₁+ αbbT, b= −G_t−1₋₁ga, andα = 1/ g(ka,ka)+ g T ab and the column vector gT a is given by gT a = g(k1,ka) · · · g(kn,ka)· · · g(ka−1,ka) 1×a−1. The elements g(kn,ka) of g T

a are elements of the gram matrix Gt that are ob-tained using mode-n gram matrices G(n); n ∈ {1, . . . , N},

g(kn, ka)= g(N)(knN, kaN)⊗ . . . ⊗ g

(1)_(k n1, ka1),

(11)

where, knN, . . . , kn1are the tensor indices corresponding to the column index knand kaN, . . . , ka1are the tensor indices corresponding to the column index ka(see appendix A)

Updating the Gram matrix after Removing a Column krfrom the Active Set.

Let the column kr∈ I be the column removed from the active set. We move column krand row krof Gt−1−1to become its last column and last row, respec-tively. We denote this new matrix as ˆG−1_t₋₁. By using the Schur complement inversion formula for a symmetric block matrix, the inverse ˆG−1_t₋₁could be interpreted as ˆ G_t−1₋₁= F−1₁₁ αb αbT _α N×N , (3.7) where F−111 = G−1t + αbb T

. Therefore, we could calculate the inverse of the Gram matrix at iteration t as (Goldfarb, 1972; Rosário et al., 2016)

G_t−1= F−1₁₁ − αbbT. (3.8)

Both F−1₁₁ andαbbTcould be easily obtained from ( ˆG−1_t₋₁) as follows (Matlab notation): F−1₁₁ = ˆG_t−1₋₁(1 : N− 1, 1 : N − 1) , (3.9) αbbT₌ Gˆ −1 t−1(1 : N− 1, N) ˆGt−1−1(N, 1 : N − 1) ˆ G−1_t₋₁(N, N) . (3.10)

3.2.2 Obtain Direction Vector dt. The direction vector along which the

solution x follows in a piecewise linear fashion when an active column is added to or removed from the active set is given by

dt= G−1t zt, (3.11)

where, zt= sign (ct(I)), that is, the sign sequence of the correlation vector over the active set.

3.2.3 Obtainvt. A vectorvtcould be defined as

vt= TItdt. (3.12)

This vectorvt could be efficiently obtained as a multilinear transformation of a direction tensorDtby the Gram matrices G(n); n ∈ {1, . . . , N},

(12)

Vt= Dt×1G(1)×2. . . ×nG(n)×n+1. . . ×NG(N), (3.13) where vec (Dt(I))= dtandDt(Ic)= 0. We note that vec (Vt)= vt.

3.2.4 Obtain the Correlation Vector ct. Because c1 would be obtained at initialization, the following calculations are needed only for an iteration t≥ 2. The correlation vector ctis given by

ct= Tvec (Rt−1), (3.14)

whereRt−1is the residual tensor from the previous iteration. Since

vec (Rt−1)= vec (Rt−2)− δt∗−1It−1dt−1, (3.15) we could update the correlation vector ctby

ct= Tvec (Rt−2)− δt∗−1TIt−1dt−1. (3.16) Substituting equation 3.12 into equation 3.16, we obtain an update for the correlation:

ct= ct−1− δt∗−1vt−1. (3.17)

3.2.5 Calculate Step Sizeδ∗. The minimum step size for adding a new

col-umn to the active set is given by

δ+ t = min i∈Ic λt− ct(i) 1− vt(i) ,λt+ ct(i) 1+ vt(i) . (3.18)

The minimum step size for removing a column from the active set is given by δ− t = −xt−1(i) dt(i) . (3.19)

Therefore, the minimum step size for L1 constrained sparse least-squares problem is δ∗ t = min δ+ t , δ−t . (3.20)

For the L0constrained sparse least-squares problem, only new columns are added to the active set at every iteration. Therefore, the minimum step size for L0constrained sparse least-squares problem isδt∗= δt+

(13)

3.2.6 Update the Solution ˜xt, Regularization Parameter, and Residual. The

current solution ˜xt = vec( ˜Xt) is given by

˜xt = ˜xt−1+ δt∗dt. (3.21)

Updateλt+1andRt+1for the next iteration:

λt+1= λt− δ∗t. (3.22)

The residual tensorRt+1for the next iteration (t+ 1) can be calculated as

Rt+1= Rt− δt∗Dt×1(1)×2(2)×3· · · ×N(N). (3.23) 3.2.7 Check Stopping Criterion. Check if either of the following stopping

criteria has been reached:

Rt+12< ε (3.24)

or

length (I)≥ K. (3.25)

The T-LARS algorithm solves the sparse tensor least-squares problem in equation 2.6 to obtain a sparse solution X ∈ RI1×...×In×...×IN _{for a}

ten-sorY ∈ RJ1×...×Jn×...×JN_{using N mode-n dictionaries}(n)∈ RJn×In; ∀n ∈ {1, N}

TensorY and N mode-n dictionaries (1), (2), . . . , (N) are the inputs to the T-LARS algorithm where N≥ 1. The T-LARS algorithm can be used to solve underdetermined, square or overdetermined sparse tensor least-squares problems, where the mode-n dictionaries(n)_{∈ R}Jn×In; ∀n ∈ {1, .N}

are overcomplete dictionaries (Jn< In) complete dictionaries (Jn= In) or un-dercomplete dictionaries (Jn> In), respectively.

The complete T-LARS algorithm is summarized in algorithm 2 (using Matlab notation). This T-LARS algorithm solves the Lp sparse separable least-squares problem when the T-LARS_mode is set to Lp

4 Algorithm Computational Complexity

In this section, we analyze the computational complexity of our T-LARS algorithm and compare it with the computational complexity of Kronecker-OMP. We show that the computational complexity of T-LARS is significantly lower than that of Kronecker-OMP when solving sparse tensor least-squares problems.

4.1 The Computational Complexity of T-LARS. Let K be the number of iterations used in T-LARS P= I1× . . . × In× . . . × INbe the number of atoms in the Kronecker dictionary and Q = J1× . . . × Jn× . . . × JNbe the

(14)

(15)

total number of elements in tensorY. In a typical sparse solution obtained by T-LARS K P. In the following analysis we refer to algorithm 2.

Step 1 of the T-LARS algorithm runs only once and has a computational complexity of I1Q+ I1I2Q J1 + . . . + PQ J1. . . Jn−1× In+1. . . IN+ . . . + PJN

Step 10 of the T-LARS algorithm obtains the inverse Gram matrix using Schur complement inversion, and has complexityOPk+ k3_{where k is the} iteration number whose maximum value is K. The computational complex-ity of step 10 for column addition isPK+ 4K_k₌₁k2_{and the} computa-tional complexity of step 10 for column removal is2K_k₌₁k2_{. Therefore,} the maximum computational complexity of step 10 is given by

PK+ 4 K k=1

k2.

Steps 13 and 27 of the T-LARS algorithm involve multilinear transforma-tions. In both stepsDtis a sparse tensor with at most k nonzero entries at any iteration k. Therefore, for K iterations, the computational complexities of steps 13 and 27 are

2 K k=1 N n=1 kIn and 2 K k=1 N n=1 kJn+ 2KQ, respectively.

4.1.1 Case of Overcomplete mode-n Dictionaries. For overcomplete mode-n

dictionaries(n)∈ RJn×In; J

n< In, n∈ {1, . . . , N}, step 13 of the T-LARS al-gorithm would have higher computational complexity compared to step 27. Therefore, the computational complexity for most computationally

(16)

intensive steps of the T-LARS algorithm would be less than ⎛ ⎜ ⎜ ⎜ ⎝ I1Q+I1I2QJ1 + . . . + PQ J1...Jn−1×In+1...IN+ . . . + PJN + PK+ 4K k=1k 2 + 2 K k=1 N n=1kIn+ 2KQ ⎞ ⎟ ⎟ ⎟ ⎠ (4.1) 4.2 Comparison of Computational Complexities of Kronecker-OMP and T-LARS. Caiafa and Cichocki (2012) earlier analyzed the compu-tational complexity of Kronecker-OMP to solve problem 2.6 given Y ∈ RJ1×...×Jn×...×JN_{, J}

n= J; ∀n ∈ {1, . . . , N} and mode-n dictionaries (n)∈ RJ×I. From Caiafa and Cichocki (2012) after K iterations, the combined computa-tional complexity of Kronecker-OMP was given by

⎛ ⎜ ⎜ ⎜ ⎝ 2IN_J 1−(J I) N 1−J I K+ 2IN_K+7K k=1k 2N + (2NJ+ N+4) K k=1k N +(N(N− 1)+3) KJN ⎞ ⎟ ⎟ ⎟ ⎠. (4.2)

To obtain the computational complexity of T-LARS to solve problem 2.6 given Y ∈ RJ1×...×Jn×...×JN_{, J}

n= J; ∀ n ∈ {1, . . . , N} and mode-n dictionaries (n)_{∈ R}J×I_{, we substitute I} n= I and Jn= J; ∀n ∈ {1, . . . , N}, in equation 4.1 to obtain ⎛ ⎜ ⎝2IN_J ⎛ ⎜ ⎝1− J I N 1−J_I ⎞ ⎟ ⎠ + INK+ 4 K k=1 k2 + 2NI K k=1 k+ 2KJN ⎞ ⎟ ⎠ . (4.3)

Table 1 shows a term-by-term comparison of the computational com-plexity of Kronecker-OMP and T-LARS given in equations 4.2 and 4.3, respectively.

On comparing equations 4.2 and 4.3, we note that the first term of the computational complexity of T-LARS is more than K times lower than the first term of the computational complexity of Kronecker-OMP:

2INJ ⎛ ⎜ ⎝1− J I N 1−J_I ⎞ ⎟ ⎠ < 2IN_J ⎛ ⎜ ⎝1− J I N 1−J_I ⎞ ⎟ ⎠ K. (4.4)

On comparing equations 4.2 and 4.3, we note that the second term of the computational complexity of T-LARS is OIN_K_{+ K}3_{while the second term}

(17)

Table 1: Term-by-Term Comparison of the Computational Complexity of Kronecker-OMP and T-LARS Given in Equations 4.2 and 4.3.

Kronecker-OMP T-LARS First term 2IN_J 1−J I N 1−J I K 2IN_J 1−J I N 1−J I Second term 2IN_K₊₇K k=1k 2N _IN_K_{+ 4}K k=1k 2 Third term (2NJ+ N+4)K k=1k N _2NIK k=1k Fourth term (N(N− 1)+3) KJN 2KJN

of the computational complexity of Kronecker-OMP is OIN_K_{+ K}2N+1 Therefore, for N≥ 2 and the same number of iterations

INK+ 4 K k=1 k2_{< 2I}N_K_{+ 7} K k=1 k2N_. _(4.5)

On comparing equations 4.2 and 4.3, we note that the third term of the com-putational complexity of T-LARS is OK2_{while the third term of the} com-putational complexity of Kronecker-OMP is OKN+1_{. Therefore, for N}_{≥ 2} and the same number of iterations

2NI K k=1 k< (2NJ + N+4) K k=1 kN. (4.6)

On comparing equations 4.2 and 4.3, we note that both fourth terms of the computational complexity of T-LARS and of Kronecker-OMP are OJN_. Therefore,

2KJN< (N(N − 1)+3) KJN. (4.7)

Therefore, from equations 4.4 to 4.7, we observe that the computational complexity of our T-LARS algorithm is significantly lower than Kronecker-OMP when solving sparse tensor least-squares problems with N≥ 2 with the same number of iterations.

For multidimensional problems, N≥ 2, typically K I; therefore, the second terms of the computational complexities of both T-LARS and Kronecker-OMP dominate over all other terms. Therefore, for K iterations, the asymptotic computational complexities of T-LARS and Kronecker-OMP are OIN_K_{+ K}3_{and O}_IN_K_{+ K}2N+1_{, respectively.}

(18)

5 Experimental Results

In this section, we present experimental results to compare the performance of Kronecker-OMP and T-LARS when used to obtain sparse representa-tions of 3D brain images using both fixed and learned mode-n overcomplete dictionaries.

5.1 Experimental Data Sets. For our computational experiments, we obtained a 3D MRI brain image and a 3D PET-CT brain image from publicly available data sets.

Our 3D MRI brain image consists of 175× 150 × 10 voxels and was ob-tained from the OASIS-3: Longitudinal Neuroimaging, Clinical, and Cogni-tive Dataset for Normal Aging and Alzheimer’s Disease (LaMontagne et al., 2018). This image shows a region in the brain of a 38-year-old male patient with a tumor in his right frontal lobe.

Our 3D PET-CT brain image consists of 180× 160 × 10 voxels and was obtained from the Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection (Clark et al., 2013). This image shows a region in the brain of a 38-year-old female patient

5.2 Experimental Setup. We compared the performance of T-LARS and Kronecker-OMP when used to obtain different sparse representations for our 3D MRI and PET-CT brain images by solving L0and L1 constrained sparse multilinear least-squares problems using both fixed and learned overcomplete dictionaries.

Our fixed mode-n overcomplete dictionaries were a union of a discrete cosine transform (DCT) dictionary and a Symlet wavelet packet with four vanishing moments dictionary. In this case of using fixed mode-n dictionar-ies, we obtained the required 3D sparse representations by solving either the 3D L0or L1minimization problem.

Our learned mode-n overcomplete dictionaries were learned using the tensor-method of optimal directions (T-MOD) (Roemer, Del Galdo, & Haardt, 2014) algorithm. We used T-MOD to learn three overcomplete mode-n dictionaries—(1)_{∈ R}32×38_,(2)_{∈ R}32×38_{, and}(3)_{∈ R}10×12_—using random patches, 32× 32 × 10 voxels with a 10% overlap from either one the 3D brain images that we used: MRI or PET-CT (Elad & Aharon, 2006; Zhai, Zhang, Lv, Fu, & Yu, 2018). In this case of using learned dictionaries, we ob-tained the required 3D sparse representations by solving either a 4D (due to the use of image patches) L0or L1minimization problem. For fair com-parison of the performance of T-LARS and Kronecker-OMP, we generated our results using two learned dictionaries,KOMPandTLARS, that were ob-tained using Kronecker-OMP and T-LARS as the sparse coding algorithm used by T-MOD, respectively

To compare the performance of T-LARS and Kronecker-OMP when used to solve L0and L1constrained sparse multilinear least-squares problems, we

(19)

designed the following experiments to obtain sparse representations of our 3D brain images under different conditions:

1. Experiment 1: Fixed mode-n dictionaries—3D L0minimization prob-lem

2. Experiment 2: Learned mode-n dictionaries (KOMP)—4D L0 mini-mization problem

3. Experiment 3: Learned mode-n dictionaries (TLARS)—4D L0 mini-mization problem

4. Experiment 4: Fixed mode-n dictionaries—3D L1minimization prob-lem

5. Experiment 5: Learned mode-n dictionaries (TLARS)—4D L1 mini-mization problem

All of our experimental results were obtained using a Matlab implemen-tation of T-LARS and Kronecker-OMP on an MS Windows machine: 2 Intel Xeon CPUs E5-2637 v4 3.5G Hz, 32 GB RAM, and NVIDIA Tesla P100 GPU with 12 GB memory.

5.3 Experimental Results for 3D MRI Brain Images. In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain K-sparse representations of our 3D MRI brain image,Y, 175 × 150 × 10 vox-els by solving the L0constrained sparse tensor least-squares problem. We also obtained similar K-sparse representations using T-LARS by solving the

L1optimization problem. Table 2 summarizes our results for the five ex-periments described in section 5.2. In all exex-periments, the algorithms were stopped when the number of nonzero coefficients K reached 13, 125, which is 5% of the number of elements inY. We note that in Table 2, the number of iterations for L1optimization problems is larger than K because, as shown in algorithm 2 at each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

5.3.1 Experiment 1. Figures 1 and 2 show obtained experimental results

for representing our 3D MRI brain image using three fixed mode-n over-complete dictionaries(1)_{∈ R}175×351_,(2)_{∈ R}150×302_{, and}(3)_{∈ R}10×26_{, by} solving the L0 minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS was R2= 0.0839 (8.39%); for Kronecker-OMP, it was R2= 0.0624 (6.24%).

5.3.2 Experiments 2 and 3. Figures 3 and 4 show the experimental

re-sults for representing our 3D MRI brain image using our learned over-complete dictionaries,KOMPandTLARSby solving the L0minimization problem using both T-LARS and Kronecker-OMP. For theKOMPdictionary, the residual error of the reconstructed 3D images obtained using T-LARS wasR2= 0.1368 (13.68%) and for Kronecker-OMP, it was R2= 0.1143

(20)

T able 2: Summary of Experimental R esults for O ur 3D MRI B rain Image. Number Computation Computation Optimization o f T ime (sec) T ime (sec) Experiment Image Size Pr oblem D ictionary T y pe Iterations K-OMP T -LARS 1 175 × 150 × 10 L0 Fixed 13,125 20,144 434 23 2 × 32 × 10 × 36 L0 Learned ( KO M P ) 13,125 25,002 394 33 2 × 32 × 10 × 36 L0 Learned ( TL A R S ) 13,125 22,646 400 4 175 × 150 × 10 L1 Fixed 14,216 – 495 53 2 × 32 × 10 × 36 L1 Learned ( TL A R S ) 14,856 – 490

(21)

Figure 1: Experiment 1. Original 3D MRI brain image (a), its reconstruction us-ing 5% nonzero coefficients (K= 13,125) obtained by Kronecker-OMP (b), and T-LARS (c) using fixed mode-n overcomplete dictionaries.

Figure 2: Experiment 1. (a) Number of nonzero coefficients versus computation time. (b) Residual error versus computation time. (c) Residual error versus num-ber of nonzero coefficients obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using fixed mode-n overcomplete dictionaries.

(11.43%). For theTLARSdictionary, the residual error of the reconstructed 3D images obtained using T-LARS wasR2= 0.1127 (11.27%) and for Kronecker-OMP, it wasR2= 0.0955 (9.55%)

5.3.3 Experiment 4. Figures 5 and 6 show the experimental results for

representing our 3D MRI brain image using three fixed mode-n overcom-plete dictionaries,(1)∈ R175×351,(2)∈ R150×302and(3)∈ R10×26, by solv-ing the L1minimization problem using T-LARS. The residual error of the reconstructed 3D image obtained using T-LARS wasR2= 0.121 (12.1%)

5.3.4 Experiment 5. Figures 7 and 8 show obtained experimental results

for representing our 3D MRI brain image using our learned overcomplete dictionaryTLARS, by solving the L1minimization problem, using T-LARS. The residual error of the reconstructed 3D image wasR2= 0.138 (13.8%)

(22)

Figure 3: Experiments 2 and 3. Original 3D MRI brain image (a), its reconstruc-tions using 5% nonzero coefficients (K= 13,125), (b–e), the difference images, (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcom-plete dictionaries.

Figure 4: Experiments 2 and 3. (a) Number of nonzero coefficients versus com-putation time. (b) Residual error versus comcom-putation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D MRI brain image and using our learned overcomplete dictionaries.

5.4 Experimental Results for 3D PET-CT Brain Images. In this section, we compare the performance of T-LARS and Kronecker-OMP to obtain K-sparse representations of our 3D PET-CT brain image,Y, 180 × 160 × 10 voxels. by solving the L0constrained sparse tensor least-squares problem. We also obtained similar K-sparse representations using T-LARS by solving the L1optimization problem. Table 3 summarizes our results for the five ex-periments. In all experiments, the algorithms were stopped when the num-ber of nonzero coefficients K reached 14,400, which is 5% of the numnum-ber of elements inY. We note that in Table 2 the number of iterations for L1 op-timization problems is larger than K because, as shown in algorithm 2, at

(23)

Figure 5: Experiment 4. Original 3D MRI brain image (a), its reconstruction using 5% nonzero coefficients (K= 13,125) obtained by T-LARS using fixed moden overcomplete dictionaries (b), and the difference image (c).

Figure 6: Experiment 4. (a) Number of nonzero coefficients versus computa-tion time. (b) Residual error versus computacomputa-tion time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D MRI brain image and using fixed mode-n overcomplete dictionaries.

Figure 7: Experiment 5. Original 3D MRI brain image (a) and its reconstruc-tion using 5% nonzero coefficients (K= 13,125) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

(24)

Figure 8: Experiment 5. (a) Number of nonzero coefficients versus computa-tion time. (b) Residual error versus computacomputa-tion time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D MRI brain image and using our learned overcomplete dictionary.

each iteration T-LARS could either add or remove nonzero coefficients to or from the solution.

representing our 3D PET-CT brain image using three fixed mode-n over-complete dictionaries,(1)∈ R180×364,(2)∈ R160×320and(3)∈ R10×26, by solving the L0 minimization problem using both T-LARS and Kronecker-OMP. The residual error of the reconstructed 3D images obtained using T-LARS wasR2= 0.054 (5.4%), and for Kronecker-OMP, it was R2= 0.0368(3.68%).

5.4.2 Experiments 2 and 3. Figures 11 and 12 show the experimental

re-sults for representing our 3D PET-CT brain image using our learned over-complete dictionariesKOMPandTLARS, by solving the L0 minimization problem using both T-LARS and Kronecker-OMP. For the KOMP dictio-nary, the residual error of the reconstructed 3D images obtained using T-LARS was R2= 0.096 (9.6%) and for Kronecker-OMP, it was R2= 0.077 (7.7%). For the TLARSdictionaries, the residual error of the recon-structed 3D images obtained using T-LARS wasR2= 0.0877 (8.77%) and for Kronecker-OMP, it wasR2= 0.0722 (7.22%).

5.4.3 Experiment 4. Figures 13 and 14 show the experimental results

for representing 3D PET-CT brain images using three fixed overcomplete mode-n dictionaries,(1)_{∈ R}180×364_,(2)_{∈ R}160×320_{, and} (3)_{∈ R}10×26_{, by} solving the L1 minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS is R2= 0.0838 (8.38%).

(25)

T able 3: Summary of Experimental R esults for O ur 3D PET -CT B rain Image. Number Computation Computation Optimization o f T ime (sec) T ime (sec) Experiment Image Size Pr oblem D ictionary T y pe Iterations K-OMP T -LARS 1 180 × 160 × 10 L0 Fixed 14,400 29,529 505 23 2 × 32 × 10 × 42 L0 Learned ( KOMP ) 14,400 33,453 476 33 2 × 32 × 10 × 42 L0 Learned ( TLARS ) 14,400 31,083 490 4 180 × 160 × 10 L1 Fixed 16,059 – 591 53 2 × 32 × 10 × 42 L1 Learned ( TLARS ) 18,995 – 744

(26)

Figure 9: Experiment 1. Original PET-CT brain image (a) and its reconstruction using 5% nonzero coefficients (K= 14,400) obtained by Kronecker-OMP (b) and TLARS (c) using fixed mode-n overcomplete dictionaries.

Figure 10: Experiment 1. (a) Number of nonzero coefficients versus computa-tion time. (b) Residual error versus computacomputa-tion time. (c) Residual error ver-sus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using fixed mode-n overcomplete dictionaries.

dictionaryTLARS, by solving the L1minimization problem using T-LARS. The residual error of the reconstructed 3D PET-CT brain images obtained using T-LARS isR2= 0.106 (10.6%).

6 Conclusion

In this letter, we developed tensor least angle regression (T-LARS), a generalization of least angle regression to efficiently solve large L0or large

L1 constrained multidimensional (tensor) sparse least-squares problems (underdetermined or overdetermined) for all critical values of the regu-larization parameterλ. An earlier generalization of OMP, Kronecker-OMP was developed to solve the L0problem for large multidimensional sparse least-squares problems. To demonstrate the validity and performance of

(27)

Figure 11: Experiments 2 and 3. Original 3D PET-CT brain image (a), its re-constructions using 5% nonzero coefficients (K= 14,400) (b–e), the difference images (f–i) obtained using Kronecker-OMP and T-LARS, using our learned overcomplete dictionaries.

Figure 12: Experiments 2 and 3. (a) Number of nonzero coefficients versus com-putation time. (b) Residual error versus comcom-putation time. (c) Residual error versus number of nonzero coefficients, obtained by applying Kronecker-OMP and T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionaries.

our T-LARS algorithm, we successfully used it to obtain different K-sparse signal representations of two 3D brain images, using fixed and learned separable overcomplete dictionaries, by solving 3D and 4D, L0 and L1 constrained sparse least-squares problems. Our numerical experiments demonstrate that our T-LARS algorithm is significantly faster (46 to 70 times) than Kronecker-OMP in obtaining K-sparse solutions for multilinear least-squares problems. However, the K-sparse solutions obtained using Kronecker-OMP always have a slightly lower residual error (1.55% to 2.25%) than those obtained by T-LARS. These numerical results confirm our analysis in section 4.2 that the computational complexity of T-LARS is significantly lower than the computational complexity of Kronecker-OMP. In future work we plan to exploit this significant computational efficiency

(28)

Figure 13: Experiment 4. Original 3D PET-CT brain image (a), its reconstruc-tion using 5% nonzero coefficients (K= 14,400) obtained by T-LARS using fixed mode-n overcomplete dictionaries (b), and the difference image (c).

Figure 14: Experiment 4. (a) Number of nonzero coefficients versus computa-tion time. (b) Residual error versus computacomputa-tion time. (c) Residual error versus number of nonzero coefficients, obtained by applying T-LARS to our 3D PET-CT brain image and using fixed mode-n overcomplete dictionaries.

Figure 15: Experiment 5. Original 3D PET-CT brain image (a) and its recon-struction using 5% nonzero coefficients (K= 14,400) obtained by T-LARS using our learned overcomplete dictionary (b) and the difference image (c).

(29)

Figure 16: Experiment 5. (a) Number of nonzero coefficients versus computa-tion time. (b) Residual error versus computacomputa-tion time. (c) Residual error versus number of nonzero coefficients obtained by applying T-LARS to our 3D PET-CT brain image and using our learned overcomplete dictionary.

of T-LARS to develop more computationally efficient Kronecker dictionary learning methods. A Matlab GPU-based implementation of our Tensor Least Angle Regression (T-LARS) algorithm, Algorithm 2, is available at https://github.com/SSSherif/Tensor-Kronecker-Least-Angle-Regression. Appendix A: Mapping of Column Indices of Dictionary to Column

Indices of Mode-n Dictionaries

T-LARS avoids the construction of large matrices such as the separable dic-tionary in equation 2.4. Instead, T-LARS uses mode-n dictionaries for cal-culations. The following mapping between column indices of dictionary and column indices of mode-n dictionaries(n)_{; n ∈ {1, . . . , N} is essential} in T-LARS calculations.

The arbitrary columnkis the kth column of the separable dictionary in equation 2.4, which is given by the Kronecker product of the columns of the dictionary matrices(1)_,(2)_{, . . . ,}(N)_{, n}_{∈ {1, . . . , N}:}

k= φ_i(N)_N ⊗ φ_i(N−1)_N₋₁ ⊗ . . . ⊗ φI1(1). (A.1) The column indices (iN, iN−1, . . . , i1) are the indices of the columns of the dictionary matrices(1)_,(2)_{, . . . ,}(N)_{. The column index k of the separable} dictionary is given by (Kolda, 2006)

k= i1+ N n=2

(in− 1) I1I2. . . In−1, (A.2)

where, I1I2, . . . , IN are the dimensions of the columns of the dictio-nary matrices (1)_,(2)_{, . . . ,}(N)_{, respectively. The following} proposi-tion shows how to obtain the column indices of the dicproposi-tionary matrices

(30)

(iN, iN−1, . . . , i1), which correspond to the column index k of the separable dictionary.

Proposition 1. Let k be the column index of the separable dictionary column vec-torkand Inbe the dimension of the columns of each dictionary matrix(n); n ∈ {1, . . . , N}. In equation A.1, the corresponding column indices in; n ∈ {1, . . . , N} of each dictionary matrixφ_i(n)_n are given by

in= ⎡ ⎢ ⎢ ⎢ k I1× . . . × In−1− N p=n+1;p≤N ip− 1 p−1 q=n;q>0 Iq ⎤ ⎥ ⎥ ⎥ (A.3)

where ∗ indicates the ceiling function—for example, i1= ! k− (iN− 1) IN−1× . . . × I1− . . . − (i2− 1) I1 " .. . iN−1= # k I1× . . . × IN−2− (iN− 1) IN−1 $ iN= # k I1× . . . × IN−1 $ .

Proof. We note that in; ∀n ∈ {1, 2, . . . , N are integers and 1 ≤ in≤ In. From equation A.2,

k= I1+ (i2 − 1) I1+ . . . + (iN− 1) I1× . . . × IN−1. (A.4) Therefore, iN− 1 = k I1× . . . × IN−1 % &' ( SN −i1+ (i2− 1)I1+ . . . + (iN−1− 1)I1× . . . × IN−2 I1× . . . × IN−1 % &' ( fN iN= SN+ (1 − fN). (A.5) Since in≤ In; ∀n ∈ {1, 2, . . . , N}, fN≤I1+ (I2 − 1) I1+ . . . + (IN−1− 1) I1× . . . × IN−2 I1× . . . × IN−1 = 1. (A.6)

Also since 1≤ in; ∀n ∈ {1, 2, . . . , N}, we have

fN= 1+ (1 − 1) I1+ . . . + (1 − 1) I1× . . . × IN−2

(31)

Therefore, from equations A.6 and A.7

0< fN≤ 1, (A.8)

0≤ 1 − fN< 1. (A.9)

Since iN∈ Z, from equations A.5 and A.9, we have iN= ! iN− (1 − fN) " = SN iN= # k I1× . . . × IN−1 $ , (A.10)

where ∗ indicates the ceiling function. Similarly, iN−1− 1 = k I1× . . . × IN−2− (iN− 1)IN−1 % &' ( SN−1 − i1+ (i2− 1)I1+ · · · + (iN−1− 1)I1× . . . × IN−2 I1× . . . × IN−2 % &' ( fN−1 , iN−1= sN−1+ (1 − fN−1), 0≤ (1 − fN−1)< 1. Since iN−1∈ Z and 0 ≤ (1 − fN−1)< 1, iN−1=!iN−1− (1 − fN−1)"= SN−1 iN−1= # k I1× . . . × IN−2 − (iN− 1) IN−1 $ . Similarly, for∀n ∈ {1, 2, . . . , N}, 0≤ (1 − fn)< 1, in= ! in− (1 − fn) " = Sn , in= ⎡ ⎢ ⎢ ⎢ k I1× . . . × In−1 − N p=n+1;p≤N ip− 1 p−1 q=n;q>0 Iq ⎤ ⎥ ⎥ ⎥.

(32)

Appendix B: Tensor and Dictionary Normalization

B.1 Normalization of the TensorY ∈ RJ1×...×Jn×...×JN_. _Compute

ˆ Y = Y Y2 . (B.1) where,Y2= √ Y, Y =J1 j1. . . JN jNy 2 j1j2... jN 1 2 .

B.2 Normalization of Columns of the Separable Dictionary to Have a UnitL2Norm. The columnkin equation A.1 is the kth column of the separable dictionary. Normalization of each column vector kof the sep-arable dictionary is given by

ˆ

k= k

k2

. (B.2)

Proposition 2. Normalization of the column k in equation A.1 is given by the Kronecker product of the normalization of the dictionary columns

φ(N) iN , φ (N−1) iN−1 , . . . , φ (1) I1 : ˆ k= ˆφ (N) iN ⊗ ˆφ (N−1) iN−1 ⊗ . . . ⊗ ˆφ (1) I1 . (B.3)

Proof. The L2norm of the Kronecker product of vectors is the product of

L2norms of these vectors (Lancaster & Farahat, 1972): k 22=φ(N)_i_N ⊗ φ(N−1)_i_N₋₁ ⊗ . . . ⊗ φ(1)I1 =φ(N) iN 2 2× φ(N−1)iN−1 2 2× . . . × φ(1) I1 2 2. (B.4)

From equations B.2 and B.3,

ˆ k= k k2 = φ (N) iN φ(N) iN 2 2 ⊗ . . . ⊗ φ (1) I1 φ(1) I1 2 2 . (B.5) Therefore, ˆ k= ˆφ (N) iN ⊗ ˆφ (N−1) iN−1 ⊗ . . . ⊗ ˆφ (1) I1 .

(33)

Appendix C: Obtaining the Initial Correlation TensorC1

In LARS the initial correlation vector c1is obtained by taking the correlation between all columns of and the vectorization of the tensor Y:

c1=

(N)T

⊗ (N−1)T

⊗ · · · ⊗ (1)T_{vec (}_{Y) .} _(C.1) We could also represent equation C.1 as a multilinear transformation of the tensorY .(Kolda, 2006): C1= Y ×1(1) T ×2. . . ×n(n) T ×n+1. . . ×N(N) T . (C.2)

The tensorC1is the correlation between the tensorY and the mode-n dic-tionary matrices(n); n ∈ {1, . . . , N}. The tensor C1could be calculated effi-ciently as N mode-n products.

Appendix D: Creating a Gram Matrix for Each mode-n Dictionary(n) Gram matrices are used in several steps of T-LARS. For a large separa-ble dictionary,, its Gram matrix would be large as well. Therefore, ex-plicitly building this Gram matrix and using it in computations could be very inefficient for large problems. Therefore, we developed T-LARS to use Gram matrices of mode-n dictionary matrices,(1), (2), . . . , (N), defined as G(n); n ∈ {1, . . . , N}, instead of the Gram matrix T:

T₌(N)T

(N)_{⊗ · · · ⊗}(n)T

(n)_{⊗ · · · ⊗}(1)T

(1)_. _(D.1)

We can obtain Gram matrix G(n)for each mode-n dictionary(n)_by

G(n)₌(n)T

(n)_. _(D.2)

The total sizes of the Gram matrices G(n); n ∈ {1, . . . , N} would be much smaller than the Gram matrix G= T, thereby allowing faster calcula-tions and requiring less computer storage.

Acknowledgments

This work was supported by a Discovery Grant (RGPIN-2018-06453) from the National Sciences and Engineering Research Council of Canada.

(34)

References

Acar, E., Dunlavy, D. M., & Kolda, T. G. (2011). A scalable optimization approach for fitting canonical tensor decompositions. Journal of Chemometrics, 25, 67–86. https://doi.org/10.1002/cem.1335

Björck, ˚A. (2015). Numerical methods in matrix computations. Berlin: Springer. Boyd, S., & Vandenberghe, L. (2010). Convex optimization. Cambridge: Cambridge

University Press.

Caiafa, C. F., & Cichocki, A. (2012). Computing sparse representations of mul-tidimensional signals using Kronecker bases. Neural Computation, 25, 1–35. https://doi.org/10.1162/NECO_a_00385

Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposi-tion by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61. https://doi.org/10.1137/S1064827596304010

Cichocki, A., Mandic, D., De Lathauwer, L., Zhou, G., Zhao, Q., Caiafa, C., & Phan, H. A. (2015). Tensor decompositions for signal processing applications: From two-way to multitwo-way component analysis. IEEE Signal Processing Magazine, 32, 145– 163. https://doi.org/10.1109/MSP.2013.2297439

Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., & Prior, F. (2013). The cancer imaging archive (TCIA): Maintaining and operating a public information repository. Journal of Digital Imaging, 26(6), 1045–1057. https://doi.org/10.1007/s10278-013-9622-7

Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718. https://doi.org/10.1109/18.119732

Daubechies, I. (1990). The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 36(5), 961–1005. https://doi.org/10.1109/18.57199

Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal1-norm solution is also the sparsest solution. Communications on

Pure and Applied Mathematics, 59(6), 797–829. https://doi.org/10.1002/cpa.20132

Donoho, D. L., & Elad, M. (2003). Optimally sparse representation in gen-eral (nonorthogonal) dictionaries via 1 minimization. In Proceedings of the

National Academy of Sciences, 100(5), 2197–2202. https://doi.org/10.1073

/pnas.0437847100

Donoho, D. L., & Tsaig, Y. (2008). Fast solution of1-norm minimization problems when the solution may be sparse. IEEE Transactions on Information Theory, 54(11), 4789–4812. https://doi.org/10.1109/TIT.2008.929958

Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., Ishwaran, H., Knight, K., & Tib-shirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499. https://doi.org/10.1214/009053604000000067

Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant represen-tations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745. https://doi.org/10.1109/TIP.2006.881969

Elrewainy, A., & Sherif, S. S. (2019). Kronecker least angle regression for unsuper-vised unmixing of hyperspectral imaging data. Signal, Image and Video Processing,

(35)

Goldfarb, D. (1972). Modification methods for inverting matrices and solving sys-tems of linear algebraic equations. Mathematics of Computation, 26(120), 829–829. https://doi.org/10.1090/S0025-5718-1972-0317527-4

Hager, W. W. (1989). Updating the inverse of a matrix. SIAM Review, 31(2), 221–239. https://doi.org/10.1137/1031049

Hawe, S., Seibert, M., & Kleinsteuber, M. (2013). Separable dictionary learning. In Proceedings of the IEEE Computer Society Conference on

Com-puter Vision and Pattern Recognition (pp. 438–445). Piscataway, NJ: IEEE.

https://doi.org/10.1109/CVPR.2013.63

Kolda, T. G. (2006). Multilinear operators for higher-order decompositions. (Techni-cal Report SAND2006-2081). Livermore, CA: Sandia National Laboratories. https://doi.org/10.2172/923081

Kolda, T. G., & Bader, B. W. (2009). Tensor decompositions and applications. SIAM

Review, 51(3), 455–500. https://doi.org/10.1137/07070111X

Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T.-W., & Sejnowski, T. J. (2003). Dictionary learning algorithms for sparse representation. Neural

Compu-tation, 15(2), 349–396. https://doi.org/10.1162/089976603762552951

LaMontagne, P. J., Keefe, S., Lauren, W., Xiong, C., Grant, E. A., Moulder, K. L., & Marcus, D. S. (2018). OASIS-3: Longitudinal neuroimaging, clinical, and cogni-tive dataset for normal aging and Alzheimer’s disease. Alzheimer’s and Dementia,

14(7S, pt. 2), P138–P138. https://doi.org/10.1016/j.jalz.2018.06.2231

Lancaster, P., & Farahat, H. K. (1972). Norms on direct sums and tensor products.

Mathematics and Computation, 26(118), 401–414.

Malioutov, D. M., Çetin, M., & Willsky, A. S. (2005). Homotopy continuation for sparse signal representation. In Proceedings of the IEEE International Conference

on Acoustics, Speech and Signal Processing (pp. 733–736). Piscataway, NJ: IEEE.

https://doi.org/10.1109/ICASSP.2005.1416408

Mallat, S. S. (2009). A wavelet tour of signal processing. Amsterdam: Elsevier. https://doi.org/10.1016/B978-0-12-374370-1.X0001-8

Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12), 3397–3415. https://doi.org/10.1109/78.258082

Pati, Y. C., Rezaiifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of the Asilomar Conference on Signals, Systems & Computers (vol. 1, pp. 40–44). Piscataway, NJ: IEEE. https://doi.org/10.1109/acssc.1993.342465 Roemer, F., Del Galdo, G., & Haardt, M. (2014). Tensor-based algorithms for learning

multidimensional separable dictionaries. In Proceedings of the IEEE International

Conference on Acoustics, Speech and Signal Processing (pp. 3963–3967). Piscataway,

NJ: IEEE. https://doi.org/10.1109/ICASSP.2014.6854345

Rosário, F., Monteiro, F. A., & Rodrigues, A. (2016). Fast matrix inversion updates for massive MIMO detection and precoding. IEEE Signal Processing Letters, 23(1), 75–79. https://doi.org/10.1109/LSP.2015.2500682

Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., & Faloutsos, C. (2017). Tensor decomposition for signal processing and machine learning. IEEE Transactions on Signal Processing, 65(13), 3551–3582. https://doi.org/10.1109/TSP.2017.2690524