A Riemannian trust region method for the canonical tensor rank approximation problem

(1)

A Riemannian trust region method for the canonical tensor rank approximation problem November 6, 2017 Paul Breiding (TU Berlin) Nick Vannieuwenhoven (FWO / KU Leuven)

(2)

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(3)

Introduction

Overview

(4)

Introduction

The tensor rank decomposition

Hitchcock (1927) introduced thetensor rank decomposition:1

A= r X i=1 a1_i ⊗a2_i ⊗ · · · ⊗ad_i A = +· · ·+

where the length is assumed to beminimal.

1_{Also called CANDECOMP/PARAFAC, canonical (polyadic) decomposition} and CP decomposition.

(5)

Introduction Identifiability

Identifiability

Kruskal (1977) proved that the rank-1 terms appearing in

A=

r X

i=1

a1_i ⊗a2_i ⊗ · · · ⊗ad_i

are uniquely determined ifr is small and d ≥3. We call the tensor

(6)

LetX ⊂PCN be a projective variety, and define itsr-secant

varietyas

σr(X) :=σr0(X) with σ0r(X) := [

p1,...,pr∈X

hp1, . . . ,pri.

By construction, tensors of rank-r are Zariski-dense inσr(S), where

S :={a1⊗ · · · ⊗ad |ak ∈Rnk \ {0},k = 1, . . . ,d}

(7)

Then,X is generically r-identifiableif there exists aZ with dimZ <dimσr(X) such that

∀[p]∈σr(X)\Z : [p] = [p1+· · ·+pr] = [q1+· · ·+qr] iff

{[p1], . . . ,[pr]}={[q1], . . . ,[qr]} including multiplicities.

(8)

Genericr-identifiability of symmetric tensors in SdCn,

A=

r X

i=1

a⊗_i d with a_i ∈_Cn,

iscompletely understood because of Ballico (2015); Chiantini, Ottaviani, and V (2017); and Galuppi and Mella (2017).

General rule:

if r <n−1 d+n_d−1 andd ≥3 → generic r-identifiability if r ≥n−1 d+n_d−1

(9)

Genericr-identifiability of tensors inCn1_{⊗ · · · ⊗}

Cnd_, A= r X i=1 a1_i ⊗ · · · ⊗ad_i withak_i ∈Cnk,

isconjecturally understoodbecause of

1 _{Strassen (1983) for}_d _{= 3 (partial result);}

2 _{Bocci, Chiantini, Ottaviani (2013) for unbalanced cases;}

3 _{Chiantini, Ottaviani, V (2014) for} _n₁· · ·_n_d≤_{15 000;}

4 _{Abo, Ottaviani, Peterson (2009); Chiantini, Ottaviani (2012);}

Bocci, Chiantini (2013); Chiantini, Mella, Ottaviani (2014); etc. Letn1≥ · · · ≥nd,rcr= ₁₊Pn1d···nd

i=1(ni−1),rub=n2· · ·nd−

Pd

k=2(nk −1).

Conjectured general rule:

if r≥rcr ord= 2 → not genericallyr-identifiable

if n1>rub andr ≥rub → not genericallyr-identifiable

(10)

Introduction

A computational problem

The computational problem that we are interested in is thetensor canonical rank decomposition problem(TAP):

Given a tensorA ∈_Rn1×n2×···×nd_,_{find the rank-1 tensors}

pi :=a1i ⊗a2i ⊗ · · · ⊗adi such that r X i=1 a1_i ⊗a2_i ⊗ · · · ⊗ad_i − A F is min.

(11)

Classic optimization

Overview

(12)

Classic optimization Classic parameterization

Classic parameterization

In the literature, the TAP is usually formulated as a classic unconstrained optimization problem over someRN:

min (A1,...,Ad)∈(Rn1×r×···×Rnd×r) r X i=1 a1_i ⊗ · · · ⊗ad_i − A F ,

whereAk := [ak_i]i are the factor matrices.

For applying classic optimization methods, we consider

Rn1×r × · · · ×Rnd×r 'Rr(n1+···+nd);

in this latter interpretation, I call themvectorized factor matrices (VFM).

(13)

Structure-exploitingGauss–Newton methodswith trust region or line search are state-of-the-art algorithms for this problem.

Letp∈_Rr(n1+···+nd) _{represent the VFM. Then, at every step, GN}

locally minimizes the model

mp(x) =f(p) + (rTpJp)x+xT(JpTJp)x, where rp:= r X i=1 a1_i ⊗ · · · ⊗ad_i − A is the residual Jp:= h ∂ ∂pi Pr i=1a1i ⊗ · · · ⊗adi i

(14)

It is easy to compute thatTerracini’s matrixis

Jp= J1 J2 · · · Jr , where Ji = Id⊗a2_i ⊗ · · · ⊗ad_i · · · a1_i ⊗ · · · ⊗ad_i−1⊗Id

Note that eachJi has a kernel of dimension (d −1). This

complication arises because the VFM is anover-parameterization of the rank-1 tensora1_i ⊗ · · · ⊗ad_i .

Theexpected rank ofJ ∈R(n1···nd)×r(n1+···+nd) _{is only}

r(n1+· · ·+nd−(d−1))<r(n1+· · ·+nd)

This corresponds to the expected dimension ofσr(S), where S is the Segre variety of rank-1 tensors inRn1⊗ · · · ⊗Rnd_.

(15)

Hence,the minimizer of the model mp(x) is not unique! We should compute the least-squares solution of

Jpx? =rp. Thisx? is the new search direction.

An approximate solution is found by

computing the pseudo-inverse x? =Jp†rp; or regularizing x? _≈₍_JT

pJp+λ·Id)−1JpTrp; or x? is approximated using the LSQR method; or a subset of columns of Jp is taken,eJ_p andx? =eJ

†

(16)

The general outline of the Gauss–Newton method is as follows: S1. Choose a random starting pointp∈_Rr(n1+···+nd)

S2. While not converged do:

S2.1. Compute residualrp and JacobianJp

S2.2. Solve the least-squares problemJpx?=rp

S2.3. Use globalization method to determine next iteratep

Dedieu and Kim (2002) showed that the above method converges quadratically to exact solutions; and

linearly to least-squares solutions,

where the multiplicative constants are functions ofkJp†k2. Recall thatkJp†k2 is also the condition number of the CPD, as considered in V (2017).

(17)

Riemannian optimization

Overview

(18)

Riemannian optimization Choosing a good parameterization

Choosing a good parameterization

Recall that we try to minimize

min rank(B)≤r

kB − Ak_F,

which is a constrained optimization problem.

We should understand the structure of theconstraint set

σ_r0(S) :={B ∈Rn1×···×nd | rank(B)≤r}.

Since it is a projection of a graph of a polynomial map, it follows from the Tarski–Seidenberg principle thatσ_r0(S) is a

(19)

This is not great news, because there exists pointsp ∈σ0_r(S) that are not locally diffeomorphic to someRN. This rules out smooth

optimization methods, such as Gauss–Newton methods.

We cancircumventthis problem by considering the addition map Σ :S × · · · × S →σ0_r(S), (p1, . . . ,pr)7→

r X

i=1

pi.

This is a smooth map. Moreover, its source is theproduct of smooth manifolds, because S ⊂PCΠ is known to be a smooth

(20)

We could then reformulate our optimization problem: min

rank(B)≤rkB − AkF =(p1,...,prmin)∈(S×···×S)

kΣ(p1, . . . ,pr)− AkF.

Note that now we optimize

1 _{a (twice)} _{differentiable}_function, 2 _{over a} _{smooth manifold.}

These optimization problems are studied inRiemannian optimization.

(21)

Riemannian optimization Riemannian optimization

Riemannian optimization

In general, ifM ⊂RN is an m-dimensional embedded smooth

manifold andF :M →_Rn _{a smooth function, then} min

x∈M 1

2kF(x)k 2

is a Riemannian optimization problem that can be solved by, e.g., a Riemannian Gauss–Newton method; see Absil, Mahoney, and Sepulchre (2008).

(22)

Riemannian optimization Riemannian optimization

Breiding and V (2017b) showed the following Lemma

Let x? be a local minimizer ofminx∈M1₂kF(x)k2. Let κ=ςm(dx?F)

−1_{. Assume that x}

0 is sufficiently close to x? and that C is a sufficiently large constant. Then, the Riemannian Gauss–Newton method

converges quadratically, specifically

kxk+1−x?k ≤C ·κ· kxk −x?k2 if F(x?) = 0; or

converges linearly, i.e.,

kxk+1−x?k ≤C ·κ2kF(x?)k · kxk −x?k if F(x?)>0.

(23)

Riemannian optimization Analysis

Analysis

There are many Riemannian optimization formulations of the TAP!

LetE be a smooth Riemannian manifold with dimE ≥dimS×r_, and Ψ :E →σ_r0(S) asurjective, smooth map. Then,

arg min x∈E 1 2kΨ(x)− Ak 2 F

(24)

So which parameterizationE should we choose? This depends on theproblem you want to solve! In several

applications, one wishes to interpret the individual rank-1 terms. I present the analysis from Breiding and V (2017c) next.

We assume there is a mapπ:E → S×r_{, so that}

E S×r

σ0_r(S)

π

Ψ Σ

(25)

Letm:= dimS×r_{, let} _x_{∈ E}_{, and let} _κ_:=_k₍_d

π(x)Σ)†k2 be the

geometric condition numberfrom Paul’s talk.

Then, we showed that 1 κ·√r · 1 ςm(dxπ) ≤ ς1(dxΨ) ς1(dxπ) · 1 ςm(dxΨ).

(26)

TakingE =Rr(n1+···+nd) _{as the classic parameterization, this yields}

1 κ·√r · 1 ςm(dxπ) ≤r· 1 ςm(Jx) =r·kJ † xk2. where,

1 _κ _{is the condition number of the trivial parameterization}

E =S×r _{from Breiding and V (2017a); and} 2 J_x is Terracini’s matrix;

3 k_J_x†k₂ _{is the condition number of the parameterization with} VFM from V (2017).

(27)

The spectrum ofdxπ was analyzed in V (2017) for norm-balanced

x. We have ςm(dxπ) = min 1≤i≤rka 1 i ⊗ · · · ⊗adik d−1 d .

Paul showed thatκ does not depend on the norms of the rank-1 tensors.

For a fixed set of unit-norm rank-1 tensorspi ∈S(S), the

geometric condition numberof the CPD Pr_i=1αipi is the constant

κ, while theclassic condition number satisfies

kJ_x†k ≥ 1

κ·r√r ·1max≤i≤rα −1 i . whichblows up as someαj →0.

(28)

Consequently, for a fixed geometric condition numberκ, the convergence of the Gauss–Newton method applied to the classic parameterization can be slowed down arbitrarily by changing the norms of the rank-1 terms of the local minimizer, while the Riemannian Gauss–Newton method’s convergence is unaffected.

(29)

Riemannian Gauss–Newton with trust region

We propose applying a Riemannian Gauss–Newton (RGN) method withtrust region to

min p∈S×···×S 1 2kΣ(p)− Ak 2_, for a givenA ∈Rn1×···×nd_.

(30)

LetM ⊂_RN _{be an} _m_{-dimensional embedded submanifold.}

Atangent vector toM atq is a vector tq∈_RN _{such that there} exists a smooth curvep(t)⊂ M with t∈(−1,1) for which

q=p(0) andtq= d

dtp(0).

Thetangent spaceTqM⊂RN of Matq ∈ Mis the

m-dimensional linear subspace spanned by all tangent vectors to

(31)

In a RGN method, the objective function

f(x) = 1

2kΣ(x)− Ak 2

islocally approximated atp∈ S×r _{by the quadratic model}

mp(t) :=f(p) +hdpf,ti+ 1 2ht,(dpΣ ∗_◦_d pΣ)(t)i, where

Hp:=dpΣ∗◦dpΣ is theGN Hessian approximation, and h·,·i is the inner product inherited from the ambient RN.

(32)

The RGN method withtrust region considers the model to be accurate only in a radius ∆ aboutp.

p p

Thetrust region subproblem(TRS) is min

t∈TpSr

mp(t) subject toktk ≤∆,

(33)

We need to advance fromp∈ S×r _to_p0 _{∈ S}×r_{, along the direction} p. However, whilep+p∈TpS×r_,_{this point does not lie in}_S×r_!

p

Rp(p)

S×r

TpS×r

We need aretraction operator (Absil, Mahoney, Sepulchre, 2008) for smoothly mapping a neighborhood of0∈TpS×r back to S×r.

(34)

RGN with trust region method:

S1. Choose random initial pointspi ∈ S.

S2. Let p(1)←(p1, . . . ,pr), and set k ←0.

S3. Choose a trust region radius ∆>0.

S4. While not converged, do:

S4.1. Solve the trust region subproblem, resulting inpk ∈TpS×r. S4.2. Compute the tentative next iteratep(k+1)←Rp(k)(pk) via a

retraction in the direction ofpk fromp(k).

S4.3. Accept or reject the next iterate. If the former, incrementk.

S4.4. Update the trust region radius ∆.

(35)

Retraction

Given a retraction operatorR0 for S, a retraction operatorR for the product manifoldS×r ₌_{S × · · · × S} _at_p_{= (}_p

1, . . . ,pr) is Rp(·) := (Rp01×R 0 p2× · · · ×R 0 pr)(·),

which is called theproduct retraction.

Some known retraction operators forS are

the rank-(1, . . . ,1) T-HOSVD (De Lathauwer, De Moor, Vandewalle, 2001), proved by Kressner, Steinlechner, and Vandereycken (2014); and

the rank-(1, . . . ,1) ST-HOSVD(V, Vandebril, and Meerbergen, 2012), proved by Breiding and V (2017c).

(36)

Trust region subproblem

In Breiding and V (2017c), the TRS is solved by combining a standarddogleg step with ahot restartingscheme.

Letgp be the coordinate representation of dpf, and let Hp be the

matrix ofdpΣ∗◦dpΣ. The dogleg step approximates the solutionp

of the TRS by b p=        pN =−H † pgp ifkpNk ≤∆ pC =−gTpHpgp gT pgp gp if kpNk>∆ and kpCk ≥∆ pI:=pC+ (τ −1)(pN−pC) s.t. kpIk= ∆, otherwise .

(37)

Riemannian optimization The hot restarts strategy

The hot restarts strategy

The Newton direction

pN=−Hp†gp.

is vital to the dogleg step.

Unfortunately, the Hessian approximationHp=dpΣ∗◦dpΣ can be

close to a singular matrix. In fact, q

kHp−1k2 = 1

ςm(dpΣ) =:κ(p), wherem= dimS×r_.

(38)

Let

I_r ⊂ S×r _:=_{_p_{∈ S}×r _|_κ₍_p_{) =}_∞}

be theill-posed locus. It turns out thatIr is a closed, nonempty, positive-dimensional subvariety ofS×r_.

In Breiding and V (2017c), we provideheuristic arguments showing that nearq∈ I_r any RGN method will need many steps to escape

Ir when Σ(q) is a tensor

with infinitely many rank-r decompositions; or whose border rank is strictly smaller than its rank. Open questions:

Could such points be attractive for the RGN process? Do all CPDs in Ir cause slow convergence?

(39)

WheneverHp is close to a singular matrix we suggest to apply

random perturbationsto the current decompositionp untilHp is

sufficiently well-behaved. We call this ahot restarts procedure.

Evidently the success of this procedure depends on the average geometric condition number in the neighborhood of ap∈ Ir.

(40)

Numerical experiments

Overview

(41)

Numerical experiments

The proposed RGN method with trust region and hot restarts (RGN-HR) was implemented in Matlab R2016b.

We compare it with somestate-of-the-art nonlinear least squares solversin Tensorlab v3.0 (Vervlietet al., 2016), namely nls lmand

(42)

We consider parameterized2 tensors in Rn1×n2×n3 with varying

condition numbers. There are three parameters:

1 c ∈[0,1] regulates the “colinearity” of the factor matrices 2 s ≥1 regulates the scaling, and

3 _r _{is the rank.} Typically,

1 _increasing _c _{increases the geometric condition number.} 2 _increasing _s _{increases the classic condition number.} 3 increasing r decreases the probability of finding a

decomposition.

2

(43)

The true rank-r tensor is then

A=

r X

i=1

a1_i ⊗a2_i ⊗a3_i.

Finally, we normalize the tensor and add random Gaussian noise

E ∈Rn1×n2×n3 _{of magnitude}_τ_:

B= A

kAk_F +τ E kEk_F.

The tensorBis the one we would like to approximate by a tensor of rankr.

(44)

We will choosek random starting points and then apply each of the methods to each of the starting points.

The keyperformance criterion (on a single processor) is the expected time to success(TTS).

Let

1 the probability of success be p_S,

2 _{the probability of failure be} _p_F _{= 1}−_p_S_,

3 _{a successful decomposition take} _m_S _{seconds, and} 4 a failed decomposition take m_F seconds.

Then, the expected time to a first success is

E[TTS] = ∞ X k=0 pk_F−1pS(mS + (k−1)mF) = pSmS +pFmF pS .

(45)

Speedup of RGN-HR

Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 s c 1 1 2 1 0.60 0.83 1 2 0.83 1 4 4 0.86 3 20 ∞ 0.0 0.25 0.5 0.75 c 0.44 0.91 1 3 0.40 0.89 2 6 0.45 1.00 2 5 0.71 2 ∞ ∞ 0.0 0.25 0.5 0.75 c 0.33 0.78 2 3 0.50 1 1 ∞ 0.42 0.75 5 3 0.59 1 10 ∞ 0.0 0.25 0.5 0.75 GNDL-PCG c 0.26 0.61 1 ∞ 0.24 0.54 1 ∞ 0.31 0.71 ∞ ∞ 0.45 8 5 4 1 2 3 4 s 11 9 16 15 11 19 17 21 11 14 18 30 26 30 65 38 14 18 21 23 21 20 34 52 37 29 27 30 33 92 69 66 57 40 26 53 27 70 45 16 82 49 38 44 168 53 182 246 GNDL 35 31 52 132 68 44 304 ∞ 250 76 171 56 99 509 130 284 1 2 3 4 s r = 15 2 1 2 2 1.00 1 2 1 2 1 1 1 1 1 1 1 r = 20 1.00 1 1 2 1.00 1.00 1 2 1.00 1 2 1 2 1 2 1 r = 25 1 1 1 2 1 1 2 0.79 1 1 3 2 2 2 2 ∞ RGN-Reg r = 30 0.87 2 3 2 2 2 3 3 2 2 1 1 2 3 2 2 noise levelτ = 10−3

(46)

Speedup of RGN-HR

Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 0.95 s c 3 6 6 6 2 3 7 8 1.00 1 7 13 0.86 3 16 36 3 10 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 2 4 5 3 0.56 0.92 6 13 0.60 2 5 15 0.69 4 8 ∞ 2 30 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 0.94 0.45 4 6 0.40 2 6 20 0.43 1 10 ∞ 0.48 1.00 23 ∞ 2 20 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 GNDL-PCG c 0.36 1 4 12 0.35 2 6 19 0.41 1 7 ∞ 0.44 1 ∞ ∞ 2 ∞ ∞ ∞ 1 2 3 4 s 6 28 15 16 9 18 27 30 15 15 17 80 12 63 83 45 50 55 160 163 25 20 50 24 15 32 52 29 35 38 53 102 75 53 60 157 113 117 114 326 24 28 40 42 39 61 154 89 134 88 89 110 496 107 115 145 ∞ 298 439 264 GNDL 39 78 186 66 68 94 195 185 601 113 132 893 216 416 1169 489 ∞ ∞ ∞ ∞ 1 2 3 4 s r = 15 0.80 1.00 1 2 1.00 1 2 4 1.00 1 2 3 1 1 3 4 2 2 1 5 r = 20 1.00 1 2 5 1 2 2 4 1.00 1 2 3 2 2 3 3 5 2 4 6 r = 25 1.00 1 3 9 2 1 2 3 1 2 3 2 2 2 7 4 34 9 4 7 RGN-Reg r = 30 0.91 2 3 3 1 2 2 5 1 2 6 28 3 5 17 6 18 38 7 6 noise levelτ = 10−5

(47)

Speedup of RGN-HR

Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 0.95 s c 0.80 8 3 2 1.00 2 5 10 0.80 3 8 22 1 2 17 84 4 25 402 ∞ 0.0 0.25 0.5 0.75 0.95 c 0.44 2 2 12 0.91 1 4 13 1.00 2 6 26 0.73 4 5 31 2 10 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 0.74 3 4 14 0.63 0.84 4 19 0.40 2 5 19 1 3 30 ∞ 1 6 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 GNDL-PCG c 1 0.96 4 44 0.24 1 4 9 0.38 3 9 124 0.44 3 23 ∞ 2 77 ∞ ∞ 1 2 3 4 s 14 9 21 13 14 13 31 19 23 13 33 49 24 20 73 124 71 75 92 101 20 30 25 32 35 25 39 89 64 82 55 84 73 61 56 136 255 128 275 724 19 38 109 101 79 57 95 120 109 83 61 122 75 384 277 162 2143 152 1205 ∞ GNDL 70 50 438 164 95 118 106 462 419 391 221 1882 836 355 1576 814 ∞ 1025 2004 ∞ 1 2 3 4 s r = 15 1.00 1.00 3 1 1.00 1.00 1 2 1.00 1 2 4 0.83 1 2 4 2 2 4 5 r = 20 1 1 1 6 1.00 3 2 3 1 2 2 5 1 2 3 4 2 2 8 3 r = 25 1 3 2 9 0.84 2 3 5 2 3 6 4 4 2 7 6 9 6 11 11 RGN-Reg r = 30 1 1 2 9 2 4 4 6 2 2 5 17 4 3 12 24 23 11 58 23 noise levelτ = 10−7

(48)

Speedup of RGN-HR

Model 2, 13×11×9 tensors 0 1 2 3 4 5 7 9 11 13 s r GNDL 3 4 4 7 4 4 5 17 16 28 10 20 31 46 40 16 31 168 381 59 58 166 452 inf failed 5 7 9 11 13 r GNDL-PCG 7 17 13 7 6 8 7 18 21 67 6 2 22 23 35 4 16 27 24 inf 10 24 24 inf failed noise levelτ = 10−5

(49)

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(50)

Convergence plots

(51)

Convergence plots

(52)

Convergence plots

(53)

Convergence plots

(54)

Convergence plots

(55)

Convergence plots

(56)

Convergence plots

(57)

Convergence plots

(58)

Convergence plots

(59)

Conclusions

Overview

(60)

Conclusions

Take-away story:

1 _{The classic and geometric condition numbers qualitatively} predict the difference between a classic GN method and a RGN method for solving TAPs.

2 We proposed a Riemannian Gauss–Newton trust region method with dogleg step and hot restarts for solving TAPs. 3 Specifically for badly scaled problems the RGN method is

(61)

Conclusions

(62)

Conclusions

Main references

Breiding and V (2017a),The condition number of join decompositions, 2017. (Submitted)

Breiding and V (2017b),Convergence analysis of Riemannian Gauss-Newton methods and its connection with the geometric condition number, Applied Mathematics Letters, 2017. (Accepted)

Breiding and V (2017c),A Riemannian trust region method for the canonical tensor rank approximation problem, 2017. (Submitted)

(63)

Conclusions

General

De Lathauwer, De Moor, and Vandewalle (2000), A multilinear

singular value decomposition, SIAM J. Matrix Anal. Appl.

Hitchcock (1927),The expression of a tensor or a polyadic as a sum

of products, J. Math. Phys.

Kruskal (1977),Three-way arrays: rank and uniqueness of trilinear

decompositions, with application to arithmetic complexity and statistics, Lin. Alg. Appl.

V, Vandebril, and Meerbergen (2012),A new truncation strategy for

the higher-order singular value decomposition, SIAM J. Sci. Comput.

Vervliet, Debals, Sorber, Van Barel, and De Lathauwer (2016), Tensorlab v3.0, Available online.

(64)

Conclusions

Generic identifiability I

Ballico (2005),On the weak non-defectivity of Veronese

embeddings of projective spaces, Centr. Eur. J. Math.

Abo, Ottaviani, and Peterson (2009),Induction for secant varieties

of Segre varieties, Trans. Amer. Math. Soc.

Bocci and Chiantini (2013),On the identifiability of binary Segre

products, J. Algebraic Geom.

Bocci, Chiantini, and Ottaviani (2013),Refined methods for the

identifiability of tensors, Ann. Mat. Pura Appl.

Chiantini, Mella, and Ottaviani (2014), One example of general

unidentifiable tensors, J. Alg. Stat.

Chiantini and Ottaviani (2012),On generic identifiability of

3-tensors of small rank, SIAM J. Matrix Anal. Appl.

Chiantini, Ottaviani, and V (2014), An algorithm for generic and

low-rank specific identifiability of complex tensors, SIAM J. Matrix Anal. Appl.

Chiantini, Ottaviani, and V (2017), On generic identifiability of

(65)

Conclusions

Generic identifiability II

Galuppi and Mella (2017),Identifiability of homogeneous

polynomials and Cremona Transformations, arXiv.

Qi, Comon, and Lim (2016),Semialgebraic geometry of nonnegative

tensor rank, SIAM J. Matrix Anal. Appl.

Strassen (1983), Rank and optimal computation of generic tensors,

Linear Algebra Appl.

Conditioning

B¨urgisser and Cucker (2013),Condition: The Geometry of

Numerical Algorithms, Springer.

de Silva and Lim (2008),Tensor rank and the ill-posedness of the

best low-rank approximation problem, SIAM J. Matrix Anal. Appl.

Lee (2013),Introduction to Smooth Manifolds.

V (2017),Condition numbers for the tensor rank decomposition,

(66)

Conclusions

Optimization

Absil, Mahoney, and Sepulchre (2008),Optimization Algorithms on

Matrix Manifolds.

Dedieu and Kim (2002),Newton’s method for analytic systems of

equations with constant rank derivatives, J. Complexity.

Kressner, Steinlechner, and Vandereycken (2014),Low-rank tensor

completion by Riemannian optimization, BIT.

(67)

Conclusions

First, compute the Cholesky decomposition withR ∈_Rr×r _upper triangular of

C =c11T + (1−c)I =RTR.

Then, the factor matrices are

Ak =NkRdiag(s0,s1,s2, . . . ,sr), whereNk has standard normally distributed elements.

(68)

Conclusions

The settings for the Tensorlab methods were as follows:

AlgOpts = []; opts = [];

AlgOpts.TolFun = 1e-9 * tau^2; AlgOpts.TolX = 1e-12;

AlgOpts.TolAbs = 0; AlgOpts.MaxIter = 1000; AlgOpts.CGTol = 1e-6; AlgOpts.CGMaxIter = 75;

AlgOpts.LargeScale = true; % or false opts.Compression = false;

(69)

Conclusions

The settings for the proposed method were:

AlgOpts = []; opts = [];

AlgOpts.TolFun = 1e-6 * tau^2; AlgOpts.TolX = 1e-12; AlgOpts.TolAbs = 0; AlgOpts.MaxIter = 1000; AlgOpts.MaxRestarts = 500; opts.Compression = false; opts.AlgorithmOptions = AlgOpts;

We observed a delicate dependency on the relative function value toleranceTolFunfor all methods.