A Riemannian trust region method for the canonical tensor rank approximation problem November 6, 2017 Paul Breiding (TU Berlin) Nick Vannieuwenhoven (FWO / KU Leuven)
Overview
1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 ConclusionsIntroduction
Overview
1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 ConclusionsIntroduction
The tensor rank decomposition
The tensor rank decomposition
Hitchcock (1927) introduced thetensor rank decomposition:1
A= r X i=1 a1i ⊗a2i ⊗ · · · ⊗adi A = +· · ·+
where the length is assumed to beminimal.
1Also called CANDECOMP/PARAFAC, canonical (polyadic) decomposition and CP decomposition.
Introduction Identifiability
Identifiability
Kruskal (1977) proved that the rank-1 terms appearing in
A=
r X
i=1
a1i ⊗a2i ⊗ · · · ⊗adi
are uniquely determined ifr is small and d ≥3. We call the tensor
Introduction Identifiability
LetX ⊂PCN be a projective variety, and define itsr-secant
varietyas
σr(X) :=σr0(X) with σ0r(X) := [
p1,...,pr∈X
hp1, . . . ,pri.
By construction, tensors of rank-r are Zariski-dense inσr(S), where
S :={a1⊗ · · · ⊗ad |ak ∈Rnk \ {0},k = 1, . . . ,d}
Introduction Identifiability
Then,X is generically r-identifiableif there exists aZ with dimZ <dimσr(X) such that
∀[p]∈σr(X)\Z : [p] = [p1+· · ·+pr] = [q1+· · ·+qr] iff
{[p1], . . . ,[pr]}={[q1], . . . ,[qr]} including multiplicities.
Introduction Identifiability
Genericr-identifiability of symmetric tensors in SdCn,
A=
r X
i=1
a⊗i d with ai ∈Cn,
iscompletely understood because of Ballico (2015); Chiantini, Ottaviani, and V (2017); and Galuppi and Mella (2017).
General rule:
if r <n−1 d+nd−1 andd ≥3 → generic r-identifiability if r ≥n−1 d+nd−1
Introduction Identifiability
Genericr-identifiability of tensors inCn1⊗ · · · ⊗
Cnd, A= r X i=1 a1i ⊗ · · · ⊗adi withaki ∈Cnk,
isconjecturally understoodbecause of
1 Strassen (1983) ford = 3 (partial result);
2 Bocci, Chiantini, Ottaviani (2013) for unbalanced cases;
3 Chiantini, Ottaviani, V (2014) for n1· · ·nd≤15 000;
4 Abo, Ottaviani, Peterson (2009); Chiantini, Ottaviani (2012);
Bocci, Chiantini (2013); Chiantini, Mella, Ottaviani (2014); etc. Letn1≥ · · · ≥nd,rcr= 1+Pn1d···nd
i=1(ni−1),rub=n2· · ·nd−
Pd
k=2(nk −1).
Conjectured general rule:
if r≥rcr ord= 2 → not genericallyr-identifiable
if n1>rub andr ≥rub → not genericallyr-identifiable
Introduction
A computational problem
A computational problem
The computational problem that we are interested in is thetensor canonical rank decomposition problem(TAP):
Given a tensorA ∈Rn1×n2×···×nd,find the rank-1 tensors
pi :=a1i ⊗a2i ⊗ · · · ⊗adi such that r X i=1 a1i ⊗a2i ⊗ · · · ⊗adi − A F is min.
Classic optimization
Overview
1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 ConclusionsClassic optimization Classic parameterization
Classic parameterization
In the literature, the TAP is usually formulated as a classic unconstrained optimization problem over someRN:
min (A1,...,Ad)∈(Rn1×r×···×Rnd×r) r X i=1 a1i ⊗ · · · ⊗adi − A F ,
whereAk := [aki]i are the factor matrices.
For applying classic optimization methods, we consider
Rn1×r × · · · ×Rnd×r 'Rr(n1+···+nd);
in this latter interpretation, I call themvectorized factor matrices (VFM).
Classic optimization Classic parameterization
Structure-exploitingGauss–Newton methodswith trust region or line search are state-of-the-art algorithms for this problem.
Letp∈Rr(n1+···+nd) represent the VFM. Then, at every step, GN
locally minimizes the model
mp(x) =f(p) + (rTpJp)x+xT(JpTJp)x, where rp:= r X i=1 a1i ⊗ · · · ⊗adi − A is the residual Jp:= h ∂ ∂pi Pr i=1a1i ⊗ · · · ⊗adi i
Classic optimization Classic parameterization
It is easy to compute thatTerracini’s matrixis
Jp= J1 J2 · · · Jr , where Ji = Id⊗a2i ⊗ · · · ⊗adi · · · a1i ⊗ · · · ⊗adi−1⊗Id
Note that eachJi has a kernel of dimension (d −1). This
complication arises because the VFM is anover-parameterization of the rank-1 tensora1i ⊗ · · · ⊗adi .
Theexpected rank ofJ ∈R(n1···nd)×r(n1+···+nd) is only
r(n1+· · ·+nd−(d−1))<r(n1+· · ·+nd)
This corresponds to the expected dimension ofσr(S), where S is the Segre variety of rank-1 tensors inRn1⊗ · · · ⊗Rnd.
Classic optimization Classic parameterization
Hence,the minimizer of the model mp(x) is not unique! We should compute the least-squares solution of
Jpx? =rp. Thisx? is the new search direction.
An approximate solution is found by
computing the pseudo-inverse x? =Jp†rp; or regularizing x? ≈(JT
pJp+λ·Id)−1JpTrp; or x? is approximated using the LSQR method; or a subset of columns of Jp is taken,eJp andx? =eJ
†
Classic optimization Classic parameterization
The general outline of the Gauss–Newton method is as follows: S1. Choose a random starting pointp∈Rr(n1+···+nd)
S2. While not converged do:
S2.1. Compute residualrp and JacobianJp
S2.2. Solve the least-squares problemJpx?=rp
S2.3. Use globalization method to determine next iteratep
Dedieu and Kim (2002) showed that the above method converges quadratically to exact solutions; and
linearly to least-squares solutions,
where the multiplicative constants are functions ofkJp†k2. Recall thatkJp†k2 is also the condition number of the CPD, as considered in V (2017).
Riemannian optimization
Overview
1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 ConclusionsRiemannian optimization Choosing a good parameterization
Choosing a good parameterization
Recall that we try to minimizemin rank(B)≤r
kB − AkF,
which is a constrained optimization problem.
We should understand the structure of theconstraint set
σr0(S) :={B ∈Rn1×···×nd | rank(B)≤r}.
Since it is a projection of a graph of a polynomial map, it follows from the Tarski–Seidenberg principle thatσr0(S) is a
Riemannian optimization Choosing a good parameterization
This is not great news, because there exists pointsp ∈σ0r(S) that are not locally diffeomorphic to someRN. This rules out smooth
optimization methods, such as Gauss–Newton methods.
We cancircumventthis problem by considering the addition map Σ :S × · · · × S →σ0r(S), (p1, . . . ,pr)7→
r X
i=1
pi.
This is a smooth map. Moreover, its source is theproduct of smooth manifolds, because S ⊂PCΠ is known to be a smooth
Riemannian optimization Choosing a good parameterization
We could then reformulate our optimization problem: min
rank(B)≤rkB − AkF =(p1,...,prmin)∈(S×···×S)
kΣ(p1, . . . ,pr)− AkF.
Note that now we optimize
1 a (twice) differentiablefunction, 2 over a smooth manifold.
These optimization problems are studied inRiemannian optimization.
Riemannian optimization Riemannian optimization
Riemannian optimization
In general, ifM ⊂RN is an m-dimensional embedded smooth
manifold andF :M →Rn a smooth function, then min
x∈M 1
2kF(x)k 2
is a Riemannian optimization problem that can be solved by, e.g., a Riemannian Gauss–Newton method; see Absil, Mahoney, and Sepulchre (2008).
Riemannian optimization Riemannian optimization
Breiding and V (2017b) showed the following Lemma
Let x? be a local minimizer ofminx∈M12kF(x)k2. Let κ=ςm(dx?F)
−1. Assume that x
0 is sufficiently close to x? and that C is a sufficiently large constant. Then, the Riemannian Gauss–Newton method
converges quadratically, specifically
kxk+1−x?k ≤C ·κ· kxk −x?k2 if F(x?) = 0; or
converges linearly, i.e.,
kxk+1−x?k ≤C ·κ2kF(x?)k · kxk −x?k if F(x?)>0.
Riemannian optimization Analysis
Analysis
There are many Riemannian optimization formulations of the TAP!
LetE be a smooth Riemannian manifold with dimE ≥dimS×r, and Ψ :E →σr0(S) asurjective, smooth map. Then,
arg min x∈E 1 2kΨ(x)− Ak 2 F
Riemannian optimization Analysis
So which parameterizationE should we choose? This depends on theproblem you want to solve! In several
applications, one wishes to interpret the individual rank-1 terms. I present the analysis from Breiding and V (2017c) next.
We assume there is a mapπ:E → S×r, so that
E S×r
σ0r(S)
π
Ψ Σ
Riemannian optimization Analysis
Letm:= dimS×r, let x∈ E, and let κ:=k(d
π(x)Σ)†k2 be the
geometric condition numberfrom Paul’s talk.
Then, we showed that 1 κ·√r · 1 ςm(dxπ) ≤ ς1(dxΨ) ς1(dxπ) · 1 ςm(dxΨ).
Riemannian optimization Analysis
TakingE =Rr(n1+···+nd) as the classic parameterization, this yields
1 κ·√r · 1 ςm(dxπ) ≤r· 1 ςm(Jx) =r·kJ † xk2. where,
1 κ is the condition number of the trivial parameterization
E =S×r from Breiding and V (2017a); and 2 Jx is Terracini’s matrix;
3 kJx†k2 is the condition number of the parameterization with VFM from V (2017).
Riemannian optimization Analysis
The spectrum ofdxπ was analyzed in V (2017) for norm-balanced
x. We have ςm(dxπ) = min 1≤i≤rka 1 i ⊗ · · · ⊗adik d−1 d .
Paul showed thatκ does not depend on the norms of the rank-1 tensors.
For a fixed set of unit-norm rank-1 tensorspi ∈S(S), the
geometric condition numberof the CPD Pri=1αipi is the constant
κ, while theclassic condition number satisfies
kJx†k ≥ 1
κ·r√r ·1max≤i≤rα −1 i . whichblows up as someαj →0.
Riemannian optimization Analysis
Consequently, for a fixed geometric condition numberκ, the convergence of the Gauss–Newton method applied to the classic parameterization can be slowed down arbitrarily by changing the norms of the rank-1 terms of the local minimizer, while the Riemannian Gauss–Newton method’s convergence is unaffected.
Riemannian optimization
Riemannian Gauss–Newton with trust region
Riemannian Gauss–Newton with trust region
We propose applying a Riemannian Gauss–Newton (RGN) method withtrust region to
min p∈S×···×S 1 2kΣ(p)− Ak 2, for a givenA ∈Rn1×···×nd.
Riemannian optimization
Riemannian Gauss–Newton with trust region
LetM ⊂RN be an m-dimensional embedded submanifold.
Atangent vector toM atq is a vector tq∈RN such that there exists a smooth curvep(t)⊂ M with t∈(−1,1) for which
q=p(0) andtq= d
dtp(0).
Thetangent spaceTqM⊂RN of Matq ∈ Mis the
m-dimensional linear subspace spanned by all tangent vectors to
Riemannian optimization
Riemannian Gauss–Newton with trust region
In a RGN method, the objective function
f(x) = 1
2kΣ(x)− Ak 2
islocally approximated atp∈ S×r by the quadratic model
mp(t) :=f(p) +hdpf,ti+ 1 2ht,(dpΣ ∗◦d pΣ)(t)i, where
Hp:=dpΣ∗◦dpΣ is theGN Hessian approximation, and h·,·i is the inner product inherited from the ambient RN.
Riemannian optimization
Riemannian Gauss–Newton with trust region
The RGN method withtrust region considers the model to be accurate only in a radius ∆ aboutp.
p p
Thetrust region subproblem(TRS) is min
t∈TpSr
mp(t) subject toktk ≤∆,
Riemannian optimization
Riemannian Gauss–Newton with trust region
We need to advance fromp∈ S×r top0 ∈ S×r, along the direction p. However, whilep+p∈TpS×r,this point does not lie inS×r!
p
p
Rp(p)
S×r
TpS×r
We need aretraction operator (Absil, Mahoney, Sepulchre, 2008) for smoothly mapping a neighborhood of0∈TpS×r back to S×r.
Riemannian optimization
Riemannian Gauss–Newton with trust region
RGN with trust region method:
S1. Choose random initial pointspi ∈ S.
S2. Let p(1)←(p1, . . . ,pr), and set k ←0.
S3. Choose a trust region radius ∆>0.
S4. While not converged, do:
S4.1. Solve the trust region subproblem, resulting inpk ∈TpS×r. S4.2. Compute the tentative next iteratep(k+1)←Rp(k)(pk) via a
retraction in the direction ofpk fromp(k).
S4.3. Accept or reject the next iterate. If the former, incrementk.
S4.4. Update the trust region radius ∆.
Riemannian optimization
Riemannian Gauss–Newton with trust region
Retraction
Given a retraction operatorR0 for S, a retraction operatorR for the product manifoldS×r =S × · · · × S atp= (p
1, . . . ,pr) is Rp(·) := (Rp01×R 0 p2× · · · ×R 0 pr)(·),
which is called theproduct retraction.
Some known retraction operators forS are
the rank-(1, . . . ,1) T-HOSVD (De Lathauwer, De Moor, Vandewalle, 2001), proved by Kressner, Steinlechner, and Vandereycken (2014); and
the rank-(1, . . . ,1) ST-HOSVD(V, Vandebril, and Meerbergen, 2012), proved by Breiding and V (2017c).
Riemannian optimization
Riemannian Gauss–Newton with trust region
Trust region subproblem
In Breiding and V (2017c), the TRS is solved by combining a standarddogleg step with ahot restartingscheme.
Letgp be the coordinate representation of dpf, and let Hp be the
matrix ofdpΣ∗◦dpΣ. The dogleg step approximates the solutionp
of the TRS by b p= pN =−H † pgp ifkpNk ≤∆ pC =−gTpHpgp gT pgp gp if kpNk>∆ and kpCk ≥∆ pI:=pC+ (τ −1)(pN−pC) s.t. kpIk= ∆, otherwise .
Riemannian optimization The hot restarts strategy
The hot restarts strategy
The Newton directionpN=−Hp†gp.
is vital to the dogleg step.
Unfortunately, the Hessian approximationHp=dpΣ∗◦dpΣ can be
close to a singular matrix. In fact, q
kHp−1k2 = 1
ςm(dpΣ) =:κ(p), wherem= dimS×r.
Riemannian optimization The hot restarts strategy
Let
Ir ⊂ S×r :={p∈ S×r |κ(p) =∞}
be theill-posed locus. It turns out thatIr is a closed, nonempty, positive-dimensional subvariety ofS×r.
In Breiding and V (2017c), we provideheuristic arguments showing that nearq∈ Ir any RGN method will need many steps to escape
Ir when Σ(q) is a tensor
with infinitely many rank-r decompositions; or whose border rank is strictly smaller than its rank. Open questions:
Could such points be attractive for the RGN process? Do all CPDs in Ir cause slow convergence?
Riemannian optimization The hot restarts strategy
WheneverHp is close to a singular matrix we suggest to apply
random perturbationsto the current decompositionp untilHp is
sufficiently well-behaved. We call this ahot restarts procedure.
Evidently the success of this procedure depends on the average geometric condition number in the neighborhood of ap∈ Ir.
Numerical experiments
Overview
1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 ConclusionsNumerical experiments
Numerical experiments
The proposed RGN method with trust region and hot restarts (RGN-HR) was implemented in Matlab R2016b.
We compare it with somestate-of-the-art nonlinear least squares solversin Tensorlab v3.0 (Vervlietet al., 2016), namely nls lmand
Numerical experiments
We consider parameterized2 tensors in Rn1×n2×n3 with varying
condition numbers. There are three parameters:
1 c ∈[0,1] regulates the “colinearity” of the factor matrices 2 s ≥1 regulates the scaling, and
3 r is the rank. Typically,
1 increasing c increases the geometric condition number. 2 increasing s increases the classic condition number. 3 increasing r decreases the probability of finding a
decomposition.
2
Numerical experiments
The true rank-r tensor is then
A=
r X
i=1
a1i ⊗a2i ⊗a3i.
Finally, we normalize the tensor and add random Gaussian noise
E ∈Rn1×n2×n3 of magnitudeτ:
B= A
kAkF +τ E kEkF.
The tensorBis the one we would like to approximate by a tensor of rankr.
Numerical experiments
We will choosek random starting points and then apply each of the methods to each of the starting points.
The keyperformance criterion (on a single processor) is the expected time to success(TTS).
Let
1 the probability of success be pS,
2 the probability of failure be pF = 1−pS,
3 a successful decomposition take mS seconds, and 4 a failed decomposition take mF seconds.
Then, the expected time to a first success is
E[TTS] = ∞ X k=0 pkF−1pS(mS + (k−1)mF) = pSmS +pFmF pS .
Numerical experiments
Speedup of RGN-HR
Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 s c 1 1 2 1 0.60 0.83 1 2 0.83 1 4 4 0.86 3 20 ∞ 0.0 0.25 0.5 0.75 c 0.44 0.91 1 3 0.40 0.89 2 6 0.45 1.00 2 5 0.71 2 ∞ ∞ 0.0 0.25 0.5 0.75 c 0.33 0.78 2 3 0.50 1 1 ∞ 0.42 0.75 5 3 0.59 1 10 ∞ 0.0 0.25 0.5 0.75 GNDL-PCG c 0.26 0.61 1 ∞ 0.24 0.54 1 ∞ 0.31 0.71 ∞ ∞ 0.45 8 5 4 1 2 3 4 s 11 9 16 15 11 19 17 21 11 14 18 30 26 30 65 38 14 18 21 23 21 20 34 52 37 29 27 30 33 92 69 66 57 40 26 53 27 70 45 16 82 49 38 44 168 53 182 246 GNDL 35 31 52 132 68 44 304 ∞ 250 76 171 56 99 509 130 284 1 2 3 4 s r = 15 2 1 2 2 1.00 1 2 1 2 1 1 1 1 1 1 1 r = 20 1.00 1 1 2 1.00 1.00 1 2 1.00 1 2 1 2 1 2 1 r = 25 1 1 1 2 1 1 2 0.79 1 1 3 2 2 2 2 ∞ RGN-Reg r = 30 0.87 2 3 2 2 2 3 3 2 2 1 1 2 3 2 2 noise levelτ = 10−3Numerical experiments
Speedup of RGN-HR
Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 0.95 s c 3 6 6 6 2 3 7 8 1.00 1 7 13 0.86 3 16 36 3 10 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 2 4 5 3 0.56 0.92 6 13 0.60 2 5 15 0.69 4 8 ∞ 2 30 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 0.94 0.45 4 6 0.40 2 6 20 0.43 1 10 ∞ 0.48 1.00 23 ∞ 2 20 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 GNDL-PCG c 0.36 1 4 12 0.35 2 6 19 0.41 1 7 ∞ 0.44 1 ∞ ∞ 2 ∞ ∞ ∞ 1 2 3 4 s 6 28 15 16 9 18 27 30 15 15 17 80 12 63 83 45 50 55 160 163 25 20 50 24 15 32 52 29 35 38 53 102 75 53 60 157 113 117 114 326 24 28 40 42 39 61 154 89 134 88 89 110 496 107 115 145 ∞ 298 439 264 GNDL 39 78 186 66 68 94 195 185 601 113 132 893 216 416 1169 489 ∞ ∞ ∞ ∞ 1 2 3 4 s r = 15 0.80 1.00 1 2 1.00 1 2 4 1.00 1 2 3 1 1 3 4 2 2 1 5 r = 20 1.00 1 2 5 1 2 2 4 1.00 1 2 3 2 2 3 3 5 2 4 6 r = 25 1.00 1 3 9 2 1 2 3 1 2 3 2 2 2 7 4 34 9 4 7 RGN-Reg r = 30 0.91 2 3 3 1 2 2 5 1 2 6 28 3 5 17 6 18 38 7 6 noise levelτ = 10−5Numerical experiments
Speedup of RGN-HR
Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 0.95 s c 0.80 8 3 2 1.00 2 5 10 0.80 3 8 22 1 2 17 84 4 25 402 ∞ 0.0 0.25 0.5 0.75 0.95 c 0.44 2 2 12 0.91 1 4 13 1.00 2 6 26 0.73 4 5 31 2 10 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 0.74 3 4 14 0.63 0.84 4 19 0.40 2 5 19 1 3 30 ∞ 1 6 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 GNDL-PCG c 1 0.96 4 44 0.24 1 4 9 0.38 3 9 124 0.44 3 23 ∞ 2 77 ∞ ∞ 1 2 3 4 s 14 9 21 13 14 13 31 19 23 13 33 49 24 20 73 124 71 75 92 101 20 30 25 32 35 25 39 89 64 82 55 84 73 61 56 136 255 128 275 724 19 38 109 101 79 57 95 120 109 83 61 122 75 384 277 162 2143 152 1205 ∞ GNDL 70 50 438 164 95 118 106 462 419 391 221 1882 836 355 1576 814 ∞ 1025 2004 ∞ 1 2 3 4 s r = 15 1.00 1.00 3 1 1.00 1.00 1 2 1.00 1 2 4 0.83 1 2 4 2 2 4 5 r = 20 1 1 1 6 1.00 3 2 3 1 2 2 5 1 2 3 4 2 2 8 3 r = 25 1 3 2 9 0.84 2 3 5 2 3 6 4 4 2 7 6 9 6 11 11 RGN-Reg r = 30 1 1 2 9 2 4 4 6 2 2 5 17 4 3 12 24 23 11 58 23 noise levelτ = 10−7Numerical experiments
Speedup of RGN-HR
Model 2, 13×11×9 tensors 0 1 2 3 4 5 7 9 11 13 s r GNDL 3 4 4 7 4 4 5 17 16 28 10 20 31 46 40 16 31 168 381 59 58 166 452 inf failed 5 7 9 11 13 r GNDL-PCG 7 17 13 7 6 8 7 18 21 67 6 2 22 23 35 4 16 27 24 inf 10 24 24 inf failed noise levelτ = 10−5Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Numerical experiments
Convergence plots
Model 2, rank 7, scalings = 2, noise level τ = 10−5
10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG
Conclusions
Overview
1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 ConclusionsConclusions
Conclusions
Take-away story:
1 The classic and geometric condition numbers qualitatively predict the difference between a classic GN method and a RGN method for solving TAPs.
2 We proposed a Riemannian Gauss–Newton trust region method with dogleg step and hot restarts for solving TAPs. 3 Specifically for badly scaled problems the RGN method is
Conclusions
Conclusions
Main references
Breiding and V (2017a),The condition number of join decompositions, 2017. (Submitted)
Breiding and V (2017b),Convergence analysis of Riemannian Gauss-Newton methods and its connection with the geometric condition number, Applied Mathematics Letters, 2017. (Accepted)
Breiding and V (2017c),A Riemannian trust region method for the canonical tensor rank approximation problem, 2017. (Submitted)
Conclusions
General
De Lathauwer, De Moor, and Vandewalle (2000), A multilinear
singular value decomposition, SIAM J. Matrix Anal. Appl.
Hitchcock (1927),The expression of a tensor or a polyadic as a sum
of products, J. Math. Phys.
Kruskal (1977),Three-way arrays: rank and uniqueness of trilinear
decompositions, with application to arithmetic complexity and statistics, Lin. Alg. Appl.
V, Vandebril, and Meerbergen (2012),A new truncation strategy for
the higher-order singular value decomposition, SIAM J. Sci. Comput.
Vervliet, Debals, Sorber, Van Barel, and De Lathauwer (2016), Tensorlab v3.0, Available online.
Conclusions
Generic identifiability I
Ballico (2005),On the weak non-defectivity of Veronese
embeddings of projective spaces, Centr. Eur. J. Math.
Abo, Ottaviani, and Peterson (2009),Induction for secant varieties
of Segre varieties, Trans. Amer. Math. Soc.
Bocci and Chiantini (2013),On the identifiability of binary Segre
products, J. Algebraic Geom.
Bocci, Chiantini, and Ottaviani (2013),Refined methods for the
identifiability of tensors, Ann. Mat. Pura Appl.
Chiantini, Mella, and Ottaviani (2014), One example of general
unidentifiable tensors, J. Alg. Stat.
Chiantini and Ottaviani (2012),On generic identifiability of
3-tensors of small rank, SIAM J. Matrix Anal. Appl.
Chiantini, Ottaviani, and V (2014), An algorithm for generic and
low-rank specific identifiability of complex tensors, SIAM J. Matrix Anal. Appl.
Chiantini, Ottaviani, and V (2017), On generic identifiability of
Conclusions
Generic identifiability II
Galuppi and Mella (2017),Identifiability of homogeneous
polynomials and Cremona Transformations, arXiv.
Qi, Comon, and Lim (2016),Semialgebraic geometry of nonnegative
tensor rank, SIAM J. Matrix Anal. Appl.
Strassen (1983), Rank and optimal computation of generic tensors,
Linear Algebra Appl.
Conditioning
B¨urgisser and Cucker (2013),Condition: The Geometry of
Numerical Algorithms, Springer.
de Silva and Lim (2008),Tensor rank and the ill-posedness of the
best low-rank approximation problem, SIAM J. Matrix Anal. Appl.
Lee (2013),Introduction to Smooth Manifolds.
V (2017),Condition numbers for the tensor rank decomposition,
Conclusions
Optimization
Absil, Mahoney, and Sepulchre (2008),Optimization Algorithms on
Matrix Manifolds.
Dedieu and Kim (2002),Newton’s method for analytic systems of
equations with constant rank derivatives, J. Complexity.
Kressner, Steinlechner, and Vandereycken (2014),Low-rank tensor
completion by Riemannian optimization, BIT.
Conclusions
First, compute the Cholesky decomposition withR ∈Rr×r upper triangular of
C =c11T + (1−c)I =RTR.
Then, the factor matrices are
Ak =NkRdiag(s0,s1,s2, . . . ,sr), whereNk has standard normally distributed elements.
Conclusions
The settings for the Tensorlab methods were as follows:
AlgOpts = []; opts = [];
AlgOpts.TolFun = 1e-9 * tau^2; AlgOpts.TolX = 1e-12;
AlgOpts.TolAbs = 0; AlgOpts.MaxIter = 1000; AlgOpts.CGTol = 1e-6; AlgOpts.CGMaxIter = 75;
AlgOpts.LargeScale = true; % or false opts.Compression = false;
Conclusions
The settings for the proposed method were:
AlgOpts = []; opts = [];
AlgOpts.TolFun = 1e-6 * tau^2; AlgOpts.TolX = 1e-12; AlgOpts.TolAbs = 0; AlgOpts.MaxIter = 1000; AlgOpts.MaxRestarts = 500; opts.Compression = false; opts.AlgorithmOptions = AlgOpts;
We observed a delicate dependency on the relative function value toleranceTolFunfor all methods.