• No results found

A Riemannian trust region method for the canonical tensor rank approximation problem

N/A
N/A
Protected

Academic year: 2021

Share "A Riemannian trust region method for the canonical tensor rank approximation problem"

Copied!
69
0
0

Loading.... (view fulltext now)

Full text

(1)

A Riemannian trust region method for the canonical tensor rank approximation problem November 6, 2017 Paul Breiding (TU Berlin) Nick Vannieuwenhoven (FWO / KU Leuven)

(2)

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(3)

Introduction

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(4)

Introduction

The tensor rank decomposition

The tensor rank decomposition

Hitchcock (1927) introduced thetensor rank decomposition:1

A= r X i=1 a1i ⊗a2i ⊗ · · · ⊗adi A = +· · ·+

where the length is assumed to beminimal.

1Also called CANDECOMP/PARAFAC, canonical (polyadic) decomposition and CP decomposition.

(5)

Introduction Identifiability

Identifiability

Kruskal (1977) proved that the rank-1 terms appearing in

A=

r X

i=1

a1i ⊗a2i ⊗ · · · ⊗adi

are uniquely determined ifr is small and d ≥3. We call the tensor

(6)

Introduction Identifiability

LetX ⊂PCN be a projective variety, and define itsr-secant

varietyas

σr(X) :=σr0(X) with σ0r(X) := [

p1,...,pr∈X

hp1, . . . ,pri.

By construction, tensors of rank-r are Zariski-dense inσr(S), where

S :={a1⊗ · · · ⊗ad |ak ∈Rnk \ {0},k = 1, . . . ,d}

(7)

Introduction Identifiability

Then,X is generically r-identifiableif there exists aZ with dimZ <dimσr(X) such that

∀[p]∈σr(X)\Z : [p] = [p1+· · ·+pr] = [q1+· · ·+qr] iff

{[p1], . . . ,[pr]}={[q1], . . . ,[qr]} including multiplicities.

(8)

Introduction Identifiability

Genericr-identifiability of symmetric tensors in SdCn,

A=

r X

i=1

a⊗i d with aiCn,

iscompletely understood because of Ballico (2015); Chiantini, Ottaviani, and V (2017); and Galuppi and Mella (2017).

General rule:

if r <n−1 d+nd−1 andd ≥3 → generic r-identifiability if r ≥n−1 d+nd−1

(9)

Introduction Identifiability

Genericr-identifiability of tensors inCn1⊗ · · · ⊗

Cnd, A= r X i=1 a1i ⊗ · · · ⊗adi withaki ∈Cnk,

isconjecturally understoodbecause of

1 Strassen (1983) ford = 3 (partial result);

2 Bocci, Chiantini, Ottaviani (2013) for unbalanced cases;

3 Chiantini, Ottaviani, V (2014) for n1· · ·nd15 000;

4 Abo, Ottaviani, Peterson (2009); Chiantini, Ottaviani (2012);

Bocci, Chiantini (2013); Chiantini, Mella, Ottaviani (2014); etc. Letn1≥ · · · ≥nd,rcr= 1+Pn1d···nd

i=1(ni−1),rub=n2· · ·nd−

Pd

k=2(nk −1).

Conjectured general rule:

if r≥rcr ord= 2 → not genericallyr-identifiable

if n1>rub andr ≥rub → not genericallyr-identifiable

(10)

Introduction

A computational problem

A computational problem

The computational problem that we are interested in is thetensor canonical rank decomposition problem(TAP):

Given a tensorA ∈Rn1×n2×···×nd,find the rank-1 tensors

pi :=a1i ⊗a2i ⊗ · · · ⊗adi such that r X i=1 a1i ⊗a2i ⊗ · · · ⊗adi − A F is min.

(11)

Classic optimization

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(12)

Classic optimization Classic parameterization

Classic parameterization

In the literature, the TAP is usually formulated as a classic unconstrained optimization problem over someRN:

min (A1,...,Ad)∈(Rn1×r×···×Rnd×r) r X i=1 a1i ⊗ · · · ⊗adi − A F ,

whereAk := [aki]i are the factor matrices.

For applying classic optimization methods, we consider

Rn1×r × · · · ×Rnd×r 'Rr(n1+···+nd);

in this latter interpretation, I call themvectorized factor matrices (VFM).

(13)

Classic optimization Classic parameterization

Structure-exploitingGauss–Newton methodswith trust region or line search are state-of-the-art algorithms for this problem.

Letp∈Rr(n1+···+nd) represent the VFM. Then, at every step, GN

locally minimizes the model

mp(x) =f(p) + (rTpJp)x+xT(JpTJp)x, where rp:= r X i=1 a1i ⊗ · · · ⊗adi − A is the residual Jp:= h ∂ ∂pi Pr i=1a1i ⊗ · · · ⊗adi i

(14)

Classic optimization Classic parameterization

It is easy to compute thatTerracini’s matrixis

Jp= J1 J2 · · · Jr , where Ji = Id⊗a2i ⊗ · · · ⊗adi · · · a1i ⊗ · · · ⊗adi−1⊗Id

Note that eachJi has a kernel of dimension (d −1). This

complication arises because the VFM is anover-parameterization of the rank-1 tensora1i ⊗ · · · ⊗adi .

Theexpected rank ofJ ∈R(n1···nd)×r(n1+···+nd) is only

r(n1+· · ·+nd−(d−1))<r(n1+· · ·+nd)

This corresponds to the expected dimension ofσr(S), where S is the Segre variety of rank-1 tensors inRn1⊗ · · · ⊗Rnd.

(15)

Classic optimization Classic parameterization

Hence,the minimizer of the model mp(x) is not unique! We should compute the least-squares solution of

Jpx? =rp. Thisx? is the new search direction.

An approximate solution is found by

computing the pseudo-inverse x? =Jp†rp; or regularizing x? (JT

pJp+λ·Id)−1JpTrp; or x? is approximated using the LSQR method; or a subset of columns of Jp is taken,eJp andx? =eJ

(16)

Classic optimization Classic parameterization

The general outline of the Gauss–Newton method is as follows: S1. Choose a random starting pointp∈Rr(n1+···+nd)

S2. While not converged do:

S2.1. Compute residualrp and JacobianJp

S2.2. Solve the least-squares problemJpx?=rp

S2.3. Use globalization method to determine next iteratep

Dedieu and Kim (2002) showed that the above method converges quadratically to exact solutions; and

linearly to least-squares solutions,

where the multiplicative constants are functions ofkJp†k2. Recall thatkJp†k2 is also the condition number of the CPD, as considered in V (2017).

(17)

Riemannian optimization

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(18)

Riemannian optimization Choosing a good parameterization

Choosing a good parameterization

Recall that we try to minimize

min rank(B)≤r

kB − AkF,

which is a constrained optimization problem.

We should understand the structure of theconstraint set

σr0(S) :={B ∈Rn1×···×nd | rank(B)≤r}.

Since it is a projection of a graph of a polynomial map, it follows from the Tarski–Seidenberg principle thatσr0(S) is a

(19)

Riemannian optimization Choosing a good parameterization

This is not great news, because there exists pointsp ∈σ0r(S) that are not locally diffeomorphic to someRN. This rules out smooth

optimization methods, such as Gauss–Newton methods.

We cancircumventthis problem by considering the addition map Σ :S × · · · × S →σ0r(S), (p1, . . . ,pr)7→

r X

i=1

pi.

This is a smooth map. Moreover, its source is theproduct of smooth manifolds, because S ⊂PCΠ is known to be a smooth

(20)

Riemannian optimization Choosing a good parameterization

We could then reformulate our optimization problem: min

rank(B)≤rkB − AkF =(p1,...,prmin)∈(S×···×S)

kΣ(p1, . . . ,pr)− AkF.

Note that now we optimize

1 a (twice) differentiablefunction, 2 over a smooth manifold.

These optimization problems are studied inRiemannian optimization.

(21)

Riemannian optimization Riemannian optimization

Riemannian optimization

In general, ifM ⊂RN is an m-dimensional embedded smooth

manifold andF :M →Rn a smooth function, then min

x∈M 1

2kF(x)k 2

is a Riemannian optimization problem that can be solved by, e.g., a Riemannian Gauss–Newton method; see Absil, Mahoney, and Sepulchre (2008).

(22)

Riemannian optimization Riemannian optimization

Breiding and V (2017b) showed the following Lemma

Let x? be a local minimizer ofminx∈M12kF(x)k2. Let κ=ςm(dx?F)

−1. Assume that x

0 is sufficiently close to x? and that C is a sufficiently large constant. Then, the Riemannian Gauss–Newton method

converges quadratically, specifically

kxk+1−x?k ≤C ·κ· kxk −x?k2 if F(x?) = 0; or

converges linearly, i.e.,

kxk+1−x?k ≤C ·κ2kF(x?)k · kxk −x?k if F(x?)>0.

(23)

Riemannian optimization Analysis

Analysis

There are many Riemannian optimization formulations of the TAP!

LetE be a smooth Riemannian manifold with dimE ≥dimS×r, and Ψ :E →σr0(S) asurjective, smooth map. Then,

arg min x∈E 1 2kΨ(x)− Ak 2 F

(24)

Riemannian optimization Analysis

So which parameterizationE should we choose? This depends on theproblem you want to solve! In several

applications, one wishes to interpret the individual rank-1 terms. I present the analysis from Breiding and V (2017c) next.

We assume there is a mapπ:E → S×r, so that

E S×r

σ0r(S)

π

Ψ Σ

(25)

Riemannian optimization Analysis

Letm:= dimS×r, let x∈ E, and let κ:=k(d

π(x)Σ)†k2 be the

geometric condition numberfrom Paul’s talk.

Then, we showed that 1 κ·√r · 1 ςm(dxπ) ≤ ς1(dxΨ) ς1(dxπ) · 1 ςm(dxΨ).

(26)

Riemannian optimization Analysis

TakingE =Rr(n1+···+nd) as the classic parameterization, this yields

1 κ·√r · 1 ςm(dxπ) ≤r· 1 ςm(Jx) =r·kJ † xk2. where,

1 κ is the condition number of the trivial parameterization

E =S×r from Breiding and V (2017a); and 2 Jx is Terracini’s matrix;

3 kJx†k2 is the condition number of the parameterization with VFM from V (2017).

(27)

Riemannian optimization Analysis

The spectrum ofdxπ was analyzed in V (2017) for norm-balanced

x. We have ςm(dxπ) = min 1≤i≤rka 1 i ⊗ · · · ⊗adik d−1 d .

Paul showed thatκ does not depend on the norms of the rank-1 tensors.

For a fixed set of unit-norm rank-1 tensorspi ∈S(S), the

geometric condition numberof the CPD Pri=1αipi is the constant

κ, while theclassic condition number satisfies

kJx†k ≥ 1

κ·r√r ·1max≤i≤rα −1 i . whichblows up as someαj →0.

(28)

Riemannian optimization Analysis

Consequently, for a fixed geometric condition numberκ, the convergence of the Gauss–Newton method applied to the classic parameterization can be slowed down arbitrarily by changing the norms of the rank-1 terms of the local minimizer, while the Riemannian Gauss–Newton method’s convergence is unaffected.

(29)

Riemannian optimization

Riemannian Gauss–Newton with trust region

Riemannian Gauss–Newton with trust region

We propose applying a Riemannian Gauss–Newton (RGN) method withtrust region to

min p∈S×···×S 1 2kΣ(p)− Ak 2, for a givenA ∈Rn1×···×nd.

(30)

Riemannian optimization

Riemannian Gauss–Newton with trust region

LetM ⊂RN be an m-dimensional embedded submanifold.

Atangent vector toM atq is a vector tq∈RN such that there exists a smooth curvep(t)⊂ M with t∈(−1,1) for which

q=p(0) andtq= d

dtp(0).

Thetangent spaceTqM⊂RN of Matq ∈ Mis the

m-dimensional linear subspace spanned by all tangent vectors to

(31)

Riemannian optimization

Riemannian Gauss–Newton with trust region

In a RGN method, the objective function

f(x) = 1

2kΣ(x)− Ak 2

islocally approximated atp∈ S×r by the quadratic model

mp(t) :=f(p) +hdpf,ti+ 1 2ht,(dpΣ ∗d pΣ)(t)i, where

Hp:=dpΣ∗◦dpΣ is theGN Hessian approximation, and h·,·i is the inner product inherited from the ambient RN.

(32)

Riemannian optimization

Riemannian Gauss–Newton with trust region

The RGN method withtrust region considers the model to be accurate only in a radius ∆ aboutp.

p p

Thetrust region subproblem(TRS) is min

t∈TpSr

mp(t) subject toktk ≤∆,

(33)

Riemannian optimization

Riemannian Gauss–Newton with trust region

We need to advance fromp∈ S×r top0 ∈ S×r, along the direction p. However, whilep+p∈TpS×r,this point does not lie inS×r!

p

p

Rp(p)

S×r

TpS×r

We need aretraction operator (Absil, Mahoney, Sepulchre, 2008) for smoothly mapping a neighborhood of0∈TpS×r back to S×r.

(34)

Riemannian optimization

Riemannian Gauss–Newton with trust region

RGN with trust region method:

S1. Choose random initial pointspi ∈ S.

S2. Let p(1)←(p1, . . . ,pr), and set k ←0.

S3. Choose a trust region radius ∆>0.

S4. While not converged, do:

S4.1. Solve the trust region subproblem, resulting inpk ∈TpS×r. S4.2. Compute the tentative next iteratep(k+1)←Rp(k)(pk) via a

retraction in the direction ofpk fromp(k).

S4.3. Accept or reject the next iterate. If the former, incrementk.

S4.4. Update the trust region radius ∆.

(35)

Riemannian optimization

Riemannian Gauss–Newton with trust region

Retraction

Given a retraction operatorR0 for S, a retraction operatorR for the product manifoldS×r =S × · · · × S atp= (p

1, . . . ,pr) is Rp(·) := (Rp01×R 0 p2× · · · ×R 0 pr)(·),

which is called theproduct retraction.

Some known retraction operators forS are

the rank-(1, . . . ,1) T-HOSVD (De Lathauwer, De Moor, Vandewalle, 2001), proved by Kressner, Steinlechner, and Vandereycken (2014); and

the rank-(1, . . . ,1) ST-HOSVD(V, Vandebril, and Meerbergen, 2012), proved by Breiding and V (2017c).

(36)

Riemannian optimization

Riemannian Gauss–Newton with trust region

Trust region subproblem

In Breiding and V (2017c), the TRS is solved by combining a standarddogleg step with ahot restartingscheme.

Letgp be the coordinate representation of dpf, and let Hp be the

matrix ofdpΣ∗◦dpΣ. The dogleg step approximates the solutionp

of the TRS by b p=        pN =−H † pgp ifkpNk ≤∆ pC =−gTpHpgp gT pgp gp if kpNk>∆ and kpCk ≥∆ pI:=pC+ (τ −1)(pN−pC) s.t. kpIk= ∆, otherwise .

(37)

Riemannian optimization The hot restarts strategy

The hot restarts strategy

The Newton direction

pN=−Hp†gp.

is vital to the dogleg step.

Unfortunately, the Hessian approximationHp=dpΣ∗◦dpΣ can be

close to a singular matrix. In fact, q

kHp−1k2 = 1

ςm(dpΣ) =:κ(p), wherem= dimS×r.

(38)

Riemannian optimization The hot restarts strategy

Let

Ir ⊂ S×r :={p∈ S×r |κ(p) =∞}

be theill-posed locus. It turns out thatIr is a closed, nonempty, positive-dimensional subvariety ofS×r.

In Breiding and V (2017c), we provideheuristic arguments showing that nearq∈ Ir any RGN method will need many steps to escape

Ir when Σ(q) is a tensor

with infinitely many rank-r decompositions; or whose border rank is strictly smaller than its rank. Open questions:

Could such points be attractive for the RGN process? Do all CPDs in Ir cause slow convergence?

(39)

Riemannian optimization The hot restarts strategy

WheneverHp is close to a singular matrix we suggest to apply

random perturbationsto the current decompositionp untilHp is

sufficiently well-behaved. We call this ahot restarts procedure.

Evidently the success of this procedure depends on the average geometric condition number in the neighborhood of ap∈ Ir.

(40)

Numerical experiments

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(41)

Numerical experiments

Numerical experiments

The proposed RGN method with trust region and hot restarts (RGN-HR) was implemented in Matlab R2016b.

We compare it with somestate-of-the-art nonlinear least squares solversin Tensorlab v3.0 (Vervlietet al., 2016), namely nls lmand

(42)

Numerical experiments

We consider parameterized2 tensors in Rn1×n2×n3 with varying

condition numbers. There are three parameters:

1 c ∈[0,1] regulates the “colinearity” of the factor matrices 2 s ≥1 regulates the scaling, and

3 r is the rank. Typically,

1 increasing c increases the geometric condition number. 2 increasing s increases the classic condition number. 3 increasing r decreases the probability of finding a

decomposition.

2

(43)

Numerical experiments

The true rank-r tensor is then

A=

r X

i=1

a1i ⊗a2i ⊗a3i.

Finally, we normalize the tensor and add random Gaussian noise

E ∈Rn1×n2×n3 of magnitudeτ:

B= A

kAkF +τ E kEkF.

The tensorBis the one we would like to approximate by a tensor of rankr.

(44)

Numerical experiments

We will choosek random starting points and then apply each of the methods to each of the starting points.

The keyperformance criterion (on a single processor) is the expected time to success(TTS).

Let

1 the probability of success be pS,

2 the probability of failure be pF = 1pS,

3 a successful decomposition take mS seconds, and 4 a failed decomposition take mF seconds.

Then, the expected time to a first success is

E[TTS] = ∞ X k=0 pkF−1pS(mS + (k−1)mF) = pSmS +pFmF pS .

(45)

Numerical experiments

Speedup of RGN-HR

Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 s c 1 1 2 1 0.60 0.83 1 2 0.83 1 4 4 0.86 3 20 ∞ 0.0 0.25 0.5 0.75 c 0.44 0.91 1 3 0.40 0.89 2 6 0.45 1.00 2 5 0.71 2 ∞ ∞ 0.0 0.25 0.5 0.75 c 0.33 0.78 2 3 0.50 1 1 ∞ 0.42 0.75 5 3 0.59 1 10 ∞ 0.0 0.25 0.5 0.75 GNDL-PCG c 0.26 0.61 1 ∞ 0.24 0.54 1 ∞ 0.31 0.71 ∞ ∞ 0.45 8 5 4 1 2 3 4 s 11 9 16 15 11 19 17 21 11 14 18 30 26 30 65 38 14 18 21 23 21 20 34 52 37 29 27 30 33 92 69 66 57 40 26 53 27 70 45 16 82 49 38 44 168 53 182 246 GNDL 35 31 52 132 68 44 304 ∞ 250 76 171 56 99 509 130 284 1 2 3 4 s r = 15 2 1 2 2 1.00 1 2 1 2 1 1 1 1 1 1 1 r = 20 1.00 1 1 2 1.00 1.00 1 2 1.00 1 2 1 2 1 2 1 r = 25 1 1 1 2 1 1 2 0.79 1 1 3 2 2 2 2 ∞ RGN-Reg r = 30 0.87 2 3 2 2 2 3 3 2 2 1 1 2 3 2 2 noise levelτ = 10−3

(46)

Numerical experiments

Speedup of RGN-HR

Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 0.95 s c 3 6 6 6 2 3 7 8 1.00 1 7 13 0.86 3 16 36 3 10 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 2 4 5 3 0.56 0.92 6 13 0.60 2 5 15 0.69 4 8 ∞ 2 30 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 0.94 0.45 4 6 0.40 2 6 20 0.43 1 10 ∞ 0.48 1.00 23 ∞ 2 20 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 GNDL-PCG c 0.36 1 4 12 0.35 2 6 19 0.41 1 7 ∞ 0.44 1 ∞ ∞ 2 ∞ ∞ ∞ 1 2 3 4 s 6 28 15 16 9 18 27 30 15 15 17 80 12 63 83 45 50 55 160 163 25 20 50 24 15 32 52 29 35 38 53 102 75 53 60 157 113 117 114 326 24 28 40 42 39 61 154 89 134 88 89 110 496 107 115 145 ∞ 298 439 264 GNDL 39 78 186 66 68 94 195 185 601 113 132 893 216 416 1169 489 ∞ ∞ ∞ ∞ 1 2 3 4 s r = 15 0.80 1.00 1 2 1.00 1 2 4 1.00 1 2 3 1 1 3 4 2 2 1 5 r = 20 1.00 1 2 5 1 2 2 4 1.00 1 2 3 2 2 3 3 5 2 4 6 r = 25 1.00 1 3 9 2 1 2 3 1 2 3 2 2 2 7 4 34 9 4 7 RGN-Reg r = 30 0.91 2 3 3 1 2 2 5 1 2 6 28 3 5 17 6 18 38 7 6 noise levelτ = 10−5

(47)

Numerical experiments

Speedup of RGN-HR

Model 1, 15×15×15 tensors 1 2 3 4 0.0 0.25 0.5 0.75 0.95 s c 0.80 8 3 2 1.00 2 5 10 0.80 3 8 22 1 2 17 84 4 25 402 ∞ 0.0 0.25 0.5 0.75 0.95 c 0.44 2 2 12 0.91 1 4 13 1.00 2 6 26 0.73 4 5 31 2 10 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 c 0.74 3 4 14 0.63 0.84 4 19 0.40 2 5 19 1 3 30 ∞ 1 6 ∞ ∞ 0.0 0.25 0.5 0.75 0.95 GNDL-PCG c 1 0.96 4 44 0.24 1 4 9 0.38 3 9 124 0.44 3 23 ∞ 2 77 ∞ ∞ 1 2 3 4 s 14 9 21 13 14 13 31 19 23 13 33 49 24 20 73 124 71 75 92 101 20 30 25 32 35 25 39 89 64 82 55 84 73 61 56 136 255 128 275 724 19 38 109 101 79 57 95 120 109 83 61 122 75 384 277 162 2143 152 1205 ∞ GNDL 70 50 438 164 95 118 106 462 419 391 221 1882 836 355 1576 814 ∞ 1025 2004 ∞ 1 2 3 4 s r = 15 1.00 1.00 3 1 1.00 1.00 1 2 1.00 1 2 4 0.83 1 2 4 2 2 4 5 r = 20 1 1 1 6 1.00 3 2 3 1 2 2 5 1 2 3 4 2 2 8 3 r = 25 1 3 2 9 0.84 2 3 5 2 3 6 4 4 2 7 6 9 6 11 11 RGN-Reg r = 30 1 1 2 9 2 4 4 6 2 2 5 17 4 3 12 24 23 11 58 23 noise levelτ = 10−7

(48)

Numerical experiments

Speedup of RGN-HR

Model 2, 13×11×9 tensors 0 1 2 3 4 5 7 9 11 13 s r GNDL 3 4 4 7 4 4 5 17 16 28 10 20 31 46 40 16 31 168 381 59 58 166 452 inf failed 5 7 9 11 13 r GNDL-PCG 7 17 13 7 6 8 7 18 21 67 6 2 22 23 35 4 16 27 24 inf 10 24 24 inf failed noise levelτ = 10−5

(49)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(50)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(51)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(52)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(53)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(54)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(55)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(56)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(57)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(58)

Numerical experiments

Convergence plots

Model 2, rank 7, scalings = 2, noise level τ = 10−5

10-10 10-8 10-6 10-4 10-2 1 102 0 0.5 1 1.5 2 2.5 3 3.5 4 objective value Time (s) RGN-HR RGN-Reg GNDL LM GNDL-PCG LM-PCG

(59)

Conclusions

Overview

1 Introduction 2 Classic optimization 3 Riemannian optimization 4 Numerical experiments 5 Conclusions

(60)

Conclusions

Conclusions

Take-away story:

1 The classic and geometric condition numbers qualitatively predict the difference between a classic GN method and a RGN method for solving TAPs.

2 We proposed a Riemannian Gauss–Newton trust region method with dogleg step and hot restarts for solving TAPs. 3 Specifically for badly scaled problems the RGN method is

(61)

Conclusions

(62)

Conclusions

Main references

Breiding and V (2017a),The condition number of join decompositions, 2017. (Submitted)

Breiding and V (2017b),Convergence analysis of Riemannian Gauss-Newton methods and its connection with the geometric condition number, Applied Mathematics Letters, 2017. (Accepted)

Breiding and V (2017c),A Riemannian trust region method for the canonical tensor rank approximation problem, 2017. (Submitted)

(63)

Conclusions

General

De Lathauwer, De Moor, and Vandewalle (2000), A multilinear

singular value decomposition, SIAM J. Matrix Anal. Appl.

Hitchcock (1927),The expression of a tensor or a polyadic as a sum

of products, J. Math. Phys.

Kruskal (1977),Three-way arrays: rank and uniqueness of trilinear

decompositions, with application to arithmetic complexity and statistics, Lin. Alg. Appl.

V, Vandebril, and Meerbergen (2012),A new truncation strategy for

the higher-order singular value decomposition, SIAM J. Sci. Comput.

Vervliet, Debals, Sorber, Van Barel, and De Lathauwer (2016), Tensorlab v3.0, Available online.

(64)

Conclusions

Generic identifiability I

Ballico (2005),On the weak non-defectivity of Veronese

embeddings of projective spaces, Centr. Eur. J. Math.

Abo, Ottaviani, and Peterson (2009),Induction for secant varieties

of Segre varieties, Trans. Amer. Math. Soc.

Bocci and Chiantini (2013),On the identifiability of binary Segre

products, J. Algebraic Geom.

Bocci, Chiantini, and Ottaviani (2013),Refined methods for the

identifiability of tensors, Ann. Mat. Pura Appl.

Chiantini, Mella, and Ottaviani (2014), One example of general

unidentifiable tensors, J. Alg. Stat.

Chiantini and Ottaviani (2012),On generic identifiability of

3-tensors of small rank, SIAM J. Matrix Anal. Appl.

Chiantini, Ottaviani, and V (2014), An algorithm for generic and

low-rank specific identifiability of complex tensors, SIAM J. Matrix Anal. Appl.

Chiantini, Ottaviani, and V (2017), On generic identifiability of

(65)

Conclusions

Generic identifiability II

Galuppi and Mella (2017),Identifiability of homogeneous

polynomials and Cremona Transformations, arXiv.

Qi, Comon, and Lim (2016),Semialgebraic geometry of nonnegative

tensor rank, SIAM J. Matrix Anal. Appl.

Strassen (1983), Rank and optimal computation of generic tensors,

Linear Algebra Appl.

Conditioning

B¨urgisser and Cucker (2013),Condition: The Geometry of

Numerical Algorithms, Springer.

de Silva and Lim (2008),Tensor rank and the ill-posedness of the

best low-rank approximation problem, SIAM J. Matrix Anal. Appl.

Lee (2013),Introduction to Smooth Manifolds.

V (2017),Condition numbers for the tensor rank decomposition,

(66)

Conclusions

Optimization

Absil, Mahoney, and Sepulchre (2008),Optimization Algorithms on

Matrix Manifolds.

Dedieu and Kim (2002),Newton’s method for analytic systems of

equations with constant rank derivatives, J. Complexity.

Kressner, Steinlechner, and Vandereycken (2014),Low-rank tensor

completion by Riemannian optimization, BIT.

(67)

Conclusions

First, compute the Cholesky decomposition withR ∈Rr×r upper triangular of

C =c11T + (1−c)I =RTR.

Then, the factor matrices are

Ak =NkRdiag(s0,s1,s2, . . . ,sr), whereNk has standard normally distributed elements.

(68)

Conclusions

The settings for the Tensorlab methods were as follows:

AlgOpts = []; opts = [];

AlgOpts.TolFun = 1e-9 * tau^2; AlgOpts.TolX = 1e-12;

AlgOpts.TolAbs = 0; AlgOpts.MaxIter = 1000; AlgOpts.CGTol = 1e-6; AlgOpts.CGMaxIter = 75;

AlgOpts.LargeScale = true; % or false opts.Compression = false;

(69)

Conclusions

The settings for the proposed method were:

AlgOpts = []; opts = [];

AlgOpts.TolFun = 1e-6 * tau^2; AlgOpts.TolX = 1e-12; AlgOpts.TolAbs = 0; AlgOpts.MaxIter = 1000; AlgOpts.MaxRestarts = 500; opts.Compression = false; opts.AlgorithmOptions = AlgOpts;

We observed a delicate dependency on the relative function value toleranceTolFunfor all methods.

References

Related documents

development o f my child. These teachers were from Naparyaraq and had taught for an adequate number of years within the Naparyaraq School and Naparyaraq Preschool as teacher

Findings from this report indicate widespread use of online spaces and resources among LGBT youth. However, it is important to remember that LGBT youth are not a monolithic group

An improved trusted cloud computing platform model based on DAA and Privacy CA scheme 2 1.. Cloud Computing : The Limits of Public Clouds for

Voor dit laatste is de continuïteit ook van groot belang, de reden waarom we de continuïteit van groene burgeriniti- atieven ook als een belangrijke kracht kunnen zien.. Groene

We examine the effect of magnetic field on boundary layer flow of an incompressible electrically conducting water-based nanofluids past a convectively heated vertical porous plate

The agencies’ enforcement powers include the authority to remove or suspend a bank director from office or prohibit the director from participating in the affairs of the bank if

This implies two conditions: a common fiscal instrument has to enable a better response at the European level to developments in Member States and at the same time improve the mix