Operations on tensors in TT format

2.2 Tensor train format

2.2.1 Operations on tensors in TT format

We now focus on operations that are intrinsic to the algorithms that will be deﬁned later, in the remaining chapters of the ﬁrst part of the thesis. We go through some operations that were skipped in Section 2.1.1 as they are straight-forward when dealing with tensors in their full representation but not when this concrete format is considered. Additionally, even for operations that are straight-forward, we want to at least present them so that we go through all the ingredients needed in the algorithms.

Scaling. Given a TT tensor, and recalling that it can be represented in terms of core tensors as in (2.2), it is clear that multiplying a constant by the tensor is the same as multiplying it by one of the cores that compose it.

Orthogonalization. The representation of a given tensor in the TT format is not unique.

If, given the decompositionU^L_μ =QR, we deﬁne that U^L_μ becomesQ while U^R_μ₊₁becomes RU^R_μ₊₁, the representation is still associated with the same tensor. IfQ has orthonormal rows, this is called the left-orthogonalization of the μ-th core. Similarly, the right-orthogonalization of a core is obtained by using the decomposition U^R_μ = (QR)^T =R^TQ^T, setting thenU^R_μ toQ^T while U^L_μ₋₁ is changed to U^L_μ₋₁R^T.

In this context, if left-orthogonalization is performed on the ﬁrst core, then for the second and so on, we obtain a left-orthogonal tensor, meaning that (U^L_μ)^TU^L_μ =I_r_μ. As a result, it is also true thatX_≤μX^T_≤μ=I_r_μ, forμ = 1, ..., d−1. Similarly, if right-orthogonalization is performed on the second core, then for the third and so on, we obtain a right-orthogonal tensor, meaning that U^R_μ(U^R_μ)^T =I_r_μ. In that case, we also have X_≥μX^T_≥μ =I_r_μ, for μ = 2, ..., d.

There is a more general deﬁnition concerning orthogonality than the possibilities of a tensor being either left-orthogonal or right-orthogonal. If (U^L_μ)^TU^L_μ = I_r_μ for μ = 1, ..., ν − 1, and U^R_μ(U^R_μ)^T = I_r_μ for μ = ν + 1, ..., d, then the tensor is said to be ν-orthogonal. In particular, being left-orthogonal is equivalent to being d-orthogonal while right-orthogonal is to being 1-orthogonal.

Addition. Consider two tensors X and Y such that rank_TT(X) = (r₀, r₁, ..., r_d) and rank_TT(Y) = (˜r₀, ˜r₁, ..., ˜r_d), represented in the format from (2.2) as

X(i₁, ..., i_d) =U₁(i₁)U₂(i₂)· · · Ud(i_d), Y(i₁, ..., i_d) = ˜U₁(i₁) ˜U₂(i₂)· · · ˜U_d(i_d).

2.2. Tensor train format

The result of adding the two tensors,Z = X + Y, can be represented as a TT tensor of TT rank (r₀+ ˜r₀, r₁+ ˜r₁, ..., r_d+ ˜r_d),

Z(i₁, ..., i_d) =V₁(i₁)V₂(i₂)· · · Vd(i_d), where

V₁(i₁) =U₁(i₁) U˜₁(i₁), V_d(i_d) =

U_d(i_d) U˜_d(i_d)

, and

V_μ(i_μ) =

U_μ(i_μ) 0 0 U˜_μ(i_μ)

, μ = 2, ..., d− 1.

Adding two tensors results in a tensor with summed entries of the corresponding TT rank vectors.

Inner product. The inner product between two tensors represented in TT format consists of the Euclidean inner product of their vectorizations.

Noting that

vec(X) = X₌₁vec(U₁) = (X_≥2⊗ In₁)vec(U₁) = vec(U₁×₃X_≥2), the inner product can be eﬃciently computed reformulating it as follows:

X, Y = vec(U₂)^TX^T₌₂Y₌₂vec(V₂)

= vec(U₂)^T(X^T_≥3⊗ In₂ ⊗ Ir₁)(Y_≥3⊗ In₂⊗ X^T_≤1Y_≤1)vec(V₂)

= vec(U₂×3X_≥3)^Tvec(V₂×1X^T_≤1Y_≤1×3Y_≥3)

=U₂×₃X_≥3, V₂×₁X^T_≤1Y_≤1×₃Y_≥3.

Therefore, we can take instead the inner product of two (d− 1)-dimensional tensors, obtained by removing the ﬁrst core of each involved tensor and changing the second core of the second appropriately – V₂ becomesV₂×1X^T_≤1Y_≤1=V₂×1(U^L₁)^TV^L₁.

As a result, the inner product can be obtained by successively reducing the dimensionality of the involved tensors. The idea is to obtain the matrices (U^L_μ)^TV^L_μ and multiply the result to the next core V_μ₊₁, starting from μ = 1 and repeating the procedure until the last case,μ = d, is reached.

Concerning the norm, induced by the inner product, ifX is ν-orthogonal, then X^T_=νX_=ν =

I_r_ν−1_n_ν_r_ν, so that

||X|| =vec(U_ν)^TX^T_=νX_=νvec(U_ν) =

vec(U_ν)^Tvec(U_ν) =||Uν||.

This means that obtaining the norm of the tensor only requires the computation of the norm of the ν-th core.

Truncation. In order to obtain a TT tensor with a certain TT rank, a generalization of the truncated SVD procedure for matrices can be used. The procedure is calledTT-SVD [Ose11b] and it basically requires that truncated SVD is applied to the diﬀerent μ-th mode unfoldings of the tensor, for μ = 1, ..., d− 1.

Given a tensor X ∈ ⁿ¹^×···×n^d, with TT rank (r₀, r₁, ..., r_d), we successively apply the best rank-r_μapproximation to the dimensionμ, which we denote by P_r^μ_μ, starting from the first mode and finishing in the mode d− 1. Such a projection is described by (P_r^μ_μX)^μ =QQ^TX^μ, where Q∈ⁿ^μ^×r^μ contains the firstr_μ left singular vectors of

X^μ. The whole projection can be represented as P_r^TT = P_r^d⁻¹

d−1◦ · · · ◦ Pr¹₁. Note that this projection goes from ⁿ¹^×···×n^d to the manifold containing the tensors of TT rank (r₀, r₁, ..., r_d).

While, as opposed to truncated SVD, TT-SVD does not fulﬁl the best approximation property, it still fulﬁls a quasi-best approximation one. This is stated in [Ose11b], and we also state it now, noting that_r stands for the manifold with tensors of TT-rankr.

Theorem 1. The result of applying the projection P_r^TT to tensor X, represented by P_r^TT(X), is such that

||X − Pr^TT(X)|| ≤√

d− 1||X − Pr(X)||, (2.6)

where P_r(X) is the projection associated with the best approximation of X within the set of tensors with TT rank r.

IfX is already given in TT format, with TT rank r, TT-SVD truncation to a prescribed TT rank ˜r can be obtained eﬃciently. Assuming that X is d-orthogonal, it will become (d− 1)-orthogonal if the SVD of U^R_d,QSV^T, is used to deﬁne new cores by setting U^R_d to V^T andU^L_d₋₁ to U^L_d₋₁QS. Furthermore, if only ˜r_d₋₁ singular values are kept, r_d₋₁ reduces to ˜r_d−1. This needs to be repeated until the core U₁ is reached in order for the TT rank to become˜r instead of the original r.

Note that such truncation only makes sense if each individual entry of ˜r is smaller than the corresponding entry of r.

An alternative to the upper bound in (2.6) for||X − Pr^TT(X)|| that is more convenient follows.

2.2. Tensor train format

Figure 2.1: Singular values ofX^μfor the stationary distribution of the large overﬂow model, ford = 4 (left plot) and d = 6 (right plot).

Theorem 2. The result of applying the projection P_r^TT to tensor X, represented by P_r^TT(X), is such that

where σ_j(·) denotes the j-th largest singular value of a matrix.

This upper bound is in fact strongly related with the TT-SVD procedure, in which the goal is to obtain the smallest possible TT rank entries for a given desired accuracy. This is a fundamental operation in the algorithms proposed in this format as we will explain in the following chapters when discussing them.

As in the matrix case, the success of the application of low-rank tensor methods strongly relies on the data – in our case, the stationary probability distribution – being well approximated by a low-rank tensor or not. According to (2.7), this can be quantified by considering the singular values of the different unfoldings. Good accuracy can only be expected when their singular values decay sufficiently fast.

We now exemplify this for a concrete model of an overﬂow queuing network that has been extensively used; see, e.g., [Buc00]; to test algorithms for solving the exact same problem that we address in the ﬁrst part of this thesis.

Figure 2.1 displays the singular values of the relevant matricizations of the mentioned model. We consider two cases: d = 4 with n_μ = 20 states per queue; and d = 6 with n_μ = 6. Note that only the ﬁrst half of the matricizations are considered since the singular values of the second half display a similar behaviour. It turns out that the singular values have a very fast decay, showing that this model can be well approximated with very low entries of the TT rank. This was expected since this model has a strong

underlying Kronecker structure; see a more detailed description of it in Section 3.4.1.

In general, statements on the size of the values of the TT rank are hard to ﬁnd. Such statements can be found in [KS15] for a particular type of model. Furthermore, we here show, again for a particular model, that the exact solution can be approximated with very small TT rank entries.

In document Low-rank tensor methods for large Markov chains and forward feature selection methods (Page 34-38)