2.2 Tensor train format
2.2.1 Operations on tensors in TT format
We now focus on operations that are intrinsic to the algorithms that will be defined later, in the remaining chapters of the first part of the thesis. We go through some operations that were skipped in Section 2.1.1 as they are straight-forward when dealing with tensors in their full representation but not when this concrete format is considered. Additionally, even for operations that are straight-forward, we want to at least present them so that we go through all the ingredients needed in the algorithms.
Scaling. Given a TT tensor, and recalling that it can be represented in terms of core tensors as in (2.2), it is clear that multiplying a constant by the tensor is the same as multiplying it by one of the cores that compose it.
Orthogonalization. The representation of a given tensor in the TT format is not unique.
If, given the decompositionULμ =QR, we define that ULμ becomesQ while URμ+1becomes RURμ+1, the representation is still associated with the same tensor. IfQ has orthonormal rows, this is called the left-orthogonalization of the μ-th core. Similarly, the right-orthogonalization of a core is obtained by using the decomposition URμ = (QR)T =RTQT, setting thenURμ toQT while ULμ−1 is changed to ULμ−1RT.
In this context, if left-orthogonalization is performed on the first core, then for the second and so on, we obtain a left-orthogonal tensor, meaning that (ULμ)TULμ =Irμ. As a result, it is also true thatX≤μXT≤μ=Irμ, forμ = 1, ..., d−1. Similarly, if right-orthogonalization is performed on the second core, then for the third and so on, we obtain a right-orthogonal tensor, meaning that URμ(URμ)T =Irμ. In that case, we also have X≥μXT≥μ =Irμ, for μ = 2, ..., d.
There is a more general definition concerning orthogonality than the possibilities of a tensor being either left-orthogonal or right-orthogonal. If (ULμ)TULμ = Irμ for μ = 1, ..., ν − 1, and URμ(URμ)T = Irμ for μ = ν + 1, ..., d, then the tensor is said to be ν-orthogonal. In particular, being left-orthogonal is equivalent to being d-orthogonal while right-orthogonal is to being 1-orthogonal.
Addition. Consider two tensors X and Y such that rankTT(X) = (r0, r1, ..., rd) and rankTT(Y) = (˜r0, ˜r1, ..., ˜rd), represented in the format from (2.2) as
X(i1, ..., id) =U1(i1)U2(i2)· · · Ud(id), Y(i1, ..., id) = ˜U1(i1) ˜U2(i2)· · · ˜Ud(id).
2.2. Tensor train format
The result of adding the two tensors,Z = X + Y, can be represented as a TT tensor of TT rank (r0+ ˜r0, r1+ ˜r1, ..., rd+ ˜rd),
Z(i1, ..., id) =V1(i1)V2(i2)· · · Vd(id), where
V1(i1) =U1(i1) U˜1(i1) , Vd(id) =
Ud(id) U˜d(id)
, and
Vμ(iμ) =
Uμ(iμ) 0 0 U˜μ(iμ)
, μ = 2, ..., d− 1.
Adding two tensors results in a tensor with summed entries of the corresponding TT rank vectors.
Inner product. The inner product between two tensors represented in TT format consists of the Euclidean inner product of their vectorizations.
Noting that
vec(X) = X=1vec(U1) = (X≥2⊗ In1)vec(U1) = vec(U1×3X≥2), the inner product can be efficiently computed reformulating it as follows:
X, Y = vec(U2)TXT=2Y=2vec(V2)
= vec(U2)T(XT≥3⊗ In2 ⊗ Ir1)(Y≥3⊗ In2⊗ XT≤1Y≤1)vec(V2)
= vec(U2×3X≥3)Tvec(V2×1XT≤1Y≤1×3Y≥3)
=U2×3X≥3, V2×1XT≤1Y≤1×3Y≥3.
Therefore, we can take instead the inner product of two (d− 1)-dimensional tensors, obtained by removing the first core of each involved tensor and changing the second core of the second appropriately – V2 becomesV2×1XT≤1Y≤1=V2×1(UL1)TVL1.
As a result, the inner product can be obtained by successively reducing the dimensionality of the involved tensors. The idea is to obtain the matrices (ULμ)TVLμ and multiply the result to the next core Vμ+1, starting from μ = 1 and repeating the procedure until the last case,μ = d, is reached.
Concerning the norm, induced by the inner product, ifX is ν-orthogonal, then XT=νX=ν =
Irν−1nνrν, so that
||X|| =vec(Uν)TXT=νX=νvec(Uν) =
vec(Uν)Tvec(Uν) =||Uν||.
This means that obtaining the norm of the tensor only requires the computation of the norm of the ν-th core.
Truncation. In order to obtain a TT tensor with a certain TT rank, a generalization of the truncated SVD procedure for matrices can be used. The procedure is calledTT-SVD [Ose11b] and it basically requires that truncated SVD is applied to the different μ-th mode unfoldings of the tensor, for μ = 1, ..., d− 1.
Given a tensor X ∈ n1×···×nd, with TT rank (r0, r1, ..., rd), we successively apply the best rank-rμapproximation to the dimensionμ, which we denote by Prμμ, starting from the first mode and finishing in the mode d− 1. Such a projection is described by (PrμμX)μ =QQTXμ, where Q∈nμ×rμ contains the firstrμ left singular vectors of
Xμ. The whole projection can be represented as PrTT = Prd−1
d−1◦ · · · ◦ Pr11. Note that this projection goes from n1×···×nd to the manifold containing the tensors of TT rank (r0, r1, ..., rd).
While, as opposed to truncated SVD, TT-SVD does not fulfil the best approximation property, it still fulfils a quasi-best approximation one. This is stated in [Ose11b], and we also state it now, noting thatr stands for the manifold with tensors of TT-rankr.
Theorem 1. The result of applying the projection PrTT to tensor X, represented by PrTT(X), is such that
||X − PrTT(X)|| ≤√
d− 1||X − Pr(X)||, (2.6)
where Pr(X) is the projection associated with the best approximation of X within the set of tensors with TT rank r.
IfX is already given in TT format, with TT rank r, TT-SVD truncation to a prescribed TT rank ˜r can be obtained efficiently. Assuming that X is d-orthogonal, it will become (d− 1)-orthogonal if the SVD of URd,QSVT, is used to define new cores by setting URd to VT andULd−1 to ULd−1QS. Furthermore, if only ˜rd−1 singular values are kept, rd−1 reduces to ˜rd−1. This needs to be repeated until the core U1 is reached in order for the TT rank to become˜r instead of the original r.
Note that such truncation only makes sense if each individual entry of ˜r is smaller than the corresponding entry of r.
An alternative to the upper bound in (2.6) for||X − PrTT(X)|| that is more convenient follows.
2.2. Tensor train format
Figure 2.1: Singular values ofXμfor the stationary distribution of the large overflow model, ford = 4 (left plot) and d = 6 (right plot).
Theorem 2. The result of applying the projection PrTT to tensor X, represented by PrTT(X), is such that
where σj(·) denotes the j-th largest singular value of a matrix.
This upper bound is in fact strongly related with the TT-SVD procedure, in which the goal is to obtain the smallest possible TT rank entries for a given desired accuracy. This is a fundamental operation in the algorithms proposed in this format as we will explain in the following chapters when discussing them.
As in the matrix case, the success of the application of low-rank tensor methods strongly relies on the data – in our case, the stationary probability distribution – being well approximated by a low-rank tensor or not. According to (2.7), this can be quantified by considering the singular values of the different unfoldings. Good accuracy can only be expected when their singular values decay sufficiently fast.
We now exemplify this for a concrete model of an overflow queuing network that has been extensively used; see, e.g., [Buc00]; to test algorithms for solving the exact same problem that we address in the first part of this thesis.
Figure 2.1 displays the singular values of the relevant matricizations of the mentioned model. We consider two cases: d = 4 with nμ = 20 states per queue; and d = 6 with nμ = 6. Note that only the first half of the matricizations are considered since the singular values of the second half display a similar behaviour. It turns out that the singular values have a very fast decay, showing that this model can be well approximated with very low entries of the TT rank. This was expected since this model has a strong
underlying Kronecker structure; see a more detailed description of it in Section 3.4.1.
In general, statements on the size of the values of the TT rank are hard to find. Such statements can be found in [KS15] for a particular type of model. Furthermore, we here show, again for a particular model, that the exact solution can be approximated with very small TT rank entries.