Algorithm from Section 5.1.1 in TT format

5.2 Proposed algorithms

5.2.1 Algorithm from Section 5.1.1 in TT format

Our ﬁrst proposed algorithm combines the algorithm described in Section 5.1.1 with TT format by having all involved structures, in particular matrices and vectors, in this

5.2. Proposed algorithms

format.

As noted in Section 5.1.1, this new algorithm is a similar tensorized multigrid scheme to the one proposed in Section 4.1.2 (both reducing the size of the diﬀerent modes in each grid considering an underlying 1D local topology). This should become even more clear now that we explicitly discuss the adaptation of the algorithm to TT format.

Furthermore, they are similar in the fact that they both aim at reducing the mode sizes in each grid. As a consequence, it is logical that the particularities of the tensorized multigrid scheme that we consider are very similar to those that were considered for the other mentioned scheme. We now go through such particularities, and the mentioned similar algorithm is referred to as algorithm of reference.

Restriction and interpolation. Because the corresponding aggregation is represented in a tensorized way, as seen in Section 5.1.1, it allows a TT representation with all entries of the TT rank equal to 1. The same holds for disaggregation. A similar statement was made for the restriction and interpolation operators associated with the algorithm of reference. The idea is that the small matrices involved in the Kronecker representation are simply the matrices A_μ,μ = 1, ..., d, in the corresponding TT representation (2.8).

The diﬀerence is that such matrices are now of the form (5.1) (adapted to the mode sizes).

As noted in the context of the algorithm of reference, when using the multigrid scheme described in Algorithm 3, the only important property of the matrix that should be kept from one level to another is that the sum of the columns is 0. This holds independently of how interpolation is chosen as, assuming that the sum of the columns at level  is 0, 1^TA₊₁ =1^T(S⁽⁾AP⁽⁾) = (1^TS⁽⁾)AP⁽⁾=1^TAP⁽⁾=0 (where 0 denotes, again, a vector of zeros) using the fact that1^TS⁽⁾ =1^T – the sum of the columns of the restriction operator S is 1. In this context, we use the transpose of restriction for interpolation, which corresponds to considering, as in the context of the corresponding operators for the algorithm of reference, eachP_k, k = 1, ..., d, to be the transpose of S_k. The reason was explained in when justifying in Section 1.1 how simple it is to obtain a representation of Q^T of the form (1.3) given the same type of representation forQ.

Smoother. The fulﬁlments that must be veriﬁed by the smoother are exactly the same as in the algorithm of reference. In this context, we choose again GMRES.

We use three steps of GMRES in the ﬁnest grid while one step in the remaining grids.

In fact, while we used three steps in all other grids for the algorithm of reference, in this case we veriﬁed that the extra cost associated with such additional steps was not worth the corresponding gain in convergence. Note that the separation is done depending on the level and not on whether we are in a presmoothing or postsmoothing stage, as

typically done.

Given that we have even less smoothing steps than in the algorithm of reference, the associated requirements in storage and computation are now even more negligible.

Coarsest grid solver. The coarsest grid is still aﬀected by the curse of dimensionality.

In fact, just as in the algorithm of reference, the mode sizes are reduced from one level to the next while the number of modes remains unchanged. In this context, the same solution that was adopted in the algorithm proposed in this thesis to solve this problem from the algorithm of reference, recall Section 4.2.3, can be adopted: we use AMEn [DS14] as coarsest grid solver.

Normalization. In (1.1) we have the restriction that the sum of the entries of the solution is 1. This is not naturally kept during a cycle, so that, again as in the algorithm of reference, we normalize the obtained approximation after each cycle.

Truncations. Truncation is again needed during a cycle to prevent excessive rank growth.

TT-SVD algorithm is again used as in the algorithm of reference, being applied in the exact same steps of Algorithm 3.

In particular, truncation is again done after lines 6 and 8 of Algorithm 3. As for the parameters that are considered in these truncations, the restricted residual in line 6 is again truncated with constant accuracy 10⁻¹. As for the truncation of the updated iterates v after line 8, it is now less complicated since the dependency of their norms on the level is not as strong as in the algorithm of reference, so that we do target an accuracy that depends on the level. The only adaptive scheme that is considered is related to the residual norm after the previous cycle. In fact, the target accuracy is that value times a constant, 10. This is again also the accuracy that is considered for the truncations inside the GMRES smoother.

The same upper bound on the TT rank entries that are allowed after each truncation as in the algorithm of reference are imposed: it is initially set to 15 and grows by a factor of

√2 after each cycle for which the new residual norm is larger than ₁₀⁹ times the residual norm obtained considering the solution from the previous cycle.

Parameters of AMEn in the coarsest grid problem. For the same reason that the particularities of this algorithm are in general similar to those from the algorithm of reference, it makes sense that the parameters associated with the application of AMEn in the coarsest grid are similar to those of Multigrid-AMEn; recall Section 4.2. In fact, the parameters are the same: AMEn targets an accuracy that is the residual norm after

5.2. Proposed algorithms

the previous multigrid cycle; the enrichment rank is 3, and the approximation of the associated residual is obtained by ALS as suggested in [DS14]; the subproblems are solved with a direct solver for problems of size up to 1000, while MINRES is used otherwise.

Size of the coarsest grid problem. By construction of restriction and interpolation, the mode sizes in the coarsest grid can only be powers of 2 (except 2⁰ = 1 since it is not possible to reduce the problem to a single variable per mode, which was also the case for the algorithm of reference, even if for different reasons, recalling that the operators for restriction and interpolation are different). In sequence of the comments concerning the size of the coarsest grid problem for Multigrid-AMEn; recall Section 4.2; it was noted that mode sizes 5 would still be too large while mode sizes 3 are already small enough for AMEn to be possible to apply effectively. The question is whether mode sizes 4 are still possible to use or we need to reduce the mode sizes to the value 2. With a similar type of study that led to the mentioned conclusions for Multigrid-AMEn, we concluded that mode sizes 4 would still be too large.

Thus, the number of levels is chosen such that the coarsest grid problem has mode sizes 2, which is the minimum possible value.

Initial approximation of the solution. The algorithm is initialized with the tensor that results from solving the coarsest grid problem, which is then brought up to the ﬁnest level using interpolation, and such problem suﬀers from the curse of dimensionality so that we cannot apply a direct solver, just as for the algorithm of reference. In this context, as in Multigrid-AMEn; recall Section 4.2; our variant of AMEn is used.

Allowing the number of levels to depend on the mode. We allow the possibility that a diﬀerent number of levels for diﬀerent modes is considered, as in Multigrid-AMEn; recall Section 4.2. For a certain level, if there are modes for which we do not want to restrict further, we simply set the corresponding core to identity.

In document Low-rank tensor methods for large Markov chains and forward feature selection methods (Page 84-87)