Differential equations for the factor matrices

1.3 An integration method

1.3.2 Differential equations for the factor matrices

The underlying representation of the low-rank matrix that is sought admits the formY(t) = U(t)S(t) V(t)>. Using this decomposition, the time derivative of the low-rank matrixY(t) is determined by the Leibniz rule, which results in

Y(t) =U(t)S(t) V(t). >+ U(t)S(t) V(t). >+ U(t)S(t)V(t). >. (1.13) Moreover, since U(t) and V(t) have orthonormal columns, we have U(t)>U(t) = In1 and

V(t)>V(t) = In2, respectively. Again, by the Leibniz rule, we obtain

U(t)>U(t) + U(t)>U(t) = 0. n1 and

V(t)>V(t) + V(t)>V(t) = 0. n2

and hence, translating the orthogonality conditions (1.9) to this time-dependent setting, we require

U(t)>U(t) = 0. n1 and V(t)

V(t) = 0n2. (1.14)

In the following, we will use representation (1.13) for the time derivative ofY, but for ease of presentation we omit using the time dependencet.

With the unique representations (1.10)-(1.12) of the tangent factor matrices and the unique form (1.8) of δ Y at hand, we are now in the situation to deduce differential equations for the factor matricesS, U and V.

Recalling that the minimization condition (1.4) is equivalent to the Galerkin condition (1.5), viz.,

hY. − F (t, Y), δ Yi = 0 for all δ Y∈ TYM,

the idea is to choose δ Y in a way that provides ordinary differential equations for the factor matrices. Since the Galerkin condition holds for all tangent matrices

1. The dynamical low-rank approximation

we first choose δ U = δ V = 0. With the form (1.13) of Y and with the orthogonality. constraints (1.14), this yields

0 =hY. −A, U δS V. >i =hU>US V. >V + U>US V. >V + U>U δSV.>V− U>A V, δS. i =_hS._{− U}>A V, δS. _i. It follows that . S = U>A V .. (1.15)

Next, for deriving the differential equation for U, we choose δ Y_{∈ T}YM to be of the

formδ Y = δ U S V>, i.e.,δS = δ V = 0. Then, by using the form (1.15) of S, we find. 0 =_hY. ₋A, δ U S V. >_i

=_hUS + U. S. ₋A V, δ U S. _i

=hUSS. >+ USS. >−A V S. >, δ Ui

=hUSS. >+ U U>A V S. >−A V S. >, δ Ui.

Now, due to the orthogonality condition U>δ U = 0n1, the tangent matrix lies in the

range of the orthogonal complement of the space spanned by the columns of U. In other words, by defining

PU= U U> and P⊥U= In1− U U

as the orthogonal projections onto the spaces spanned by the columns of U as well as the orthogonal projection onto the complements of those spaces, we conclude that δ U _∈ range P⊥_U. Therefore, there exists an arbitrary δ W _{∈ R}n1×n2_{, such that} _{δ U = P}⊥

Uδ W.

Hence, it follows that

0 =_hUSS. >+ U U>A V S. >₋A V S. >, P⊥_Uδ W_i =_hP⊥_UUSS. >+ P⊥_UU U>A V S. >_{− P}⊥_UA V S. >, δ W_i =hUSS. >− P⊥U

A V S>, δ Wi.

Since this holds for an arbitrary matrixδ W∈ Rn1×n2_{, it yields}

0 =USS. >− P⊥U

A V S>

and by multiplying by S−> as well as by S−1 from the right, we obtain a differential equation for the factor matrix U, which is of the form

U = P⊥_UA V S. −1. (1.16) The derivation of the differential equation for the factor matrix V goes along similar lines. The Galerkin condition (1.5) holds for all tangent matrices δ Y ∈ TYM and in

(1.5), using the representation (1.13) of the time derivative ofY, adhering the orthogonality constraints (1.14) and inserting the time derivative (1.15) of S results in

0 =hY. −A, U Sδ V. >i =hU>US V. >+ U>US V. >+ U>U SV.>− U>A, Sδ V. >i =_hS>S V. >+S>SV.>_{− S}>U>A, δ V. >_i =_hS>U>A V V. >+S>SV.>_{− S}>U>A, δ V. >_i =hV V>A.>U S +VS. >S−A.>U S, δ Vi. Defining PV= V V> and P⊥V = In2− V V >

to be the orthogonal projection onto the space spanned by the columns ofV and the orthogonal projection onto the complement space, we conclude by the orthogonality condition V>δ V = 0 in (1.9) that δ V_{∈ range P}⊥_V. Hence, there exists an arbitraryδ Z_{∈ R}n1×n2_,

such thatδ V = P⊥_Vδ Z. Therefore, we find

0 =hV V>A.>U S +VS. >S−A.>U S, δ Vi =_hVS. >S_{− P}_V⊥A.>U S, P⊥_Vδ Z_i

=_hP⊥_VVS. >S_{− P}⊥_VA.>U S, δ Z_i =_hVS. >S_{− P}⊥_VA.>U S, δ Z_i.

This equation holds for any δ Z_{∈ R}n1×n2 _{and so we conclude that}

0 =VS. >S_{− P}⊥_VA.>U S,

which by multiplying first by S−1 _{and second by} _S−> _{from the right yields a differential}

equation forV, which then is of the form

V = P⊥_VA.>U S−>. (1.17) The idea of the integration method proposed in [KL07] is to solve the system of differential equations (1.15)-(1.17) for the factor matrices of the low-rank representation of Y with appropriately given initial valuesU(t0), S(t0) and V(t0), which ideally come from a

truncated SVD of the initial value A(t0) of the differential equation (1.3). In a nutshell,

we solve the system of equations

S(t) = U(t)>A(t) V(t),. S(t0) = S0,

U(t) = P⊥_UA(t) V(t)S(t). −1, U(t0) = U0,

V(t) = P⊥_VA(t). >U(t)S(t)−>, V(t0) = V0,

(1.18)

with initial valuesS(t0) ∈ Rr×r, U(t0)∈ Rn1×r and V(t0)∈ Rn2×r, respectively. Solving

1. The dynamical low-rank approximation

resulting factor matrices by each other yields the low-rank approximation matrixY(t) to A(t) at time t = t1, i.e.,

Y(t1) = U(t1)S(t1) V(t1)>.

Note that in practice, in order to avoid expensive computations, we actually never build this product.

To continue with the integration procedure in time, we simply take the factor matrices at timet1as initial values for the next time integration fromt1 → t1+ h = t2 and so forth.

We remark that it is important to assure orthonormality of the initial values for the factor matrices U(t) and V(t) at subsequent time steps. In other words, the solutions of the differential equations for U(t) and V(t) must stay orthonormal. Now, due to the orthogonality conditions (1.14), the time derivatives ofU(t)>U(t) and of V(t)>V(t) are given by

U(t)>U(t) + U(t)>U(t) = 0. n1 and

V(t)>V(t) + V(t)>V(t) = 0. n2,

respectively. This means that U(t)>U(t) and V(t)>V(t) are constant and since the initial values U(t0) and V(t0) are supposed to have orthonormal columns, we conclude

thatU(t) and V(t) retain orthonormal columns during the time-integration of the system (1.18). From the computational perspective, when integrating this system of ODEs numer- ically, this property is preserved by, e.g., orthogonality-preserving Runge–Kutta methods described in [HLW06, IV.9].

In document Time integration for the dynamical low-rank approximation of matrices and tensors (Page 34-37)