Fixed Rank PSD Matrix Manifold Optimization

Solving low rank PSD optimization problems have been examined by Journ´ee et al. [34] where an algorithm that solves a sequence of fixed rank PSD matrix manifold optimization problems was introduced. The authors did not give a name to that algorithm, referring to it only as a “meta algorithm”. For ease of discussion, this meta algorithm is referred to here as “the rank incremental algorithm”.

Mishra et al. [58] used this rank incremental algorithm to solving the low embedding dimension EDMC problem, the Matlab code can be found at https://bamdevmishra.in/ codes/edmcompletion/. The geometry of the fixed rank problem is relevant to modelling protein structures because a protein’s Gram matrix is a fixed rank (rank 3) PSD matrix. Since Gram matrices of other ranks are not relevant to modelling protein ENMs, the details

of the rank incremental algorithm will not be discussed, see Journ´ee et al. [34], Mishra et al. [58], and the Matlab code for further details. The discussion from this point on will focus on solving the fixed rank problem only.

An optimization problem that requires its solution to be a fixed rank Gram matrix is an optimization problem on the Riemannian manifold Sn,r₊ , where r denotes the fixed rank. Such an optimization problem can be solved using either the gradient descent algorithm or the trust region algorithm generalized to Sn,r₊ [58].

Optimization algorithms on R3n(equivalently Rn) can be generalized to matrix manifolds, which are Riemannian manifolds where a point on the manifold can be represented by a matrix, see [1]. Appendix D provides the background needed to define a Riemannian manifold, and the compact Stiefel manifold, the non-compact Stiefel manifold, the Grassmann manifold, and the fixed rank PSD matrix manifolds are given as examples of matrix manifolds.

Absil et al. [1] has used abstract differential geometry as a foundation, and showed how such abstract ideas can be applied to various matrix manifolds.

The trust region algorithm generalized to Riemannian manifolds is called the Riemannian trust region (RTR) algorithm. The trust region algorithm has the desirable superlinear convergence properties of the Newton algorithm, but is much less sensitive to initial conditions than the Newton algorithm (see [1] Chapter 7). Another advantage of the trust region algorithm over Newton’s algorithm is that the subproblem is solved by the tCG algorithm, which does not require the Hessian matrix of the objective function to be formed explicitly, nor the inverse of the Hessian matrix to be found. This means the tCG algorithm can be conveniently generalized to the situation where the Riemannian manifold is not R3n_{, where the Hessian matrix is instead given as a linear operator acting on a matrix}

or vector, as seen in Section 3.6.5. Due to these advantages, the trust region algorithm will be used in this thesis.

Mishra et al.[58] showed the trust region algorithm is preferable to gradient descent for the EDMC problem following reasons:

• The trust region algorithm requires less iterations for convergence than gradient descent in the EDMC problem (see discussion in [58] Section V and VI).

• As the number of points n increases, the trust region algorithm has been observed to scale better than gradient descent for the EDMC problem. (see discussion in [58] Section VI).

In Section3.5and Section3.6, and the rest of this section, the Gram matrix factorization will be denoted as:

X = Y YT , instead of:

X = P PT.

The reason for this notation change is that P has been defined to be an n × 3 matrix in this thesis. However, the fixed rank problem currently being discussed is a subroutine called by the rank incremental algorithm. At the beginning of the rank incremental algorithm, the Y given as input to the fixed rank problem is n × 1, in the next iteration of the rank incremental algorithm Y is n × 2, and so on. Thus, Y will be defined to be an n × r matrix, instead of an n × 3 matrix.

The fixed rank problem solved by Mishra et al. [58] is: min

Y YT_∈Sn,r +

f (Y YT) =k H ◦ (K(Y YT) − D) k2_F , _(3.2)

where f (Y YT) is the objective function for the fixed rank problem. Equation (3.2) means the optimization problem will find the rank r PSD matrix Y YT _{whose corresponding EDM,}

K(Y YT_{), is closest to D. The is the same objective function as Equation (}_3.1_{), except the}

Gram matrix is factorized.

The factorization of the Gram matrix as X = Y YT is invariant to the transformation: Y → Y Q ,

where Q ∈ Or, and Or = {Q|QTQ = QQT = Ir}, Ir is the r × r identity matrix. Since X

is rank r and Y is an n × r matrix of maximal rank r, Y is an element of the non-compact Stiefel manifold, Y ∈ Rn×r

∗ . The optimization problem in Equation (3.2) is defined on the

quotient manifold:

Sn,r₊ _{' R}n×r_∗ /Or

This quotient manifold has equivalence classes [Y ] defined as:

[Y ] = {Y Q|Q ∈ Or} . (3.3)

That is, the matrix Y Q is the same point as the matrix Y on the quotient manifold Sn,r₊ _{' R}n×r

∗ /Or, for any Q ∈ Or.

When solving optimization problems on quotient manifolds, any matrix from an equivalence class can be used to represent that equivalence class. This is given as the following definition

for a point on Sn,r₊ _{' R}n×r ∗ /Or.

Definition 3.4.1 (A point on the quotient manifold Sn,r₊ _{' R}n×r_∗ /Or). Sn,r+ ' Rn×r∗ /Or

is a quotient manifold and each point on the manifold is an equivalence classes given by Equation (3.3) with potentially an infinite number of elements. In numerical algorithms and when storing in computer memory, any matrix Y ∈ [Y ] can be used to represent its equivalence class, and mathematical formulas are expressed using this “representative matrix”. This means a point on Sn,r₊ _{' R}n×r

∗ /Or can be represented simply by the matrix

Y .

Following Definition3.4.1, expressions like [Y ][Y ]T _{will not appear in the mathematical}

formulas relating to the fixed rank EDMC problem. The simpler notation Y YT _{will still be}

used. The notation f (Y YT) and f (Y ) will both be used to denote the objective function in Equation (3.2). The notation f (Y ) is preferable to f (Y YT_{) since it is easier to read.}

As discussed in [78] and Section D.6.5, Sn,r₊ is diffeomorphic to many geometries. In this thesis, unless stated otherwise, the diffeomorphism Sn,r₊ _{' R}n×r∗ /Or will always be

assumed, similarly for protein ENMs, Sn,3₊ _{' R}n×3

∗ /O3 is assumed.

In document Protein Structure Elastic Network Models and the Rank 3 Positive Semidefinite Matrix Manifold (Page 76-79)