CHAPTER 2: Methods for Fr´echet Means in Phylogenetic Treespace
2.5 Interior point methods for optimizing edge lengths
2.5.1 A damped Newton’s method
If these approximate optimality conditions are satisfied thenF(X∗)will not differ much from the Fr´echet function value when the lengths of edges with positive derivatives are set to 0.
2.5.1 A damped Newton’s method
The following algorithm is designed to find approximately optimal edge lengths for a fixed tree topology. Detailed explanations for the steps of this algorithm are in the following subsections. Interior point algorithm for optimal edge lengths
input:T1, T2, . . . , Tn, X0∈ Tr, >0,δ >0,0< c <1
whileapproximate optimality conditions (2.5) are not satisfied do
compute a descent directionP (Sec. 2.5.1.1)
find a feasible step-length,α, satisfying decrease condition (Sec. 2.5.1.2) letXk+1=Xk+αP
if|e|< thenremovee endwhile
2.5.1.1 Newton steps
A successful iterative algorithm will make substantial progress to an optimal point. This can be achieved using a modified Newton’s method. Newton’s method uses a descent vector which points to the minimizer of a quadratic approximation of the objective function. The quadratic approximation in Newton’s method uses the first three terms of the Taylor expansion ofF(X). For the Fr´echet function the entries of the Hessian matrix of second order partial derivatives is given in Def. 2.4.5.
The Hessian matrix is positive definite because the Fr´echet function is strictly convex. Thus, when the Hessian and gradient are well defined, the second order Taylor approximation is
g(X;p) =F(X) +X e∈E ∇(F(X))e(pe) + X e∈E X e0∈E (pe)(pe0)∇2F(X) ee0 (2.6) The minimizer inpofg(X;p)is the Newton vectorpN =−∇F(X)(∇2F(X))−1. In cases where Hessian is not well defined or poorly scaled the Newton vector either must be modified or shouldn’t be used at all. There are two such cases: (i) the Hessian is poorly scaled if any setAil haskAi
lkclose
to zero and|Ai
l|>1; and (ii) the Hessian is not well defined on the shared faces of multivistal cells
for edges in a support pair which changes from one side of the shared face to the other. In case (i), as kAi
lk approaches zero|
∇2F(X)
ee0|increases without bound for e, e0 ∈ Ail. Thus, the Hessian matrix may become ill conditioned. But even askAi
lkapproaches zero, other
entries of the gradient and Hessian are stable. Ife∈Ail, ande0∈/ Ali, ande, e0∈Ailforl6=q, then those entries|
∇2F(X)
ee0|approaches zero.
But even if case (i) occurs, the edges which have lengths approaching zero will have very little effect on the lengths of large scale edges. The mixed second order partial derivatives, with one edgeeinAilwithkAi
lk →0, and the other edgee
0inAi
lwithkAilkbounded positively from below,
are essentially zero. Therefore, these mixed partial derivatives should have very little effect in the quadratic approximationg(X;p).
In case (ii),∇2F(X)
ee0)does not agree on the shared face of multi-vistal cells. In this case a direction of improvement is based on the gradient and the algebraic form of the Hessian matrix for points strictly within that shared face. The Newton direction is obtained by solving for the minimizer of the second order Taylor series quadratic model. The solution is obtained by solving the linear system∇2F(X)p=−∇F(X). Since the entries of∇2F(X)may jump across vistal cells the entries of this matrix will change in a non-uniform way and the solutionpN will jump.
2.5.1.2 Choosing a step length
The taking a full step along the Newton direction minimizes the quadratic approximation of the Fr´echet function, however taking a full step may result in a new point which actually has a larger Fr´echet function value or that may be on outside the orthant boundaries.
The first precaution is to calculate the maximum step lengthα0such that|e|k+1 =|e|k+α0pe≥ 0for alle. Ifα0 ≤1then letα=α0c0where0< c0 <1.
Choosing step-length which satisfies the followingsufficient decrease conditionwill ensure a substantial decrease in the objective function value at every step. Let0< c1<1.
F(Xk+αp)≤F(Xk) +c1α∇Fk0p (2.7)
Thecurvature condition, which rules out unacceptably short steps, requires the step-length,α, to satisfy
∇F(Xk+αp)0p≥c2∇Fk0p (2.8)
for some constantc2in the interval(c1,1). 2.5.1.3 Initialization
For initializing an interior point search, any point inO(E)would suffice, but it is preferential to start with a good guess for edge lengths. The global search algorithms presented in the next section could provide a starting point for a local search. One good start strategy can be derived by noticing that the Fr´echet function can be separated into a quadratic part and a part involving sums of norms.
F(X) = n X i=1 ki X l=1 kAi lk2+ 2kAilkkBlik+kBlik2+ X e∈Ci (|e|2 X − |e|Ti)2 (2.9)
The only terms that cannot be expressed in a quadratic function of the edge lengths are collected into function S(X) = n X i=1 ki X l=1 2kAi lkkBlik (2.10)
The quadratic parts are collected into function
Q(X) = n X i=1 ki X l=1 kAilk2+kBlik2+ X e∈Ci (|e|X − |e|Ti)2, (2.11)
The minimizer ofQ(X),XQ∗, can be easily found by solving∇Q(X) = 0; the solution is |e|X∗ Q = Pn i=1|e|Ti n . (2.12)
The optimal value|e|X∗
Q is non-negative, and ifeis common in any ofT
1, . . . , Tn, then|e| XQ∗ is
positive. The gradient ofS(X)is non-negative at any feasibleX, which implies that the optimal edges lengths inX∗must be no larger than the edge lengths inXQ∗ i.e. the optimal solution is in the closed box
0≤ |e|X ≤ |e|X∗
Q ∀e∈E. (2.13)