Parametric Search - Parametric Sequence Alignment

Parametric Sequence Alignment

4.4 Parametric Search

The term parametric search refers to any question that involves finding a parameter vector λ^∗ that satisfies some specified property. Problems 1 and 5 of 4.2.4 fall in this category.

In both cases, λ^∗ is a vertex of the maximization diagram of Z: For ray shooting, after re-parametrization, the λ^∗ we seek is a breakpoint of Z. For inverse optimum alignment, the point λ^∗ that minimizes the diﬀerence between Z(λ) and score(A⁽⁰⁾, λ) can always be chosen to be a vertex of the maximization diagram of Z.

We review four methods of parametric search: bisection search, Newton’s method, gra-dient descent, and Megiddo’s method. All of these operate by generating a set of candidate values for λ^∗, using them to narrow down the search by invoking either an evaluator (Deﬁ-nition 4.4) or an oracle.

4-12 Handbook of Computational Molecular Biology DEFINITION 4.7 An oracle for a one-dimensional parametric search problem is a pro-cedure that, given a parameter value ˆλ determines whether or not ˆλ is less than or equal to the parameter value being sought.

Oracles and evaluators are often related. To illustrate this, consider the following ray-shooting problem that is used as a sample application of three of the methods presented here.

Indel penalty sensitivity analysis in global alignment. Let A⁽⁰⁾ be an optimum global alignment for some given indel penalty γ⁽⁰⁾. Assuming that the gap penalty is zero and that the reward for matches and the mismatch penalty are each one, ﬁnd the largest indel penalty γ^∗≥ γ⁽⁰⁾ such that A⁽⁰⁾ is optimal for every γ∈ [γ⁽⁰⁾, γ^∗].

Note that, by Remark 4.1, no generality is lost by the above choices for the weights of matches and mismatches. As seen in 4.2.2, an evaluator for this problem is the standard dynamic programming algorithm for optimum global alignment. An oracle for the problem must determine whether a given ˆγ ≥ γ⁽⁰⁾ is less than or equal to γ^∗. To test this, first use the evaluator to find an optimum alignment ˆA when the indel penalty equals ˆγ. Next, compare score(A⁽⁰⁾, ˆγ) and score( Â, ˆγ). If they are equal, then ˆγ ≤ γ^∗. Otherwise (since Â is optimum at ˆγ), the only possibility is that score(A⁽⁰⁾, ˆγ) < score( Â, ˆγ), and thus ˆγ > γ^∗. The rest of this section is organized as follows. In 4.4.1–4.4.4, we give an overview of bisection search, Megiddo’s method, Newton’s method, and gradient descent. This is followed by applications of parametric search to inverse alignment (4.4.5) and sensitivity analysis (4.4.6).

4.4.1 Bisection Search

Bisection search for one-parameter problems is easy to describe. Suppose λ^∗ (which, by assumption, is a breakpoint) is known to lie in some intervalI on the real line. Repeatedly halve I, taking the left or right half depending on the outcome of an oracle call at the midpoint. The search stops when I is too small to contain more than one breakpoint; the sole breakpoint that remains inI must be λ^∗. We use the indel penalty sensitivity analysis problem to illustrate this technique. We show that the properties of the scoring function imply a logarithmic bound on the number of halving steps (and, therefore, the number of oracle calls) required. Similar ideas can be used to prove the eﬃciency of the bisection search in other applications.

The goal is to locate the ﬁrst breakpoint γ^∗ of Z that follows γ⁽⁰⁾. For this, we (i) choose a suﬃciently large search intervalI,

(ii) repeatedly bisect I until it has at most one breakpoint of Z, but still contains γ^∗, and

(iii) locate γ^∗ withinI.

The oracle for the problem has already been described. Its running time is O(nm) (the work to compute an optimum alignment using dynamic programming). It remains to explain the implementation of steps (i)–(iii). As usual, let n and m be the lengths of the sequences, n≤ m.

Consider step (i). Let A⁽¹⁾,A⁽²⁾, . . . denote the series of optimal alignments along the intervalI. Let wi, x_i, y_idenote the number of matches, mismatches, and indels inA⁽ⁱ⁾. Let

∆w_i = w_i+1− wi, ∆x_i = x_i+1− xi, ∆y_i = y_i+1− yi and let γ⁽ⁱ⁾ be the breakpoint where

Parametric Sequence Alignment 4-13 A⁽ⁱ⁾ andA⁽ⁱ⁺¹⁾ are co-optimal. Then,

γ⁽ⁱ⁾= ∆wi− ∆xi

∆yi

. (4.15)

By integrality, ∆yi ≥ 1 and, by inequalities (4.9), wi≤ n. Hence, γ⁽ⁱ⁾≤ n for all i. Thus, our search can be restricted to the intervalI = (γ⁽⁰⁾, n].

Consider step (ii). For any two successive breakpoints γ⁽ⁱ⁾, γ⁽ⁱ⁺¹⁾of Z, γ⁽ⁱ⁺¹⁾− γ⁽ⁱ⁾= (∆w_i+1− ∆xi+1)∆y_i− (∆wi− ∆xi)∆y_i+1

∆yi+1∆yi

. (4.16)

By (4.9), ∆yi ≤ 2n. Since the left-hand side of Equation (4.16) must be positive and the various ∆ terms are integers, the numerator must be at least 1. Thus, γ⁽ⁱ⁺¹⁾−γ⁽ⁱ⁾≥ 1/(4n²).

Therefore, in step (ii) we stop as soon as the length of the search interval drops below 1/(4n²).

After step (ii) is complete, we know that γ^∗must lie in the intervalI = (γ⁽⁰⁾, γ⁽¹⁾], within which Z has at most one breakpoint. To locate γ^∗withinI (step (iii)), do as follows. First, compute the optimal alignmentA⁽¹⁾ at γ⁽¹⁾. There are two cases:

Case 1: There is no breakpoint insideI, and therefore there are no breakpoints beyond γ⁽⁰⁾. This is true if either score(A⁽⁰⁾, γ⁽⁰⁾) = score(A⁽¹⁾, γ⁽⁰⁾) or score(A⁽⁰⁾, γ⁽¹⁾) = score(A⁽¹⁾, γ⁽¹⁾). In this case, return γ^∗= +∞.

Case 2: There is exactly one breakpoint inside I, which must be the value λ^∗ be-ing sought. In this case, return the value γ^∗ such that score(A⁽⁰⁾, γ^∗) = score(A⁽¹⁾, γ^∗).

The number of bisection steps is O(log n), each requiring O(nm) time. The ﬁnal step requires computing one optimum alignment plus O(1) additional work. The total time is therefore O(nm log n).

While the details above are speciﬁc to sensitivity analysis, similar ideas can be used for other search problems, such as inverse optimal alignment (see 4.4.5 and [50]). Extensions to two-parameter problems are possible. In this case, instead of maintaining an interval, we maintain a polygonal region and, instead of splitting an interval through the middle, we split the current polygonal region by a line through its centroid (see [50]).

4.4.2 Megiddo’s Method

Megiddo’s method [36, 37] provides a precise relationship between the complexity of solving a parametric problem and the complexity of the problem’s ﬁxed-parameter version. Here we discuss the one-parameter version of Megiddo’s method; generalizations to any ﬁxed number of parameters are explained elsewhere [9, 3].

In what follows λ denotes a scalar parameter. Let the value being sought be denoted by λ^∗, which is known to to be greater than or equal to some value λ⁽⁰⁾. Like bisection search, Megiddo’s method generates a sequence of test values that are used to reduce the search interval with the aid of the oracle. The key diﬀerence is that the test values are generated by simulating the execution of an algorithm for the underlying ﬁxed-parameter problem.

This algorithm must be of a certain kind.

DEFINITION 4.8 An algorithm is piecewise linear if each value it computes is a linear combination of the input parameters.

4-14 Handbook of Computational Molecular Biology Any reasonable dynamic programming algorithm is piecewise linear. For example, con-sider the standard (table-based) dynamic programming algorithm for global alignment with zero gap penalty. We argue that each entry of the dynamic programming table is a linear combination of α, β and γ. This claim is trivially true for the ﬁrst row and column of the table. Now assume the claim is true for every entry (i, j) such that (i, j) is lexicographi-cally smaller than (i, j). Entry (i, j) is the maximum of three entries of the table, each with index (i, j) lexicographically smaller than (i, j), plus α, minus β, or minus γ. Therefore, entry (i, j) is itself a linear combination of α, β and γ.

Megiddo’s method simulates the execution of a piecewise linear algorithm B for the un-derlying ﬁxed-parameter problem in order to ﬁnd B’s execution path at λ^∗. Instead of manipulating numbers, the simulation manipulates linear functions of λ. This is possible because every value v manipulated byB can be represented symbolically as v(λ) = pv+ qvλ.

Megiddo’s method maintains an intervalI = [λ⁽⁰⁾, λ⁽¹⁾) that is updated so that the following invariant holds after i steps ofB have been simulated:

λ^∗∈ I and the ﬁrst i steps of B’s execution path are the same for every λ ∈ I. (4.17) Initially, λ⁽¹⁾ = +∞. Suppose a certain number of B’s steps have been simulated. To simulate the next step, proceed as follows.

• If the step is an arithmetic operation, execute it symbolically to obtain a new linear function of λ. We make the mild assumption that symbolic execution of an operation only increases its running time by a constant factor.

• If the step is a comparison between two numbers u(λ) = pu+ quλ and v(λ) = pv+ qvλ, compute ˆλ such that u(ˆλ) = v(ˆλ). If no such ˆλ exists, u and v are either identical or one is larger than the other for all λ. In either case, the outcome of the comparison can easily be determined and the step can be executed. If ˆλ exists, invoke the oracle to determine the position of ˆλ relative to λ^∗. The outcome of the call determines the outcome of the comparison between u and v at λ^∗. If ˆλ≤ λ^∗, set λ⁽⁰⁾= max(λ⁽⁰⁾, ˆλ). Otherwise, set λ⁽¹⁾= min(λ⁽¹⁾, ˆλ)

At the end of the simulation, we have an intervalI such that for any λ ∈ I, algorithm B always executes the same way. Therefore Z has no breakpoints inI and, hence, λ^∗= λ⁽⁰⁾.

THEOREM 4.3 Let P be a parametric search problem that has an oracle that runs in worst-case time b. Suppose that there exists a piecewise linear algorithm to evaluate Z(λ) that executes t steps in the worst case. Then, P can be solved in time O(t · b).

For example, in the indel penalty sensitivity problem, t and b are both O(nm). Thus, by Theorem 4.3, Megiddo’s method yields a O(n²m²) algorithm for the problem, which is considerably slower than bisection search. This can be improved to O(nm polylog n) by simulating a parallel alignment algorithm instead of a sequential one (see [37, 29] for details), but this is at the expense of a considerably more involved procedure.

4.4.3 Newton’s Method

Newton’s classic zero-finding method can be adapted for ray shooting (Problem 1). Recall that the question is as follows: Given a parameter vector λ⁽⁰⁾ ∈ R^d, an optimum alignment A⁽⁰⁾ at λ⁽⁰⁾, and a ray ρ originating at λ⁽⁰⁾, find the last point λ^∗ on the ray such that A⁽⁰⁾ is optimal at λ^∗. Without loss of generality, assume that the problem has been re-parameterized so that λ is a scalar. Furthermore, we restrict the search for λ^∗ to a finite

Parametric Sequence Alignment 4-15 intervalI = (λ⁽⁰⁾, λ⁽¹⁾]. This is not a limitation in practice, since λ⁽¹⁾can always be chosen to be large enough (an example of this is, in fact, given in 4.4.1).

The key observation is that if λ^∗ < λ⁽¹⁾, thenA⁽⁰⁾ is co-optimal with some other align-ment at λ^∗. This leads to the following version of Newton’s method, adapted for piecewise linear functions.

Algorithm Newton

Input: An intervalI = (λ⁽⁰⁾, λ⁽¹⁾], an optimum alignmentA⁽⁰⁾ at λ⁽⁰⁾, and an eval-uator for Z.

Output: The largest value λ^∗∈ I such that A⁽⁰⁾ is optimal at λ^∗. 1. Compute an optimal alignmentA⁽¹⁾ at λ⁽¹⁾.

2. Set i = 1.

3. While score(A⁽⁰⁾, λ⁽ⁱ⁾) < score(A⁽ⁱ⁾, λ⁽ⁱ⁾), do the following steps:

(a) Let λ⁽ⁱ⁺¹⁾be λ-value such that score(A⁽⁰⁾, λ) = score(A⁽ⁱ⁾, λ).

(b) Set i = i + 1.

4. Return λ^∗= λ⁽ⁱ⁾.

The execution of Newton is illustrated in Figure 4.5. The convexity and piecewise linearity of Z imply that the λ⁽ⁱ⁾ values form a decreasing sequence and that at all times λ⁽ⁱ⁾ ≥ λ^∗. At termination, we must have score(A⁽⁰⁾, λ⁽ⁱ⁾) = score(A⁽ⁱ⁾, λ⁽ⁱ⁾), which implies that λ⁽ⁱ⁾ = λ^∗. Also, the successive alignments computed by the algorithm must have distinct score functions. This leads to the following result.

THEOREM 4.4 Algorithm Newton correctly solves the ray shooting problem. The number of evaluations it requires is at most equal to the number of optimality regions of the maximization diagram.

For feature-based scoring schemes we can invoke Theorem 4.2: If each feature is in the same integer range of size N , then the number of evaluations of Z required by the algorithm is O(N^d(d^−1)/(d+1)), where d is the number of features. For example, in the indel penalty sensitivity analysis problem, the O(n^2/3) bound on the number of regions (see 4.3.1) implies that only that many evaluations are needed in the worst case, resulting in a O(n^5/3m) bound on the search time.

4.4.4 Gradient Descent

Gradient descent (also called steepest descent ) is a numerical method to obtain the minimum of a function within a given interval [40, 44]. The method is iterative, generating a sequence of points that converges to a minimum. If the current point is not minimum, the algorithm chooses the next point by moving some distance in the direction opposite to the direction of the gradient. The intuition is that advancing in that direction should reduce the value of the function. More formally, assume that the function F : R^d → R to be minimized is continuously diﬀerentiable and that λ^(t) is the current (non-optimum) point. The next point in the sequence is given by

λ^(t+1)= λ^(t)− θ∇F (λ^(t)),

where θ is a scalar, which denotes the step distance, and∇F (λ^(t)), the gradient of F at λ^(t), is the vector whose elements are partial derivatives with respect to the d dimensions.

4-16 Handbook of Computational Molecular Biology

A⁽⁰⁾

λ⁽⁰⁾ λ⁽²⁾ λ⁽¹⁾

A⁽²⁾

λ⁽³⁾ λ^∗

A⁽¹⁾

A⁽³⁾

FIGURE 4.5: Newton’s method for ray shooting.

That is,

∇F =

∂F

∂λ1

, . . . , ∂F

∂λd

To apply this approach to piecewise linear functions, which are not everywhere diﬀeren-tiable, we need a new concept.

DEFINITION 4.9 Let F be a function F :R^d→ R. A vector s ∈ R^d is a subgradient of F at λ⁽⁰⁾∈ R^d if for all λ∈ R^d

F (λ)≥ F (λ⁽⁰⁾) + s· (λ − λ⁽⁰⁾).

The collection of subgradients at λ⁽⁰⁾ is called the sub-diﬀerential at λ⁽⁰⁾, and is denoted by ∂F (λ⁽⁰⁾).

It can be shown that ∂F (λ⁽⁰⁾)= ∅ at all points λ⁽⁰⁾ [40]. Subgradients play the role of gradients in searching for the minimizer of functions that are not everywhere diﬀerentiable.

In particular, it can be shown that λ⁽⁰⁾ is optimal if and only if 0∈ ∂F (λ⁽⁰⁾) [40].

The subgradient algorithm is as follows:

Algorithm Subgradient

Input: A point λ⁽⁰⁾ ∈ R^d, a sequence θ⁽⁰⁾, θ⁽¹⁾, . . . of real numbers, and a procedure for computing a sub-gradient of function F at any point.

Output: A value λ^∗ at which F (λ) is minimum.

Parametric Sequence Alignment 4-17 1. Compute a subgradient s⁽⁰⁾∈ ∂F (λ⁽⁰⁾).

2. Set t = 0.

3. While s^(t)= 0, do the following steps:

(a) Let λ^(t+1)= λ^(t)− θ^(t)s^(t)

(b) Choose a subgradient s^(t+1) ∈ ∂F (λ^(t+1)).

(c) Set t = t + 1 4. Return λ^∗= λ^(t).

The procedure to compute a subgradient depends on the given problem; we explain how to ﬁnd a subgradient for inverse alignment in the next subsection. In practice it may be diﬃcult to determine if 0∈ ∂F (λ^(t)), since only one subgradient is computed at any point.

One way to handle this is by terminating the algorithm if the function has not decreased by a certain amount after some number of iterations. We note that the convergence and running time of algorithm Subgradient depend on the choice of the θ⁽ⁱ⁾ sequence [40]. Although the algorithm is fast in practice, it is not in general possible to establish combinatorial bounds on its running time.

4.4.5 Parametric Search and Inverse Sequence Alignment

As defined in 4.2.4, the inverse optimal alignment (Problem 5) is a parametric search prob-lem whose goal is to find a parameter vector λ^∗ that minimizes the function F (λ) defined as

F (λ) = Z(λ)− score(A⁽⁰⁾, λ),

whereA⁽⁰⁾ is a reference alignment and, as usual, Z(λ) is the optimal score function. Since Z is piecewise linear and convex and score(A⁽⁰⁾, λ) is a linear function, F is also piecewise linear and convex. In fact, the decomposition of the parameter space induced by F is identical to that induced by Z. By convexity, the solution λ^∗ to the problem can always be chosen to be a vertex of the maximization diagram of Z. We discuss how to solve the inverse optimal alignment problem through bisection search, Megiddo’s method and gradient descent. Newton’s method can also be adapted to solve this problem [46].

For bisection search and Megiddo’s method, the key is to implement the oracle. In the one-parameter case, we can determine if a given ˆλ is greater than λ^∗ by computing the optimum alignment ˆA at ˆλ. Then ˆλ > λ^∗ if and only if score( ˆA, λ) − score(A⁽⁰⁾, λ) has a positive slope. Generalizations to more parameters are discussed in [3, 50].

To apply gradient descent, we need a means to compute a sub-gradient in ∂F (λ^(t)).

This can be done as follows. Let A^(t) be an optimum alignment at λ^(t). The function score(A^(t), λ)− score(A⁽⁰⁾, λ) has the form a0+d

i=1aiλi. Then, the vector (a1, . . . , ad) is a sub-gradient at λ^(t). Algorithm Subgradient of the previous section can now be used to obtain the inverse optimal value.

4.4.6 Ray Shooting and Sensitivity Analysis

The sensitivity analysis problem ( 4.2.4, Problem 2) can be solved by repeated ray shooting.

Using the notation of 4.2.4, let λ⁽⁰⁾ be a given point in the parameter space and letA⁽⁰⁾be an optimal alignment at that point. The problem is to find the maximal region around λ⁽⁰⁾ where A⁽⁰⁾ is optimal. In the one-parameter case, this translates into finding an interval around λ⁽⁰⁾, which can be done by shooting two rays from λ⁽⁰⁾. The first, in the negative direction, yields a point λ⁽¹⁾; the other, in the positive direction, yields a point λ⁽²⁾. The

4-18 Handbook of Computational Molecular Biology

(0)

λ⁽³⁾

ρ⁽¹⁾ λ⁽²⁾ ρ⁽²⁾

λ⁽¹⁾ ρ⁽⁰⁾ λ

FIGURE 4.6: Ray shooting to determine an edge of the region of optimality of alignmentA⁽⁰⁾.

interval [λ⁽¹⁾, λ⁽²⁾] is the maximal region ofR¹within which alignmentA⁽⁰⁾is optimal. We describe how to extend the idea to two-parameter problems, where the optimality regions are polygons. Extensions to higher dimensions are possible (see the notes in 4.7).

Let F be the region of optimality of A⁽⁰⁾. The first step is to choose an arbitrary ray ρ⁽⁰⁾ emanating from λ⁽⁰⁾ and shoot a ray to find the point λ⁽¹⁾ along ρ⁽⁰⁾ that intersects the boundary of F . Ray shooting is assumed to be adapted to yield an alignment A⁽¹⁾ that is co-optimal at λ⁽¹⁾. Let l be the line defined by the intersection of score(A⁽⁰⁾, λ) and score(A⁽¹⁾, λ). Then l contributes a segment e to the boundary of F . From λ⁽¹⁾ shoot two rays ρ⁽¹⁾ and ρ⁽²⁾ in opposite directions along l, to find the end points of edge e. The process is illustrated in Figure 4.6.

To ﬁnd the remaining edges, repeat the above steps with other rays emanating from λ⁽⁰⁾. Each new ray must be in a direction away from any previously discovered edges of the boundary of F . To ensure of this, one can use the data structure depicted in Figure 4.7.

The solid lines there represent the edges already discovered. Edges with common endpoints are joined as these endpoints are identiﬁed. Chains of known edges are linked to each other by dashed lines, indicating unknown regions of the boundary. Each successive ray generated by the algorithm goes between the endpoints of such a region. The discovery of a new edge ﬁlls in part of the missing information for that portion of the boundary.

There are two special cases. One occurs when A⁽⁰⁾ remains optimal along the entire length of the current ray. To handle this, it is convenient to assume that the parameter space is enclosed within a large rectangular interval. We can then use one of the boundary edges of the interval as a boundary of the region. The other case is when the ray ρ^(t) goes through a vertex of the optimality region. ThenA remains optimal only along at most one the two rays along the boundary line, allowing us to recognize the vertex.

We need three ray searches to ﬁnd each edge of F . This gives us a bound of O(e) ray searches to determine a polygon of e edges.

Parametric Sequence Alignment 4-19

ρ λ(0)

FIGURE 4.7: Ray shooting data structure.

In document Handbook of Computational Molecular Biology (Page 118-126)