• No results found

CHAPTER 6: ISODETECT

7.2. The IsoSolve Command

7.3.5. IsoSolve Pseudocode

7.3.5.4 The ProposeStructuresFromSeed Function

The ProposeStructuresFromSeed function proposes at least one well-supported structure

consistent w ith the seed it is given. See Figure 39 and Listing 17.

The function performs this task in a relatively straightforward manner. Using the topology

count estimation technique o f Section 7.3.3, it tracks the number o f topologies th a t an OSCAR

solution could generate for a given set o f pathways. It then tentatively adds another pathway

from the full data set and asks if the solution now generates few er structures. If yes, the

pathway is kept as part o f the solution; if no, the pathway is discarded. In either case, all

pathways in the full data set are tentatively added in this fashion, until the solution generates a

Function

P r o p o s e S t r u c t u r e s F r o m S e e d ( S e e d )

Add all pathw ays fro m Seed into an OSCAR Solution

For each pa th w ay P in th e en tire pathw ay set (not just AvailableSeeds) do:

r Yes. P = Next Pathway

Does adding P \ :ause Solution to generate fewer structures? / / N . < pathways ) < . - N o ■V left? / Yes Add P to th e Solution / ' Does Solution - No -r' produce exactly one >

N structure?

Yes

Return

S olution.G enerateStructures(); Caller will add structures to th e ProposedStructures set

Done

Figure 39: A flow chart fo r the function ProposeStructuresFromSeed(Seed).

88: Function ProposeStructuresFromSeed(Seed)

89: Parameter Seed has type Set of Pathways;

9 0 :

91: Returns Set of Structures;

92: {

9 3 : // The variable Solution is implemented by

9 4 : // OSCAR's "Solution" data type

95: Variable Solution has type OSCAR Solution;

96 :

9 7 : // Execute "AddPathway P" for each pathway P in Seed

98: Add all pathways from Seed to Solution;

9 9 :

1 0 0 : // If the seed cannot produce any structures, abort

1 0 1 : // the search: Return the empty set and exit the function.

102: if (Solution is sterile)

103: return { );

1 0 4 :

1 0 5 : // Search all pathways in order, looking for ones that help

1 0 6 : // Solution converge toward a single structure

107: for each pathway P in AllPathways do {

108: if (adding P to Solution makes progress) {

109: Add P to Solution;

110:

1 1 1 : // If we have converged on a single topology, return it

112: // and exit the function.

113: if (Solution generates exactly 1 structure)

114: return Solution.GenerateStructures();

115: )

1 1 6 :

1 1 7 : // The algorithm did not converge on a single structure, so

1 1 8 : // return the set of all structures that Solution can generate.

119: return Solution.GenerateStructures();

1 20: )

Listing 17: The ProposeStructuresFromSeed function returns a set o f w ell-supported structures th a t are consistent w ith the given set of Seed pathways.

It te n ta tive ly combines Seed w ith all pathways in IsoSolve's data set (not ju s t the AvailableSeeds) to converge to w a rd a small num ber of proposed topologies.

A few clarifications are in order here.

First, because ProposeStructuresFromSeed tentatively adds every available pathway, there

is no bias against selecting pathways th a t are consistent w ith some previously-proposed

structure. We know at this point th a t the seed is guaranteed to lead to new structures, and so

to some subset o f the pathways, or we may fail to generate a valid structure. Just because

pathway P I is part o f structure S I doesn't mean it can't also be part o f structure S2. For

example, recall from the G M la /G M lb example (Table 26 on page 108) th a t some pathways can

be compatible w ith m ultiple isomers. In th a t case, both 1273.6_898.4 and 1273.6_989.4_676.3

are compatible w ith G M la and G M lb . When searching fo r G M lb , it w ould be a mistake to

ignore pathway 1273.6_898.4 just because the pathway is consistent w ith G M la.

Second, it is possible th a t an isomer is present but there are insufficient data fo r

ProposeStructuresFromSeed to find a com bination o f pathways that yields exactly and only th a t

structure. In this case, the OSCAR solution manipulated by the function w ould fail to converge

on a single structure before running out of pathways to tentatively add. Here,

ProposeStructuresFromSeed w ill return m ultiple structures—namely, all o f the structures

produced by the solution. These structures are marked w ith a count o f structures produced in

this batch, and so the analyst can easily identify the structures th a t were produced as a "bunch"

due to insufficient data. A skilled analyst could then determ ine which fu rth e r spectra to collect

to resolve the ambiguity.

7.4. Lim itations/Future Work

From the above discussion, we can see a number o f areas where IsoSolve can be improved

in future efforts:

1. ProposeStructuresFromSeed could be m odified to perform a more intelligent

selection o f the next pathway to tentatively add to the OSCAR solution. Currently

the selection is made based on the estimated inform ation content o f the pathways

(as estimated by the technique described in Section 7.3.3 on page 116), but other

selection strategies should be considered. For example, selecting a complementary

ion may be profitable, as we saw when sequencing the fetuin glycan m /z 3618.8

(Section 5.5 on page 84).

2. ProposeStructuresFromSeed uses OSCAR to calculate an upper bound on the

number o f structures the solution might generate, but th a t upper bound can

sometimes be quite a bit higher than the actual number. This could lead IsoSolve to

discard a pathway th a t actually reduced the number o f candidate structures. Better

techniques fo r quickly estimating topology counts w ould be beneficial here.

3. The scoring algorithm used to rank proposed structures suffers slightly in the

presence o f isobars, as discussed in Section 7.3.4. One im provem ent would be to

avoid penalizing scores if m ultiple fragments on one spectrum are inconsistent w ith

each other. That is, inconsistent pathways w o u ld n 't all count toward

AvailablePathways; rather, only the largest subset o f consistent pathways would be

counted.

Related documents