• No results found

15.3 Experimental data

15.3.5 Different biological state

The msmeg data set provides also a typical alignment case, but with another emphasis than the ecoli data set. The msmeg data set represents a test of biological variation. It contains LC-LC-MS/MS measurements of the M. smegmatis proteome extracted from cells in three dif- ferent growth-phases. Digested protein extract of the early, the middle, as well as the stationary phase on four different fractions was measured. The pre-processing procedure of the 12 result- ing raw maps as well as the extraction of peptide features was the same as for the ecoli data set and was described in detail in Section 15.3.2. We again compare the six LC-MS feature map alignment algorithms OpenMSMA(implemented in OpenMS), SpecArrayMA (implemented in

SpecArray), msInspectMA (implemented in msInspect), XCMSMA (implemented in XCMS),

MZMineMA(implemented in MZMine), and X Align with respect to the feature maps resulting

15.3. Experimental data

map represents the M. smegmatis proteome in a different cell growth-phase. The alignment of the msmeg data set constitutes a more difficult problem than the ecoli data set, since the pro- teome of cells in different growth-phases may share only a small fraction of common proteins. We evaluate the four consensus maps determined by each alignment algorithm with the ground truth consensus maps that are based on reliable peptide identifications. In Section 15.3.2, we described the procedure to generate a ground truth consensus maps given the feature maps to be aligned as well as the corresponding SEQUEST annotations for each fraction. The size of the resulting ground truth consensus maps for each fraction is given in Table 15.5.

We computed recall and precision values of each alignment algorithm based on the four de- termined consensus maps and the corresponding ground truth consensus maps. The precision values are only given for the sake of completeness since they do not have the same explanatory power as the precision values. As already mentioned in Section 15.3.3, the precision values are underestimated, because true positives are only given for annotated features and the correspon- dence of the remaining unlabeled features is not known and therefor cannot be evaluated. For most alignment algorithms the user can define the maximal deviation of feature position within a consensus feature given by∆RT and∆m/z. We optimized these parameters for each tool and set

• OpenMSMA:∆RT := 200 s and∆m/z := 2 Th.

• msInspectMA:∆RT := 300 (defines in this case the number of scans) and∆m/z := 1.5 Th. • XAlign:∆RT := 180 s and∆m/z := 2 Th.

• MZMineMA:∆RT := 120 s and∆m/z := 1.5 Th.

• XCMSMA:∆RT := 40 s (given by the parameter bw) and∆m/z := 1.5 Th.

The alignment algorithm implemented in SpecArray does not provide any parameters that may be defined by the user. Table 15.8 shows the recall and precision values of the six algorithms for the four feature map alignments in the msmeg data set.

Our alignment algorithm again achieves high recall values. The percentage of correctly dis- covered pairwise feature assignments lay between 60 and 79 for the fractions 20, 40, and 60 and is higher than the recall values of the other tools. However, OpenMSMAfailed to align the

three feature maps of fraction 80 and discovered only 12 % of the expected pairwise feature as- signments. The alignment of the three feature maps of fraction 80 poses a hard problem for all other tools and was not solved satisfyingly by any other tool. SpecArray achieved the highest recall value for fraction 80, but discovers only 49 % of the pairwise feature assignments given by the ground truth consensus map. Besides this fraction, SpecArray did not result in a recall higher than 0.54. XALign, MZMineMA and XCMSMAare, as in the ecoli data set, ranked be-

Table 15.8: Recall and precision values for the six algorithms aligning the feature maps of themsmeg data set.

fraction 20 OpenMSMA SpecArrayMA msInspectMA MZMineMA XCMSMA X Align

recall 0.79 0.23 0.30 0.68 0.70 0.72 precision 0.16 0.01 0.02 0.15 0.01 0.16 fraction 40 recall 0.60 0.49 0.09 0.56 0.47 0.44 precision 0.08 0.04 0.01 0.10 0.09 0.06 fraction 80 recall 0.12 0.49 0.31 0.25 0.25 0.28 precision 0.06 0.04 0.02 0.06 0.06 0.05 fraction 100 recall 0.76 0.54 0.39 0.59 0.57 0.71 precision 0.09 0.05 0.04 0.09 0.09 0.09

0.47 − 0.70 respectively. The three algorithms also failed to align fraction 80. The alignment algorithm of msInspect did not exceeded a recall of 0.39.

The OpenMS alignment algorithm again outperforms the other tools not only with its high recall values, but also with its fast runtime. Runtime measurements were taken with caveat as described on page 156. In the Table 15.9 the runtimes of the six alignment algorithms on the msmeg data set are given. The manual wall clock time measurements for X Align indicated same run time order of magnitude as the other algorithms.

Table 15.9: Runtimes of the six alignment algorithms on themsmegdata set. For details, see Table 15.7.

OpenMSMA SpecArrayMA msInspectMA MZMineaMA XCMSbMA X Alignc

fraction 20 3.74 282.38 9.94 55 11.27 n/a fraction 40 2.31 37.39 9.52 3 9.43 n/a fraction 80 1.12 28.21 7.99 2 6.43 n/a fraction 100 1.09 66.58 8.15 2 2.89 n/a

Our approach is consistently faster than the rest and took only 1.09 to 3.74 s for the computation

of a consensus map. Method MZMineMAneeded 55 s to compute a consensus map of fraction

20, but all other alignments took only 2 to 3 s. However, XCMSMAthat achieved similarly good

recall values as MZMineMA, took 2.89 to 9.43 s for the alignment of fraction 40, 80 and 100.

The runtime of the alignment of fraction 20 was also much slower with 11.27 s.

The msmeg data set represents also a typical but more complex alignment scenario than the ecoli data set. We proved once more the applicability of our algorithm to real-world data, where its precise and quick alignments outperform the results of the other alignment approaches. In the following section we will prove the robustness of the six alignment methods with simulated data.