• No results found

Optimization of all peak parameters

The peaks computed so far typically yield a reasonable approximation of the true signal, es- pecially for well-resolved, clearly separated peaks. We tried to further improve accuracy and perform an additional (optional) optimization step of all picked peaks in a spectrum. In the basic peak picker (see Figure 8.2, line 1-25), each of the peaks has been fitted independently of the others and only during the separation of overlapping peaks we fit the sum of convolved peaks to the experimental signal. In this step, we want to optimize the parameters of all picked peaks in the spectrum by minimizing the sum of squared residuals between the determined peak functions and the original raw signal. Our peak model M is now given by all peak functions pi

picked in the spectrum,

M(a, x) := k

i=1

phil ir i, ˆpi(x) (8.27)

whereby pi can either represent an L or an S peak function (compare Section 8.3). Hence,

the model M depends on 4k parameters and the parameter vector is defined by a := (h1,λl 1r1, ˆp1, . . . , hkl kl k, ˆpk)T ∈ R4k. Since the number of peaks in a spectrum and

thereby the number of the parameters can be very high, we decompose the optimization prob- lem into smaller subproblems. After sorting all peak functions with respect to their positions we linearly search for connected peaks that are afterward fit simultaneously. Thereby, two

8.5. Optimization of all peak parameters

peaks are connected if the distance between the peak positions is smaller than a certain thresh- old.

We use the Levenberg-Marquardt algorithm to find a local minimizer a∗ for the function de- fined in Equation 8.25. For a group of k connected peaks pi, the initial parameter vector

a0:= (h0

1,λl01,λr01, ˆp01, . . . , h0kl0krk0, ˆp0k)T∈ R4k is given by the four peak parameters of each

pi(i= 1, . . . , k).

We again use the additional parameters provided by the GSL to introduce penalty terms for height and width values that fall below certain thresholds. Furthermore, we penalize large changes of position parameters during an iteration.

The dotted line in Figure 8.11 shows the model function M(a, x) with respect to the localized minimizer a∗resulting from the optimization step.

809 809.5 810 810.5 811 811.5 812 812.5 813 813.5 814 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2x 10 5 m/z intensity

raw data points peak function 1 a peak function 1 b peak function 1 c peak function 2 peak function 3 sum of peak functions

Figure 8.11: Optimization of all peak parameters: Charge two isotopic pattern of bombesin and the five

peaks resulting from the basic peak picker plus the separation method for overlapping peaks and the optimization of all peak parameters. Note the slight differences to Figure 8.10: The peak positions are 810.304Th,810.718Th,811.231Th,811.722Th, and812.025Th. The dotted line shows the sum of the five peak functions.

8.5.1 The PeakPicker TOPP tool

We provide an application for “The OpenMS Proteomics Pipeline (TOPP)” [Kohlbacher et al., 2007] application called PeakPicker for the extraction of peaks in mass spectra that implements the algorithm proposed in Chapter 8. The input and output format of spectra ismzData (see

Figure 8.12: Peak picking with the PeakPicker tool.

All parameters are provided by an XML-based control file. The usage of the tool is described in the TOPP documentation and an example is given in the TOPP tutorial.

The PeakPicker application, as all other TOPP tools, is based on the OpenMS library. Fig- ure 8.13 shows the class diagram of our peak picking classes inUML format. The classes are

described in the OpenMS documentation and examples of use can be found in the OpenMS tutorial.

Experiments

The qualitative assessment of a peak picking scheme is a non-trivial problem and its solution by a straight-forward and general approach is still missing. Obviously, an algorithm that solves the problem should compute the peak’s centroid, height, and area as accurately as possible while featuring a high sensitivity and specificity. To determine the accuracy of, e.g., a peak’s centroid, the correct mass value is needed, and thus peak picking algorithms are typically tested against a spectrum of known composition, e.g., a standard peptide mixture or the tryptic digest of a certain protein. Comparing the features of the peaks found in the spectrum with the theoretical values gives a measure of the algorithm’s capabilities, typically expressed as the average absolute and relative deviation (measured in ppm). Unfortunately, these results are heavily affected by the quality of the experimental data, and additional issues such as the calibration. Consequently, peak picking algorithms are typically tested against particularly well-resolved spectra, and internal calibration methods are employed. This usually results in high mass measurement accuracy, but the quality of the peak picking algorithms cannot be judged independently of the quality of the calibration scheme. From a user’s perspective, on the other hand, obtaining similarly well-resolved spectra is often infeasible, and internal calibration is not always an option. Thus, we have decided to demonstrate the capabilities of our approach on both LC-MS data measured by an ion trap with low resolution, containing severely overlapping isotope patterns, as well as on highly resolved MALDI-TOF spectra. As described in Chapter 7, most peak picking algorithms are designed for a specific data type and, furthermore, they often are not freely available. Li et al. [2005] and Bellew et al. [2006] propose algorithms for the determination of mass spectral peaks, but those methods are closely connected with their 2D feature detection procedures. Thus, they are not appropriate for the comparison to our peak picking approach.

Hence, we decided to use the vendor-supplied software on the same spectra in both experiments to provide a fair means of comparison.

9.1. Sample preparation and MS analysis

9.1

Sample preparation and MS analysis

Peptide mix ESI: A peptide mix (peptide standards mix #P2693 from Sigma Aldrich) of nine known peptides (bradykinin (F), bradykinin fragment 1-5 (B), substance P (H), [Arg8]- vasopressin (E), luteinizing hormone releasing hormone bombesin (G), leucin enkephalin (A), methionine enkephalin (C), oxytocin (D)). Sample concentration was 0.25 ng/µl, injection volume 1.0µl. LC separation was performed on a capillary column (monolithic polystyrene/- divinylbenzene phase, 60 mm x 0.3 mm) with 0.05% trifluoroacetic acid (TFA) in water (eluent A) and 0.05% TFA in acetonitrile (eluent B). Separation was achieved at a flow of 2.0µl/min at 50◦C with an isocratic gradient of 0–25% eluent B over 7.5 min. Eluting peptides were detected in a quadrupole ion trap mass spectrometer (Esquire HCT from Bruker, Bremen, Ger- many) equipped with an electrospray ion source in full scan mode (m/z 500-1500).

Peptide mix MALDI: The MALDI matrix solution was prepared as a CHCA thin layer by ultrasonicating an excess of CHCA in 90% tetrahydofurane, 0.1% trifluoroacetic acid (TFA). A PolyK-mixture with 6.4 mg/ml polylysine in 1% TFA was deposited onto the matrix and dried. Afterward, the samples were washed by depositing 2 µl of 1% TFA and 1 mM n-octylglucopyranoside, and immediately aspirated. Peptide samples (with 19 known pep- tides: bradykinin (A), angiotensin II (B) and I (C), substance P-methylester (D), substance P-methylester (ox.) (E), fibrinopeptide A (F), Glu1-fibrinopeptide A (G), bombesin (H), bombesin (ox.) (I), renin substrate (human) (J), ACTH clip 1-17 (K), ACTH clip 1-17 (ox.) (L), ACTH clip 18-39 (M), ACTH clip 3-24 (N), ACTH clip 3-24 (ox.) (O), ACTH clip 1-24 (P), ACTH clip 1-24 (ox.) (Q), somatostatin (R), and Insulin B chain (ox.) (S)) were prepared using the CHCA surface affinity preparation, previously described in [Gobom et al., 2001]. Mass analysis of positively charged peptide ions was performed on an Ultraflex II LIFT MALDI-TOF/TOF mass spectrometer (Bruker Daltonics, Bremen, Germany), equipped with a SmartBeam solid-state laser. Positively charged ions in the m/z range 500-4500 Da were ana- lyzed automatically in the reflector mode. Altogether, 100 spectra were recorded, where each was the sum of 800 single-shot spectra acquired at two different locations of each MALDI sample.

9.2

Mass accuracy and separation capability in low resolved