• No results found

Many proteomic studies require the relative or absolute amounts of the proteins present in a bi- ological sample: the spectrum ranges from additive series in analytical chemistry [Gr¨opl et al., 2005], over analysis of time series in expression experiments [Bisle et al., 2006; Niittyl¨a et al., 2007], to applications in clinical diagnostics [Vissers et al., 2007], in which we want to find sta- tistically significant markers describing certain disease states. Despite the relationship between the amount of analyte present in a sample and the measured signal intensity being complex and incompletely understood, it could be shown that mass spectral peak intensities of pep- tide ions correlate well with protein abundances in complex samples [Bondarenko et al., 2002; Wang et al., 2003; Schulz-Trieglaff et al., 2007; Old et al., 2005], and that the comparison of signals from the same peptide under different conditions can give a rough estimate of relative protein abundance between multiple proteomes [Ong and Mann, 2005].

Mass spectrometry allows for the determination of two different quantitative pieces of infor- mation. Absolute quantification experiments estimate the amount of the substance in question, whereas in relative quantification experiments the amount of substance is defined in relation to another measure of the same substance. In the following two sections we introduce two approaches to determine the quantitative information of interest.

3.3.1 Label-free quantification

Label-free quantification [Bondarenko et al., 2002; Wang et al., 2003; Schulz-Trieglaff et al., 2007; Old et al., 2005] is a promising method for relative quantification. Signal intensities of the same peptide in different LC-MS maps are compared directly. Not only does this approach require the accurate determination of the signal intensities belonging to a certain peptide, but it also needs the correct assignment of corresponding peptide signals across different measure-

3.3. Protein quantification

ments.

3.3.2 Labeled quantification

Labeled quantification uses the stable isotope labeling of proteins or peptides to determine their absolute or relative quantities. This approach is mainly designed to quantify proteomes of only two to four different states. To this end, the proteins/peptides of one sample are labeled and af- terward combined with the unlabeled sample. Depending on the labeling method the same pro- teins/peptides show a specific mass difference in the mass spectrometric measurement. Several methods have been developed, which are mainly distinguished by the way the stable isotope la- bels are introduced into the protein or peptide [Ong and Mann, 2005]: spiking in an isotopically labeled analog [Gerber et al., 2003; Gr¨opl et al., 2005; Mayr et al., 2006; Kirkpatrick et al., 2005], incorporation through an enzyme during protein digestion [Yao et al., 2001, 2003], introducing a chemical, isotopically labeled tag onto peptides or proteins [Gygi et al., 1999; Ross et al., 2004], or having cells that incorporate the label metabolically [Oda et al., 1999; Ong et al., 2002].

Although these techniques bypass problems due to ion-suppressive effects of co-eluting pep- tides, which can affect label-free quantification experiments, they are very costly and prevent retrospective comparisons and complicate large studies with multiple samples.

OpenMS—An open-source framework

for mass spectrometry

The high complexity and the sheer amount of MS-based proteomics data require sophisticated analytical methods. The information extraction from LC-MS data can be classified into a series of smaller analysis steps

• signal filtering and baseline removal: remove noise and baseline artifacts,

• peak picking: find and extract the accurate positions, heights, total ion counts, and

FWHM values of all mass spectral peaks,

• identification algorithm: identify the proteins in a sample given the mass spectral peak

information,

• feature detection and quantification: detect and extract patterns of peaks that correspond

to the same charge variant of a peptide,

• intensity normalization: normalize the ion counts,

• multiple map alignment: correct the distortion of the RT and m/z dimension of multiple

raw or feature maps; in case of feature maps, assign corresponding features afterward,

• classification algorithms and biomarker discovery: find differentially expressed peak or

feature patterns that can be used to classify samples, e.g., from different cell states.

A label-free quantification protocol might consist of a process involving signal filtering and baseline removal, peak picking, quantification, normalization, multiple map alignment, and marker finding. On the other hand, an identification pipeline might be composed of signal

4.1. The map concept

filtering and baseline removal, a peak picking step, and an identification algorithm. Small algorithmic components for each analytical step allow for the development of tools for both analytical aims and might be readily combined into more complex workflows or tools.

In 2003, the Algorithmic Bioinformatics group at the Freie Universit¨at Berlin and the De- partment for Simulation of Biological Systems of T¨ubingen University initiated an aca- demic project for proteomic data analysis that realizes the modular idea of problem solv- ing. OpenMS—a framework for mass spectrometry [Sturm et al., 2008] is flexible and serves as a framework for developing mass spectrometry data analysis tools, providing every- thing from basic data structures over file input/output (I/O) and visualization to sophisti- cated algorithms for the analysis steps mentioned above. Thus, OpenMS allows developers to focus on new algorithmic approaches instead of implementing infrastructure. The high flexibility of OpenMS stands out against other existing academic tools for proteomic data analysis, e.g., MapQuant [Leptos et al., 2006], MASPECTRAS [Hartler et al., 2007], msIn- spect [Bellew et al., 2006], MZMine [Katajamaa et al., 2006], SpecArray [Li et al., 2005], Trans-Proteomic Pipeline (TPP) [Keller et al., 2005], Viper [Monroe et al., 2007], Super- hirn [Mueller et al., 2007], and XCMS [Smith et al., 2006]. These tools are typically mono- lithic and hard to adapt to new experiments. Furthermore, they often concentrate on only one step of the analysis, e.g., quantification, peptide identification, or map alignment, or combine a few steps into a pipeline.

4.1

The map concept

The data that is produced by the combination of multi-dimensional LC and subsequent MS can be viewed as a set of multidimensional discrete points. In LC-MS such a data point is described by retention time, m/z, and intensity. The collection of all these data points is called an LC-MS raw map. The analysis of this raw data is done through several steps, which in our view correspond to a series of map transformations. Figure 4.1 shows the map types and transformation steps.

Signal filtering and baseline removal steps are performed on raw LC-MS maps. The output of these transformations is again an preprocessed LC-MS raw map. Depending on the underlying type of mass spectrometer, a raw LC-MS map can have a size of several hundred megabytes up to several gigabytes, whereas only a small fraction of data contains the signal of interest. Thus, data reduction is a central concept of OpenMS. It comprises two transformation steps, which are peak picking and feature detection and quantification. During the peak picking process, the mass spectral peaks are detected and important information, such as their accurate positions, heights, total ion counts, and FWHM values, is extracted. We call the resulting data of a peak picking step a LC-MS peak map. The subsequent feature detection and quantification step is again a data reduction step, at which the two-dimensional signals created by some chemical

Figure 4.1: Top: An LC-MS raw map and its mass spectrum at RT=558.88 s. Middle: The correspond-

ing LC-MS peak map and the extracted mass spectral peaks at RT=558.88 s. Bottom: The corresponding feature map. The red arrows indicate the possible transformations.

entities (e.g., peptides) are grouped together into a so-called LC-MS feature map. A feature is characterized by its isotopic pattern in mass-to-charge dimension and by the elution profile