A distance function dsim for LC-MS maps - Analysis of mass spectrometric data: peak picking and

At first we consider a similarity of LC-MS/MS maps. In LC-MS/MS maps some of the elements are annotated with reliable peptide identifications and thereby a part of the correspondence between the maps is already given. These corresponding elements give information about the extent of the distortions in both the RT and the m/z dimension and can be used to discover the correspondence of the remaining elements without annotations. Corresponding elements in two maps with similar 2D positions point at comparable RT and m/z dimensions, whereas common elements with different positions indicate a considerable shift in RT and m/z. The more the 2D positions of common elements vary, the greater the distance between the maps and the more dissimilar the maps are. Therefore, we measure the similarity using the distance of corresponding elements in the Euclidean space R2. The RT dimension is in general more

12.2. A distance function dsim for LC-MS maps

distorted than the m/z dimension, hence a weighted Euclidean metric should be used instead of the standard Euclidean distance. Instead of evaluating the distance of corresponding elements, we can also evaluate the similarity of elements with similar coordinates. If the positions of common elements vary significantly between different maps, an element’s nearest neighbor in the other map will not have the same annotation. Instead of the sum of distances between corresponding elements, we can also count the number of corresponding elements that have similar coordinates and are nearest neighbors. This approach requires a one-to-one assignment of elements in two maps, that we will give in the following definition.

Definition 12.2.1: Given two LC-MS maps M :_{= {m}1, . . . , mk} and S := {s1, . . . , sl} and an ε > 0. The matching function match : R2_{× R}2 _{→ B with B = {0,1} is defined as follows:}

Two elements mi∈ M and sj∈ S are matched if their positions lie within anε-environment in

a weighted Euclidean metric d : R2_{× R}2_{→ R, and s}jis nearest neighbor of miand vice versa:

match(mi, sj) :=              1, d(mi, sj) <ε and ∀mr∈ M \ {mi},st∈ S \ {sj} : d(mi, sj_{) ≤ d(m}i, st) and d(mi, sj) ≤ d(mr, sj) 0, otherwise

For the annotated elements we could verify each match using the identification of the elements. The total number of matched elements with identical identifications indicates the similarity of two LC-MS/MS maps.

The match function allows for an assignment of unannotated elements in two LC-MS maps and thereby can also be used in a similarity measure for LC-MS maps. Although the lack of annotations prevents the verification of the matching, we can use the intensity of the elements as an additional similarity term instead. A matching of elements with similar intensities should be rewarded, whereas a matching of two elements with extremely different intensities should be penalized. The evaluation of the matching using the elements’ ion counts is a sensible as- sumption if the majority of peptides is not differentially expressed, which is usually the case. It should be noted that the comparison of intensities in different maps requires an intensity normalization of the maps [Katajamaa et al., 2006; Radulovic et al., 2004; Wang et al., 2007]. The matching function in Definition 12.2.1 indicates the similarity of matched elements’ positions, and the ion counts of two feature maps. Hence, we are now able to define a distance function or dissimilarity measure for LC-MS maps:

Definition 12.2.2: Given LC-MS maps M :_{= {m}1, . . . , mk} and S := {s1, . . . , sl} andε > 0.

Furthermore,(RT(mi), m/z(mi)) is the 2D position of the element miand int(mi) its ion count.

The distance or dissimilarity dsim : M_{× S → R of M and S is given by:} dsim_{(M, S) := max {k,l} −} k

∑

i=1 l

∑

j=1

match(mi, sj)|d(mi, sj) −ε|

min_{int(mi), int(s_j)}

max_{int(mi), int(sj)} .

Given two maps M :_{= {m}1, . . . , mk} and S := {s1, . . . , sl}, the codomain of the distance measure

is_{[0, . . . , max{k,l}].}

For all maps M, S and X dsim satisfies the following conditions

• dsim(M,S) ≥ 0 (non-negativity).

• dsim(M,S) = 0, if and only if M = S (identity).

• c(dsim(M,X) + dsim(X,S)) ≥ dsim(M,S) for some constant c ≥ 1 (relaxed triangle in-

equality).

• dsim(M,S) = dsim(S,M) (symmetry).

Similarity measures for partial matching, giving a small distance dsim(M, S) if a part of M

matches a part of S, in general do not obey the triangle inequality and it therefore makes sense to formulate a weaker form, the relaxed triangle inequality [Veltkamp, 2001]. Another useful property of dsim(M, S) is the symmetry, which guarantees that the order in which the maps are

compared does not matter.

As an explanatory example, Figure 12.1 shows two feature maps, “feature map 1” and “feature map 2”, which share a fraction of common features. “Feature map 1” depicts data from a real measurement. 80% of the data points were copied to “feature map 2” after their RT positions had been warped by an affine transformation T := 1.1x + 30. Addi-

tionally, random points were added to the bounding box. Since the RT dimension is usually more distorted than the m/z dimension we use a weighted Euclidean metric given by d(m, s) =qw2₁(mRT− sRT)2+ w₂2(mm/z− sm/z)2 with w1:= 1 and w2:= 10. Furthermore, we

allow for an error of 22 s and 0.2 Th and yieldε_{≈ 30. Due to the shift, the distance between}

the two maps is relatively large and shows up in the maximum dsim value of 195. Even with ε := 100 (corresponding to an error of 0.2 Th and 98 s) the dsim value of 190 indicates a

large dissimilarity of the maps. In Figure 12.1 on the right hand side “feature map 1” and the dewarped “feature map 2” are shown. The common 80% of the features have now similar positions and the dsim value of 30 indicates relatively similar maps.

This general distance function can be used for every type of LC and MS based experiment. Furthermore, it is also independent of the processing state of the maps, because it uses only the 2D positions and intensities of the elements. We will now use dsim to define the multiple LC-MS raw and feature map alignment problem.

12.2. A distance function dsim for LC-MS maps

m/z

rt

m/z

Figure 12.1: Top: Two LC-MS feature maps are shown. “Feature map 1” as well as “feature map 2”

contain 195 features. The two feature maps share 156 common features, but the RT positions of these features are shifted in “feature map 2” by an affine transformationT := 1.1x + 30. Thedsimvalue of the two dissimilar feature maps is195usingε= 30(allowing for an error of0.2Th in m/z and22s in RT) and even withε= 100(allowing for an error of0.2Th in m/z and98s in RT) the two maps have a large distance of190. Bottom: “feature map 1” and the dewarped “feature map 2” are shown. Thedsim value of these two feature maps is only30for bothε= 30andε= 100.

In document Analysis of mass spectrometric data: peak picking and map alignment (Page 115-119)