The fragmentScoring module - Scoring Module Reference

C.4 Scoring Module Reference

C.4.4 The fragmentScoring module

Bottom-up experiments can provide the masses of peptides aligned to a region of a target protein. Any adduct modifications are partially localized by the extent of the peptide, although not to specific AAs. By matching experimental peptide masses to the associated region of a guess, the quality of that guess can be evaluated. The more peptides that match and the closer the mass of a given peptide to the matching region, the better the guess. A perfect guess should match all peptides within experimental tolerances, although in some cases there will be contradictory or erroneous data, described in more detail below.

Two different scoring functions can be used to model this, deltaMassScoring, which

uses scoring similar to the intact scoring module, anderrorCounting, which uses scoring

similar to the localized fragment scoring module described in the following section. PIE assumes all data it is presented with applies to the current prediction. It is possible for matched peptide lists to contain peptides from multiple isoforms of the target protein. The intact mass will help PIE select as a final answer one isoform compatible with both intact and fragment data (indeed this is the main driving force for creating PIE). However, if peptide data is lop-sided enough, data from this module can outweigh the intact isoform data, causing it to be ignored. If this happens the data set should be split with each set containing only one of the contradictory peptides, but all others. Running PIE separately on each of these data sets should then provide at least one answer consistent with the MS/MS and the intact mass. Note, if there are multiple points where contradictory peptides are present, then this may need to be done repeatedly.

It is also possible that peptides are present that match no intact mass isoform detected. Again, if enough of these peptides are present, they will distract PIE as it tries to match them. This may be useful as an alternate answer, but if the intact mass can be trusted to truly represent an isoform, another answer still exists. Again the solution is to split the data, splitting on conflicting data or possibly on data supporting one PTM, and trying again.

The deltaMassScoring algorithm

This algorithm bases the match scores for each peptide on the difference between masses of

a peptide and its associated region of the guess, similar to scoring by theintactScoring

module. A total score for all peptides is calculated by multiplying the individual peptide match scores together.

S =nm·Y

(|Me,i−Mt,i|+E)

where

i= iterator over all matched peptides;

Me,i= Experimental mass of peptide i;

Mt= Theoretical mass of guess region aligned to peptide i;

E = Experimental error;

n= novelty factor;

m= count of novel modifications.

(C.4)

Modification novelty is explicitly part of the model. If a guess contains modifications that were not detected in bottom up experiments, the assumption is this is less likely to be correct. This implements a type of “Occam’s razor” filtering, allowing PIE to prefer-

entially select simpler explanations. Settingn= 0.5 is recommended to allow a simplified

interpretation of an error, this is described below in the section onerrorCounting.

Although the error E has a specific meaning (see the comment on error in C.4.2, The

The errorCounting algorithm

This algorithm bases match scores on the number of matched AA and matched modifications identified by an external program such as GFS (Wisz et al., 2004) or MASCOT

(Perkins et al., 1999). This is similar to thelocalizedFragmentScoringmodule.

S=nm·Y i 1 2ai·₂mi_{+ 1} where n= novelty factor,

m= count of novel modifications,

i= iterator over all matched peptides,

ai = count of mismatched AA over aligned guess region for peptide i,

mi = count of mismatched modifications over aligned guess region for peptide i,

+1 = small shift to avoid singularities.

(C.5)

By using a factor of two for each match, the ratio of any two guesses different by 1 error, either an unmatched AA or an unmatched modification, is 0.5. This helps maintain a quantitative interpretation nearly-best guesses relative to the best guess. The +1 only significantly effects values with very few mismatches, although this does shift the meaning of “error“ somewhat.

See the previous description of the deltaMassScoring algorithm for details on the

novelty factor.

Parameters

isFragmentScoring =false

peptides is turned off as it requires experiment-specific data.

fragmentDataFile = “fragment.txt”

If usingfragmentScoring, PIE needs a data file. The file “fragment.txt” will be

searched for by default, but any name can be specified.

fragmentScoringAlgorithm = “deltaProductMass”

Specifies which basic scoring algorithm is used. Two options are currently sup- ported: “deltaProductMass”, the default, which uses mass-difference based scoring and “errorCounting”, which uses the putative sequence and modifications of an identified peptide along with an accuracy measure to compute the score.

fragmentMassType = “AVG”

Declares the type of mass value used in the associated data file. The default is “AVG” for average masses. “MONO” is also allowed, specifying that monoisotopic mass values are given.

noveltyFactor =0.5

Multiply the returned peptide score by this value for each modification that is of a

type not detected in the peptide data. If used with the defaultdeltaProductMass

scoring algorithm, the modifications column of the associated data file will need to contain accurate data. To turn off novelty scoring, set this to 1.0.

proteinName = “OVER RIDE”

The target protein not only selects theFASTA file sequence to use, but also selects

the line from the topDownDataFile that will be read and used as data for this

scoring module during a run.

In document Index Catalog // Carolina Digital Repository (Page 125-128)