help me?
That depends on what data you have, and what you want to know. There is fairly a long list of caveats and exceptions that follow, but PIE currently requires some
effort to use, and this list is intended to help you avoid wasting time. If you are not sure, go ahead and try to use PIE. Even if today PIE can’t help you, it is designed for easy extension in multiple ways, and almost all the limitations below can theoretically be overcome, at least partially. We welcome comments on what PIE could or could not do for you.
A.4.1
Do you have MS data?
If not, PIE may not be useful to you. It can combine the result from multiple prediction tools, and if that is your intent, then great. But without experimental data, it will difficult for you to evaluate if a result is biologically meaningful.
A.4.2
Is the sequence of the underlying protein known?
PIE requires that you break your data up into queries about 1 protein target at a time, where the (canonical) sequence of the protein is known. If you are only looking at results from a purified protein, great. If you have multiple protein targets, each must be run separately, and on any given run of PIE you must tell it the one sequence you are targeting on that run and give it the data that applies to that one protein sequence only. PIE can handle data representing several different modification isoforms, but only a limited number at once.A.4.3
Is your peptide data targeted?
PIE requires that only a limit number of peptides not matching the isoform queried be present. If you have peptide or MS/MS data collected from a small pool of proteins, PIE may be able to work with that, but will evaluate the data in terms of
only one intact mass at a time. The more dilute the peptides from that target are, the worse PIE will perform. PIE is intended to work with peptides pre-identified as associated with a protein by some other software, such as GFS (Wisz et al, 2004) or MASCOT (Perkins et al., 1999).
A.4.4
Can you specify a discrete list of all modifications?
All modifications that are possible must be in this list, and each may not span amino acids. This means methylation and dimethylation both have to be specified, and that modifications that span AAs (like cystine crosslinks) need special rules to ensure they occur in sets. It also means that variable modifications like lipids and sugars can not be analyzed for.The shorter the list of modifications simultaneously searched for, the better. Both performance and accuracy degrade as the number of potential adduct modi- fications increase. In addition, each modification simultaneously searched for must have all needed prior data specified to use an associated prior scoring distribution (e.g. relative abundance or amino-acid preference).
A.4.5
Does the protein analyzed have any sequence
mutations?
The canonical sequence required by PIE must match that of the protein being analyzed. N and C terminal deletions are handled separately, and are not considered mutations. Insertions and deletions are generally not allowed, except possibly as specific point mutations that can be considered as adducts.
Single specific point mutations (e.g. S ⇒ T) can be considered as an adduct with an appropriate delta mass, massof(T)−massof(S). Point insertions can be
considered likewise as S⇒ ST, and point deletions as S ⇒ null. This is somewhat unsatisfactory, and problematic as any combination with an adduct also needs to be included separately, (S⇒ T+phos, etc.). PIE has only been tested searching for about 10 adducts. Including all 18 non-0 mass AA mutations X 10 possible adduct modifications even for just one mutable AA transition adds 180 more to the adduct list; this is currently too large a list to allow PIE to work efficiently.
A.4.6
Do you have enough data to determine an answer?
If you don’t have enough data to constrain the possible solutions and actually de- termine an answer, PIE can’t help. But you don’t really need to answer this, PIE makes the best of the data it has, and will produce blurry guesses even with weak data. PIE will let you know how poor your data is by not finding answers, by report- ing the “no answer” answer of “unmodified”, or by finding many indistinguishable answers. If you can add more data, you can rerun PIE to refine answers.A.4.7
Congratulations!
If you made it this far, then PIE can help you to quickly and automatically ana- lyze your data. It will provide the most-likely modification pattern represented by your data, including AA localized adducts, N-terminal cleavages, and C-terminal cleavages. It will also provide a description of the ensemble of nearly-best answers, allowing you to understand in a deep way just what it is that your data is telling you about what modifications are present, which parts of the prediction are more or less sound than others, and the relative contribution of each data type.