CHAPTER 1: The Vitamin K Cycle
2.3 Experimental
2.3.6 Data Processing
2.3.6.1 Protein-Lynx Global Server – DDA
Waters® ProteinLynx Global Server 2.4 (RC7) is a software package that automates processing of LC-MSMS or MSE data for proteomics. PLGS allows data acquired in
MassLynx to be loaded directly into a data analysis software package that will identify mass corrected peptide ions and match these peptides against a protein database for identification. Data analysis using PLGS requires two main method processing components: the data preparation file and the workflow template. The data preparation file applies a lockmass correction on the data acquired, addresses background subtraction and noise reduction while automating peptide deconvolution and centroiding to generate a peak list. Conditions for electrospray DDA data preparation method development were evaluated where the electrospray survey and MSMS processing parameters were kept constant. Lock spray calibration was enabled for [Glu1]-fibrinopeptide B lock mass correction m/z at 785.8426 Daltons for two consecutive scans within a mass tolerance of 0.25 Daltons for all methods generated. Methods were varied both in noise reduction parameters as well as deisotoping and centroiding settings to maximize efficient data analysis. The four data preparation conditions investigated were as follows: (a) adaptive background subtraction and slow deisotoping where the number of iterations were set to 40; (b) no background subtraction and slow deisotoping; (c) no background subtraction fast deisotoping of centroided data; and (d) no background subtraction with Savitzky-Golay smoothing (two iterations and a half-width window of three channels) with fast deisotoping of centroided data. After data treatment, peptide lists were processed with a fragment ion search workflow designer to search a particular databank of interest.
Databanks were created to include proteins encompassed during sample preparation. To include contaminant proteins, samples were first processed with both a compiled Swiss- Prot database as well as a databank specialized in the Spodoptera frugiperda (Sf9) insect cell
line reported proteome. From this evaluation, a specialized databank was created from the ExPASy proteomics server of the Swiss Institute of Bioinformatics (http://www.uniprot.org) of the most abundant proteins identified. UniProtKB provides a protein knowledgebase consisting of two sections: Swiss-Prot which is manually annotated and reviewed and TrEMBL which is automatically annotated but not reviewed. Swiss-Prot strives to provide reliable protein sequences associated with a high level of annotation such as the description of the function of a protein, its domains structure, post-translationalmodifications and variants with a minimal level of redundancy. The work presented was processed from the Swiss-Prot curated biological database of protein sequences. The databank created for GGCX proteolytic peptide identification included the following proteins listed by accession number: P38435-Homo Sapiens (Human) Vitamin K-dependent Gamma-Glutamyl
Carboxylase modified with the HPC4 C-terminus epitope tag (EDQVDPRLIDGK), Q8I866-
Spodoptera frugiperda (Sf9) Heat shock cognate 70 protein, P00766 and P00767-Bos Taurus
(Bovine) Chymotrypsinogen A and B respectively, P00761-Sus Scrofa (Porcine) Trypsin,
P04264 and P04259-Homo Sapiens (Human) Keratin type II cytoskeletal proteins 1 and 6B
respectively. To identify false positive protein selection, the specialized databank was randomized.
The databank of choice was uploaded into the workflow designer for peptide and protein sample identification. Here, both the MS and MSMS spectra were matched with a selected databank to identify the protein(s) in the original sample. The peptide and fragment
ion tolerances were set to 20 ppm and 0.5 Daltons respectively with an estimated calibration error of 0.05 Daltons. Digestions were completed with trypsin, chymotrypsin, modified chymotrypsin_NS, or nonspecific cleavage parameters with anywhere from one to three missed cleavages. Trypsin proteolytic cleavage was specified C-terminal to lysine (K) and arginine (R) not before proline (P). For chymotrypsin digestions, digestion parameters were modified to create a chymotrypsin_NS reagent tool to cleave peptides C-terminal to
methionine (M) and leucine (L) in addition to the traditional phenylalanine (F), tyrosine (Y), and tryptophan (W). A nonspecific digest reagent was also evaluated where peptide cleavage could occur at any given amino acid. Conditions for peptide scoring for nonspecific
digestions must be more stringent as a greater number of theoretical peptides may be
generated. To account for any sequence variability or post-translational adaptations, several common peptide modifications were added to the search parameters: N-terminus acetylation ( 42.011 Daltons), deamidation of asparagine and glutamine (0.984 Daltons), and
oxidation of methionine (15.995 Daltons). Results were validated to limit peptide identification to only peptides with three or more consecutive fragment ions from the same series. An example of data preparation and workflow parameter conditions for DDA analyzed samples is represented in Table 2.2.
In addition to the databank search for DDA processed samples, AutoMod processing allows for peptides with modifications, substitutions, or greater than three missed cleavages to be identified with the same databank selected. If there are peptides that are identified that do not meet the search criteria of the specified databank, a De Novo query can be applied that allows for sequence information to be extracted from the MSMS data and searched via a BLAST tool (Basic Local Alignment Search Tool) to match homologous proteins.
For most data sets, DDA PLGS2.4 processing is programmed using a lock mass corrected data preparation file with no background subtraction under fast deisotoping of centroided data. The workflow template fragment ion preparation is limited to a databank search under appropriate proteolytic digestion conditions with peptide and fragment ion tolerance set to 20 ppm and 0.5 Daltons accordingly. Variable peptide modifications including N-terminus acetylation, deamidation of asparagines and glutamine, and oxidation of methionine are also evaluated.