Phosphoproteomics - Development and application of software and algorithms for network approach

An estimated third of eukaryotic proteins being phosphorylated [Cohen, 2000] makes phosphorylation one of the key PTMs. In an active balance, kinases and phosphatases rapidly and reversibly modify proteins in order to alter their function [Hunter, 1995].

While in signaling, phosphorylation often acts in an activating or deactivating man-ner [Macek et al., 2009], it can modulate almost all protein functions, including localiza-tion, half-life and interactions [Cohen, 2000]. Disruptions of phosphorylation-mediated signaling leads to diseases, such as cancers and conversely makes kinases and phos-phatases attractive drug targets [Ventura and Nebreda, 2006].

Since the detection of the first phosphorylation site on vitellin in 1906 by Levene and Alsberg, phosphorylation sites have been studied one site at a time. However, only an unbiased and comprehensive measurement of the phosphoproteome would allow the understanding of the effects of protein phosphorylation at a systems level. MS provides

Figure 1.5: Node-link visualization of a large cluster of proteins from a yeast PPI net-work generated using the Y2H method [Uetz et al., 2000]. Protein nodes are colored according to the phenotype observed when removed (red, lethal; green, non-lethal; or-ange, slow growth; yellow, unknown). Adapted from [Jeong et al., 2001].

an ideal platform to study PTMs such as phosphorylation which introduce a detectable mass shift at their modification site. The main challenges are not only to measure the phosphoproteome in a reproducible, high-throughput and deep-coverage manner, but also to adapt and extend the subsequent statistical analysis.

Due to the low stoichiometry of phosphorylation [Riley and Coon, 2016], phospho-peptides are hard to detect in regular proteomics experiments. Therefore, phosphopro-teomic experiments are optimized to increase the coverage of phosphopeptides. First, digestion can be improved by using additional restriction enzymes which alleviate is-sues with suboptimal trypsin cleavage close to phosphorylated amino acids [Wi´sniewski and Mann, 2012]. Second, a phosphopeptide enrichment step greatly increases the chances to detect low abundant phosphopeptides (see Figure 1.6). For smaller scale experiments, antibodies targeting phosphorylated residues can be enriched for by im-munoprecipitation (IP) [Grønborg et al., 2002]. TiO₂ enrichment is better suited for large-scale experiments. Here, the oxygen in the phosphoryl group interacts with the metal oxid matrix [Thingholm et al., 2006]. Alternatively, immobilized metal affinity chromatography (IMAC) [Villén and Gygi, 2008] exploits the affinity of the negatively charged phosphate group on the peptide, to the positively charged metal cations in the column. By utilizing extensive prefractionation the coverage of phosphopeptides can be increased further. The downside to prefractionation is the increased amount of labor and measurement time required for the analysis. Recent streamlined protocols in 96-well format coupled with single-shot LC-MS enable high-throughput measurements of a large number of phosphoproteomic samples [Humphrey et al., 2015, 2018]. For the quantification of phosphoproteomics experiments, label-free as well as metabolic and isobaric labeling have been employed successfully [Hogrebe et al., 2018].

Compared to the identification of unmodified peptides from MS²spectra, the iden-tification of the modified phosphopeptide and the localization of the modification on the peptide sequence require special considerations [Potel et al., 2018, Sinitcyn et al., 2018a]. In a database search framework, extending the sequence database with mod-ified sequences allows for the identification of modmod-ified sequences. The mass shift on parts of the fragment ion series alongside modification-specific neutral losses and di-agnostic peaks are considered when generating theoretical spectra. A final localization probability can be derived from the scores assigned to the theoretical fragment spec-tra with the different possible localizations [Beausoleil et al., 2006, Cox et al., 2011].

mAB

TiO2 P

O-O O

phosphorylated peptides

Antibody-based ionic interaction

digestion

Figure 1.6: The phosphoproteomics workflow includes an additional phosphopeptide enrichment step prior to MS analysis. Adapted from [Hein et al., 2013].

When complementary proteome data is available stoichiometry can be derived from the measured intensities by calculating the ratios between the modified and unmodi-fied peptide [Cox and Mann, 2008, Olsen et al., 2010, Sharma et al., 2014].

The identification of large numbers of phosphorylation sites has been successfully used for evolutionary studies. Phosphorylation sites were found to be conserved when occurring in structured regions of the protein, and rapidly evolving when located in unstructured regions [Collins, 2009]. To study the regulatory effect of phosphoryla-tion, a quantitative analysis of the up to 50,000 detected sites detected by MS is re-quired [Sharma et al., 2014]. The first of the two core challenges in the analysis phos-phoproteomic data is the increased variability inherent to peptide-level data. Protein-level proteome analyses benefit from peptide to protein aggregation, leading to higher confidence data [Sinitcyn et al., 2018a]. The second challenge is the translation of the observed changes in phosphorylation patterns into mechanistic insights, since in gen-eral, phosphorylation sites can exhibit excitatory, inhibitor or even combinatoric effects on the function of the substrate [Macek et al., 2009].

While many phosphoproteomic studies still pursue analysis approaches taken from proteome analysis, such as identifying differential expression or clustering on a

phos-Figure 1.7: Kinase-substrate networks can be represented as either directed, site-specific networks which naturally arise from kinase-substrate interactions or kinase motifs, or as undirected protein-level interactions modeled after large-scale PPI networks.

Adapted from [Rudolph and Cox, 2018].

One core concept that is exploited by most of these analyses it the relationship between kinases and their substrate, which allows to restate the aims of the analysis. Instead of focusing or the individual phosphorylation sites, the aim is to identify regulated ki-nases from large-scale phosphoproteomic data. Changes in the phosphorylation levels of the substrates imply corresponding changes in the activity of the kinase.

1.4.1 Kinase-substrate networks and kinase activities

Prerequisite for a kinase-centric analysis of phosphoproteomic data is the assignment of kinases to their specific substrates. To do so, phosphorylation events can be nat-urally described as a directed interaction between a kinase and a specific site on the substrate (Figure 1.7) forming a site-specific kinase-substrate network. Only a limited number of such site-specific interactions are available in databases such as Phospho-SitePlus [Hornbeck et al., 2015]. However, in order to obtain a comprehensive picture on kinase activity, a high coverage of kinases and sites would be required.

computational tools address this issue. The statistical analysis of the peptide sequences identified in large-scale phosphoproteomics experiments allows for the identification of phosphorylation motifs which can serve as a proxy for a kinase [Schwartz and Gygi, 2005]. The NetPhorest tool [Miller et al., 2008] integrates motif finding with the phy-logeny of kinases to pinpoint the kinase family of a phosphorylation site. Combined with the network context modeling approach of NetworKIN [Linding et al., 2008], the KinomeXplorer [Horn et al., 2014] platform, integrates both tools to predict human kinase-substrate interactions, which can be used to supplement kinase-substrate net-works from literature.

Most tools for the scoring of kinase activities from phosphoproteomic data have a common approach. Using the kinase-substrate assignments from the network, an ac-tivity score is calculated for each kinase/kinase family/motif from the observed phos-phorylation changes on the assigned sites. The KSEA [Casado et al., 2013] method includes a variety of scores, including a Z-score and its associated p-value, which per-forms well for relative comparisons between conditions [Hernandez-Armenta et al., 2017]. For comparisons within the same conditions KARP [Wilkes et al., 2017] ranks the kinases based on their contribution to the total observed phosphorylation. Compu-tationally more involved approaches try to optimize binding affinity between kinases and substrates (IKAP [Mischnik et al., 2015]), or even derive an entire logic model from a phosphoproteomic screen of kinase inhibitors [Terfve et al., 2015].

The requirement for site-specific interactions and the focus on deriving the enzy-matic activity of kinases from the data limits the scope such methods. While the phos-phorylation itself is mediated by the kinases, non-kinase proteins contribute to signal-ing by enablsignal-ing signal transduction through complex formsignal-ing and scaffoldsignal-ing. The PHOTON [Rudolph et al., 2016, Rudolph and Cox, 2018] method devises a more gen-eral signaling functionality score which allows it to use large-scale PPI networks to perform the analysis. The undirected, site-agnostic nature of such networks (Figure 1.7) reduces the specificity of the approach, but dramatically increases the quality of the net-work and the amount of data that can be utilized in the analysis. The PHOTON score itself is calculated from the average phosphorylation changes observed on the

In document Development and application of software and algorithms for network approaches to proteomics data analysis (Page 49-55)