Black Swallowtail - larvae and butterfly
Why Proteomics?
Same Genome
Biological Complexity
•
Yeast - a “simple” proteome
– 6,113 proteins = 344,855 tryptic peptides
– “is proteomics the most significant analytical challenge?”
•
Qualitative analysis by LC/MS/MS
– need at least one peptide with at least 8 amino acids
•
Quantitative analysis by LC/MS
– need at least one peptide pair from a differential expression study
•
In Theory, complete coverage of yeast proteome at the
protein level requires the analysis of “only” 6,113 peptides
•
In Reality, one needs multiple peptides from each protein
•
Replicates needed for statistics at each time point and
biological condition (4X+ for cell lines & inbreed
Biologically Relevant Information
- key information required
•
Changes as a result of biological challenge (disease, drug)
– expression level change?
– change in post-translational modification(s)? – pattern of changes?
•
Biological function(s) of identified proteins
– protein complex component?
– role in signaling/regulatory pathway? – sub-cellular localization?
– multiple functions (different compartments, splice variants)
•
Multiple types of information required to understand biological
function
Expression Proteomics
Protein quantification
Interaction Proteomics
Protein-protein associations
Basics of Protein Identification
- same for proteomics as other protein MS studies - fundamental difference is scale of analysis
PROTEOME PROTEIN GENOME EST dB MALDI MS/MS
GENE PRODUCT I.D.
ESI MS/MS
HPLC
DIGEST MALDI MSHigh Throughput Proteomics
•
Success in proteomics is the analysis of all of the
biologically relevant proteins
– obtaining high proteome coverage is more important than high
throughput
– samples represent months of prior work by collaborators
– proteomic results will lead to months of work by collaborators
•
Need detailed analysis, not just high throughput
•
Obtaining high quality data is the key to
success in proteomics
What is High Information Content
in Proteomics?
•
High sample throughput
– number of samples replicated
• meaningful analytical statistics
– number of biological samples/group
• meaningful biological statistics
•
High proteome coverage
– number of proteins identified
•
High protein coverage
– number peptides identified
• PTMs, splice variants
•
High information content
– improved biological knowledge
Interaction Proteomics
- protein interaction networks
Characterization of protein interactions in order to understand the function of individual proteins, protein complexes, and protein interaction networks
Multi-subunit Complexes
Nuclear Receptor p/CIP CBP/p300 NCoA P/CAF NCoR/SMRT Sin3 HDAC-1 - ligand repressed activated + ligand Gβ Gγ STE20 PAK HOMOLOGUE MAPKKK STE7 KSS1 FUS3 MAPKK MAPKBiochemical Pathway
STE11Interaction Proteomics Using Mass Spectrometry
bait Endogenous or exogenous bait protein plasmid expressing epitope-tagged bait protein isolate protein complex inspect on 1-D gel analyze components by LC/MS/MS in-gel or solution protease digest Database Search: Match peptide sequences to protein sequencesInteraction Proteomics
- improving information content via double-tagging strategy
- minimize non-specific protein binding
Elution 1st affinity column Nickel agarose beads 2nd affinity column Coupled HA11 beads TEV protease cleavage site HIS tag HA tag TARGET Cell expression TEV protease cleavage
“Systematic Identification of Protein Complexes
in Saccharomyces cerevisiae by Mass
Spectrometry”
Nature, 415, 180, 2002
Kss1 = MAP kinase
Blue arrows = known interactions
Red arrows = new interactions
“Functional organization of the yeast
proteome by systematic analysis
of protein complexes”
Nature, 415, 141,2002
Figure 4: The protein complex network,
and grouping of connected complexes. Links were established between
complexes sharing at least one protein
Cellular roles: red - cell cycle
dark green - signalling
dark blue - transcription, DNA maintenance, chromatin structure
pink protein and RNA transport orange -RNA metabolism
light green - protein synthesis and turnover
brown - cell polarity and structure violet - intermediate and energy metabolism
light blue - membrane biogenesis and traffic
The lower panel is an example of a complex linked to two other complexes by shared components.
It illustrates the connection between the protein and complex levels of organization.
Expression Proteomics
- 2D gels and mass spectrometry platform
...and so on, maybe 500 times
LC/MS/MS peptide mass and
sequence information1 23 550 Database search Automated In-gel Digestion MALDI/MS peptide mass only (peptide mass fingerprint)
Database search Separation & Quantitation 1D- or 2D-gel Automated Spot Cutting
1 spot or band comprises many proteins
2D Gel Electrophoresis
•
Most powerful analytical method for protein separation
– multi-dimensional separation technique
• IEF and PAGE
– very sensitive detection schemes
• SyproRuby fluorescence staining
– rugged quantitation
– useful for visualization of PTM changes – sometimes described as ‘low throughput’
• not true - can be higher throughput than ‘non-gel based proteomics’ – parallel processing of gels
– analysis of only proteins undergoing expression change
•
Challenges with 2D gels
– specialized gels required for ‘extreme proteins’
Expression Proteomics
Wild type
knockout
Down regulated spot Missing Spot
Improved Information Content
- reproducibility & meaningful biological statistics
pH 4 pH 7
• 2D PAGE
Improved sample throughput by running gel samples in parallel
Improved Sample Throughput
- automated spot cutting
CCD camera
Cutting head
Barcode reader
LED (excitation 480 nm,
emission 520 nm LP)
Gel
Cutting head cleaner
•
cuts blots, wet and dry gels
Expression Proteomics
- industrialized
(GeneProt, Geneva)•
GeneProt’s approach – two serum/plasma samples – 5 liters each – 60,000 LC fractions analyzed by MS – 320,000 2D gel spots analyzed by MS – 50+ mass spectrometers – 1,462 processor server – 6 months 1,462 processor server 40+ ion trap MS/MS systems with multiplexed LCs - 2 LCs per MS for improved throughputExpression Proteomics
- developing methods
•
Isotope coding
– qualitative and quantitative analysis by MS and MS/MS
– different isotope codes
• isotope coded affinity tags – cys labeling
• 18O tagging
– via digestion in 18O water
• CH3 and CD3 ester formation
•
Direct LC/MS
– peptide mass mapping – ion mapping
– AMRTs
Graphic from M. Mann editorial
Improved Proteome Coverage
- use of multiple analysis methods
•
How can analytical coverage of the proteome be improved?
– Use both non-gel based and gel-based approaches
which are complementary
• Non-gel based “Shotgun Proteomics”
– total digest of complex protein mixture
– multidimensional chromatography and tandem mass spectrometry
• LC/LC/MS/MS
Improved Proteome Coverage
- use of multiple analysis methods
- shotgun proteomics and 2D Gels
•
Shotgun proteomics (LC/LC/MS/MS) minimal bias against specific protein classes– total proteolytic digestion yields tractable peptides from − very large and very small proteins
− acidic and basic proteins (typical 2D gel pI 4-7) − hydrophobic proteins (e.g. membrane proteins) – qualitative information contained in MS/MS spectrum – quantitative information contained in MS spectrum
− significantly facilitated with use of isotope coded tags
•
2D gel may provide higher throughput– batch processing of samples in parallel
Improved Proteome Coverage
- shotgun proteomics using LC/LC/MS/MS
- analysis of nuclear fraction of cancer cell line
LC 1 Results Correlation 60 - 400 proteins per fraction LC 2 Results Correlation 1,500 peptides per fraction Tissue Fractions nuclear fraction LC 1 Fractions IEX 40 fractions LC 2 Fractions RP 3 hr. gradient Tissue Fraction Results Correlation Tissue Knowledge MS 2 60,000+ peptides analyzed by MS and MS/MS MS 1 Database Search
1,898 proteins identified in nuclear extract
- 120 hour MS/MS acquisition
Improved Proteome Coverage
- multiple ionization sources for LC/MS/MS
•
Post LC column split between ESI and MALDI
•
Option 1 - both systems in automated MS/MS mode
– more comprehensive direct analysis
• better proteome coverage
•
Option 2 - targeted analysis
– initial ESI/MS/MS analysis
• process and interpret data
– follow-up MALDI/MS/MS analysis
Splitter 1 20% 80% ESI / MS / MS MALDI / MS / MS Nanoscale LC / LC MALDI Plate Spotting
Improved Proteome Coverage
- Probot LC/MALDI interface
Improved Proteome Coverage
- use of multiple ionization sources for MS/MS
- 51 mitochondrial ribosomal proteins
8 Unique by LC/ESI/MS/MS 11 Unique by LC/MALDI/MS/MS
32
Both
78% Identified by LC/ESI/MS/MS 16% Unique 84% Identified by LC/MALDI/MS/MS 22% UniqueSignificant overlap between the two datasets (63%)
Significant additional information obtained (37%)
Conclusions
•
Proteomics is a much bigger challenge than most people realize– easy to get high level proteins, essentially impossible to get complete coverage of complex mixtures with current technologies
– “genes were easy”
•
Multiple integrated approaches are needed to provide high information content to biological studies– genomic and proteomic methods – multiple proteomic methods
• gel-based and non-gel based studies
– sample depletion and fractionation methods needed
• multiple analytical methods for protein ID
•
This integrated approach yields a large amount of data – true success requires integration of different datasets• requires significant bioinformatic resources