Introduction to Proteomics

(1)

(2)

Black Swallowtail - larvae and butterfly

Why Proteomics?

Same Genome

(3)

Biological Complexity

• Yeast - a “simple” proteome

– 6,113 proteins = 344,855 tryptic peptides

– “is proteomics the most significant analytical challenge?”

• Qualitative analysis by LC/MS/MS

– need at least one peptide with at least 8 amino acids

• Quantitative analysis by LC/MS

– need at least one peptide pair from a differential expression study

• In Theory, complete coverage of yeast proteome at the

protein level requires the analysis of “only” 6,113 peptides

• In Reality, one needs multiple peptides from each protein

• Replicates needed for statistics at each time point and

biological condition (4X+ for cell lines & inbreed

(4)

Biologically Relevant Information

- key information required

• Changes as a result of biological challenge (disease, drug)

– expression level change?

– change in post-translational modification(s)? – pattern of changes?

• Biological function(s) of identified proteins

– protein complex component?

– role in signaling/regulatory pathway? – sub-cellular localization?

– multiple functions (different compartments, splice variants)

• Multiple types of information required to understand biological

function

(5)

Expression Proteomics

Protein quantification

Interaction Proteomics

Protein-protein associations

(6)

Basics of Protein Identification

- same for proteomics as other protein MS studies - fundamental difference is scale of analysis

PROTEOME PROTEIN GENOME EST dB MALDI MS/MS

GENE PRODUCT I.D.

ESI MS/MS

HPLC

DIGEST MALDI MS

(7)

High Throughput Proteomics

• Success in proteomics is the analysis of all of the

biologically relevant proteins

– obtaining high proteome coverage is more important than high

throughput

– samples represent months of prior work by collaborators

– proteomic results will lead to months of work by collaborators

• Need detailed analysis, not just high throughput

•

Obtaining high quality data is the key to

success in proteomics

(8)

What is High Information Content

in Proteomics?

• High sample throughput

– number of samples replicated

• meaningful analytical statistics

– number of biological samples/group

• meaningful biological statistics

• High proteome coverage

– number of proteins identified

• High protein coverage

– number peptides identified

• PTMs, splice variants

• High information content

– improved biological knowledge

(9)

Interaction Proteomics

- protein interaction networks

Characterization of protein interactions in order to understand the function of individual proteins, protein complexes, and protein interaction networks

Multi-subunit Complexes

Nuclear Receptor p/CIP CBP/p300 NCoA P/CAF NCoR/SMRT Sin3 HDAC-1 - ligand repressed activated + ligand Gβ Gγ STE20 PAK HOMOLOGUE MAPKKK STE7 KSS1 FUS3 MAPKK MAPK

Biochemical Pathway

STE11

(10)

Interaction Proteomics Using Mass Spectrometry

bait Endogenous or exogenous bait protein plasmid expressing epitope-tagged bait protein isolate protein complex inspect on 1-D gel analyze components by LC/MS/MS in-gel or solution protease digest Database Search: Match peptide sequences to protein sequences

(11)

Interaction Proteomics

- improving information content via double-tagging strategy

- minimize non-specific protein binding

Elution 1st affinity column Nickel agarose beads 2nd affinity column Coupled HA11 beads TEV protease cleavage site HIS tag HA tag TARGET Cell expression TEV protease cleavage

(12)

“Systematic Identification of Protein Complexes

in Saccharomyces cerevisiae by Mass

Spectrometry”

Nature, 415, 180, 2002

Kss1 = MAP kinase

Blue arrows = known interactions

Red arrows = new interactions

(13)

“Functional organization of the yeast

proteome by systematic analysis

of protein complexes”

Nature, 415, 141,2002

Figure 4: The protein complex network,

and grouping of connected complexes. Links were established between

complexes sharing at least one protein

Cellular roles: red - cell cycle

dark green - signalling

dark blue - transcription, DNA maintenance, chromatin structure

pink protein and RNA transport orange -RNA metabolism

light green - protein synthesis and turnover

brown - cell polarity and structure violet - intermediate and energy metabolism

light blue - membrane biogenesis and traffic

The lower panel is an example of a complex linked to two other complexes by shared components.

It illustrates the connection between the protein and complex levels of organization.

(14)

Expression Proteomics

- 2D gels and mass spectrometry platform

...and so on, maybe 500 times

LC/MS/MS peptide mass and

sequence information1 2₃ 550 Database search Automated In-gel Digestion MALDI/MS peptide mass only (peptide mass fingerprint)

Database search Separation & Quantitation 1D- or 2D-gel Automated Spot Cutting

1 spot or band comprises many proteins

(15)

2D Gel Electrophoresis

• Most powerful analytical method for protein separation

– multi-dimensional separation technique

• IEF and PAGE

– very sensitive detection schemes

• SyproRuby fluorescence staining

– rugged quantitation

– useful for visualization of PTM changes – sometimes described as ‘low throughput’

• not true - can be higher throughput than ‘non-gel based proteomics’ – parallel processing of gels

– analysis of only proteins undergoing expression change

• Challenges with 2D gels

– specialized gels required for ‘extreme proteins’

(16)

Expression Proteomics

Wild type

knockout

Down regulated spot Missing Spot

(17)

Improved Information Content

- reproducibility & meaningful biological statistics

pH 4 pH 7

• 2D PAGE

Improved sample throughput by running gel samples in parallel

(18)

Improved Sample Throughput

- automated spot cutting

CCD camera

Cutting head

Barcode reader

LED (excitation 480 nm,

emission 520 nm LP)

Gel

Cutting head cleaner

• cuts blots, wet and dry gels

(19)

Expression Proteomics

- industrialized

(GeneProt, Geneva)

•

GeneProt’s approach – two serum/plasma samples – 5 liters each – 60,000 LC fractions analyzed by MS – 320,000 2D gel spots analyzed by MS – 50+ mass spectrometers – 1,462 processor server – 6 months 1,462 processor server 40+ ion trap MS/MS systems with multiplexed LCs - 2 LCs per MS for improved throughput

(20)

Expression Proteomics

- developing methods

• Isotope coding

– qualitative and quantitative analysis by MS and MS/MS

– different isotope codes

• isotope coded affinity tags – cys labeling

• 18_{O tagging}

– via digestion in 18_{O water}

• CH₃ and CD₃ ester formation

• Direct LC/MS

– peptide mass mapping – ion mapping

– AMRTs

Graphic from M. Mann editorial

(21)

(22)

(23)

(24)

Improved Proteome Coverage

- use of multiple analysis methods

• How can analytical coverage of the proteome be improved?

– Use both non-gel based and gel-based approaches

which are complementary

• Non-gel based “Shotgun Proteomics”

– total digest of complex protein mixture

– multidimensional chromatography and tandem mass spectrometry

• LC/LC/MS/MS

(25)

Improved Proteome Coverage

- use of multiple analysis methods

- shotgun proteomics and 2D Gels

•

Shotgun proteomics (LC/LC/MS/MS) minimal bias against specific protein classes

– total proteolytic digestion yields tractable peptides from − very large and very small proteins

− acidic and basic proteins (typical 2D gel pI 4-7) − hydrophobic proteins (e.g. membrane proteins) – qualitative information contained in MS/MS spectrum – quantitative information contained in MS spectrum

− significantly facilitated with use of isotope coded tags

•

2D gel may provide higher throughput

– batch processing of samples in parallel

(26)

Improved Proteome Coverage

- shotgun proteomics using LC/LC/MS/MS

- analysis of nuclear fraction of cancer cell line

LC 1 Results Correlation 60 - 400 proteins per fraction LC 2 Results Correlation 1,500 peptides per fraction Tissue Fractions nuclear fraction LC 1 Fractions IEX 40 fractions LC 2 Fractions RP 3 hr. gradient Tissue Fraction Results Correlation Tissue Knowledge MS 2 60,000+ peptides analyzed by MS and MS/MS MS 1 Database Search

1,898 proteins identified in nuclear extract

- 120 hour MS/MS acquisition

(27)

Improved Proteome Coverage

- multiple ionization sources for LC/MS/MS

• Post LC column split between ESI and MALDI

• Option 1 - both systems in automated MS/MS mode

– more comprehensive direct analysis

• better proteome coverage

• Option 2 - targeted analysis

– initial ESI/MS/MS analysis

• process and interpret data

– follow-up MALDI/MS/MS analysis

(28)

Splitter 1 20% 80% ESI / MS / MS MALDI / MS / MS Nanoscale LC / LC MALDI Plate Spotting

Improved Proteome Coverage

- Probot LC/MALDI interface

(29)

Improved Proteome Coverage

- use of multiple ionization sources for MS/MS

- 51 mitochondrial ribosomal proteins

8 Unique by LC/ESI/MS/MS 11 Unique by LC/MALDI/MS/MS

32 Both

78% Identified by LC/ESI/MS/MS 16% Unique 84% Identified by LC/MALDI/MS/MS 22% Unique

Significant overlap between the two datasets (63%)

Significant additional information obtained (37%)

(30)

(31)

(32)

(33)

(34)

Conclusions

•

Proteomics is a much bigger challenge than most people realize

– easy to get high level proteins, essentially impossible to get complete coverage of complex mixtures with current technologies

– “genes were easy”

•

Multiple integrated approaches are needed to provide high information content to biological studies

– genomic and proteomic methods – multiple proteomic methods

• gel-based and non-gel based studies

– sample depletion and fractionation methods needed

• multiple analytical methods for protein ID

•

This integrated approach yields a large amount of data – true success requires integration of different datasets

• requires significant bioinformatic resources

•

Final step