What are the most abundant proteins in a cell?

(1)

What are the most abundant proteins in a cell?

Even after reading several textbooks on proteins, one may still be left wondering which of these critical molecular players in the life of a cell are the most quantitatively abundant. Though figuring this out by pure thought alone is generally not easy, cells in the leaves of plants are that rare case in which it is relatively easy to make an estimate. The carbon-fixing enzyme Rubisco, the molecular gatekeeper between the inorganic and the organic worlds is required at extremely high concentrations. Let’s see why. As schematically depicted in Figure 1, the photon flux under full illumination is about 2000 microEinstein/m2_{-s. About 10-30% of this flux is maximally} utilized and beyond that there is saturation of the photosynthetic apparatus. About every 10 photons supply enough energy to fix one carbon atom. Rubisco works at a sluggish maximal rate of ≈1-3 per sec per catalytic site. From this alone, we can see that the cell thus needs ≈0.3-3x107 Rubisco molecules per micron2_{cross section. A Rubisco monomer has a mass of 60kDalton (BNID} 105007) and so the weight per micron2_{is ≈0.3-3x10}-12_{g. Let’s estimate the total protein content} in leaf. A characteristic leaf has a height of about 200 micron. ≈80% of the volume is vacuoles (BNID 103442) and the dry mass will be ≈30% of this volume with proteins consisting about half, so we arrive at about 6x10-12_{g of protein per cell as derived in Figure 1. We conclude that about} 5-50% of the protein mass is Rubisco. Indeed, the experimental determinations in C3 plants such as wheat, potato and tobacco find that Rubisco constitutes in the range of 25-60% of all soluble proteins in such cells (BNID 101762).

The protein census for other organisms, even model microorganisms, is more complicated. In the late 1970s, a unique catalog of the quantities of 140 proteins under different growth rates in E. coli was created using 2D gel electrophoresis and 14_{C labeling (Pedersen et al, Cell 1978 BNID} 106195). Newer methods have recently enabled extensive protein wide surveys of protein content using mass spectrometry (BNID xxx), TAP labeling (Ghaemmaghami 2003, BNID 101845) and fluorescent light microscopy (Taniguchi et al., 2010, BNID xxx). A new database (http://pax-db.org/) has been created to collect such data on protein abundances across organisms. The picture emerging from these kinds of experiments shows several prominent players. First, not surprisingly, ribosomal proteins and their ancillary components are highly abundant. The elongation factor EF-TU, responsible for mediating the entrance of the tRNA to the free site of the ribosome, was characterized as the most abundant protein in the original 1978 catalog with a copy number of ~58,000 proteins per bacterial genome. This absolute molecular count can be repackaged in concentration units and is roughly equivalent to 100 μM (BNID 104733). Recall that under different growth conditions the cell size and thus total protein content can change several fold (see, for example, the vignette on yeast size) and this media dependence to the protein census is especially important for ribosomal proteins.

Another contender for the title of most abundant protein is ACP, the Acyl carrier protein, which plays an important role in fatty acid biosynthesis. This protein carries fatty acid chains as the chains are elongated. It is claimed to be the most abundant protein in E. coli, with about 60,000 molecules per cell (BNID 106194). In a recent high throughput mass spectrometry measurement on minimal medium (Lu, 2007 BNID 104246), a value of ≈76,000 was reported making it the third most abundant protein reported. Table 1 gives a rank ordering of some of the most ubiquitous proteins found in E. coli, though it should be noted that there are inconsistencies between the different experimental approaches that have not yet been fully settled. The most

(2)

abundant protein found in this particular survey of E. coli is RplL, a ribosomal protein (estimated at ≈109,000 copies per cell, and reported (Subrananlan, 1975) to be in 4 copies per ribosome in contrast to other ribosomal proteins which have one copy per ribosome) and TufB (the elongation factor also known as EF-TU, estimated at ≈87,000 copies per cell). The next most abundant reported proteins are GroS (MopB, 65,000), a component of the chaperone system Gro-EL-Gro-ES necessary for proper folding of many proteins and GapA (49,000), a key enzyme in glycolysis.

Structural proteins can also be highly abundant. FimA is the major subunit of the 100-300 fimbria (pili) of E. coli (BNID 101473). Every pilus has about 1000 copies (BNID 100107) and thus a simple estimate leads us to expect hundreds of thousands of this repeating monomer on the outside of the cell.

As noted above, protein content varies based on growth conditions and gene induction. For example, LacZ, the gene responsible for breaking lactose into glucose and galactose is usually repressed and the protein has only a small number of copies (10 to 20, BNID 106200), but under full induction was characterized to have a concentration of 50uM (BNID 100735), i.e. about 100,000 copies per cell. In summary, though different measurement methods can vary significantly even under similar conditions the overall picture of the most abundant proteins in E.coli is generally consistent.

As usual, it is interesting to contrast what has been discovered in bacteria with similar experiments in eukaryotic microorganisms. In yeast, an overall estimate of ≈50,000,000 proteins per cell was reported (BNID 106198). Measurements based on a TAP tag (BNID 101845 Ghaemmaghami 2003) report that out of this huge store of proteins, only three are found with over a million copies per cell. These are a cell wall protein (YKL096W-a), the Plasma membrane H+-ATPase (YGL008C), that pumps protons out of the cell and Fructose 1,6-bisphosphate aldolase (YKL060C), essential for glycolysis and gluconeogenesis. Different reports on the abundance of proteins in glycolysis, an intensely studied model system, led to an overall estimate of ≈25% of total protein content (BNID 101928). Like with E. coli, in yeast as well, new high-throughput MS data is becoming available (BNID 104245, 104188). Table 1 shows the top 10 most abundant yeast proteins in rich as well as minimal media. In rich media, the proteins with highest abundance are mostly glycolytic. In minimal media the most abundant proteins are still of unclear function, which further highlights our limited knowledge on these most elementary questions to date.

Why are people going to all the trouble of carrying out these increasingly refined censuses of some of the most favored model organisms? Many of the biochemical and regulatory pathways that make up the life of a cell have been or are now being mapped with exquisite detail and many of the nodes have essential roles. But a wiring diagram does not a cell make. To really

understand the relative rates of the various components of these pathways, we need to know about the abundances of the various proteins and their substrates. Further, if one is interested in assessing the biosynthetic burden of these various molecular players, the actual abundance is critical. Similarly, the many binding reactions that are the basis for much of the busy

biochemical activity of cells, whether specific binding of intentional partners or spurious nonspecific binding between unnatural partners is ultimately dictated by molecular counts. Finally, there is a growing appreciation of the constraints that are inflicted on the cell as a result of noise in copy numbers. For understanding and predicting such effects it is vital to know if one is dealing with tens of thousands of copies per cell or only tens of copies per cell, as turns out to often be the case in unicellular organisms. In these small-numbers limits, fluctuations are a fact of life and both we and the cell must account for them.

(3)

Figure 1: Estimate of the fraction of Rubisco proteins of total protein content in a leaf cell.

(4)

Table 1-2: Most abundant proteins in prokaryotes and eukaryotes. Several methods using mass spec (APEX, Lu et al., 2007 PMID 17187058), using a yellow fluorescent protein fusion library (Taniguchi et al., 2010 PMID 20671182), creation of a yeast fusion library where each open reading frame is tagged with a high-affinity epitope and expressed from its natural chromosomal location (Ghaemmaghami et al., 2003 PMID 14562106 ) and mass spectrometry data of mouse fibroblast cells (Schwanha¨usser et al., 2011 PMID 21593866). Gene annotation: Yeast -SGD, E. coli – Ecoliwiki, mouse-Uniprot. Color code: yellow – translation, cyan – glycolysis, green – chaperones. The sum is based on adding together all the absolute values reported in each study.

Protein

rank

E. coli

–

minim

al

media,

Nat

Biotec

hnol,

Lu

2007

(total

of 2-

3×10

6

protein

s/cell,

sum of

protein

s in

referen

ce is

2,500,0

00)

E. coli

– M9

minimal media,

Science,

Taniguchi 2010

(sum of proteins

in reference is

95,000)

B. subtilis

–

minimal

medium

during

exponentia

l growth,

Analytical

Chemistry,

Maass

2011 (sum

of proteins

in

reference is

2,300,000)

S. aureus

–synthetic

medium

during

exponenti

al growth,

Anal

Chem,

Maass

2011 (sum

of

proteins

in

reference

is

350,000)

Leptospira interrogans

–

EMJH (

Ellinghausen-

McCullough-Johnson-Harris) medium,

Malmström 2009 (sum

of proteins in reference

is 820,000)

1 RplL,

4.4 %

,

50S

riboso

mal

subunit

(**)

CspC, 8.3

%

,

stress protein

(**)

TufA, 4.3

%

,

Elongation

factor Tu

(****)

Asp23,

7.1 %,

Alkaline

shock

protein 23

()**

LipL32,

4.6 %

,

external

encapsulating structure

2 TufB,

3.5 %

,

EF-Tu,

Elongat

ion

Factor-Transla

tion

(*****

***)

TufA, 3.6

%

,

protein chain

elongation factor

EF-Tu

(****)

CspD,

4.0 %,

Cold shock

protein

CspD ()**

SodA,

6.9 %,

Superoxid

e

dismutase

[Mn/Fe] 1

(*)**

Peptidoglycan

associated cytoplasmic

membrane, 3.7

%

,

external encapsulating

structure

(5)

3 AcpP

3.0 %

,

acyl

carrier

protein

(ACP)

RpsV, 3.3

%

, 30S

ribosomal

subunit

IlvC, 3.3

%

,

Ketol-acid

reductoiso

merase

()**

CspA,

4.3 %, Cold

shock

protein

()**

60 kDa chaperonin

(Protein Cpn60) (groEL

protein) (Heat shock 58

kDa protein), 2.2

%

,

nucleotide binding

4 GroS,

2.6 %

,

10 kDa

chaper

onin

()

CspE, 3.2

%

,

DNA-binding

transcriptional

repressor

AhpC,

3.0 %, Alkyl

hydroperox

ide

reductase

subunit C

(*)**

Tuf,

3.7 %,

Elongatio

n factor

Tu

(********

)

Elongation factor Tu

(EF-Tu), 1.7

%

, hydrolase

activity

(****)

5 GapA,

2.0 %

,

glycera

ldehyd

e

3-phosph

ate

dehydr

ogenas

e-A

(****)

DnaK, 2.5

%

,

chaperone

Hsp70

YfmK,

2.5 %,

Uncharacte

rized

N-acetyltrans

ferase ()**

RplL,

2.9 %,

50S

ribosomal

protein

L7/L12

()**

LipL36,

1.7 %

, external

encapsulating structure

6 MetE,

1.6 %

,

Methio

nine

synthas

e (**)

GapA, 2.5

%

,

glyceraldehyde-3-phosphate

dehydrogenase

A (****)

YheA,

2.0 %,

UPF0342

protein

()**

GapA1,

2.8 %,

Glyceralde

hyde-3-phosphate

dehydrog

enase 1

(****)

Flagellin protein,

1.7 %

,

flagellum

7 CspC,

1.6 %

,

stress

protein

(**)

TufB, 2.3

%

,

protein chain

elongation factor

EF-Tu

(****)

Icd,

1.8 %,

Isocitrate

dehydroge

nase

[NADP]

participates

in mapk

signaling

pathway

()**

Eno,

2.1 %,

Enolase

(*)**

Electron transfer

flavoprotein

alpha-subunit,

1.5 %

,

nucleotide binding

8 RplW,

1.5 %

,

50S

riboso

mal

subunit

Rho, 2.3

%

,

transcription

termination

factor

GroS,

1.8 %,

10 kDa

chaperonin

()

(no name,

locus

SACOL2595

)

,

1.8 %,

Putative

uncharact

erized

protein

transcriptional regulator

(ArsR family),

1.5 %

,

transcription factor &

regulators

(6)

Protein

rank

S. cerevisiae

-

rich media,

Nat

Biotechnol, Lu

2007

(total of 5×10

7

proteins/cell

according to

primary

source, sum

of proteins in

reference is

also

50,000,000)

S. cerevisiae

– minimal

media, Nat

Biotechnol,

Lu 2007

(total of

5×10

7

proteins/c

ell

according

to primary

source,

sum of

proteins in

reference

is also

50,000,000

)

S. cerevisiae

–

rich media,

Nature,

Ghaemmagha

mi 2003 (sum

of proteins in

reference is

47,000,000)

M. musculus

(

NIH3T3 cells

)-light (L) SILAC

medium,

Nature,

Schwanha¨usser

et al., 2011

(sum

of proteins in

reference is

570,000,000)

1 ENO2, 6.2

%

,

Enolase II

ABM1,

4.6 %

,

unknown

function,

required

for normal

microtubul

e

organizatio

n

CWP2, 3.4

%

,

Cell Wall

Protein

ACTB,

2.8 %,

Actin,

cytoplasmic 1

(*)**

9 RpsP,

1.2 %

,

30S

riboso

mal

subunit

GroS, 2.2

%

,

10 kDa chaperonin

()

SodA,

1.6 %,

Superoxide

dismutase

[Mn] (*)**

(no name,

locus

SACOL0427

)

,

1.7 %,

Putative

uncharact

erized

protein

LipL41,

1.3 %

,

external

encapsulating structure

10 Mdh,

1.2 %

,

Compo

nent of

malate

dehydr

ogenas

e

GlyA, 1.7

%

,

serine

hydroxymethyltr

ansferase

TrxA,

1.5 %,

Thioredoxi

n ()**

AhpC,

1.4 %,

Alkyl

hydropero

xide

reductase

subunit C

(*)**

LipL21,

1.1 %

,

external

encapsulating structure

(7)

2 FBA1, 4.0

%

,

Fructose

1,6-bisphosphate

aldolase (***)

YMR181C,

4.2 %

,

unknown

function

PMA1, 2.8

%

,

Plasma

Membrane

ATPase

HIST1H4A,

2.6 %,

Histone H4

3 TDH3, 4.0

%

,

Glyceraldehyd

e-3-phosphate

dehydrogenas

e

YLR407W,

4.2 %

,

unknown

function

FBA1, 2.1

%

,

Fructose

1,6-bisphosphate

aldolase (***)

HIST1H2AF,

2.6 %,

Histone

H2A type 1-F

4 PGK1, 3.8

%

,

3-phosphoglycer

ate kinase

ORT1,

3.0 %

,

Ornithine

transporter

of the

mitochond

rial inner

membrane

ILV5, 1.9

%

,

IsoLeucine-plus-Valine

requiring

HIST2H2BB,

1.9 %,

Histone

H2B type 2-B

5 ENO1, 3.6

%

,

Enolase I (***)

YMR115W

(SGD

name:

MGR3),

2.6 %

,

Subunit of

the

mitochond

rial (mt)

i-AAA

protease

supercomp

lex

YEF3, 1.9

%

,

Yeast

Elongation

Factor

(translation)

HIST1H3B,

1.5 %,

Histone H3.2

6 PDC1, 2.6

%

,

Major of three

pyruvate

decarboxylase

isozymes

YIL077C,

2.2 %

,

unknown

function

HHF2, 1.4

%

,

Histone H Four

EEF1A1,

0.93 %,

Elongation factor

1-alpha 1

(translation)

7 ADH1, 2.6

%

,

Alcohol

dehydrogenas

e

YDR193W,

2.0 %

,

Dubious

open

reading

frame

RPP2B, 1.4

%

,

Ribosomal

Protein P2 Beta

RPS27A,

0.9%

,

Ubiquitin-40S

ribosomal

protein S27a

8 TEF2, 2.4

%

,

Translational

elongation

factor EF-1

alpha

DOA1,

1.8 %

, WD

repeat

protein

HHF1, 1.1

%

,

Histone H Four

S100A4

,

0.75 %,

Protein S100-A4

(similar to

Glyceraldehyde-3-phosphate

dehydrogenase

(GAPDH) isoform

1)

(8)

9 TDH2, 1.9

%

,

Glyceraldehyd

e-3-phosphate

dehydrogenas

e

CCZ1,

1.6 %

,

Protein

involved in

vacuolar

assembly

SOD1, 1.1

%

,

SuperOxide

Dismutase

TUBB5,

0.75 %,

Tubulin beta-5

chain

10 CDC19, 1.8

%

,

Pyruvate

kinase

RPS26A,

1.5 %

, small

(40S)

ribosomal

subunit

RPS26B, 1.1

%

,

Ribosomal

Protein of the

Small subunit

ANXA2,

0.67 %,

Annexin A2

(9)