What are the most abundant proteins in a cell?
Even after reading several textbooks on proteins, one may still be left wondering which of these critical molecular players in the life of a cell are the most quantitatively abundant. Though figuring this out by pure thought alone is generally not easy, cells in the leaves of plants are that rare case in which it is relatively easy to make an estimate. The carbon-fixing enzyme Rubisco, the molecular gatekeeper between the inorganic and the organic worlds is required at extremely high concentrations. Let’s see why. As schematically depicted in Figure 1, the photon flux under full illumination is about 2000 microEinstein/m2-s. About 10-30% of this flux is maximally utilized and beyond that there is saturation of the photosynthetic apparatus. About every 10 photons supply enough energy to fix one carbon atom. Rubisco works at a sluggish maximal rate of ≈1-3 per sec per catalytic site. From this alone, we can see that the cell thus needs ≈0.3-3x107 Rubisco molecules per micron2 cross section. A Rubisco monomer has a mass of 60kDalton (BNID 105007) and so the weight per micron2 is ≈0.3-3x10-12 g. Let’s estimate the total protein content in leaf. A characteristic leaf has a height of about 200 micron. ≈80% of the volume is vacuoles (BNID 103442) and the dry mass will be ≈30% of this volume with proteins consisting about half, so we arrive at about 6x10-12 g of protein per cell as derived in Figure 1. We conclude that about 5-50% of the protein mass is Rubisco. Indeed, the experimental determinations in C3 plants such as wheat, potato and tobacco find that Rubisco constitutes in the range of 25-60% of all soluble proteins in such cells (BNID 101762).
The protein census for other organisms, even model microorganisms, is more complicated. In the late 1970s, a unique catalog of the quantities of 140 proteins under different growth rates in E. coli was created using 2D gel electrophoresis and 14C labeling (Pedersen et al, Cell 1978 BNID 106195). Newer methods have recently enabled extensive protein wide surveys of protein content using mass spectrometry (BNID xxx), TAP labeling (Ghaemmaghami 2003, BNID 101845) and fluorescent light microscopy (Taniguchi et al., 2010, BNID xxx). A new database (http://pax-db.org/) has been created to collect such data on protein abundances across organisms. The picture emerging from these kinds of experiments shows several prominent players. First, not surprisingly, ribosomal proteins and their ancillary components are highly abundant. The elongation factor EF-TU, responsible for mediating the entrance of the tRNA to the free site of the ribosome, was characterized as the most abundant protein in the original 1978 catalog with a copy number of ~58,000 proteins per bacterial genome. This absolute molecular count can be repackaged in concentration units and is roughly equivalent to 100 μM (BNID 104733). Recall that under different growth conditions the cell size and thus total protein content can change several fold (see, for example, the vignette on yeast size) and this media dependence to the protein census is especially important for ribosomal proteins.
Another contender for the title of most abundant protein is ACP, the Acyl carrier protein, which plays an important role in fatty acid biosynthesis. This protein carries fatty acid chains as the chains are elongated. It is claimed to be the most abundant protein in E. coli, with about 60,000 molecules per cell (BNID 106194). In a recent high throughput mass spectrometry measurement on minimal medium (Lu, 2007 BNID 104246), a value of ≈76,000 was reported making it the third most abundant protein reported. Table 1 gives a rank ordering of some of the most ubiquitous proteins found in E. coli, though it should be noted that there are inconsistencies between the different experimental approaches that have not yet been fully settled. The most
abundant protein found in this particular survey of E. coli is RplL, a ribosomal protein (estimated at ≈109,000 copies per cell, and reported (Subrananlan, 1975) to be in 4 copies per ribosome in contrast to other ribosomal proteins which have one copy per ribosome) and TufB (the elongation factor also known as EF-TU, estimated at ≈87,000 copies per cell). The next most abundant reported proteins are GroS (MopB, 65,000), a component of the chaperone system Gro-EL-Gro-ES necessary for proper folding of many proteins and GapA (49,000), a key enzyme in glycolysis.
Structural proteins can also be highly abundant. FimA is the major subunit of the 100-300 fimbria (pili) of E. coli (BNID 101473). Every pilus has about 1000 copies (BNID 100107) and thus a simple estimate leads us to expect hundreds of thousands of this repeating monomer on the outside of the cell.
As noted above, protein content varies based on growth conditions and gene induction. For example, LacZ, the gene responsible for breaking lactose into glucose and galactose is usually repressed and the protein has only a small number of copies (10 to 20, BNID 106200), but under full induction was characterized to have a concentration of 50uM (BNID 100735), i.e. about 100,000 copies per cell. In summary, though different measurement methods can vary significantly even under similar conditions the overall picture of the most abundant proteins in E.coli is generally consistent.
As usual, it is interesting to contrast what has been discovered in bacteria with similar experiments in eukaryotic microorganisms. In yeast, an overall estimate of ≈50,000,000 proteins per cell was reported (BNID 106198). Measurements based on a TAP tag (BNID 101845 Ghaemmaghami 2003) report that out of this huge store of proteins, only three are found with over a million copies per cell. These are a cell wall protein (YKL096W-a), the Plasma membrane H+-ATPase (YGL008C), that pumps protons out of the cell and Fructose 1,6-bisphosphate aldolase (YKL060C), essential for glycolysis and gluconeogenesis. Different reports on the abundance of proteins in glycolysis, an intensely studied model system, led to an overall estimate of ≈25% of total protein content (BNID 101928). Like with E. coli, in yeast as well, new high-throughput MS data is becoming available (BNID 104245, 104188). Table 1 shows the top 10 most abundant yeast proteins in rich as well as minimal media. In rich media, the proteins with highest abundance are mostly glycolytic. In minimal media the most abundant proteins are still of unclear function, which further highlights our limited knowledge on these most elementary questions to date.
Why are people going to all the trouble of carrying out these increasingly refined censuses of some of the most favored model organisms? Many of the biochemical and regulatory pathways that make up the life of a cell have been or are now being mapped with exquisite detail and many of the nodes have essential roles. But a wiring diagram does not a cell make. To really
understand the relative rates of the various components of these pathways, we need to know about the abundances of the various proteins and their substrates. Further, if one is interested in assessing the biosynthetic burden of these various molecular players, the actual abundance is critical. Similarly, the many binding reactions that are the basis for much of the busy
biochemical activity of cells, whether specific binding of intentional partners or spurious nonspecific binding between unnatural partners is ultimately dictated by molecular counts. Finally, there is a growing appreciation of the constraints that are inflicted on the cell as a result of noise in copy numbers. For understanding and predicting such effects it is vital to know if one is dealing with tens of thousands of copies per cell or only tens of copies per cell, as turns out to often be the case in unicellular organisms. In these small-numbers limits, fluctuations are a fact of life and both we and the cell must account for them.
Figure 1: Estimate of the fraction of Rubisco proteins of total protein content in a leaf cell.
Table 1-2: Most abundant proteins in prokaryotes and eukaryotes. Several methods using mass spec (APEX, Lu et al., 2007 PMID 17187058), using a yellow fluorescent protein fusion library (Taniguchi et al., 2010 PMID 20671182), creation of a yeast fusion library where each open reading frame is tagged with a high-affinity epitope and expressed from its natural chromosomal location (Ghaemmaghami et al., 2003 PMID 14562106 ) and mass spectrometry data of mouse fibroblast cells (Schwanha¨usser et al., 2011 PMID 21593866). Gene annotation: Yeast -SGD, E. coli – Ecoliwiki, mouse-Uniprot. Color code: yellow – translation, cyan – glycolysis, green – chaperones. The sum is based on adding together all the absolute values reported in each study.