Omics and Arrays
3.1 Molecular Techniques in Omics Developments in molecular techniques have
contributed to the various fields of ‘omics’, which include genomics, transcriptomics, proteomics, metabalomics and phenomics.
These underlying developments include advanced gel, hybridization and expression systems, cell imaging by light and electron microscopy, high density microarrays and array experiments, and genetic readout experiments.
Using proteomics as an example, clas-sical techniques used in proteomics involve the use of two-dimensional gel
electrophore-sis (2DE). The proteins can be identified by excising the spot from the gel, digesting the polypeptide into smaller peptide frag-ments with specific proteases, and sequenc-ing the peptides directly or analyssequenc-ing them by mass spectrometry (MS). Although this method is still useful and widely used, it is limited in sensitivity, resolution, and the range of abundance of the different proteins in the sample (Zhu et al., 2003; Baginsky and Gruissem, 2004). For example, abun-dant proteins in the sample dominate the gel whereas less abundant proteins might not be visible. New approaches involve both improved separation methods and advanced detection equipment, and several other new technologies are available for use in proteomic research (Kersten et al., 2002;
Zhu et al., 2003; De Hoog and Mann, 2004).
New detection methods and proteomic technologies are also being developed in an array format, which is increasingly being focused on protein–protein interactions, post-transcriptional modification, and elucidation of three-dimensional protein structure.
3.1.1 2-Dimensional gel electrophoresis
2DE is a form of gel electrophoresis com-monly used to analyse proteins. Mixtures of
0 20
10 9
Molecular weight
8 (a) p l
(b) Peptide chromatography and ESI
(c) MS (d) MS/MS
200 LLEAAAQSTK 516.27 (2+)
y7 a2
b2 y3
y4 y5 y6
y9 S Q A A E L L
y8 100
0 q1 q2
400
200 516.27 (2+)
400 600
m/z
800 200 600
m/z
1000 Intensity (arbitrary units) 0
7
Trypsin
Peptides
12 14
Time Separate peptides
16
6 5 4 3
40 60 80 100
Fig. 3.1. Standard protein analysis by two-dimensional electrophoresis followed by mass spectrometry proteomics. (a) Protein is separated by two-dimensional electrophoresis: in one dimension by
isoelectronic point (pI) and in the second dimension by mass (molecular weight). Individual peptides are obtained using trypsin to cleave peptide chains. (b) Peptides are separated by chromatography and then peptides are ionized using electospray ionization (ESI): they pass through the first quadrupole (q1) and collision chamber (q2). (c) Individual ions are separated based on their mass-to-charge (m/z) by a mass analyser. (d) From the MS spectrum, an individual peptide ion (516.27 (2+)) is selected for MS/MS analysis to produce peptide ion fragmentation patterns. Letters S, Q, A, A, E, L and L represent amino acids in the selected peptide and ‘a2’, b2’, ‘y3’, etc. represent different ions.
proteins are separated by two properties in two dimensions in 2DE. During the early years of proteomics and until recently, profiling of protein expression relied primarily on the use of two-dimensional polyacrylmide gel electrophoresis (2D PAGE), which was later combined with MS. The basic procedure is to solubilize the protein contents of an entire cell popu-lation, tissue or biological fluid, followed by separation of the protein components in the lysate using 2DE and visualization of the separated proteins with silver stain-ing. This approach allows only a limited display of the total protein content and can identify only the relatively abundant proteins.
2DE begins with one-dimensional electrophoresis and then separates the molecules by a second property in a direc-tion at 90° to the first. In this technique
proteins are separated in one dimension by isoelectric point and in the second dimen-sion by mass. In one-dimendimen-sional electro-phoresis, proteins (or other molecules) are separated in one dimension, so that all the proteins/molecules in one lane will be separated from one another according to the differences in a particular property (e.g. isoelectric point) between each com-ponent. The result is a gel with proteins separated out on its surface (Fig. 3.1a).
The proteins can then be visualized by a variety of staining methods, the most com-monly used stains are silver nitrate and Coomassie blue. By combining electro-phoresis with MS, individual proteins can be profiled (Fig. 3.1b, c) and theoretical and acquired MS profiles can be matched by a database search.
An important development in 2D PAGE is the use of immobilized pH gradients
(IPGs) in which a pH gradient is fixed within the acrylamide matrix (Gorg et al., 1999). Because a wide or narrow pH range can be fixed within the gel, IPGs can be used to detect thousands of spots on a single gel with high reproducibility.
A variation on this theme is the use of so-called ‘zoom gels’ in which the protein content of an individual sample is first fractionated into narrow pH ranges under low resolution and then each fraction is subjected to high-resolution separation by 2D PAGE. Another innovation in 2DE is dif-ferential in-gel electrophoresis (DIGE; Ünlü et al., 1997) in which two pools of proteins are labelled with different fluorescent dyes.
The labelled proteins are mixed and sepa-rated in the same 2DE.
Some of the main challenges facing expression proteomics, be it using 2D PAGE or any other approach, include the great dynamic range of protein abundance and a wide range of protein properties including mass, isoelectric point, extent of hyropho-bicity and post-translational modifications (Hanash, 2003). Reducing sample com-plexity prior to analysis, for example by analysing protein subsets and subcellular organelles separately, improves the reach of 2DE or other separation techniques for the quantitative analysis of low-abundance proteins. The isolation of sub-proteome components may be combined with protein tagging to further enhance sensitivity. For example, protein tagging technologies have been implemented for the comprehensive analysis of the cell-surface proteome (Shin et al., 2003).
Even with all the improvements that could be introduced, 2DE will probably remain a rather low-throughput approach that requires a relatively large amount of sample for analysis. The latter is particu-larly problematic when the samples to be analysed are of limited availability (Hanash, 2003). In particular, the use of laser-capture microdissection, which allows defined cell types to be isolated from tissues, yields a very small amount of protein that is dif-ficult to reconcile with the large amounts needed for 2DE.
3.1.2 Mass spectrometry
MS is an analytical technique used to deter-mine the composition of a physical sample by measuring the mass-to-charge ratio of the ions. It has become the method of choice for analysis of complex protein samples (Han et al., 2008). MS-based proteomics has estab-lished itself as an indispensable technology for interpreting the information encoded in genomes; this has been made possible by technical and conceptual advances in many areas, most notably the discovery and devel-opment of protein ionization methods as rec-ognized by the award of the Nobel prize for chemistry to John B. Fenn and Koichi Tanaka in 2002. Mass spectrometry instrumentation has made strides in recent years in terms of dynamic range and sensitivity (Blow, 2008).
Mass spectrometric measurements are carried out in the gas phase on ionized analytes. Mass spectrometers consist of three essential parts; the first, an ionization source, converts molecules into gas-phase ions. Once ions are created, individual ions are separated based on their mass-to-charge ratio (m/z) by a second device, a mass ana-lyser, and transferred by magnetic or electric fields to the third, an ion detector (Fig. 3.1b, c and d). The mass analyser is central to the technology. It uses a physical property to separate ions of a particular m/z value that then strike the ion detector. The magnitude of the current that is produced at the detec-tor as a function of time (i.e. the physical field in the mass analyser is changed as a function of time) is used to determine the m/z value of the ion. In the context of pro-teomics, its key parameters are sensitivity, resolution, mass accuracy and the ability to generate information-rich ion mass spectra from peptide fragments. The technique has several applications, including identifying unknown compounds by the mass of the compound molecules or their fragments, determining the isotopic composition of an element and its structure by observing the fragmentation, quantifying the amount of a compound in a sample using carefully designed methods and studying the funda-mentals of gas phase ion chemistry.
There are many types of mass analys-ers which use static or dynamic fields and magnetic or electric fields. Each analyser type has its strengths and weaknesses. Four basic types of mass analyser used in pro-teomic research are: ion trap, time-of-flight (TOF), quadrupole and Fourier transform mass spectrometry (FT-MS) analyser. In ion-trap analysers, the ions are first captured or trapped for a certain time interval and are then subjected to MS or tandem MS (MS/
MS) analysis. Ion traps are robust, sensitive and relatively inexpensive. A disadvantage is their relatively low mass accuracy, due in part to the limited number of ions that can be accumulated at their point-like centre before space-charging distorts their distribu-tion and thus the accuracy of the mass meas-urement. The linear or two-dimensional ion trap is a recent development where ions are stored in a cylindrical volume that is considerably larger than that of the tradi-tional, three-dimensional ion traps, allow-ing increased sensitivity, resolution and mass accuracy. The FT-MS instrument is also a trapping mass spectrometer, although it captures the ions under high vacuum in a high magnetic field. It measures mass by detecting the image current produced by ions cyclotroning in the presence of a mag-netic field. Its strengths are high sensitiv-ity, mass accuracy, resolution and dynamic range. In spite of the enormous potential, the expense, operational complexity and low-peptide-fragmentation efficiency of FT-MS instruments has limited their rou-tine use in proteomic research (Aebersold and Mann, 2003). The TOF analyser uses an electric field to accelerate the ions through the same potential and then measures the time they take to reach the detector.
Techniques for ionization have been key to determining what types of samples can be analysed by MS. Electrospray ionization (ESI; Fenn et al., 1989) and matrix-assisted laser desorption/ionization (MALDI; Karas and Hillenkamp, 1988) are two techniques most commonly used to volatize and ion-ize proteins or peptides for MS analysis while inductively coupled plasma sources are used primarily for metal analysis on a wide array of sample types. MALDI is
usu-ally coupled to TOF analysers that measure the mass of intact peptides, whereas ESI has mostly been coupled to ion traps and triple quatrupole instruments and used to generate fragment ion spectra (collision-induced spectra) of selected precursor ions (Aebersold and Goodlett, 2001). ESI creates ions by application of a potential to a flow-ing liquid causflow-ing the liquid to charge and subsequently spray. The electrospray creates very small droplets of solvent- containing analyte. Solvent is removed by heat or some other form of energy (e.g. energetic collisions with a gas) as the droplets enter the mass spectrometer and multiply-charged ions are formed in the process. ESI ionizes the ana-lytes out of a solution and is therefore read-ily coupled to liquid-based (for example, chromatographic and electrophoretic) sepa-ration tools (Fig. 3.1). MALDI creates ions by excitation of molecules that are isolated from the energy of the laser by an energy-absorbing matrix. The laser energy strikes the crystalline matrix to cause rapid excita-tion of the matrix and subsequent ejecexcita-tion of matrix and analyte ions into the gas phase.
MALDI-MS is normally used to analyse relatively simple peptide mixtures in cases where integrated liquid- chromatography ESI-MS systems (LC-MS) are preferred for the analysis of complex samples.
Key developments leading to improved detection of proteins include TOF MS and relatively non-destructive methods for con-verting proteins into volatile ions (Zhu et al., 2003). MALDI and ESI have made it possible to analyse large molecules such as peptides and proteins. Although MALDI-TOF MS is a relative high-throughput method compared with ESI, the latter is more easily coupled with separation techniques such as LC or high pressure LC (HPLC) (Zhu et al., 2003).
This has provided an attractive alternative to 2DE, because even low-abundance pro-teins and insoluble transmembrane propro-teins can be detected (Ferro et al., 2002; Koller et al., 2002). Other MS techniques include gas chromatography–mass spectrometry (GC-MS), and ion mobility spectrometry/
mass spectrometry (IMS/MS). All MS-based techniques require a substantial and search-able database of predicted proteins, ideally
representing the entire genome. Protein identification is possible by comparing the deduced masses of the resolved peptide fragments with the theoretical masses of predicted peptides in the database.
Mass spectrometers are restricted in the number of ions that can be detected at any point in time. Pre-fractionation of proteins on the basis of isolation of specific cell types or subcellular organelles is often necessary to reduce the complexity (Lonosky et al., 2004). Another method of fractionating a complex sample is to introduce a chromato-graphic technique before MS analysis. This method, referred to as multidimensional protein identification technology (MudPIT) (Whitelegge, 2002) has been used to conduct a shotgun survey of metabolic pathways in the leaves, roots and developing seeds of rice (Koller et al., 2002). Compared with 2DE-MS, each method identifies unique pro-teins, supporting the complementary nature of the different proteomic technologies.
3.1.3 Yeast two-hybrid system
The yeast two-hybrid assay (Fields and Song, 1989) provides a genetic approach to the identification and analysis of pro-tein–protein interactions. Yeast two-hybrid (Y2H) systems detect not only members of known complexes but also weak or tran-sient interactions (Jansen et al., 2005). The Y2H assay makes use of the molecular organization found in many transcription factors that have a DNA-binding domain and activation domains that can function independently, but when these domains are fused to two proteins that interact, the abil-ity of the domains to control transcriptional activity is reconstituted. In this assay hybrid proteins are generated that fuse a protein X to the DNA-binding domain and protein Y to the activation domain of a transcription factor (Fig. 3.2a). Interaction between X and Y reconstitutes the activity of the tran-scription factor and leads to expression of a reporter gene with a recognition site for the DNA-binding domain. In the typical practice of this method, a protein of interest fused to the DNA-binding domain (the
so-called bait) is screened against a library of activation-domain hybrids (prey) to select interaction partners (Phizicky et al., 2003).
The key advantages of the Y2H assay are its sensitivity and flexibility (Phizicky et al., 2003). The sensitivity derives in part from overproduction of protein in vivo, their designed direction to the nuclear compart-ment where interactions are monitored, the large number of variable inserts of the interacting proteins that can be examined at once, and the potency of the genetic selec-tions. This sensitivity leads to the detection of interactions with dissociation constants around 10−7 M which is in the range of most weak protein interactions found in the cell and is more sensitive than co-purification.
It also allows detection of certain transient interactions that might affect only a subpop-ulation of the hybrid proteins. Flexibility of the assay is provided by calibration to detect interactions of varying affinity by altering the expression levels of the hybrid proteins, the number and nature of the DNA-binding sites and the composition of the selection media.
Some disadvantages of the Y2H assay include the unavoidable occurrence of false negatives and false positives (Phizicky et al., 2003). False negatives include proteins such as membrane proteins and secretory proteins that are not usually amenable to nuclear-based detection systems, proteins that failed to fold correctly and interactions dependent on domains occluded in the fusions or on post-translational modifica-tions. False positives include colonies not resulting from a bona fide protein interac-tion, as well as colonies resulting from a protein interaction not indicative of an association that occurs in vivo.
There are several variations of the Y2H system. In the reverse Y2H system, induced URA3 expression leads to 5-FOA being con-verted into the toxic substance 5- fluorouracil by Ura3p, leading to growth prohibition.
Mutated or fragmented genes are created and then subjected to analysis and only loss-of-interaction mutants are able to grow in the presence of 5-FOA. In the one-hybrid sys-tem, the bait is a target DNA fragment fused to a reporter gene. Preys that are able to bind to the DNA fragment–reporter fusion will
lead to activation of the reporter genes (lacZ, HIS3 and URA3). In the repressed transac-tivator system, the interaction of bait–DNA binding domain fusion proteins and the prey–repressor domain fusion proteins can be detected by repression of the reporter URA3. The interaction of bait and prey ena-bles cells to grow in the presence of 5-FOA, whereas non-interactors are sensitive to 5-FOA as a result of Ura3p production. In the three-hybrid system, the interaction of
bait and prey proteins requires the presence of a third interacting molecule to form a complex. The third interacting molecule can be a protein used with a nuclear localization acting as a bridge between bait and prey to cause transcriptional activation.
Different genome-wide two-hybrid strategies have been used to analyse protein interactions in Saccharomyces cerevisiae.
One approach involved screening a large number of individual proteins against a X
(a)
(b) (c)
X
X
X
X X
X1
Screened against Screened against Screened against Screened against
Screened against
Screened against
Y1
X96
Y96 Y
Y1
Y2
Yn
(d) (e)
••
•
••
•
••
•
••
•
••
•
••
•
Fig. 3.2. Yeast two-hybrid approaches. (a) The yeast two-hybrid system. DNA binding and activation domains (circles) are fused to two proteins X and Y, the interaction of X and Y leads to reporter gene expression (arrow). (b) A standard two-hybrid search. Protein X, present as a DNA binding domain hybrid, is screened against a complex library of random inserts in the activation domain vector (shown in square brackets). (c) A two-hybrid array approach. Protein X is screened against a complete set of full length open reading frames (ORFs) present as activation domain hybrids (shown as yeast transformant spotted on to microtitre plates). (d) A two-hybrid search using a library of full length ORFs. The set of ORFs as activation-domain hybrids (microtitre plates in square brackets) is combined to form a low- complexity library.
(e) A two-hybrid pooling strategy. Pools of ORFs as both DNA-binding domain and activation domain hybrids (in square brackets) are screened against each other. From Phizicky et al. (2003) reprinted by permission from Macmillan Publishers Ltd.
comprehensive library of randomly gen-erated fragments (Fig. 3.2b). A second approach used systematic one-by-one test-ing of every possible protein combination using a mating assay with a comprehensive array of strains (Fig.3.2c). A third approach used a one-by-many matings strategy in which each member of a nearly complete set of strains expressing yeast open read-ing frames (ORFs) as DNA-bindread-ing domain hybrids was mated to a library of strains containing activation-domain fusions of full-length yeast ORFs (Fig. 3.2d). A fourth variation involved mating of defined pools of strain arrays (Fig.3.2e). Suter et al. (2008) reviewed the current applications of Y2H and variant technologies in yeast and mam-malian systems. Y2H methods will continue to play a dominant role in the assessment of protein interactomes.
3.1.4 Serial analysis of gene expression
Serial analysis of gene expression (SAGE) is a method for the comprehensive analysis of gene expression patterns. SAGE is used to produce a snapshot of the mRNA population in a sample of interest (Velculescu et al., 1995). Several variants have since been developed, most notably a more robust ver-sion, LongSAGE (Saha et al., 2002) and the most recent SuperSAGE (Matsumura et al., 2005) that enables very precise annotation of existing genes and discovery of new genes within genomes because of an increased tag-length of 25–27 bp. Three principles underlie the SAGE methodology: (i) a short sequence
Serial analysis of gene expression (SAGE) is a method for the comprehensive analysis of gene expression patterns. SAGE is used to produce a snapshot of the mRNA population in a sample of interest (Velculescu et al., 1995). Several variants have since been developed, most notably a more robust ver-sion, LongSAGE (Saha et al., 2002) and the most recent SuperSAGE (Matsumura et al., 2005) that enables very precise annotation of existing genes and discovery of new genes within genomes because of an increased tag-length of 25–27 bp. Three principles underlie the SAGE methodology: (i) a short sequence