CHAPTER 7 DNA Sequencing

Full text

(1)CHAPTER 7 DNA Sequencing INTRODUCTION For many recombinant DNA experiments, knowledge of a DNA sequence is a prerequisite for its further manipulation. DNA sequencing followed by computer-assisted searching for restriction endonuclease cleavage sites is often the fastest method of obtaining a detailed restriction map (UNITS 3.1-3.3). This information is particularly useful when vectors designed to overexpress proteins or to generate protein fusions are used to subclone a gene of interest (Chapter 16). Computer-assisted identification of protein-coding regions (open reading frames, or ORF) within the DNA sequence, followed by computer-assisted similarity searches of DNA and protein data bases, can lead to important insights about the function and structure of a cloned gene and its product (UNIT 7.7). In addition, the DNA sequence is a prerequisite for a detailed analysis of the 5′ and 3′ noncoding regulatory regions of a gene. DNA sequence information is essential for sitedirected mutagenesis (UNIT 8.1). Small amounts of DNA sequence information (sequence tagged sites, or STS; or expressed sequence tags, or ESTs) are the basis of methods for mapping and ordering large DNA segments cloned into yeast or bacterial artificial chromosomes (CYACs; BACs) or cosmids (Olson et al., 1989; Stephens et al., 1990; Green and Olson, 1990; Adams et al., 1991; Shizuya et al., 1992). EST databases are extremely valuable in gene discovery. DNA sequencing techniques are primarily based on electrophoretic procedures using high-resolution denaturing polyacrylamide gels. These so-called sequencing gels are capable of resolving single-stranded oligonucleotides up to 800 bases in length that differ in size by a single deoxynucleotide. In practice, for a given region to be sequenced, a set of labeled, single-stranded oligonucleotides is generated whose members have one fixed end, and which differ at the other end by each successive deoxynucleotide in the sequence. The key to determining the sequence of deoxynucleotides is to generate, in four separate enzymatic or chemical reactions, all oligonucleotides that terminate at the variable end in. A, T, G, or C. The oligonucleotide products of the four reactions are then resolved on adjacent lanes of a sequencing gel. Because all possible oligodeoxynucleotides are represented among the four lanes, the DNA sequence can be read directly from the four “ladders” of oligonucleotides as shown in Figure 7.0.1. In automated four-color fluorescent DNA sequencers, the oligonucleotide products terminated from each of the four bases (A, C, G, T) are run in a single lane and resolved on the basis that the DNA fragments ending with each of the four bases are labeled with four different fluorescent tags. The practical limit on the amount of information that can be obtained from a set of sequencing reactions is the resolution of the sequencing gel (see UNIT 7.6 for protocols on setting up and running sequencing gels). Current technology allows ∼500 nucleotides of sequence information to be reliably obtained in one set of sequencing reactions, although more information (up to 800 nucleotides) is often obtained using an automated sequencer. Thus, if the region of DNA to be sequenced is <500 nucleotides, a single cloning into the appropriate vector is all that is usually necessary to produce a recombinant molecule that can easily be sequenced in a single set of reactions. For a larger region of DNA, it is generally necessary to break a large fragment into smaller ones that are then individually sequenced. This can be done in a random or an ordered fashion. UNIT 7.1 contains a discussion of strategies for sequencing large regions of DNA. Two protocols for subdividing large regions of DNA are provided in UNIT 7.2. These protocols are used to create a set of ordered, or nested, deletions for DNA sequencing using exonuclease III or Bal 31 nuclease. The two methods that are widely used to determine DNA sequences, the enzymatic dideoxy method and the chemical method, differ primarily in the technique used to generate the ladder of oligonucleotides. In the enzymatic dideoxy sequencing method, a DNA polymerase is utilized to synthesize a labeled, complementary copy of a DNA template. In the. Contributed by Frederick M. Ausubel, Lisa M. Albright, and Jingyue Ju Current Protocols in Molecular Biology (1999) 7.0.1-7.0.15 Copyright © 1999 by John Wiley & Sons, Inc.. DNA Sequencing. 7.0.1 Supplement 47.

(2) drawings of the nucleotides). Thus, overall chain growth is in the 5′→3′ direction. Dideoxy sequencing capitalizes on the ability of the DNA polymerase to use 2′,3′-dideoxynucleoside triphosphates (ddNTPs) as substrates. When a ddNMP is incorporated at the 3′ end of the growing primer chain, chain elongation is terminated at G, A, T, or C because the primer chain now lacks a 3′ hydroxyl group. To generate the four sequencing ladders shown in Figure 7.0.1, only one of the four possible ddNTPs is included in each of the four reactions (see below). The ddNTP/dNTP ratio in each reaction is adjusted such that a portion of the elongating primer chains terminates at each occurrence of the base in the template DNA corresponding to the included complementary ddNTP. In this way, each of the four elongation reactions contains a population of extended primer chains, all of which have a fixed 5′ end determined by the annealed primer and a variable 3′ end terminating at a specific dideoxynucleotide.. chemical sequencing method, a labeled DNA strand is subjected to a set of base-specific chemical reagents. These two techniques are further described below.. DIDEOXY (SANGER) SEQUENCING Sequencing Method The dideoxy or enzymatic method, originally developed by F. Sanger and co-workers (1977, 1980), utilizes a DNA polymerase to synthesize a complementary copy of a singlestranded DNA template. DNA polymerases cannot initiate DNA chains; rather, chain elongation occurs at the 3′ end of a primer DNA that is annealed to “template” DNA (Fig. 7.0.2). The deoxynucleotide added to the growing primer chain is selected by base-pair matching to the template DNA. Chain growth involves the formation of a phosphodiester bridge between the 3′ hydroxyl group at the growing end of the primer and the 5′ phosphate group of the incoming deoxynucleotide (see Fig. A.1.7 for line. fixed end ATTAGACGTCCG "A" reaction. "T" reaction. "G" reaction "C" reaction. ATTAG ATTAGACG ATTAGACGTCCG. A ATTA ATTAGA. ATTAGAC ATTAGACGTC ATTAGACGTCC. AT ATT ATTAGACGT. A. sequencing gel. Introduction. T. G. C ATTAGACGTCCG ATTAGACGTCC ATTAGACGTC ATTAGACGT ATTAGACG ATTAGAC ATTAGA ATTAG ATTA ATT AT A. Figure 7.0.1 General strategy for DNA sequencing. To sequence a fragment of DNA, a set of radiolabeled single-stranded oligonucleotides is generated in four separate reactions. In each of the four reactions, the oligonucleotides have one fixed end and one end that terminates sequentially at each A, T, G, or C, respectively. The products of each reaction are fractionated by electrophoresis on adjacent lanes of a high-resolution polyacrylamide gel. After autoradiography, the DNA sequence can be “read” directly from the gel.. 7.0.2 Supplement 47. Current Protocols in Molecular Biology.

(3) Two protocols for dideoxy sequencing using radiolabeled ddNTPs are provided in UNIT 7.4A. The original dideoxy method, which in this chapter is referred to as the Sanger procedure (Sanger et al., 1977, 1980), was developed for use with the large fragment of E. coli DNA polymerase I known as the Klenow fragment. A synthetic oligonucleotide primer is annealed to the 3′ end of the region to be sequenced on a single-stranded DNA template (Fig. 7.0.2). The annealed template + primer is then divided into four reaction mixtures, each of which contains DNA polymerase, all four deoxynucleoside triphosphates (dNTPs)—one of which is radiolabeled—and one of the four ddNTPs (Fig. 7.0.2, right side). Under these conditions, the primer is extended and labeled until incorporation of a specific dideoxynucleotide causes termination. In a subsequent chase reaction, high concentrations of all four dNTPs are added so that all chains not terminated by a dideoxynucleotide will be elongated into high-molecular-weight DNA that remains unresolved at the top of the sequencing gel. In the “labeling/termination” method, developed for use with modified T7 DNA polymerase (Sequenase; UNIT 3.5; Tabor et al., 1987; Tabor and Richardson, 1987a, 1987b, 1989a, 1989b, 1990), labeling of the primer and termination by incorporation of a dideoxynucleotide occur in two separate reactions (Fig. 7.0.2, left side). After annealing of the primer to the template, the labeling reaction occurs when the primer is elongated and labeled in the presence of a low concentration of all four dNTPs, one of which is radiolabeled. DNA synthesis continues until one or more of the nucleotide pools is exhausted, leading to almost complete incorporation of the labeled nucleotide. The termination step takes place in four separate reactions, each of which contains additional dNTPs and one of the four ddNTPs. In the termination step, a high concentration of dNTPs ensures processive DNA synthesis until the growing chains are terminated by the incorporation of a ddNMP. In the Sanger procedure, the average length of the sequencing products is controlled by the ddNTP/dNTP ratio, where a higher ratio leads to shorter products. In the labeling/termination protocol, the average length of the sequencing products can be modulated either by the concentration of dNTPs in the labeling reaction (a higher concentration leads to longer products) or by the ddNTP/dNTP ratio in the termination reaction.. If Sequenase is used, the labeling/termination method is capable of yielding longer sequencing products, on average, than those obtained using the original Sanger protocol. Therefore, the labeling/termination method is advantageous for obtaining the maximum amount of sequence information per template. For applications where large amounts of sequence information are not needed (such as verifying constructions or sequencing small regions of DNA), the Sanger procedure is usually adequate. The Sanger procedure may be more reliable for obtaining the first few nucleotides of sequence information after the primer. A variety of DNA polymerases are commercially available for sequencing. A description of these polymerases and their appropriate uses can be found in UNIT 7.4A. Thermostable DNA polymerases are the newest class of enzymes available for DNA sequencing. They are useful because they can carry out a sequencing reaction at high temperatures, allowing thermal cycling to increase the yield of sequencing fragments, and therefore enhanced detection sensitivity. This property provides a way of destabilizing secondary structures of the DNA template, which can interfere with the elongation reaction.. Vectors and Templates for Dideoxy Sequencing Dideoxy sequencing requires a singlestranded template to which the primer can anneal. Single-stranded templates can be easily generated using specialized vectors derived from M13, an E. coli filamentous phage that contains a single-stranded circular DNA molecule (UNITS 1.14 & 1.15; Messing, 1983, 1988). Dideoxy sequencing can also be readily carried out using double-stranded DNA with a cycle sequencing procedure involving denaturation of the double-stranded DNA in each cycle (Chen and Seeburg, 1985; Haltiner et al., 1985; Zagursky et al., 1985; Hattori and Sakaki, 1986). Dideoxy sequencing of a doublestranded template is particularly useful when DNA sequencing is the only rapid method available for verifying a particular plasmid construction. For large-scale sequencing projects, the use of a single-stranded DNA vector system, such as the M13mp series, is recommended, because the preparation of high-quality generation of sequencing-quality, singlestranded DNA templates is somewhat more reliable than for plasmid DNA templates. However, plasmids offer the advantage of being able to obtain sequence from both the 5′ and 3′ ends.. DNA Sequencing. 7.0.3 Current Protocols in Molecular Biology. Supplement 47.

(4) region to be sequenced. primer annealing site. AGCTTAGC. 5′. 1. add primer. TGCAATGC. 3′. 3′. ACGTTACG. 5′. 3′. TGCAATGC ACGTTACG. 3′ 5′. AGCTTAGC. 5′. Sanger protocol. Labeling/termination protocol 35. 35. 2 add [ S]dATP, dTTP, dGTP, dCTP + modified T7 DNA polymerase synthesis proceeds until dNTPs are exhausted. 2 add [ S]dATP + Klenow fragment no DNA synthesis. 3 divide into four reactions. 3 divide into four reactions. 4 “A” reaction: add ddATP + 4 dNTPs. 4 “A” reaction: add ddATP + 3 dNTPs Sequencing gel. 3′ ddATCG 3′ ddAATCG. ACGTTACG 5′ 5′. A lane. 4 “T” reaction: ddTTP + 3 dNTPs. 4 “T” reaction: ddTTP + 4 dNTPs. 3′ ddTCG 3′ ddTCGAATCG. 5′ 5′. T lane. 4 “G” reaction: ddGTP + 4 dNTPs. 3′ ddG 3′ ddGAATCG. Introduction. 5 add chase. 4 “G” reaction: ddGTP + 3 dNTPs. 5′ 5′. G lane. 4 “C” reaction: ddCTP + 4 dNTPs. 3′ ddCG 3′ ddCGAATCG. 5 add chase. 5 add chase. 4 “C” reaction: ddCTP + 3 dNTPs. 5′ 5′. C lane. 5 add chase. Figure 7.0.2 Dideoxy sequencing methods. In each method, a single-stranded DNA fragment is annealed to an oligonucleotide primer for polymerization (step 1). In the Sanger protocol (right side), the Klenow fragment and radiolabeled dATP are added (step 2). The reaction is divided into four aliquots (step 3) and the other three dNTPs and either ddATP, ddTTP, ddGTP, or ddCTP are added (step 4). DNA synthesis occurs until terminated by the incorporation of a ddNTP. A “chase” of all four dNTPs (step 5) elongates chains not terminated by a ddNMP into higher-molecular-weight DNA. In the labeling/termination protocol (left side), after the first step, a limiting amount of the four dNTPs—one of which is radiolabeled—and Sequenase are added (step 2). DNA synthesis proceeds until the dNTPs are exhausted. The reaction mix is divided into four aliquots (step 3) and all four dNTPs plus either ddATP, ddTTP, ddGTP, or ddCTP are added (step 4). Synthesis resumes, but termination specifically occurs when a ddNMP is incorporated. In each method, after the reactions are terminated, samples are loaded on adjacent lanes of a sequencing gel.. 7.0.4 Supplement 47. Current Protocols in Molecular Biology.

(5) This feature is very valuable for genomic sequencing using randomly generated DNA fragments (the “shotgun” approach). A more detailed discussion of vectors used for dideoxy sequencing is provided in UNIT 7.1, and protocols for preparation of DNA templates derived from M13, plasmid, and bacteriophage λ vectors are provided in UNIT 7.3. The products of the polymerase chain reaction (PCR) can also be sequenced by the dideoxy method, and several protocols for generating these templates are provided in UNITS 15.2 & 15.5.. Radiolabels for Dideoxy Sequencing Reactions [á−35S]dATP. The dideoxy sequencing protocols in UNIT 7.4A involve radiolabeling nascent DNA chains with [α-35S]dATP rather than with [α-32P]dATP, for the following reasons. First, the low-energy β emissions of 35S result in sharper autoradiographic bands compared to those generated by 32P, allowing more sequence to be read from the upper portion of a gel. Second, the lower-energy emissions of 35S cause fewer breaks in the sugar-phosphate backbone of the DNA. In practice, this means that 35S-labeled reaction products can be stored at −20°C for several weeks without significant degradation; by contrast, 32P products should be electrophoresed within a day. Third, users receive a lower radiation dose with 35S than with 32P. [á-32P]dATP. In contrast to 35S however, 32P offers the advantage of short exposure times and is particularly useful in situations, such as verification of plasmid constructions, where maximizing resolution in the higher region of the sequencing gel is not a priority. [á-33P]dATP. 33P has a maximum β-emission energy that is 50% stronger than 35S, but 5-fold weaker than 32P. Sequences generated using [α-33P]dATP have short exposure times like 32P but band resolution comparable to that of 35S (Zagursky et al., 1991). 5′ end labeling. An alternative to labeling the nascent oligonucleotide with [α-35S]dATP is to use a 5′-end-labeled primer generated by T4 polynucleotide kinase and [γ-32P]ATP or [γ-35S]ATP (UNIT 3.10). Sequencing of large double-stranded DNA templates (such as λgt11) with 5′-end-labeled primers has been found to give better results than standard labeling techniques. Protocols for sequencing using end-labeled primers are provided in UNIT 7.4A.. CHEMICAL (MAXAM-GILBERT) SEQUENCING Sequencing Method In the chemical method of DNA sequencing developed by A. Maxam and W. Gilbert (Maxam and Gilbert, 1977, 1980; Rubin and Schmid, 1980; Ambrose and Pless, 1987), the four sets of deoxyoligonucleotides are generated by subjecting a purified 3′- or 5′-end-labeled deoxyoligonucleotide to a base-specific chemical reagent that randomly cleaves DNA at one or two specific nucleotides. Because only end-labeled fragments are observed following autoradiography of the sequencing gel, four DNA ladders are observed as shown in Figure 7.0.3. The Maxam and Gilbert chemical sequencing method is based on the ability of hydrazine, dimethyl sulfate (DMS), or formic acid to specifically modify bases within the DNA molecule. Piperidine is then added to catalyze strand breakage at these modified nucleotides. The specificity resides in the first reaction with hydrazine, DMS, or formic acid, which react with only a few percent of the bases. The second reaction, piperidine strand cleavage, must be quantitative. The chemical mechanisms of the first reactions are as follows: G: DMS methylates nitrogen 7 of G, which then opens between carbon 8 and nitrogen 9. Piperidine then displaces the modified guanine from its sugar. G+A: Formic acid weakens A and G glycosidic bonds by protonating purine-ring nitrogens. The purines can then be displaced by piperidine. T+C: Hydrazine splits the rings of T and C. The fragments of these bases can then be displaced by piperidine. C: In the presence of NaCl, only C reacts with hydrazine. The modified C can then be displaced with piperidine. In all four reactions, piperidine also catalyzes phosphodiester bond cleavage at the position where the modified base has been displaced by piperidine.. Vectors for Chemical Sequencing Chemical sequencing reactions can be performed on either single- or double-stranded DNA, as long as only one end is labeled. Specialized vectors have been developed (Eckert, 1987; Volckaert et al., 1984; Arnold and Puhler, 1988) that allow unique labeling of only one DNA Sequencing. 7.0.5 Current Protocols in Molecular Biology. Supplement 47.

(6) strand of the cloned target DNA using Tth111I or other restriction enzymes that have asymmetric recognition sites adjacent to the cloned DNA. These vectors are described more fully in UNIT 7.1 and a protocol for their use is provided in UNIT 7.5.. CHOOSING BETWEEN DIDEOXY AND CHEMICAL SEQUENCING METHODS As described above, the dideoxy chain-termination method is based on the ability of DNA polymerase to synthesize DNA 5′→3′ from a defined primer annealed to the vector DNA at a site adjacent to the DNA being sequenced. Each reaction contains one of the four ddNTPs, which terminates synthesis selectively at G, A, T, or C. Dideoxy sequencing is rapid. The primerannealing and sequencing reactions can be completed within 60 to 90 min. A large number of single- or double-stranded samples can be. prepared for sequencing simultaneously. The method also offers excellent band resolution if 35S-labeled nucleoside triphosphates are used to label the DNA. The major disadvantage of dideoxy sequencing is that composition or secondary structure of the template can sometimes cause premature termination by DNA polymerase. Klenow fragment is more prone to this problem than are T7 DNA polymerase or the other alternative polymerases discussed in UNIT 7.4A. Despite these alternative polymerases, DNA is sometimes encountered that cannot be accurately sequenced by the dideoxy method. The second method, chemical cleavage, is based on the ability of various chemicals to cleave DNA with a high specificity. The major advantage of the chemical method is that problems associated with polymerase synthesis of DNA (i.e., premature termination due to DNA sequence or structure) are eliminated, permitting sequencing of stretches of DNA that cannot. 5′ ∗pGpApTpCpGpGpApCpCpT 3′. G reaction. G+A reaction T+C reaction C reaction. G. G+A T+C. C DNA fragment ∗GATCGGACCT ∗GATCGGACC ∗GATCGGAC ∗GATCGGA ∗GATCGG ∗GATCG ∗GATC ∗GAT ∗GA ∗G. Introduction. Figure 7.0.3 Chemical sequencing strategy. The ladder of oligonucleotides after gel electrophoresis of the products from the four chemical cleavage reactions is shown. The asterisk (*) indicates the position in the DNA fragment of the 32P label, which is placed on the 5′ end in this example. The direction of fragment migration is downward; smaller DNA oligonucleotides migrate faster in the sequencing gel than larger oligonucleotides. The shaded bases at the 3′ end of the fragments to the right of the gel indicates bases that have been chemically modified and then displaced from the oligonucleotide during piperidine-mediated strand scission. For example, after a limited reaction with dimethyl sulfate (DMS), which is specific for G′s, followed by quantitative release of the modified G residue by piperidine, a set of oligonucleotides are generated that terminate at the base immediately 5′ of each G in the sequence. In this example, the oligonucleotide products are *pGpApTpCpGp and *pGpApTpCp. Each of these products forms a band in the G lane. For *pG, the product is *p, which would most likely run off the gel, making it difficult to determine the identity of the 5′-terminal base. Because formic acid is specific for purines (G′s and A′s), a fragment that terminates in G or A will produce a band in the G + A lane. Hydrazine in the absence of NaCl cleaves T′s and C′s resulting in a band in the T + C lane. Hydrazine in the presence of NaCl cleaves only C′s; thus, a band is observed in the C lane.. 7.0.6 Supplement 47. Current Protocols in Molecular Biology.

(7) be sequenced by the enzymatic method. In addition, obtaining the sequence of shorter regions of DNA using the chemical method does not require subcloning into an appropriate sequencing vector, such as is required for dideoxy sequencing. Finally, chemical cleavage is the only sequencing method available for small oligonucleotides. Before the development of specialized cloning vectors, such as pSP64CS and pSP65CS, the major disadvantage of the chemical sequencing method was that preparation of the DNA prior to sequencing was very time-consuming. The fragment to be sequenced had to be end labeled and then the end of interest had to be isolated from all other labeled ends. This required several gel electrophoresis steps and often entailed a significant loss of the 32P-labeled DNA fragment. However, vectors such as pSP64CS or pSP65CS allow direct subcloning of the DNA fragment of interest and end-labeling of a single predetermined terminus adjacent to the sequence, and they eliminate the time-consuming gel purification steps otherwise necessary. These vectors also make it possible to sequence a large number of samples simultaneously, because each recombinant plasmid to be sequenced is processed in a systematic manner (UNIT 7.1).. ALTERNATIVES TO RADIOLABELED SEQUENCING REACTIONS Chemiluminescence Chemiluminescence is a newer detection method that is comparable in sensitivity to traditional radiolabeling. Detection of the sequencing products occurs by a chemiluminescent reaction that can be monitored by autoradiography (UNIT 7.4B). A biotinylated primer is used in the dideoxy sequencing reactions. After electrophoresis of the biotinylated sequencing products on a sequencing gel, the products are transferred and cross-linked to a nylon membrane. The sequencing products are detected using streptavidin, biotinylated alkaline phosphatase, and a dioxetane substrate for alkaline phosphatase. The multivalent streptavidin cross-links the biotinylated sequencing product to the biotinylated alkaline phosphatase, immobilizing the phosphatase. Upon dephosphorylation by alkaline phosphatase, the dioxetane substrate emits light that is detected by autoradiography (Beck et al., 1989; Tizard et al., 1990; Creasey et al., 1991; Evans, 1991; Martin et al., 1991). Technology for end labeling DNA. fragments with biotin allows this detection method to also be used in chemical sequencing reactions (Tizard et al., 1990). Alternatively, the dideoxy sequencing reactions can be carried out with a standard, nonbiotinylated primer without radioactivity. After electrophoresis, transfer, and cross-linking to a membrane, the sequencing products are hybridized with a biotinylated probe complementary to the primer before performing the detection described above. This method can be used with the products of chemical sequencing.. Multiplex Sequencing Multiplex sequencing uses hybridization to a specific probe to detect an individual sequencing ladder in a mixture of ladders. In this method, sequencing products derived from a mixture of templates are subjected to electrophoresis on a sequencing gel, transferred to a membrane, and hybridized with a probe specific for one template. After hybridization and detection of the sequencing ladder derived from a single template, the probe is removed at a high temperature and the membrane is rehybridized to a different probe, complementary to an independent set of sequencing products that have been subjected to electrophoresis in the same lanes of the sequencing gel. Thus, the amount of sequence information available from one gel is multiplied by the number of times the membrane can be rehybridized (in practice, up to 20 times). Multiplex sequencing originally used radioactive probes and chemical sequencing technology (Church and Gilbert, 1984; Church and Kiefer-Higgins, 1988) but it is equally well adapted to chemiluminescent detection and/or dideoxy reactions. A set of sequencing vectors is commercially available (Millipore) for multiplex sequencing using the dideoxy method. They have the identical priming region but contain unique sequences between the primer locus and the cloning site for the target DNA. These unique sequences are incorporated into the sequencing reaction products. Thus, sequencing reactions using several templates and the same primer can be performed in one set of reactions and later differentiated by successive hybridizations with probes for each unique sequence.. DEVELOPMENTS IN SEQUENCING TECHNOLOGY Commercial Kits for Sequencing Commercially available kits eliminate the need to assemble and calibrate numerous mixes. DNA Sequencing. 7.0.7 Current Protocols in Molecular Biology. Supplement 47.

(8) and can save a significant amount of startup time, although they are somewhat less flexible and can limit the ability to troubleshoot reactions when necessary. Nevertheless, these kits offer an excellent option for the novice at a reasonable cost. Kits are available for constructing nested deletions (see Table 7.2.1 for suppliers) as well as for dideoxy sequencing reactions using radiolabeling or chemiluminescent detection (Table 7.4.1) and for chemical sequencing using radiolabeling (UNIT 7.5).. Automated Sequencers and Automation of Sequencing Reactions. Introduction. Automated sequencing machines automate the gel electrophoresis step, detection of DNA band pattern, and analysis of bands. Currently, all commercially available automated sequencers are designed for enzymatic sequencing reactions. The four sets of oligonucleotides generated by the sequencing reactions are loaded onto the gel manually and electrophoresis is then controlled automatically. Detection occurs at a point near the bottom edge of the gel in one of two ways. In one method, applicable to either fluorescently or radioactively labeled DNA, the bands of DNA moving sequentially past a detector are recorded. In the second method, the banding pattern of fluorescently labeled DNA is detected using an imaging camera. All automated sequencers possess data-collection capabilities, and either include further analysis programs or provide portability to external data-analysis software programs. Fluorescent label can be incorporated into the sequencing products either through the primer or the ddNTPs. In the simplest system, a single fluorescently labeled primer is used and the four reaction products are run in separate lanes (Ansorge et al., 1987; Brumbaugh et al., 1988; Kambara et al., 1988; Middendorf et al., 1988; Fujita et al., 1990; Rosenthal et al., 1990; Sears et al., 1992). Other systems use either identical primers labeled with four different fluors (Smith et al., 1985, 1986; Connel et al., 1987; Johnston-Dow et al., 1987; Wilson et al., 1988; Mardis and Roe, 1989), or a different fluorescent tag for each ddNTP (Prober et al., 1987; Zagursky and McCormick, 1990). In these latter systems, because the primers or ddNTPs fluoresce at different wavelengths, all four reaction products are run in a single lane and the fragments terminating at the four different bases (A, C, G, T) are identified by four distinct emission colors of the fluorescent tags upon laser excitation. This quadruples the throughput compared to sequencing with radi-. olabels. Figure 7.0.4 shows a four-color sequencing gel image from the automated ABI 377 DNA sequencer. Among the various fluorescence detection methods, four-color DNA sequencing based on the Sanger dideoxy chaintermination method is now accepted as the technique of choice for large-scale sequencing projects. Another promising approach is automation of the sequencing reaction by the use of robotics (Martin et al., 1985; Frank et al., 1988; Wilson et al., 1988; Mardis and Roe, 1989; Zimmerman et al., 1989; D’Cunha et al., 1990; Fujita et al., 1990; Smith et al., 1990). As this technology and that of automated sequencers is further developed, more rapid and less tedious sequence-data acquisition for large projects should be possible. For example, a fully automated sequencing system has been reported for genomic-scale sequencing (Hawkins et al., 1997).. Capillary Array DNA Sequencer The most recent innovation in automated DNA sequencers utilizes capillary electrophoresis (CE; UNIT 2.8) and laser-induced fluorescence detection. CE using gel-filled narrow-bore (10- to 100-µm internal diameter) capillaries provides rapid, high-field, high-resolution separation of DNA fragments without heating artifacts. The use of multiple capillaries coupled with confocal laser–induced fluorescence for DNA sequencing was first reported in 1992 (Huang et al., 1992). Capillary array DNA sequencers (MegaBACE 1000 and ABI 3700) allow automated loading of 96 samples simultaneously as well as automated sample tracking and analysis (Kheterpal and Mathies, 1999). Currently, the separation matrix linear polyacrylamide (LPA) provides the best resolution and longer read-length in CE sequencers. Using LPA on CE sequencers with close to singlebase resolution, read-length of 1000 bp with an electrophoreses time of 1 hr has been reported (Salas-Solano et al., 1998). This is an order-ofmagnitude increase in speed compared to traditional slab gel sequencing. This new generation of automated DNA sequencer coupled with separation matrix development and improved fluorescent tags has the potential to substantially improve the throughput, speed, and overall process of larger sequencing projects.. Thermal Cycle Sequencing Thermal cycle sequencing is a method of dideoxy sequencing in which a small number of template DNA molecules are repetitively. 7.0.8 Supplement 47. Current Protocols in Molecular Biology.

(9) utilized to generate a sequencing ladder. A dideoxy sequencing reaction mixture (consisting of template, primer, dNTPs, ddNTPs, and a thermostable DNA polymerase) is subjected to repeated rounds of denaturation, annealing, and synthesis steps, similar to PCR (Chapter 15), using a commercially available thermal cycling machine. In this manner, linear amplification of the sequencing products occurs, allowing much less template DNA to be used than is usually required. In addition, thermal cycle sequencing eliminates the requirements for a separate annealing reaction preceding the sequencing reaction itself and for denaturing double-stranded DNA templates, and is compatible with automation processes. Various protocols have been developed for thermal cycle sequencing reactions (Applied Biosystems, 1989; Carothers et al., 1989; Murray, 1989; Adams and Blakesley, 1991; Craxton, 1991; Krishnan et al., 1991; Young and Blakesley, 1991; Sears et al., 1992); two such protocols are included in UNIT 7.4A. In addition, several thermal cycle sequencing kits are now commercially available utilizing each of the detection methods described above (Table 7.4A.1).. Energy Transfer Fluorescent Labeling Technology for Four-Color DNA Sequencing Ideally, the fluorophores used for four-color DNA sequencing should have a similar high. molar absorbance at a common excitation wavelength as well as high fluorescence quantum yields, exhibit strong and well-separated fluorescence emissions, and introduce the same relative mobility shift. These criteria cannot be met optimally by the spectroscopic properties of single fluorescent dye molecules, and indeed were poorly satisfied by the fluorescent tags initially used for automated DNA sequencing. By exploiting resonance energy transfer (ET), the constraints imposed by the use of single dyes were overcome and fluorescent tags for DNA sequencing that met the performance criteria set out above were developed (Ju et al., 1995). Figure 7.0.5 shows how the energy transfer principle is utilized to construct ET primers that exhibit much enhanced fluorescence compared to single-dye fluorescent tags. The ET primer is labeled with two fluorophores that are separated by a spacer and coupled by resonance fluorescence energy transfer. The excitation energy that is captured by the fluorophore that absorbs at the shorter wavelength (the “donor”) is transferred to the longer wavelength–absorbing chromophore (the “acceptor”) located some distance away. This transfer results in a loss of donor fluorescence and the appearance of enhanced acceptor fluorescence emission even though the acceptor has only weak absorbance at the excitation wavelength. For resonance energy transfer to take place, the emission spectrum of the donor must. 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 4748. Figure 7.0.4 Gel image of 48 sequencing samples from the four-color ABI 377 automated DNA Sequencer. This black and white facsimile of the figure is intended only as a placeholder; for full-color version of figure go to http://www.currentprotocols.com/colorfigures Current Protocols in Molecular Biology. DNA Sequencing. 7.0.9 Supplement 47.

(10) sequence due to the smaller number of available priming sites (fewer molecules) compared with plasmid templates of the same mass. The higher sensitivity provided by primers/terminators tagged with ET dyes make it possible to sequence these large-template DNAs directly, which is a significant advantage for large-scale sequencing and mapping projects (Marra et al., 1996; Heiner et al., 1998). The sequencing protocol for large DNA templates are provided in UNIT 7.4. overlap the absorption spectrum of the acceptor. The efficiency of energy transfer is proportional to the inverse sixth power of the distance separating the donor and acceptor. Thus, the transfer efficiency and consequently the ratio of acceptor to donor emission can be controlled through changes in the spacing between the donor and acceptor. The ET dye–labeled primers and terminators can be efficiently excited at a common wavelength and exhibit strong and distinct fluorescent emissions (Ju et al., 1996; Lee et al., 1997). They are markedly superior to single dye–labeled primers and terminators for DNA sequencing and PCR fragment analysis. The ability to sequence directly from large-insert clones (>30 kb), such as BAC clones, is very important for closing the gaps in large genomic sequencing and mapping projects. Such large templates are very difficult to. 488 nm. Solid-Phase Sequencing Another recent innovation that is applicable to both manual and automated DNA sequencing is the use of solid-phase capture strategy to generate single-stranded DNA templates (Hultman et al., 1989, 1991; Jones et al., 1991; Kaneoka et al., 1991; Zimmerman et al., 1992).. energy transfer. 605 nm hv. donor. spacer. CAGGAAACAGCTATGACC. ET cassette. desired primer sequence. Relative fluorescence intensity. 5′−. 1.0. acceptor. ETF10F. ETF10G. ETF10T. ETF10R. 0.8 0.6 FAM. 0.4 JOE 0.2 TAM 0. Introduction. –3′. 525. 555 580 Wavelength (nm). ROX 605. Figure 7.0.5 In ET primers, a common donor with a high absorbance at the excitation wavelength harvests energy and transmits it efficiently to acceptor fluorophores that emit in distinct wavelength regions. To avoid fluorescence quenching, the donor and acceptor are separated by a spacer that can be an oligonucleotide or other chemical functionality. The fluorescence emission intensity of current ET primers is 2- to 24-fold higher than that of conventional single dye-labeled primers, leading to high-quality DNA sequencing data.. 7.0.10 Supplement 47. Current Protocols in Molecular Biology.

(11) In this approach, one strand of a doublestranded DNA molecule is biotinylated (e.g., by amplification using PCR in which one of the two primers is biotinylated; Chapter 15). The hemibiotinylated DNA molecule is then bound to streptavidin-ferromagnetic beads. The strands are denatured by treating the beads with alkali and the biotinylated strands are separated from the nonbiotinylated strands using a magnet that traps the bead complex to which the biotinylated strands are bound. Sequencing reactions can be performed using either the biotinylated strand-bead complex or the nonbiotinylated strand preparation as the template. Fluorescent sequencing procedures (both dye primer and dye terminator methods) have disadvantages, most notably false termination and background noise. In the dye primer method, all the extended DNA fragments—including false-terminated fragments—from the primer carry a fluorescent dye, and thus all are detected by the fluorescent sequencer. This causes background noise and results in inaccurate sequencing data. In the dye terminator method, the excess dye-labeled dideoxynucleotides need to be cleaned up completely. Furthermore, if RNAs and nicked DNAs are present in the DNA templates, they will act as primers to generate false termination or high background noise. Thus, a DNA sequencing method that overcomes these disadvantages is desirable. A sequencing chemistry using solid-phasecapturable dideoxynucleotides was recently developed that produces much cleaner sequencing data on both slab gel and capillary array sequencers, eliminating the disadvantages of current dye primer and dye terminator chemistries (Ju et al., 1997; Ju, 1999). The procedure involves coupling fluorescent ET primers that produce high fluorescent signals with solid phase–capturable terminators such as biotinylated dideoxynucleotides. After the sequencing reaction, the extension DNA fragments are captured with magnetic beads coated with streptavidin, while the other components in the sequencing reaction are washed away. Only the pure dideoxynucleotide terminated extension products are released from the magnetic beads and loaded on the sequencing gel, producing high-quality data.. Sequencing with Mass Spectrometry Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDITOF MS; UNITS 10.21 & 10.22) has recently been explored for DNA sequencing (Koster et al.,. 1996; Monforte et al., 1997; Fu et al., 1998). The Sanger dideoxy procedure is used to generate the DNA sequencing fragments and no labels are required. The fragments are mixed with a matrix compound, 3-hydroxypicolinic acid, forming microcrystals on a flat plate. The sample plate is then placed in the vacuum chamber of the mass spectrometer. Upon irradiation by a laser light, a microscopic portion of the matrix molecules desorbs off the plate, entraining the DNA fragments. In an electric field, the charged DNA molecules (ions) are accelerated toward the detector, which measures the mass of the fragments based on their time of flight, which is inversely proportional to their mass. Since the mass of each nucleotide (A, C, G, T) is different, the mass difference between adjacent peaks in the mass spectrum is used to establish the sequence of the DNA templates. For example, if the mass difference between the two adjacent peaks in a mass spectrum of an unknown DNA is the mass of “A” base, then the peak with the higher mass is identified as an “A”. Thus simple computation software is all that is needed to assemble the sequence of the DNA from its mass spectrum. Compared to gel-based sequencing systems, MS produces very high resolution of the sequencing fragments (sometimes as good as 1 Da) and extremely fast separation, on a time scale of microseconds. The high resolution allows accurate detection of mutations and heterozygosity. Another advantage of sequencing with mass spectrometry is that the compressions associated with gel-based systems are completely eliminated. However, in order for accurate measurements of the masses of the sequencing DNA fragments to be obtained, the sample must be free from alkaline and alkaline earth salts. The samples must therefore be desalted before the MS analysis. Solid-phase procedures using either biotinylated primer or immobilized templates are generally used for desalting (Koster et al., 1996; Monforte and Becker, 1997). Both approaches introduce false-terminated DNA fragments into the mass detector. An elegant method to obtain pure sequencing fragments is to use solid phase–capturable dideoxynucleotides—such as biotinylated terminators—in the Sanger reactions to generate sequencing fragments. In this procedure, only the correctly terminated DNA fragments are isolated by streptavidin-coated beads, which are subsequently released and loaded on the mass spectrometer, resulting in accurate sequencing data (Ju et al., 1997; Ju, 1999). The current limit of mass spectrometry for sequenc-. DNA Sequencing. 7.0.11 Current Protocols in Molecular Biology. Supplement 47.

(12) ing is in the 100-bp range. Optimized solidphase sequencing chemistry plus improvement in detector sensitivity for large DNA fragments will further improve mass spectrometry for DNA sequencing.. Sequencing by Hybridization Sequencing by hybridization (SBH) makes use of an array of all possible short oligonucleotides to identify a segment of sequences present in an unknown DNA (Drmanac et al., 1989, 1993). This can be clearly explained by the following example. A pentanucleotide 5′ CAGTA-3′, with a complementary sequence of 5′-TACTG-3′ is the sequence that need to be determined from a pool of all possible trinucleotides (43 = 64). This pentanucleotide will specifically hybridize only with the complementary trinucleotides TAC, ACT, and CTG, revealing the presence of these blocs in the complementary sequence. From this the sequence 5′-TACTG-3′ can be reconstructed. Thus with a library of 8- to 10-mer oligonucleotides, much larger segments of DNA sequences can be established. Computational approaches are then used to assemble the complete sequence. In the current state of the art of this technology, SBH has been used for detecting mutations and for resequencing a genome as well as for detecting polymorphisms (Chee et al., 1996; Drmanac et al., 1998). Robust de novo sequencing has not yet been demonstrated. Potential applications of SBH include physical mapping (ordering) of overlapping DNA clones, sequence checking, DNA fingerprinting comparisons of normal and diseasecausing genes, and identification of DNA fragments with particular sequence motifs in complementary DNA and genomic libraries.. COMPUTER ANALYSIS. Introduction. Once the gels are run and autoradiograms are obtained, computer software is practically indispensable for analysis of the sequence information. Computer software can assist at three stages. First, DNA sequence data can be entered into a computer data base either by “reading” the sequencing gels manually with a digitizer system or by using an automated gel scanner. Second, several software packages are available for detecting overlaps in sequence data and then assembling contiguous DNA sequences (contigs) from individual templates. Third, computer assistance is indispensable for analyzing final sequence data, e.g., in finding open reading frames or finding homologies to other sequences present in the nucleotide (UNIT. 19.2). or protein databases (UNIT 19.3). UNIT 7.7 provides an overview of software and technology currently available.. LITERATURE CITED Adams, M.D., Keller, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xaio, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., Kerlavage, A.R., McCombie, W.R., and Venter, J.C. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252:1651-1656. Adams, S. and Blakesley, R. 1991. Linear amplification sequencing. Focus (BRL) 13:56-57. Ambrose, B. and Pless, R. 1987. DNA sequencing: Chemical methods. Methods Enzymol. 52:522538. Ansorge, W., Sproat, B., Stegemann, J., Schwager, C., and Zenke, M. 1987. Automated DNA sequencing: Ultrasensitive detection of fluorescent bands during electrophoresis. Nucl. Acids Res. 15:4593-4602. Applied Biosystems. 1989 to present. Model 370 and 370A Automated Sequencer User Bulletin. Applied Biosystems, Foster City, Calif. Arnold, W. and Puhler, A. 1988. A family of high copy number plasmid vectors with single end-label sites in rapid nucleotide sequencing. Gene 70:171-179. Beck, S., O’Keefe, T.O., Coull, J.M., and Koster, H. 1989. Chemiluminescent detection of DNA: Application for DNA sequencing and hybridization. Nucl. Acids Res. 17:5115-5123. Brumbaugh, J., Middendorf, L., Grone, D., and Ruth, J. 1988. Continuous on-line DNA sequencing using oligonucleotide primers with multiple fluorophores. Proc. Natl. Acad. Sci. U.S.A. 85:5610-5614. Carothers, A.M., Urlab, G., Mucha, J., Grunburger, D., and Chasin, L.A. 1989. Point mutation analysis in a mammalian gene: Rapid preparation of total RNA, PCR amplification of cDNA and Taq sequencing by a novel method. BioTechniques 7:494-499. Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M.S., and Fodor, S.P. 1996. Accessing genetic information with high-density DNA arrays. Science 274:610-614. Chen, E.Y. and Seeburg, P.H. 1985. Supercoil sequencing: A fast and simple method for sequencing plasmid DNA. DNA (N.Y.) 4:165-170. Church, G. and Gilbert, W. 1984. Genomic sequencing. Proc. Natl. Acad. Sci. U.S.A. 81:1991-1995. Church, G. and Kiefer-Higgins, S. 1988. Multiplex DNA sequencing. Science 240:185-188. Connel, C., Fung, S., Heiner, C., Bridgeham, J., Chakerian, V., Heron, E., Jones, B., Menchen, S., Mordan, W., Raff, M., Racknor, M., Smith, L., Springer, J., Woo, S., and Hunkapiller, M. 1987. Automated DNA sequence analysis. BioTechniques 5:342-348.. 7.0.12 Supplement 47. Current Protocols in Molecular Biology.

(13) Craxton, M. 1991. Linear amplification sequencing: A powerful method for sequencing DNA. Methods, a companion to Methods Enzymol. 3:20-26. Creasey, A., D’Angio, L.M., Dunne, T., Kissinger, C., O’Keefe, T., Perry-O’Keefe, H., Moran, L., Roskey, M., Shildkraut, I., Sears, L., and Slatko, B. 1991. Application of a novel chemiluminescent-based DNA detection method to single-vector and multiplex DNA sequencing. BioTechniques 11:102-109. D’Cunha, J., Berson, B.J., Brumly, R.L., Wagner, P.R., and Smith, L.M. 1990. An automated instrument for the performance of enzymatic DNA sequencing reactions. BioTechniques 9:80-90. Drmanac, R., Labat, I., Brukner, I., and Crkvenjakov, R. 1989. Sequencing of megabase-plus DNA by hybridization: Theory of the method. Genomics 4:114-128. Drmanac, R., Drmanac, S., Strezoska, Z., Paunesku, T., Labat, I., Zeremski, M., Snoddy, J., Funkhouser, W.K., Koop, B., Hood, L., et al. 1993. DNA sequence determination by hybridization: A strategy for efficient large-scale sequencing. Science 260:1649-1652. Drmanac, S., Kita, D., Labat, I., Hauser, B., Schmidt, C., Burczak, J.D., and Drmanac, R. 1998. Accurate sequencing by hybridization for DNA diagnostics and individual genomics. Nature Biotechnol. 16:54-58. Eckert, R. 1987. New vectors for rapid sequencing of DNA fragments by chemical degradation. Gene 51:245-252. Evans, S. 1991. Millipore’s system speeds up DNA sequencing and eliminates radioactivity. Genet. Eng. News 14:29-41. Frank, R., Bosserhoff, A., Boulin, C. Epstein, A. Gausepohl, H., and Ashman, K. 1988. Automation of DNA sequencing reactions and related techniques: A workstation for micromanipulation of liquids. Bio/Technology 6:1211-1213. Fu, D.J., Tang, K., Braun, A., Reuter, D., DarnhoferDemar, B., Little, D.P., O’Donnell, M.J., Cantor, C.R., and Koster, H. 1998. Sequencing exons 5 to 8 of the p53 gene by MALDI-TOF mass spectrometry. Nature Biotechnol. 16:381-384. Fujita, M., Usui, S., Kiyama, M., Kambara, H., Murakawa, K., Suzuki, S., Sambe, H., and Takachi, K. 1990. Chemical robot for enzymatic reactions and extraction processes of DNA in DNA sequence analysis. BioTechniques 9:584591. Green, E.D. and Olson, M.V. 1990. Chromosomal region of the cystic fibrosis gene in yeast artificial chromosomes: A model for human genome mapping. Science 250:92-98. Haltiner, M., Kempe, T., and Tjian, R. 1985. A novel strategy for constructing clustered point mutations. Nucl. Acids Res. 13:1015-1025. Hattori, M. and Sakaki, Y. 1986. Dideoxy DNA sequencing method using denatured plasmid templates. Anal. Biochem. 152:232-238.. Hawkins, T.L., McKernan, K.J., Jacotot, L.B., MacKenzie, J.B., Richardson, P.M., and Lander, E.S. 1997. A magnetic attraction to highthroughput genomics. Science 276:1887-1889 Heiner, C.R., Hunkapiller, K.L., Chen, S.M., Glass, J.I., and Chen, E.Y. 1998. Sequencing multimegabase-template DNA with BigDye terminator chemistry. Genome Res. 8:557-561. Huang, X.C., Quesada, M.A., and Mathies, R.A. 1992. DNA sequencing using capillary array electrophoresis. Anal. Chem. 64:2149-2154. Hultman, T., Stahl, S., Hornes, E., and Uhlen, M. 1989. Direct solid phase sequencing of genomic and plasmid DNA using magnetic beads as solid support. Nucl. Acids Res. 17:4937-4946. Hultman, T., Bergh, S., Moks, T., and Uhlen, M. 1991. Bidirectional solid phase sequencing of in vitro amplified plasmid DNA. BioTechniques 10:84-93. Johnston-Dow, E., Mardis, E., Heiner, C., and Roe, B.A. 1987. Optimized methods for fluorescent and radiolabeled DNA sequencing. BioTechniques 5:754-765. Jones, D. S., Schofield, J.P., and Vaudin, M. 1991. Fluorescent and radioactive solid phase dideoxy sequencing of PCR products in microtitre plates. J. DNA Seq. Map. 1:279-283. Ju, J. 1999. Nucleic Acid Sequencing with Solid Phase Capturable Terminators. U.S. Patent no. 5,876,936. Ju, J., Ruan, C., Fuller, C.W., Glazer, A.N., and Mathies, R.A. 1995. Energy transfer fluorescent dye-labeled primers for DNA sequencing and analysis. Proc. Natl. Acad. Sci. USA. 92:43474351. Ju, J., Glazer, A.N., and Mathies, R.A. 1996. Energy transfer primers: A new fluorescence labeling paradigm for DNA sequencing and analysis. Nature Med. 2:246-249. Ju, J., Yan, H., Zaro, M., Doctolero, M., Goralski, T., Konrad, K., Lachenmeier, E., and Cathcart., R. 1997. DNA sequencing with solid phase capturable terminators Microb. Comp. Genomics 2:223. Kambara, H., Nishikawa, T., Katayama, K., and Yamaguchi, T. 1988. Optimization of parameters in a DNA sequenator using fluorescence detection. Bio/Technology 6:816-821. Kaneoka, H., Lee, D.R., Hsu, K.-C., Sharp, G.C., and Hoffman, R.W. 1991. Solid phase DNA sequencing of allele specific polymerase chain reaction amplified HLA-DR genes. BioTechniques 10:30-40. Kheterpal, I. and Mathies, R.A. 1999. Capillary array electrophoresis DNA sequencing. Anal. Chem. 71:31A-37A. Koster, H., Tang. K., Fu, D.J., Braun, A., van den Boom, D., Smith, C.L., Cotter, R.J., and Cantor, C.R. 1996. A strategy for rapid and efficient DNA sequencing by mass spectrometry. Nature Biotechnol. 14:1123-1128. DNA Sequencing. 7.0.13 Current Protocols in Molecular Biology. Supplement 47.

(14) Krishnan, B.R., Blakesley, R.W., and Berg, D.E. 1991. Linear amplification DNA sequencing directly from single phage plaques and bacterial colonies. Nucl. Acids. Res. 19:1153. Lee, L.G., Spurgeon, S.L., Heiner, C.R., Benson, S.C., Rosenblum, B.B., Menchen, S.M., Graham, R.J., Constantinescu, A., Upadhya, K.G., and Cassel, J.M. 1997. New energy transfer dyes for DNA sequencing. Nucl. Acids Res. 25:28162822. Mardis, E.R. and Roe, B.A. 1989. Automated methods for single-stranded DNA isolation and dideoxynucleotide DNA sequencing reactions on a robotic workstation. BioTechniques 7:840-850. Marra, M., Weinstock, L.A., and Mardis, E.R. 1996. End sequence determination from large insert clones using energy transfer fluorescent primers. Genome Res. 6:1118-1122. Martin, W., Warmington, J., Galinski, B., Gallager, M., Davies, R., Beck, M., and Oliver, S. 1985. A system to perform the Sanger dideoxy sequencing reactions. Bio/Technology 3:911-915. Martin, C., Bresnick, L., Juo, R.-R., Voyta, J.C., and Bronstein, I. 1991. Improved chemiluminescence DNA sequencing. BioTechniques 11:110113. Maxam, A.M. and Gilbert, W. 1977. A new method for sequencing DNA. Proc. Natl. Acad. Sci. U.S.A. 74:560-564. Maxam, A.M. and Gilbert, W. 1980. Sequencing end-labeled DNA with base-specific chemical cleavages. Methods Enzymol. 65:499-559. Messing, J. 1983. New M13 vectors for cloning. Methods Enzymol. 101:20-78. Messing, J. 1988. M13, the universal primer and the polylinker. Focus (BRL) 10:21-26. Middendorf, L., Brumbaugh, J., Grone, D., Morgan, C., and Ruth, J. 1988. Large scale DNA sequencing. Am. Biol. Lab. (August) 14-22. Monforte, J.A. and Becker, C.H. 1997. Highthroughput DNA analysis by time-of-flight mass spectrometry. Nature Med. 3:360-362. Murray, V. 1989. Improved double-stranded DNA sequencing using the linear polymerase chain reaction. Nucl. Acids Res. 17:8889. Olson, M., Hood, L., Cantor, C., and Botstein, D. 1989. A common language for physical mapping of the human genome. Science 245:1434-1435. Prober, J., Trainor, G., Dam, R., Hobbs, F., Robertson, C., Zagursky, R., Cocuzza, R., Jensen, M., and Baumeister, K. 1987. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238:336341. Rosenthal, A., Sproat, B., Voss, H., Stegemann, J., Schwager, C., Erfle, H., Zimmerman, J., Courelle, C., and Ansorge, W. 1990. Automated sequencing of fluorescently labeled DNA by chemical degradation. J. DNA Seq. Map. 1:6371.. Rubin, J. and Schmid, C. 1980. Pyrimidine-specific chemical reactions useful for DNA sequencing. Nucl. Acids Res. 8:4613-4619. Salas-Solano, O., Carrilho, E., Kotler, L., Miller, A.W., Goetzinger, W., Sosic, Z., and Karger, B.L. 1998. Routine DNA sequencing of 1000 bases in less than one hour by capillary electrophoresis with replaceable linear polyacrylamide solutions. Anal. Chem. 70:3996-4003. Sanger, F., Nicklen, S., and Coulson, A.R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A. 74:5463-5467. Sanger, F., Coulson, A.R., Barrell, B.G., Smith, A.J.M., and Roe, B.A. 1980. Cloning in singlestranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 143:161-178. Sears, L., Moran, L., Kissinger, C., Creasey, T., Perry-O’Keefe, H., Roskey, M., Sutherland, E., and Slatko, B. 1992. CircumVent thermal cycle sequencing and alternative manual and automated DNA sequencing protocols using the highly thermostable VentR (exo−) DNA polymerase. BioTechniques. 13:626-633. Shizuya, H., Birren, B., Kim, U.-J., Mancino, V., Slepak, T., Tachiiri, Y., and Simon, M. 1992. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Nat. Acad. Sci. U.S.A. 89:8794-8797. Smith, L., Fung, S., Hunkapiller, M., Hunkapiller, T., and Hood, L. 1985. The synthesis of oligonucleotides containing an aliphatic amino group at the 5′ terminus: Synthesis of fluorescent DNA primers for use in DNA sequencing analysis. Nucl. Acids Res. 13:2399-2412. Smith, L., Sanders, J., Kaiser, R., Hughes, P., Dodd, C., Heiner, C., Kent, S., and Hood, L. 1986. Fluorescence detection in automated DNA sequence analysis. Nature 321:674-679. Smith, V., Brown, C.M., Banker, A.T., and Barrell, B.G. 1990. Semiautomated preparation of DNA template for large scale sequencing projects. J. DNA Seq. Map. 1:73-78. Stephens, J.C., Cavanaugh, M.L., Gradie, M.I., Mador, M.L., and Kidd, K.K. 1990. Mapping the human genome: Current status. Science 250:237-244. Tabor, S. and Richardson, C.C. 1987a. Selective oxidation of the exonuclease domain of bacteriophage T7 DNA polymerase. J. Biol. Chem. 262:15330-15333. Tabor, S. and Richardson, C.C. 1987b. DNA sequence analysis with a modified bacteriophage T7 DNA polymerase. Proc. Natl. Acad. Sci. U.S.A. 84:4767-4771. Tabor, S. and Richardson, C.C. 1989a. Selective inactivation of the exonuclease activity of bacteriophage T7 DNA polymerase by in vitro mutagenesis. J. Biol. Chem. 264:6447-6458.. Introduction. 7.0.14 Supplement 47. Current Protocols in Molecular Biology.

(15) Tabor, S. and Richardson, C.C. 1989b. Effect of manganese ions on the incorporation of dideoxynucleotides by bacteriophage T7 DNA polymerase and Escherichia coli DNA polymerase I. Proc. Natl. Acad. Sci. U.S.A. 86:4076-4080. Tabor, S. and Richardson, C.C. 1990. DNA sequence analysis with a modified bacteriophage T7 DNA polymerase: Effect of pyrophosphorolysis and metal ions. J. Biol. Chem. 265:83228328. Tabor, S., Huber, H., and Richardson, C.C. 1987. Escherichia coli thioredoxin confers processivity of the DNA polymerase activity of the gene 5 protein of bacteriophage T7. J. Biol. Chem. 262:16212-16223. Tizard, R., Cate, R.L., Ramachandran, K.L., Wysek, M., Voyta, J.C., Murphy, O.J., and Bronstein, I. 1990. Imaging of DNA sequences with chemiluminescence. Proc. Natl. Acad. Sci. U.S.A. 87:4514-4518. Volckaert, G., de Vleeschouwer, E., Blocker, H., and Frank, R. 1984. A novel type of cloning vector for ultrarapid chemical degradation sequencing of DNA. Gene Anal. Tech. 1:52-59. Wilson, R., Yuen, A., Clark, S., Spence, C., Arakelian, P., and Hood, L. 1988. Automation of dideoxynucleotide DNA sequencing reactions using a robotic workstation. BioTechniques 6:776-787.. Zagursky, R.J., Conway, P.S., and Kashdan, M.A. 1991. Use of 33P for Sanger DNA sequencing. BioTechniques 11:36-38. Zimmerman, J., Voss, H., Schwager, C., Stegemann, J. Erfle, H., and Ansorge, W. 1989. Automated preparation and purification of M13 templates for DNA sequencing. Methods Mol. Cell Biol. 1:29-34. Zimmerman, J., Dietrich, T., Voss, H., Erfle, H., Schwager, C., Stegemann, J., Hewitt, N., and Ansorge, W. 1992. Fully automated Sanger sequencing protocol for double-stranded DNA. Methods Mol. Cell Biol. 3:39-42.. KEY REFERENCES Maxam and Gilbert, 1977. See above. Describes the chemical cleavage method. Sanger et al., 1977. See above. Describes the traditional Sanger procedure. Tabor and Richardson, 1987b. See above. Describes the labeling/termination method. Sears et al., 1992. See above. Describes the thermal cycle sequencing procedure.. Young, A. and Blakesley, R. 1991. Sequencing plasmids from single colonies with the dsDNA cycle sequencing system. Focus (BRL) 13:137.. Frederick M. Ausubel, Lisa M. Albright, and Jingyue Ju. Zagursky, R. and McCormick, R. 1990. DNA sequencing separations in capillary gels on a modified commercial DNA sequencing instrument. BioTechniques 9:74-79.. The chapter editors wish to acknowledge the substantial assistance of Richard L. Eckert of Case Western Reserve and Barton E. Slatko of New England Biolabs.. Zagursky, R., Baumeister, K., Lomax, N., and Berman, M. 1985. Rapid and easy sequencing of large double-stranded DNA and supercoiled plasmid DNA. Gene Anal.Tech. 2:89-94.. DNA Sequencing. 7.0.15 Current Protocols in Molecular Biology. Supplement 47.

(16) Overview of DNA Sequencing Strategies. UNIT 7.1. Jay A. Shendure,1 Gregory J. Porreca,2 and George M. Church2 1 2. University of Washington, Seattle, Washington Harvard Medical School, Boston, Massachusetts. ABSTRACT Efficient and cost-effective DNA sequencing technologies have been, and may continue to be, critical to the progress of molecular biology. This overview of DNA sequencing strategies provides a high-level review of six distinct approaches to DNA sequencing: (a) dideoxy sequencing; (b) cyclic array sequencing; (c) sequencing-by-hybridization; (d) microelectrophoresis; (e) mass spectrometry; and (f) nanopore sequencing. The primary focus is on dideoxy sequencing, which has been the dominant technology since 1977, and on cyclic array strategies, for which several competitive implementations have been developed since 2005. Because the field of DNA sequencing is changing rapidly, this unit represents a snapshot of this particular moment. Curr. C 2008 by John Wiley & Sons, Inc. Protoc. Mol. Biol. 81:7.1.1-7.1.11. Keywords: DNA r sequencing r Sanger r dideoxy r polony. INTRODUCTION In the mid-1960s, the first attempts at DNA sequencing followed the precedent set for protein (Ryle et al., 1955) and RNA (Holley et al., 1965): sequencing by detailed analysis of degradation products. However, the length and consequent complexity of the DNA polymer proved to be significantly problematic (Sanger, 1988). A key moment came in February, 1977, when groups led by Fred Sanger and Walter Gilbert independently published descriptions of methodologies for DNA sequencing, both of which relied on gel electrophoresis to separate DNA fragments with single-basepair resolution (Maxam and Gilbert, 1977; Sanger et al., 1977). In the years that followed, the rapid dissemination of these technologies and their progression to robust protocols enabled a wide range of critical advances throughout the fields of genetics and molecular biology. The development of commercially available automated sequencing platforms in the mid-1980s represented a second key breakthrough that secured the dominance of the Sanger protocol (also known as “dideoxy sequencing”) over the Maxam-Gilbert protocol (also known as “chemical sequencing”) as the method of choice for the next several decades (Hunkapiller et al., 1991). In addition to automation, a supporting cast of related technologies was developed to further reduce costs and improve sequencing throughput. These included a broad range of methods for efficient library construction and template preparation, dideoxynu-. cleotides (ddNTPs) bearing fluorescent moieties (Prober et al., 1987), and thermostable polymerases engineered to accept them (Tabor and Richardson, 1995), as well as the implementation of efficient DNA sequence production workflows in core facilities and highthroughput sequencing centers. It is notable that much of this innovation was motivated by the Human Genome Project (HGP), which achieved completion of a draft of the canonical human genome sequence in 2001 (International Human Genome Sequencing Consortium, 2001). Consequent to the technological innovation that enabled the HGP, the per-base cost of dideoxy sequencing has followed an exponential decline (Collins et al., 2003; Shendure et al., 2004). Importantly, the read lengths and accuracy of sequencing traces have steadily improved as well. As community-wide capacity for high-throughput DNA sequence production has been maintained in the wake of the HGP, the number of sequenced nucleotides deposited in GenBank has continued its exponential rise. As of October 2007, genome sequences for 997 bacterial species and 164 eukaryotic species are available in at least draft assembly form. In recent years, there has been a collective sense in the technology development field that optimization of dideoxy sequencing protocols may be approaching exhaustion, and that the trend of declining sequencing costs is unlikely to continue much further without a radical change in the underlying technology. This has sparked significant academic. Current Protocols in Molecular Biology 7.1.1-7.1.11, January 2008 Published online January 2008 in Wiley Interscience (www.interscience.wiley.com). DOI: 10.1002/0471142727.mb0701s81 C 2008 John Wiley & Sons, Inc. Copyright . DNA Sequencing. 7.1.1 Supplement 81.

(17) and commercial investment in alternative technological paradigms (Shendure et al., 2004). Several of these alternatives have quickly progressed to substantial proof-of-concept, demonstrating costs competitive with conventional dideoxy sequencing for certain applications (Margulies et al., 2005; Shendure et al., 2005). Some of these platforms have recently become, or are anticipated to become, widely available in an “open-source” format or as commercial products. Although dideoxy sequencing still accounts for the vast majority of DNA sequencing production, this is unlikely to be the case several years from now. This unit provides a high-level overview of six distinct approaches to DNA sequencing. These are: (1) dideoxy sequencing, (2) cyclic array sequencing, (3) sequencing by hybridization, (4) microelectrophoresis, (5) mass spectrometry, and (6) nanopore sequencing. Additionally, this unit presents key parameters that should be considered when choosing the DNA sequencing strategy most appropriate for a given application. It should be emphasized that the DNA sequencing field is changing rapidly, so the information in this unit represents a snapshot of this particular moment. It is worthwhile to note that the research goals that motivate DNA sequencing may be undergoing a substantial shift as well, concurrent with the introduction of new technologies. Given that reference genome sequences for H. sapiens as well as all major model organisms are nearly complete, demand will likely shift away from de novo genome sequencing towards other areas of application, such as resequencing (identifying genetic variation in the genome of an individual for whose species a reference genome is already available) and tag counting (i.e., serial analysis of gene expression or chromatin occupancy by the sequencing of short but identifying DNA tags). The initial generation of new technologies will deliver sequence that is substantially shorter and less accurate than state-of-the-art Sanger sequencing. However, although the utility of such sequence may be limited for de novo sequencing, it will likely be compatible, and often preferable, for other areas of application.. DNA SEQUENCING STRATEGIES Dideoxy Sequencing Overview of DNA Sequencing Strategies. Dideoxy sequencing, also known as Sanger sequencing, proceeds by primer-initiated, polymerase-driven synthesis of DNA strands. complementary to the template whose sequence is to be determined (Fig. 7.1.1). Numerous identical copies of the sequencing template undergo the primer extension reaction within a single microliter-scale volume. Generating sufficient quantities of template for a sequencing reaction is typically achieved by either (1) miniprep of a plasmid vector into which the fragment of interest has been cloned, or (2) polymerase chain reaction (PCR) followed by a cleanup step. In the sequencing reaction itself, both the natural deoxynucleotides (dNTPs) and the chain-terminating dideoxynucleotides (ddNTPs) are present at a specific ratio that determines their relative probability of incorporation during the primer extension. Incorporation of a ddNTP instead of a dNTP results in termination of a given strand. Therefore, for any given template molecule, strand elongation will begin at the 3 end of the primer and terminate upon incorporation of a ddNTP. In older protocols for dideoxy sequencing, four separate primer extension reactions are carried out, each containing only one of the four possible ddNTP species (ddATP, ddGTP, ddCTP, or ddTTP), along with template, polymerase, dNTPs, and a radioactively labeled primer. The result is a collection of many terminated strands of many different lengths within each reaction. As each reaction contains only one ddNTP species, fragments with only a subset of possible lengths will be generated, corresponding to the positions of that nucleotide in the template sequence. The four reactions are then electrophoresed in four lanes of a denaturing polyacrylamide gel to yield size separation with single-nucleotide resolution. The pattern of bands (with each band consisting of terminated fragments of a single length) across the four lanes allows one to directly interpret the primary sequence of the template under analysis. Current implementations of dideoxy sequencing differ in several key ways from the protocol described above. Only a single primer extension reaction is performed that includes all four ddNTPs. The four species of ddNTP are labeled with fluorescent dyes that have the same excitation wavelength but different emission spectra, allowing for identification by fluorescent energy resonance transfer (FRET). To minimize the required amount of template DNA, a “cycle sequencing” reaction is performed, in which multiple cycles of denaturation, primer annealing, and primer extension are performed to linearly increase the number of terminated strands. This requires. 7.1.2 Supplement 81. Current Protocols in Molecular Biology.

(18) Figure 7.1.1 Schematic of the basic principle involved in dideoxy sequencing. The sequencing template consists of an unknown region whose sequence is to be determined, flanked by known sequence to which a sequencing primer can be hybridized. Cycle sequencing (multiple cycles of primer annealing, primer extension, and denaturation) are performed with polymerase, dNTPs, and fluorescently labeled ddNTPs (where a different label is present on each species of ddNTP). Products of the cycle sequencing reaction are run into a capillary containing a denaturing polymer. This yields size-based separation with single-base-pair resolution, with the shortest fragments running the fastest. Observation of the emission spectra in four channels (corresponding to the fluorescent labels for the four ddNTP species) over time, as fragments emerge from capillary electrophoresis, can be used to infer the primary sequence of the unknown template.. the use of engineered polymerases, such as ThermoSequenase, that are thermostable and that efficiently incorporate modified ddNTPs (Tabor and Richardson, 1995). The products of the cycle sequencing reaction are analyzed in an automated sequencing instrument via electrophoresis in a long capillary filled with a denaturing polymer that yields size separation with single-base-pair resolution. As fragments of each discrete length pass through a transparent component near the end of the capillary, a single wavelength of light excites the fluorophores linked to the ddNTPs. Labeled fragments fluoresce at one of four distinct wavelengths, revealing the identity of their terminal base via FRET. Simultaneous measurement of the emission spectra at these four. wavelengths produces a four-color sequencing trace. Computer algorithms (“base callers”) interpret the peak heights in these traces to produce a DNA sequence. Importantly, sophisticated algorithms exist that also define the accuracy with which individual base-calls are made (Ewing and Green, 1998; Ewing et al., 1998). Although the per-base accuracy can vary substantially within a single sequencing read, the accuracy of the best base calls can be as high as 99.999%. Nearly all dideoxy sequencing performed today makes use of automated capillary electrophoresis, which typically analyzes 96 to 384 sequencing reactions simultaneously via an array of capillaries. Major vendors include Applied Biosystems (e.g., the ABI 3730) and. DNA Sequencing. 7.1.3 Current Protocols in Molecular Biology. Supplement 81.

(19) GE Healthcare (e.g., the MegaBACE instrument series). There is a tradeoff between long read lengths and the overall throughput of an instrument. Depending on which parameter is being optimized, conventional instruments are capable of reads just over 1000 base pairs in length, or production throughputs of over 2.5 megabases per day. Because of variation in the levels of optimization and instrument uptime, the cost of dideoxy sequencing varies widely throughout the research community. The in-house costs of high-throughput sequencing centers may be as low as 50 cents per kilobase, while core facilities and commercial entities may charge anywhere from $1 to $20 per sequencing read.. Cyclic Array Sequencing All of the recently released, or soon-to-bereleased, non-Sanger commercial sequencing platforms, including systems from 454/Roche, Solexa/Illumina, Agencourt/Applied Biosystems, and Helicos BioSystems, fall under the rubric of a single paradigm, termed cyclic array sequencing (Fig. 7.1.2). Cyclic array platforms achieve low costs by simultaneously decoding a two-dimensional array bearing millions (potentially billions) of distinct sequencing features. The sequencing features are “clonal,” in that each resolvable unit contains only one species of DNA (as a single molecule or in multiple copies) physically immobilized on the array. The features may be arranged in an ordered fashion or may be ran-. Overview of DNA Sequencing Strategies. domly dispersed. Each DNA feature generally includes an unknown sequence of interest (distinct from the unknown sequence of other DNA features on the array) flanked by universal adaptor sequences. A key point in this approach is that the features are not necessarily separated into individual wells. Rather, because they are immobilized on a single surface, a single reagent volume is applied to simultaneously access and manipulate all features in parallel. The sequencing process is cyclic because in each cycle an enzymatic process is applied to interrogate the identity of a single base position for all features in parallel. The enzymatic process is coupled to either the production of light or the incorporation of a fluorescent group. At the conclusion of each cycle, data are acquired by CCD-based imaging of the array. Subsequent cycles are aimed at interrogating different base positions within the template. After multiple cycles of enzymatic manipulation, position-specific interrogation, and array imaging, a contiguous sequence for each feature can be derived from analysis of the full series of imaging data covering its position. Although this basic paradigm serves to describe several different platforms for cyclic array sequencing, the platforms differ remarkably in the specifics of implementation. The primary areas of difference (summarized for several platforms in Table 7.1.1) are (1) the method used to generate the DNA sequencing features, and (2) the biochemistry used. Figure 7.1.2 The concept of cyclic array sequencing platforms involves an array of DNA features to be sequenced, immobilized to constant locations on a solid substrate. At each cycle, the identity of a single base position is interrogated at each feature. Data are collected at each cycle by imaging of the array. At the conclusion of the experiment, imaging data for each feature collected over the full set of cycles can be used to infer contiguous stretches of sequence. The power of cyclic array methods to achieve low costs derives from the possibility of simultaneously sequencing millions to potentially billions of sequencing features in parallel. Also, microliter-scale reagent volumes can be used to manipulate all features in a single reaction, such that the effective reagent volume per sequencing feature is on the order of picoliters or femtoliters. For the color version of this figure go to http://www.currentprotocols.com.. 7.1.4 Supplement 81. Current Protocols in Molecular Biology.

No results found