• No results found

Determining the base type in SMDS by fluorescence is conceptually easy: the presence of the fluorescence signal at a primer location during any given step of the sequencing cycle is indicative of an incorporation of that base in the DNA template.

However, in practice, deciding whether an incorporation event has happened is not trivial. We have to consider the rate of occurrence of false-positive and false-negative signals.

False-positive signals occur when there is random correlation of a dye signal with the primer location in non-FRET single molecule sequencing, which can be due to non-specific binding of a labeled nucleotide close to the DNA template, within the size of a pixel or so.

Figure 10. Histogram of sequence space for 4-mers composed of A and G. All traces that reached at

least four incorporations are included. (A) Results for template 1 (actual sequence fingerprint: AAGA). (B) Results for template 2 (actual sequence fingerprint: AGAA). Reprinted from Braslavsky, I., Hebert, B. Kartalov, E. and Quake, S. R. (2003). Sequence information can be obtained from single DNA molecules. Proc. Natl. Acad. Sci. USA. 100, 3960-3964. Copyright (2003), reprinted with permission from National Academy of Sciences (USA).

These can also occur because of a mis-incorporation of the labeled nucleotide by the DNA polymerase. All false-positive signals will indicate that a nucleotide has been inserted when in fact there should be none, and hence it will introduce an error in the sequence for that particular DNA template. False-negative signals originate when a nucleotide is inserted but no fluorescent signal is detected. This could be due to defective reagents such as unlabeled nucleotides, or a labeled nucleotide whose attached dye has bleached during the donor observation that precedes the FRET imaging. In addition, dye blinking and out of focus

imaging can be sources of false-negative signals. However, the asynchronous feature of single molecule sequencing allows one to discriminate against false-signal information for each template by virtue of statistics. For example, the sequence fingerprinting experiment described in Figure 8 was also performed with an independent template DNA sequence (Braslavsky et al, 2003). Comparing the measured sequences to the set of all possible 4-mer sequences shows that the correct sequences for two templates can be discriminated with a 97% confidence level (see Figure 10).

In the re-sequencing application, the reading lengths are unique when they are longer than 16 to 20 bases (van Dam and Quake, 2002). Thus, when reading lengths of 20 bases or more are generated, the sequences can be aligned with a known reference sequence (Figure 9). When a high coverage of the reference sequence is obtained, it is possible to average the sequences, and thus find mutations or disagreements with the library sequence. By increasing the coverage, or sequencing depth, one can find rare mutations even in noisy raw sequence data. Some other factors can reduce error rate, for example, (1) mis-incorporation results in a mismatch at the end of the primer and this template will probably be terminated and thus filtered out from the template pool, (2) random overlap will look like a single addition in the alignment process, a rare event in gene sequences as it cause a shift in the reading frame, and thus can be filtered out in some cases, and (3) since the location of each molecule is known, it is possible, in principle, to sequence the same molecule twice, a procedure which would dramatically decrease the error rate.

In SMDS, each molecule contains unique information that is critical and thus one would like to examine the same molecule for the full experiment duration. The important constants for stability are not the equilibrium constants, but rather the off-rate parameters, because when the molecule leaves the anchoring position, further examination can not be completed. Hence, parameters such as stability of the template, kinetics of incorporation and others need to be optimized in order to increase read length, reduce error rates and ensure robustness of the system.

Figure 11. Several important time constants play a role in determining the minimum reagent

concentrations necessary and the error sources in the experiments.

Some of the potential processes that are of concern in SMDS are illustrated in Figure 11. We explain a few of these concerns, below -

• The stability of the substrate: what is the lifetime of the multi polyelectrolyte layers or other surfaces?

• The stability of the connector of the DNA to the surface, such as biotin streptavidin. • The kinetics of incorporation of labeled nucleotides: the bulky labeled nucleotides are

a possible bottleneck for the polymerase activity – a cleavable nucleotide increases the yield tremendously.

• The stability of the primer/DNA hybridization.

• The photo-induced radicals can be a source of damage to the DNA, to the dye (bleaching) and to other ingredients in the flow cell.

• The oxygen scavenger system can reduce the formation of oxygen radicals, but fluctuations in the performance of the scavenger solution can influence the sequencing operation. It might also degrade the surface.

• Non-specific sticking of the fluorescent molecules produces reading errors. It might be addressed by careful surface preparation and suitable wash solutions.

While each of these factors has to be optimized in order to achieve the required high yields, none of them pose a fundamental limit. For example, it is known that the mutation G over T occurs in high rates naturally (Kunkel, 2004) because there is very little local perturbation of the helix, and more importantly, the global conformation of the duplex is unaffected. Similar results have been reported for the A-C mis-pairing. Since the incorporation of the labeled nucleotide slows down incorporation rates for steric reasons, steric hinderance will also slow the incorporation of mismatched nucleotides to the point of insignificant error rates. Additionally, since synchronization is not a requirement in single molecule sequencing, the incorporation does not have to be driven to close to 100% incorporation at every cycle and thus short cycles can reduce the probability of the incorporation of wrong bases. In the next section we will discuss the anticipated performance of SMDS by cyclic synthesis.

Related documents