• No results found

Model-guided ligation strategy for optimal assembly of DNA libraries

3.3 Software development

3.4.10 Linear ligation calculator

With the free energies of up to two sticky ends in hand, the distribution of linear ligation products can be predicted (Fig. 3.11A). A major assumption in the calculation is that the chosen sticky ends are non-palindromic and do not form major byproducts. This is reasonable since the free energies of complementary base-paring (Fig. 3.10B) are, on average, ~8 kcal/mol lower than those for mismatch annealing (Fig. 3.10C, D and E); thus, byproducts can typically be safely neglected since each 1 kcal/mol increase in free energy corresponds to a 5.41-fold decrease in equilibrium affinity (equivalently, a 7.38×106-fold decrease in equilibrium affinity for a 8 kcal/mol increase in free energy).

113

Figure 3.11. Schematic and representative output from the linear ligation calculator. (A) Schematics are given for two-piece and three-piece ligations as well as for the definitions of product distribution, incorporation, and efficiency. The free energy values used in this calculation (ΔGSE1 and ΔGSE2) are

obtained from the sticky end free energy calculator (Fig. 3.10). (B) Screenshot of linear ligation prediction output showing product distributions of a three-piece ligation reaction with 20, 10, and 20 nM of DNA 1, 2, and 3, respectively. Representative outputs from the linear ligation recommendation calculator showing (C) incorporation and (D) efficiency of ligation across a range of library concentrations and molar ratios for a three-piece ligation reaction with 10 nM library construct (DNA 2). ΔGSE1 ((G)CCTA(C)) of -11.10

kcal/mol and ΔGSE2 ((T)GGAT(A)) of -11.32 kcal/mol were used in all simulations.

The first part of the linear ligation calculator predicts product concentrations, incorporation, and efficiency from input DNA concentrations for two or three linear DNA fragments with one or two sticky end free energies, respectively (Fig. 3.11A). The

114

user can specify which DNA species is “valuable” (e.g., the fragment containing diversity in a DNA library assembly application) for calculation of percent incorporation. Again, the molar DNA concentrations can be converted from mass concentrations and volumes with the ligation excel worksheet. A representative output for three-piece DNA ligation with 10 nM valuable entity (DNA 2), 2:1 molar ratios for the other DNA fragments (DNA 1 and DNA 3), and the sticky end sequences (G)CCTA(C) and (T)GGAT(A) is shown in Fig. 3.11B. Product distribution reveals that most of DNA 2 (61.9%) is incorporated into full-length product, which is driven by the excess of the other two reactants (approximately half of DNA 1 and DNA 3 remain unreacted). The overall ligation efficiency (37.2%) indicates the percentage of all input reactants (DNA 1, 2 and 3) that end up in full-length products; the remainder are either reactants or intermediate products (DNA 1 + 2 or DNA 2 + 3).

The linear ligation calculator can also predict how incorporation and efficiency vary over a range of input concentrations and molar ratios. This feature is particularly useful for determining the optimal ligation setup for a specific application. Representative outputs of incorporation and efficiency for a three-piece linear ligation with 10 nM DNA 2 (valuable entity) and the sticky ends (G)CCTA(C) and (T)GGAT(A) are shown (Fig. 3.11C and D). For a given molar ratio, increasing reactant concentrations enhances incorporation and efficiency because the product is thermodynamically favored. For a given concentration, augmenting the molar ratio monotonically increases incorporation of the valuable entity, while ligation efficiency shows an optimum at intermediate molar ratios. At high molar ratios, incorporation becomes more favorable because any increase

115

in reactants (in the absence of byproducts) will shift the equilibrium towards product formation, but the overall efficiency drops after an optimal value since increased product formation cannot compensate for the excess reactants used. If Figure 3.11D represented a routine ligation reaction where the objective is often to maximize efficiency, the optimal reaction conditions would be to use a DNA 2 concentration close to the highest concentration available (10 nM) and ~2:1 molar ratios of [DNA 1]:[DNA 2] and [DNA 3]:[DNA 2]. For more stringent applications such as DNA library assembly in which library incorporation needs to be maximized without unnecessarily wasting other DNA fragments, the optimal setup based on Figure 3.11D would be to use the maximum concentration of the library fragment and ~4:1 molar ratios of the other fragments; using still higher molar ratios would only marginally increase incorporation but efficiency would markedly decline. The chosen molar concentrations and molar ratios can be directly entered into the ligation worksheet to determine the final ligation reaction mix based on available concentrations and volumes of DNA fragments. The calculator assumes that the valuable species is limiting and that excess amounts of the other fragments are available; if this is not the case, then a lower concentration of the valuable species should be entered into the calculator such that availability of the other DNA fragments is no longer an issue. If needed, the user can iterate between the ligation calculator and the worksheet to determine a final reaction scheme that is experimentally feasible.

116 3.4.11 Circular ligation calculator

Analogous to the linear ligation calculator, the circular ligation calculator predicts product distributions and offers ligation recommendations for insert-vector assembly. Using input molar concentrations of vector and insert, two sticky-end free energy values, and the number of base pairs of the full length vector-insert product, concentrations are calculated for all products containing up to four ligated molecules (any higher molecular weight molecules are assumed to be negligible) (Fig 3.12A). The probability of ligated vector-insert molecules (and vector-insert-vector-insert molecules) forming circular products depends on the effective concentration of the sticky end in the molecule itself compared to the concentrations of free vector and insert; this effective concentration is a function of the length of the ligated molecule, which is why the number of base pairs in the product is a necessary input4,15. Since circular ligation and transformation are typically performed in series, transformation efficiency (via colony counting) can serve as an experimental proxy for ligation efficiency. For DNA library applications, percent incorporation can be useful in estimating final library size and diversity. Accuracy gives the amount of correct vector-insert product formation compared to that of vector-insert- vector-insert circular product, although this value is likely a conservative estimate since the transformation efficiency of vector-insert-vector-insert circular products will typically be less than that for the desired product since transformation efficiency decreases with increasing plasmid size40.

117

Figure 3.12. Schematic and representative output from the circular ligation calculator. (A) Schematics are given for a circular vector-insert ligation as well as for the definitions of incorporation, efficiency, and accuracy. Representative outputs from the circular ligation recommendation calculator showing (B) incorporation, (C) efficiency, and (D) accuracy of ligation across a range of insert concentrations and vector:insert molar ratios for a circular ligation with 5 nM library construct (insert). In all simulations, the total number of base pairs of vector and insert was 3500, ΔGSE1 was -11.10 kcal/mol, and ΔGSE2 was -11.32

kcal/mol.

The circular ligation calculator can also predict how incorporation (Fig. 3.12B), efficiency (Fig. 3.12C), and accuracy (Fig. 3.12D) vary over a range of insert concentrations and vector:insert molar ratios. At low vector:insert ratios, incorporation and efficiency are low because insert-vector-insert is preferentially formed over circular insert-vector products, but accuracy remains high since unbalanced molar ratios do not favor circular insert-vector-insert-vector molecules. Conversely, high vector:insert ratios result in low efficiency because vector-insert-vector products are preferentially formed and the effect on incorporation is mixed as the benefit of having excess vector to drive

118

product formation can be offset by an increase in vector-insert-vector byproduct. Maximum incorporation occurs when the vector concentration is around the value of the average equilibrium dissociation constant. In Fig. 3.12B, in which the equilibrium dissociation constants for the two sticky ends are 7.21 nM and 4.98 nM, maximal incorporation occurs in a region of concentration space that includes 3.5 nM insert with a 2:1 molar ratio (7 nM vector), 1.5 nM insert with a 4:1 ratio (6 nM vector), and 0.75 nM with an 8:1 ratio (6 nM vector). Maximum efficiency, on the other hand, is achieved at ~1:1 molar vector:insert ratio and when the insert concentration is approximately the value of the average equilibrium dissociation constant, which is around 5 nM in Fig. 3.12C. Accuracy only drops at high insert concentrations and approximately equimolar amounts of vector, although the absolute accuracy remains very high in all of concentration space and should therefore not be a major factor in optimizing circular ligation reactions. As an example optimization exercise, for normal cloning application where efficiency should be maximized, 5 nM insert with 1:1 vector:insert would be optimal; for library assembly where the balance between high incorporation and efficiency is important, 3.5 nM insert with 2:1 ratio would be preferable (Fig. 3.12B, C). Finally, the selected insert concentration and molar ratio can again be directly entered into the ligation worksheet to determine the final ligation reaction mix based on the availability of the amount of vector and insert.

119

3.5 Discussion

We have investigated the optimal ligation strategy for applications in both in vitro and in vivo directed evolution. We have reported up to 15- and 2.6-fold increases in desired products for circular and linear ligation reactions, respectively, when Type IIS restriction enzymes were used instead of Type IIP enzymes to generate DNA fragments, indicating that even order-of-magnitude improvements in library size can be achieved by this strategy. Furthermore, our simulation results can help to more narrowly focus optimization efforts and our web-based ligation calculator can enable easy determination of maximum ligation yields and library size using Type IIS restriction enzymes without extensive experimental testing.

To maximize ligation efficiency, next-generation display vectors should contain recognition sites of Type IIS restriction enzymes and cutting sites that are non- palindromic and have a matching number of G/C and A/T base-pairs. As demonstrated in this work, a convenient pair of non-palindromic cutting sites would have the same base compositions but would not exhibit any complementary to one another, thus eliminating undesired annealing. For linear ligations, the maximum concentration of digested reactants should be used, with an excess of upstream and downstream pieces to drive the equilibrium towards ligated products. For circular ligations, a concentration comparable to the K values of the sticky ends should be used with an insert:vector ratio between 1:2 and 1:1. These recommendations are different from the common ligation guidelines suggesting that insert:vector molar ratios greater than one be used and that the concentration of both should be kept within a certain range6, which are not applicable to

120

fragments generated by Type IIS restriction enzymes or in directed evolution applications where the insert library is scarce. Our recommendations and even our cutting sites should be generally applicable to any ligation with T4 DNA ligase where the incorporation of insert is essential. Other custom cutting sites with four-base overhangs are not expected to deviate greatly from our results because our two sites CGAA and GCTT both contain 50% GC content and represent moderate ΔG values.

We presented two methods for estimating the free energy of ligation: 1) using DNA thermodynamics, and 2) by experimental measurement. The two methods gave similar results for the set of experimental conditions that we examined. If a specific application requires conditions sufficiently different from those examined in this work (e.g. different temperature, salt concentration or ligase), our model could still be used to capture ligation behavior as long as the ΔG values are properly parameterized (e.g. using the simple experimental procedure outlined herein). Importantly, once this characterization has been performed for a given pair of overhangs, the model could be applied for any system that uses these annealing sequences. Again, in directed evolution experiments, a common display vector is typically used for many applications, so the trained model could be repeatedly applied to all experimental systems using a redesigned display vector with Type IIS restriction sites.

Our model is not only practically useful for ligation applications, but also provides thermodynamic insight into the annealing of sticky ends. The ligation reaction is rarely analyzed quantitatively as there is no thermodynamic information readily available in the literature. We have estimated the free energy of ligation by obtaining free energy

121

values of DNA annealing from the nearest-neighbor method23,41. However, most tabulated values are optimized at 37oC and 1 M NaCl, and salt and temperature corrections are only accurate to a certain extent and have not been systematically investigated for ligation. Empirical observations are available for specific ligation reactions but are not easily generalizable42. In contrast, considerable work has been done to characterize the free energy of sealing by ligase16,43. From the ΔG values obtained from model training, we could calculate the free energy of sticky-end annealing since we know the free energy contributed by ligase sealing. In principle, if a variety of sticky ends were investigated at different temperatures, free energy values for annealing could be obtained, providing insight into DNA thermodynamics.

In summary, for applications where ligation efficiency is crucial for experimental success, we propose that vectors be redesigned to exploit the desirable properties of commercially available Type IIS restriction enzymes. We have developed and validated a thermodynamic model of ligation to provide specific experimental strategies for in vitro

and in vivo directed evolution applications, and our web-based calculator can readily be adapted to other ligation conditions. Last but not least, our model can be used to parse the complex thermodynamics of sticky-end annealing, further contributing to our understanding of this common molecular biology technique.

122

3.6 References

1. Sambrook, J. & Russell, D. W. Plasmids and their usefulness in molecular cloning.

Molecular Cloning: A Laboratory Manual 1.157–1.159 (2001).

2. Tobias, A. V. Preparing libraries in Escherichia coli. Directed evolution library creation: methods and protocols 11–16 (2003).

3. Muller, P. Y., Studer, E. & Miserez, A. R. Molecular biocomputing suite: a word processor add-in for the analysis and manipulation of nucleic acid and protein sequence data. BioTechniques31, 1306–1313 (2001).

4. Dugaiczyk, A., Boyer, H. W. & Goodman, H. M. Ligation of EcoRI endonuclease- generated DNA fragments into linear and circular structures. J. Mol. Biol.96, 171– 184 (1975).

5. Legerski, R. J. & Robberson, D. L. Analysis and optimization of recombinant DNA joining reactions. J. Mol. Biol.181, 297–312 (1985).

6. Revie, D., Smith, D. W. & Yee, T. W. Kinetic analysis for optimization of DNA ligation reactions. Nucleic Acids Res.16, 10301–10321 (1988).

7. Dardel, F. Computer simulation of DNA ligation: determination of initial DNA concentrations favouring the formation of recombinant molecules. Nucleic Acids Res.16, 1767–1778 (1988).

8. Szybalski, W., Kim, S. C., Hasan, N. & Podhajska, A. J. Class-IIS restriction enzymes–a review. Gene100, 13–26 (1991).

9. Roberts, R. J. et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 31, 1805–1812 (2003).

10. Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE--a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 38, D234–236 (2010).

11. Bath, A. J., Milsom, S. E., Gormley, N. A. & Halford, S. E. Many type IIs restriction endonucleases interact with two recognition sites before cleaving DNA.

J. Biol. Chem.277, 4024–4033 (2002).

12. Binz, H. K. et al. High-affinity binders selected from designed ankyrin repeat protein libraries. Nat. Biotechnol.22, 575–582 (2004).

123

13. Barendt, P. A., Shah, N. A., Barendt, G. A. & Sarkar, C. A. Broad-specificity mRNA-rRNA complementarity in efficient protein translation. PLoS Genet. 8, e1002598 (2012).

14. Inoue, H., Nojima, H. & Okayama, H. High efficiency transformation of

Escherichia coli with plasmids. Gene96, 23–28 (1990).

15. Jacobson, H. & Stockmayer, W. H. Intramolecular reaction in polycondensations. I. The theory of linear systems. J. Chem. Phys.18, 1600–1606 (1950).

16. Dickson, K. S., Burns, C. M. & Richardson, J. P. Determination of the free-energy change for repair of a DNA phosphodiester bond. J. Biol. Chem. 275, 15828– 15831 (2000).

17. Dreier, B. & Plückthun, A. Ribosome display: a technology for selecting and evolving proteins from large libraries. Methods Mol. Biol.687, 283–306 (2011). 18. He, M. & Taussig, M. J. Eukaryotic ribosome display with in situ DNA recovery.

Nat. Methods4, 281–289 (2007).

19. Jones, E., Oliphant, T., Peterson, P. et al. SciPy: Open source scientific tools for Python. (2001).at <http://www.scipy.org/>

20. Hunter, J. D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng.9, 90–95 (2007).

21. Florián, J., Šponer, J. & Warshel, A. Thermodynamic parameters for stacking and hydrogen bonding of nucleic acid bases in aqueous solution: ab initio/Langevin dipoles study. J. Phys. Chem. B 103, 884–892 (1999).

22. Protozanova, E., Yakovchuk, P. & Frank-Kamenetskii, M. D. Stacked-unstacked equilibrium at the nick site of DNA. J. Mol. Biol.342, 775–785 (2004).

23. SantaLucia, J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. U.S.A. 95, 1460–1465 (1998).

24. Allawi, H. T. & SantaLucia, J. Thermodynamics of internal G•T mismatches in DNA. Biochemistry36, 10581–10594 (1997).

25. Allawi, H. T. & SantaLucia, J. Nearest neighbor thermodynamic parameters for internal G•A mismatches in DNA. Biochemistry37, 2170–2179 (1998).

124

26. Allawi, H. T. & SantaLucia, J. Nearest-neighbor thermodynamics of internal A•C mismatches in DNA: sequence dependence and pH effects. Biochemistry 37, 9435–9444 (1998).

27. Allawi, H. T. & SantaLucia, J. Thermodynamics of internal C•T mismatches in DNA. Nucleic Acids Res.26, 2694–2701 (1998).

28. Peyret, N., Seneviratne, P. A., Allawi, H. T. & SantaLucia, J. Nearest-Neighbor Thermodynamics and NMR of DNA Sequences with Internal A•A, C•C, G•G, and T•T Mismatches. Biochemistry38, 3468–3477 (1999).

29. Zahnd, C., Amstutz, P. & Plückthun, A. Ribosome display: selecting and evolving proteins in vitro that specifically bind to a target. Nat. Methods4, 269–279 (2007). 30. Landegren, U., Kaiser, R., Sanders, J. & Hood, L. A ligase-mediated gene

detection technique. Science 241, 1077–1080 (1988).

31. Harada, K. & Orgel, L. E. Unexpected substrate specificity of T4 DNA ligase revealed by in vitro selection. Nucleic Acids Res.21, 2287–2291 (1993).

32. Cherepanov, A. V. & de Vries, S. Kinetics and thermodynamics of nick sealing by T4 DNA ligase. Eur. J. Biochem.270, 4315–4325 (2003).

33. Hanes, J. & Plückthun, A. In vitro selection and evolution of functional proteins by using ribosome display. Proc. Natl. Acad. Sci. U.S.A.94, 4937–4942 (1997). 34. Roberts, R. W. & Szostak, J. W. RNA-peptide fusions for the in vitro selection of

peptides and proteins. Proc. Natl. Acad. Sci. U.S.A.94, 12297–12302 (1997). 35. Clackson, T., Hoogenboom, H. R., Griffiths, A. D. & Winter, G. Making antibody

fragments using phage display libraries. Nature352, 624–628 (1991).

36. Francisco, J. A., Campbell, R., Iverson, B. L. & Georgiou, G. Production and fluorescence-activated cell sorting of Escherichia coli expressing a functional antibody fragment on the external surface. Proc. Natl. Acad. Sci. U.S.A. 90, 10444–10448 (1993).

37. Boder, E. T. & Wittrup, K. D. Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol.15, 553–557 (1997).

38. Goffin, C., Bailly, V. & Verly, W. G. Nicks 3’ or 5' to AP sites or to mispaired bases, and one-nucleotide gaps can be sealed by T4 DNA ligase. Nucleic Acids Res.15, 8755–8771 (1987).

125

39. Gu, J. et al. XRCC4:DNA ligase IV can ligate incompatible DNA ends and can ligate across gaps. EMBO J.26, 1010–1023 (2007).

40. Hanahan, D. Studies on transformation of Escherichia coli with plasmids. J. Mol. Biol.166, 557–580 (1983).

41. SantaLucia, J., Allawi, H. T. & Seneviratne, P. A. Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 35, 3555–3562 (1996).

42. Ferretti, L. & Sgaramella, V. Temperature dependence of the joining by T4 DNA