• No results found

Synthetic Biology Engineering in Biotechnology

N/A
N/A
Protected

Academic year: 2021

Share "Synthetic Biology Engineering in Biotechnology"

Copied!
64
0
0

Loading.... (view fulltext now)

Full text

(1)

Synthetic Biology – Engineering in Biotechnology

A report written by Sven Panke, ETH Zurich

On behalf of the Committee on applied Bioscience, Swiss Academy of Engineering Sciences

„What I cannot build, I cannot understand“ Accredited to Richard Feynmann

(2)

Executive Summary

Synthetic Biology summarizes efforts directed at the synthesis of complex biological systems to obtain useful novel phenotypes based on the exploitation of well-characterized, orthogonal, and re-utilizable building blocks. It aims at recruiting design structures well established in classic engineering disciplines, such as a hierarchy of abstractions, system boundaries, standardized interfaces and protocols, and separation of manufacturing and design, for biotechnology. Synthetic Biology generally entertains the notion that successful design of a biological system from scratch – as opposed to the current practice of adapting a potentially poorly understood system over and over again towards a specific goal – is the ultimate proof of fundamental understanding and concomitantly the most powerful way to advance biotechnology to the level required to deal with today’s challenging problems.

First achievements include the design and implementation of synthetic genetic circuits, the design of novel biochemical pathways, and the de novo synthesis of a bacterial genome. However, it is clearly the ultimate aspiration of the field to extend the mastery of biological engineering to systems complex enough to deal with problems such as the design of novel therapeutic treatments, the production of liquid transportation fuels, or the efficient manufacturing of biopharmaceuticals and fine, bulk, or fuel chemicals.

Synthetic Biology is a field in its infancy, but as it desires to adopt structures that have resulted extremely successful in industrial history, the potential benefits of the field are enormous. The success of Synthetic Biology will depend on a number of challenging scientific and technical questions, but two points stand out: [i] Synthetic Biology needs to prove that the concept of orthogonal parts and subsystems is viable beyond the few domains for which it has been established at the moment (in particular the field of genetic circuits and the “Registry of Standard Biological Parts” at the MIT); and [ii] Synthetic Biology is in need of a technological infrastructure that a) provides access to cheap but accurate non template-driven (de novo) DNA synthesis that can cope with the requirements of system-level engineering and b) allows installing and maintaining an “Institute of Biological Standards” that provides the base for implementing the Synthetic Biology principles across the biotechnological community.

(3)

Contents Page

Executive summary 2

1. Synthetic Biology in context 4

2. The challenges of Synthetic Biology 9

2.1. Orthogonality in synthetic biology 11

2.1.1 Integration of unnatural amino acids into proteins 11

2.1.2. Orthogonal RNA-RNA interactions 14

2.1.3. Orthogonality in single molecules 14

2.1.4. Orthogonality by alternative chemistries 16

2.2. Evolution and Synthetic Biology 17

2.3. The technology to implement synthetic biology 18

2.3.1 DNA synthesis and assembly 19

2.3.1.1. Towards large-scale, non template-driven DNA synthesis 19

2.3.1.2. Oligonucleotide synthesis 20

2.3.1.3. Oligonucleotide assembly 25

2.3.1.4. Assembly of DNA fragments 26

2.3.1.5. Assembling DNA at genomic scale 27

2.3.1.6. Transferring artificial genomes into novel cellular environments 29

2.3.1.7. Practical considerations 29

2.3.2. Miniaturizing and automating laboratory protocols and analysis 30

2.3.2.1. Microfluidic components 31

2.3.2.2. Fabrication methods 32

2.3.2.3. Applications of microfluidics 33

2.4. The tool collection and modularity in Synthetic Biology 37

2.4.1. Modularity in DNA-based parts and devices – the Registry of Standard

Biological Parts 38

2.4.2. Devices: Genetic circuits 42

2.4.3. Fine tuning of parameters: the assembly of pathways 43

2.4.4. Modularity in other classes of molecules 44

2.4.5. The chassis for system assembly: Minimal genomes 45

2.5. Organizational challenges in Synthetic Biology 50

2.6. Synthetic Biology in society 52

2.6.1. Synthetic Biology and biosafety 53

2.6.2. Synthetic Biology and biosecurity 54

2.6.3. Synthetic Biology and ethics 55

3. Summary 55

4. Acknowledgements 56

5. Useful websites 56

6. Abbreviations 58

(4)

1. Synthetic Biology in context

Since the advent of genetic engineering with the production of recombinant somatostatin and insulin by Genentech in the late 1970s, biotechnology has tried to exploit cells and cellular components rationally to achieve ever more complex tasks, primarily in the areas of health and chemistry, and more recently also in the energy sector. The manufacturing of biopharmaceuticals has advanced from simple peptide hormones (the disulfide-bridges of which had to be introduced afterwards) to the manufacturing of monoclonal antibodies complete with specific glycosylation patterns. The manufacturing of small pharmaceutical molecules has advanced from an increase in overproduction of natural products (such as the penicillins) to the assembly of entire novel pathways (e.g. for artemisinic acid production [1]), and so has the manufacturing of chemicals (e.g. from overproducing intermediates of the citric acid cycle such as citrate to compounds alien to most microbial metabolisms such as 1,3-propanediol and indigo [2, 3]).

The same can be said for other sectors that apply the tools of molecular biology, such as gene therapy [4], tissue engineering [5, 6], diagnostics [7], and many others.

However, despite the numerous successes of the biotechnology industry, major challenges remain. The chemical industry might need to replace a substantial part of their raw material base for a substantial part of its products – can biotechnology come up reliably with robust solutions for novel production routes in a suitable timeframe and with reasonable development costs? The same goes for our energy industry – can biotechnology play a significant part in the de-carbonization of our energy systems, preferably without provoking major opposition on environmental or ethical issues? The list of questions could easily be extended for example into the field of gene therapy or any other field in which biotechnology plays already or could play in the future a major role. In fact, against these challenges, the successes of biotechnology appear only as something as the first steps of an industry that has managed to mount a couple of convincing cases but needs to do much more to adopt the potentially overwhelming role it might theoretically be well placed to assume.

The reason for this lies of course in the complexity of biological systems, which makes it difficult to reliably engineer them and essentially converts every industrial development project into a research project that needs to cope with unexpected fundamental hurdles or completely new insights into the biological system. Still, the challenges we face will need to be addressed to a large extent in these complex biological systems – cells. Cells represent a multitude of different organic molecules with a multitude of interactions, most of them highly non-linear, and thus prone to emergent properties. Of course, advances in Systems Biology allow us to delineate an ever larger number of these interactions in considerable detail (see below), but it is sobering to realize that even the dynamics of arguably the best understood single pathway in metabolism, glycolysis, in one of the best studied model organisms, Saccharomyces cerevisiae, can still not be satisfactorily explained in terms of its constituting members [8]. Overexpressing one recombinant gene in Escherichia coli triggers

(5)

hundreds of other genes to change their transcriptional status, and only for some of them it is clear what might motivate the cell to do so [9]. Clearly, our understanding of biological systems is still far from complete.

This prompts two questions: [i] how is it possible that against such a complex background it is possible at all to successfully introduce complex artificial functions (such as assembling a novel pathway for the production of a complex small molecule) and [ii] are we using the right concepts to deal with complexity in biotechnology?

For the first question, I guess it is safe to argue that the successes in recombinant biotechnology are primarily the result of a non-straightforward process in which a basic idea was put to the test and modified and expanded for so long until every (unexpected) difficulty along the way was taken care off to such an extent that the resulting process was still economical (I restrict the argument to biotechnological manufacturing processes here, but it is easy to make the equivalent argument for other areas as well). Or to put it in different words, I guess it is safe to argue that today’s successfully operating fermentation processes employing cells that had to undergo substantial genetic engineering are not the result of a targeted planning effort of a group of people that had a clear idea of the problems they were facing and in addition a clear idea of the tools that they had at their disposal to work around such problems in a reasonable amount of time. Or, to paraphrase the key point once more, one could argue that the implication of essentially any novel biotechnological process today primarily remains a research project, as pointed out above. The existing examples illustrate that it is possible to be successful with such research projects, but the effectively small number of examples (relative to the number of, e.g. novel chemical processes that have been introduced over the same time) suggests also that such research endeavors remain difficult and costly.

For the second question, it might be helpful to compare the current “design processes” in biotechnology with design processes in more mature fields, such as classical engineering disciplines (e.g. mechanical engineering or electrical engineering) in which the success rate of design efforts is by orders of magnitude better than in biotechnology, and then to try to understand whether there is a chance that we could recruit some of the key practices to biotechnology in the future.

This comparison has been at the heart of the emerging field of Synthetic Biology, and consequently it has been treated in a number of very instructive papers over the last years [10-18]. In summary, five points have been identified that are crucial in engineering but by and large absent from biotechnology:

(6)

1) Comprehensiveness of available relevant knowledge 2) Orthogonality

3) Hierarchy of abstraction 4) Standardization

5) Separation of design and manufacturing

Knowledge of relevant phenomena in classical engineering disciplines is rather comprehensive. Mechanics is a mature science from which mechanical engineers can select the proper elements that are relevant to their specific tasks, and so is electrophysics (for electrical engineering), while chemical engineering can at least draw on an – at least theoretically – comprehensive formalism to describe the effects of mass, heat, and momentum transfer on chemical reactions. In all of these examples, rather elaborate mathematical formalisms are in place that allow to describe the behavior in time of such systems, and so are the techniques that allow reasonable estimates for the unknown parameters (if any) required for these descriptions.

Biotechnology works under rather different circumstances from the five points discussed above: First of all, our knowledge of the molecular events in a cell is far from complete. Even though the cataloguing of biological molecules (genomics, transcriptomics, proteomics, metabolomics) has made tremendous progress since the first complete genome sequences started to become available [19], the interactions between these molecules and their dynamics remain to the largest extent unknown, and in many cases we do not even know about basic functions (e.g. 24% of the genes in E. coli remain without proper functional characterization [20].

Next, the issue of orthogonality (“independence”): This concept has been adopted from computer science, where it refers to the important system design property that modifying the technical effect produced by a component of a system neither creates nor propagates side effects to other components of the system. In other words, orthogonality describes the fact that adjusting the rear view mirror in the car does not affect the steering of a car or accelerating does not affect listening to the car’s radio. It is the prerequisite for the possibility to divide a system into subsystems that can be developed independently. As such, it is also the prerequisite for modularity: a system property that allows that various (orthogonal) subsystems can be combined at will, i.e. that they have the proper connections to fit to the other modules. Various elements contribute to orthogonality in non-biological designs: for example, electrical circuits and thermal elements can be insulated, chemical reactions can be confined to separate compartments, and mechanical elements can be placed spatially separated so that they do not interact.

In contrast, cells are complex and orthogonality mostly absent and then difficult to implement. In bacterial cells, for example, only one intracellular reaction space is available, the cytoplasm, and it hosts hundreds of different simultaneous chemical reactions while at the same time working on the

(7)

duplication of its information store. Introducing and expressing a single recombinant gene into E. coli changes the expression patterns of hundreds of genes [9], indicating the degree of connection between various cellular functions. On the other hand, nature provides ample examples of orthogonality, for example in eukaryotic cells that have available organelles to carry our specialized tasks or in higher organisms that have organs dedicated to specific functions. However, it is probably safe to say that consciously engineering orthogonality into biological systems to help the design process has so far played no role in biotechnology.

The third point, the hierarchy of abstraction, reflects the assembly of complex non-biological systems from orthogonal sub-systems. If it is possible to separate the overall system into meaningful subsystems, which in turn might be once more separated into meaningful subsystems, and so on, then the design task can be distributed over several levels of detail at the same time (provided that the transitions from one level to the next/one sub-system to the next are defined in advance). This has two major implications – the development time will be much reduced by parallel advances and every level of system development can be addressed by individuals who are specialists for this specific level of detail, so the overall quality of the design improves. To adopt an example from the design of integrated electric circuits, diodes, transistors, resistors, and capacitors are required to fabricate electronic logic gates (AND, OR, NOT, NAND, NOR), which in turn are the elements to make processors (e.g. for a printer). The term “transistor” describes a function (controlling a large electrical current/voltage with a small electrical current/voltage), but it also hides detailed information which is in most cases not relevant for the person that uses the transistor to build an AND gate. The AND gate in turn describes a function (an output signal is obtained if two input signals are received concomitantly) and can be implemented in various forms composed of various circuits elements, which influence its performance, but the details of how this was achieved are not relevant for the designer of the processor. In other words, at every level of the design process, detailed information is hidden under abstract descriptions. The higher the level, the more information is hidden.

Again in contrast, the description of biological systems is usually strongly focused on the molecular level, and formalized, abstract or functionalized descriptions in the sense discussed above are rare. A frequently used analogy here is that the equivalent to programming cells today (writing down a specific recombinant DNA sequence) is programming a computer in binary code instead of using one or the other very powerful specialized user-friendly software. The advent of Systems Biology is about to establish the idea of functional modules in the cell [21], but clearly we are nowhere near the sophistication with which we are familiar in classical engineering disciplines when it comes to structuring a biological engineering approach. Even worse, in essentially all instances our adoption of functional terms (e.g. “promoter”) reflects a qualitative understanding of a higher-level function, but it hides the fact that detailed quantitative information on the specifics of this function are not or only partly available, should we be interested in the details of the next level of the hierarchy (which proteins bind to regulate activity – how strong is the binding- what is the importance of the

(8)

nucleotides between the -35 and the -10 region – what is the promoter clearance rate under which circumstances?)

As already mentioned, the hierarchy of abstraction requires that the transitions between the different levels are properly specified at the beginning, so that the various levels can integrate seamlessly in the end. To continue the example from above, to be able to assemble a processor from the various logic gates, it will be helpful if the gates are cascadable – the electronic currents leaving one gate have to be in the proper range to be recognized by the next gate, and the two gates need to be physically connected to each other, for which it is helpful if the number and size of output pins of the first gate is compatible to the number and size of the receiving sockets in the next gate. In addition, if these transitions are made according to more general specifications, it should be possible to integrate (potentially superior) parts from other manufacturers. In other words, the transitions should be standardized. In fact, standardization needs to encompass many more elements of the design process: For example, measurement protocols need to be standardized so that practices become comparable and can be made available to everybody involved in the design process. The same is true for the documentation of system components and the storage of this information in databases. Only sufficient compliance with these standards ensures that a designed element of the system has a high chance of re-utilization.

Biotechnology is characterized by the absence of standards. As orthogonality in cells is at best an emerging concept, no clear ideas on how to standardize information exchange between such elements exist. In terms of standardizing experimental protocols, it is only now that with the advent of Systems Biology certain standards on data quality and measurement techniques start to produce an impact [22] in the face of the need to handle ever larger sets of experimental data delivered by the experimental “-omics” technologies. But the impact of the lack of standards in experimental techniques goes much further: a simple discussion about the relative strength of two different promoters between two scientists from two different labs will quickly reveal that the discussion is pointless, because in the corresponding experiments they have used different E. coli strains on different media after having placed the promoter on replicons with different copy numbers per cell to drive expression of two different reporter genes, of which one encoded a protein tagged for accelerated turnover. In summary, it is very difficult to answer basic questions that can be expected to be important for designing biological systems, and even if comparable experiments exist, it almost always requires an extensive search through the available scientific literature to locate the information.

The fifth key difference between current engineering practice and “biological engineering” that was identified is the separation of design and manufacturing. Put differently, the design of a car is done by a different group of people than the car’s assembly at the assembly line, and the assembly is actually requires comparatively little effort. The different groups of people also have different

(9)

qualifications and have received different training – they are specialists, and can afford to be because the design was so detailed that the manufacturing can be confidently expected to give no problems.

On the other hand, the major part of the time an average PhD student in molecular biotechnology spends at the bench, is still spent on manipulating DNA – isolating a gene from the genomic DNA of a specific strain, pruning it of specific restriction sites inside the gene, adapting the ends to cloning in a specific expression vector, adapting codon usage, playing with RNA and protein half-life, etc. In other words, the manufacturing of the system is still a major part of the research project, and in fact in many cases it is still a research project on its own.

Against this background, Synthetic Biology, in the definition adopted for this report, aims at providing the conceptual and technological fundamentals to implement orthogonality, a hierarchy of abstraction, standardization, and separation of design and manufacturing, into biotechnology. In other words, it aims at providing the prerequisites of operating biotechnology as a “true” engineering discipline. This implies aiming [i] at making biological systems amenable to large-scale rational manipulations and [ii] at providing the technological infrastructure to implement such a large scale changes. In this effort, Synthetic Biology depends on a substantial increase of the knowledge base in biotechnology, which is expected from Systems Biology.

2. The challenges of Synthetic Biology

Of the points discussed above, six main challenges arise – two scientific, two technological, one organizational, and one societal challenge.

One key scientific challenge of Synthetic Biology will be to implement the concept of orthogonality in complex biological systems. Rational, forward-oriented engineering on a considerable scale for example in cells will only be possible if we can limit the impact of our modifications on the cellular background into which we want to implement our instructions. Failing here, success in large-scale cellular manipulations will necessarily remain a matter of trial and error, because it appears presumptuous to assume that we will understand cellular complexity to such an extent that we can comprehensively integrate it into the design process.

In addition, orthogonality will be a fundamental prerequisite for the application of the hierarchy of abstraction. The latter depends on our ability to define meaningful sub-systems that can be worked on and later combined. This makes only sense if the number of interactions between the different subsystems is limited, ideally to the few interactions that are desired and eventually standardized.

(10)

The second key scientific challenge will be how to deal with evolution. Though this question has so far hardly played a role in the Synthetic Biology discussion, it is intrinsically connected to the long-term vision of Synthetic Biology: because evolution is a permanent source of change in living systems, it will always represent an obstacle when it comes to preserving the integrity of artificial designs in the long run. In the short term, the problem might not be particularly dramatic – after all, we have learned how to deal with unstable but highly efficient production strains over the approximately 70 generations it takes to fill the large scale reactor starting from a single cell from the cell bank. However, if the ideas of standardization, abstraction, and orthogonality can be effectively implemented, it is easy to foresee that strains undergo various generations of modifications, and it will be important to be sure that the strain that receives the final modification is still the one that received the first generation of modifications.

One technological challenge of Synthetic Biology is clearly to adapt our current protocols for strain manipulation to the drastic change in synthetic scope that will be required. The ambition to use ever more complex catalysts for biotechnology will inevitably lead to the requirement to realize ever more voluminous synthetic tasks: on the one hand, our ambition to adapt cellular systems to concepts such as orthogonality will require large-scale adaptations of existing cells, which is currently only possible by working through complex lengthy laboratory protocols. On the other hand, the DNA-encoding of ever larger sets of instructions will require the synthesis of ever longer DNA fragments. At the same time, we will need to adapt our data acquisition technologies to the requirements of an engineering discipline with respect to accuracy, reproducibility, and amount of labor to invest per measurement. It should be clear that it will be impossible to simply linearly scale up our current efforts in the construction of strains for biotechnology if the vision is that at least dozens of gene adaptations are required for any given new strain generation. It should be as obvious that by transforming biotechnology from a research-driven discovery science into an engineering activity, the reliable determination of fundamental data under pre-defined circumstances will become much more important and has to become much more routine, and we will require the corresponding technology to accomplish this.

A second technological challenge is to make available the tools of the trade – the “parts” and “devices” from which complex systems can be assembled rationally and the strains and cell lines into which these assemblies can be implemented. This is of course intimately connected to solving the problem of orthogonality. An additional aspect here is to provide suitable modes of assembly – it would be helpful if we could develop parts in such a way that they can be easily combined - in fact behave modular.

Organizationally, much of the impact of Synthetic Biology will depend on whether it is possible to gather a critical mass of scientists to adopt engineering-inspired standards and standard operating procedures, for example for the measurement of variables of central importance in Synthetic Biology (e.g., promoter clearance rate of recombinant promoters) and its suitable subsequent documentation

(11)

for example in a centralized data repository. Beyond the experimental arena, this request for a critical mass needs to be extended to the adoption of certain strategies to deal with questions of ownership – it will be difficult to motivate a lively use of central repositories, for example, if the user has to go through several layers of patent research before using.

Finally, Synthetic Biology will inevitably touch on questions regarding its ethical boundaries and aspects of biosafety (the safety of scientists and the public from the application of Synthetic Biology in the laboratory) and biosecurity (the security of people against military or terroristic exploitation of the technology). The notion of making life “engineer-able at the discretion of the engineer” will undoubtedly trigger the need for communicating at least the broader aims of this scientific community. Efforts to define a “minimal-life chassis” (see below) will most probably trigger ethical concerns about who has the authority to make such definitions. Certain aspects of the debate on the safety of genetic engineering from the 1980s and 90s will be re-visited once large-scale manipulation of living systems becomes a reality and most probably there will be a need to develop strategies on how to use the novel technologies of Synthetic Biology to increase biosafety. And finally the potential for military abuse of specific technological developments needs to be assessed.

In the following, I will discuss work that has been addressing the challenges for Synthetic Biology mentioned above. It is important to point out that much of the work that is described here was not carried out under the label of Synthetic Biology. Rather – next to some of the “hallmark-experiments” of Synthetic Biology - I have collected work that indicates that the ambitious goals of Synthetic Biology might indeed be reached at some point in the future.

2.1. Orthogonality in Synthetic Biology

Orthogonality is a concept that in a way is intrinsically alien to many current notions of biology. In particular the various “-omics” disciplines convey the image that biology is above all about complex interactions that might even be too complex to ever fathom completely, let alone be mastered to a biotechnologically relevant extent. While the complexity of for example living cells is for sure a daunting obstacle in the way of implementing Synthetic Biology, a number of experiments have shown that orthogonality can be “engineered” into a cell provided a suitable experimental strategy is chosen. Specifically, the directed evolution coupled to a proper selection strategy can be used to identify the system variant that complies at least to a considerable extent with the requirements of orthogonality, as can be seen from the examples discussed below.

2.1.1. Integration of unnatural amino acids into proteins (Schultz-lab, Scripps, San Diego)

Protein synthesis at the ribosome is usually limited to a set of twenty amino acids, the proteinogenic amino acids. Even though the cell has developed a variety of strategies to extend the diversity of protein composition post-translationally (e.g. by glycosylation, phosphorylation, or

(12)

chemical modification of selected amino acids), there is considerable interest in extending the canon of proteinogenic amino acids beyond twenty, for example to overcome limitations in structural analysis of proteins or provide precise spots for posttranslational modifications to facilitate any chemical manipulation of biologically produced proteins [23].

The specificity in protein synthesis is maintained at various points throughout the process: First, an amino acid is attached by a synthetase to a corresponding empty tRNA. Each amino acid is recognized by a specific synthetase (interface 1) and the resulting aa-enzyme (aa for aminoacyl) complex discharges its amino acid onto a specific (set of) tRNAs (interface 2). Finally, the anticodon of the charged tRNA interacts with the mRNA at the ribosome (interface 3). Consequently, to introduce a novel, unnatural amino acid into cellular protein synthesis, there are at least the interactions at three different interfaces to consider. Each of these interfaces requires consideration of two times two specific questions:

Interface 1:

a1) The unnatural amino acid must be recognized by one empty synthetase … a2) … but it must not fit into any other synthetase.

b1) The synthetase must recognize the new amino acid, but…. b2) … it must nor recognize any other amino acid.

Interface 2:

c1) The empty tRNA must be recognized and charged by the charged aa- synthetase complex…. c2) … but it must not be recognized nor charged by any other charged aa-synthetase complex. d1) The charged aa-synthetase complex must recognize and charge the empty tRNA….

d2) … but it must not recognize nor charge any other empty tRNA. Interface 3:

e1) The codon on the mRNA must recognize the anticodon on the charged aa-tRNA… e2) … but it must not recognize any other aa-tRNA.

f1) The charged aa-tRNA must recognize the codon on the mRNA… f2) … but it must nor recognize any other codon.

Clearly, not all 12 requirements really need to be fulfilled for a system to work efficient – even in the wild-type system a tRNA can recognize several codons in some cases, all of which code for the same amino acid (“wobble”), and some requirements are equivalent (a1/b1, c1/d1, e1/f1). But for the introduction of an unnatural amino acid into the process of protein synthesis, ideally there should be a completely orthogonal system where each of the three interfaces is specific enough to ensure meeting all of the 9 unique requirements. As it becomes clear from the structure of each pair of requirements, each requirement in a pair can be enforced by a combination of positive (“must recognize”) and negative selection (“must not recognize”). Consequently, by undergoing at most 6 experimental pairs of positive and negative selection, it should be possible to select a system that can

(13)

integrate an unnatural amino acid without interfering with the remaining protein synthesis machinery.

This strategy has been realized by the group of P. Schultz in San Diego [23]. They used a tyrosine-transferring tRNA from Methanococcus jannaschii and changed the anticodon such that it would recognize the amber STOP codon UAG on an E. coli mRNA (interface 3). Next, they established two selection strategies:

To engineer interface 2, they put a library of the engineered tRNA into an E. coli strain harboring a gene encoding a toxic gene product (barnase). The gene contained two amber STOP codons. Expressing a library of variants of the engineered tRNA in this strain eliminated all those tRNA variants that could be charged by any of the E. coli synthetases (the charged tRNA variants would suppress the amber STOP-codons in the barnase gene, leading to cell death – negative selection). In a second step, those library members in strains that survived step 1 (= could not be charged by the E. coli machinery), were recovered and transferred into an E. coli strain that contained the recombinant Tyr-tRNA-synthetase from M. jannaschii together with an -lactamase resistance gene inactivated by another amber STOP codon. Expressing the variants of the tRNA library in the presence of ampicillin allowed only those strains to survive that harbored variants that could be charged by the recombinant synthetase, because only those were able to suppress the STOP codon in the resistance gene (positive selection).

In a next step, the amino acid-specificity of the recombinant synthetase was engineered (interface 1), again by a combination of negative and positive selection. A library of synthetase-variants was transferred into an E. coli strain that contained an antibiotic resistance gene with an amber mutation. The library was then grown in the presence of the antibiotic to select for those variants that could suppress the mutation, presumably because a synthetase variant was able to charge the cognate tRNA, either with the unnatural amino acid from the medium or with one of the natural amino acids. To eliminate variants that suppressed with natural amino acids, the survivors of the positive selection were transferred into a strain with a mutated barnase gene (see above). Expressing this library in the absence of the unnatural amino acid eliminated all synthetase variants that charged natural amino acids onto the cognate tRNA.

With this experimental system, requirements a1, b1/b2, c1/c2, d1, e1, and f1 could be covered. Assuming further that requirement f2 is guaranteed by the selection of a “wobble”-free tRNA, insertion of unnatural amino acids could be engineered to meet 6 or the 9 unique requirements with this system. In this specific case, requirement e2 is also covered in a manner of speaking – a suppressed STOP codon is by definition only recognized by the suppressor tRNA. However, the STOP codon appears in a variety of other genes as well, where it actually should preferably maintain its old function, so the system allows not to properly select against the “natural” function of the amber STOP-codon.

(14)

Nevertheless, the system has been used very successfully to engineer the insertion of unnatural acids into proteins [23]. This gives a clear indication that the power of directed evolution can be recruited to help the development of orthogonal systems once it is possible to design the proper selection systems.

2.1.2. Orthogonal RNA-RNA interactions (Chin-lab, MRC Cambridge, and others)

Another very promising experimental system has been implemented by the group of J. Chin in Cambridge: They used the specificity of the interaction between ribosome binding site (RBS) on the mRNA and the 16S rRNA on the ribosome to create orthogonal ribosome populations that do not translate wild-type mRNAs but only those mRNAs whose gene is preceded by an orthogonal RBS [24]. The experimental strategy followed is very similar to the one discussed above: The authors used a gene fusion connecting an antibiotic resistance gene (for positive selection) and a gene for a uracil phosphoribosyl transferase, whose gene product catalyzes the formation of a toxic product from the precursor 5-fluorouracil (for negative selection). In a first round, RBS’s that do not give rise to translation in a normal E. coli background were selected from a library of all possible variants of the natural RBS by negative selection. In the second round, the survivors were used to complement a library of mutated 16S rRNA molecules, and mRNA/ribosome pairs that could produce antibiotic resistance were positively selected [25].

In an alternative approach from the laboratory of J. Collins in Boston, the interaction between RNA molecules was used to either repress or induce gene translation. Translation was repressed when an mRNA was equipped at the 5’-end with a sequence that would produce an RBS-sequestering secondary structure. The repression could be relieved by producing a trans-activating (ta) RNA that forced the 5’-end into a different secondary structure which made the RBS-sequence available. The authors produced 4 pairs of 5’-RNA sequences and corresponding taRNA sequences. When they checked for interactions between non-corresponding 5’-mRNA and taRNA sequences, they did not find any, suggesting that these combinations of RNA sequences were orthogonal to each other and thus might be a general tool to be used with a variety of different promoters [26].

2.1.3. Orthogonality in single molecules

Orthogonality can also be engineered into the elements that make up single molecules. So far, three groups of molecules have been intensively discussed: DNA, RNA, and proteins. The notion that promoters, ribosome binding sites, genes, and transcriptional terminators can be combined freely to a large extent is one of the pillars of genetic engineering. Even in systems were DNA sequences have multiple use, this can usually be changed: For example, in phage T7 certain parts of the DNA sequences are used to encode the C-terminus of one protein and simultaneously the regulatory region and the N-terminus of a second protein, and the implications of this “double usage” are quantitatively unclear. “Rectifying” this situation by expanding the DNA and making the

(15)

corresponding elements “mono-functional” led to viable viruses – implying that “orthogonalization” of DNA sequences can work – that produced, however, much smaller plaques – implying that the orthogonalization had a direct impact on the fitness of the virus [27].

However, as DNA is only the store of information and the information is extracted via RNA molecules and then in many cases executed by proteins, the impact of this orthogonality is limited to the orthogonality of the subsequent operators. As a consequence, it is often difficult to attach a specific quantitative measure to a specific DNA sequence. For example, the efficiency of a RBS might be a strong function of the surrounding sequence context – even though an RBS might actually very efficient in initiating translation, being sequestered by the surrounding mRNA prevents it from functioning at all. Consequently, to make use of quantitative descriptions of the effect of specific sequences (promoter clearance rate connected to a specific DNA promoter sequence – rate of translation initiation as a function of RBS sequence – percentage of terminated mRNA syntheses as a function of transcriptional terminator sequence) we most probably will need to define specific DNA contexts that allow a direct use of the information because we can be sure that, for example, there is no sequestering or RBS involved.

Again, such sequences can be engineered given the right selection system: Orthogonality as for example engineered into artificial regulatory RNA elements, as discussed above for interactions between separate RNA molecules, but also for orthogonality within single RNA molecules as recently demonstrated in [28]. The authors designed RNA molecules that contained [i] an aptamer sequence that was responsive to a small molecule and [ii] a segment that was complementary to a part of the mRNA sequence that contained the start codon of a recombinant gene. The presence of a small molecule in the cytoplasm induced a change in the alignment of complementary sequences in the synthetic riboregulator and either sequestered or exposed the part of the riboregulator that could sequester the start codon on the mRNA. These synthetic riboregulators were orthogonal in that the aptamer parts could be interchanged without significant change in regulatory behavior and in that the regulated gene could be exchanged as well (of course within the limits of the tested set of combinations).

Various degrees of orthogonality across protein domains are in fact a well established fact, in particular in regulatory proteins that can be frequently separated at least into effector domains and protein-DNA interaction domains. One important domain-class in DNA-protein interactions is the zinc-finger protein (ZFP) domain [29]. Here, one such domain, stabilized by a zinc-ion, recognizes typically a DNA base-triplet. When more than 3 (or in some cases, 4) base pairs are needed for recognition, several ZFP domains can be connected with a small linker without interfering with the interactions of the first ZFP domain. Ideally, orthogonal ZFP domains would be available for all 64 possible DNA base-triplets and by simply selecting the proper set of ZFP domains it would be possible to create highly selective DNA-binders to arbitrary DNA sequences. In fact, multiple applications can be envisioned around such a technology [30-35]. Even though this perfect scenario is not yet

(16)

possible, considerable advance has been made in designing orthogonal ZFP domains for 48 of the 64 possible triplets by rational and phage display methods and up to 6 of these ZFP domains have been combined to produce selective binders with a recognition sequence of 18 bp [36].

However, pushing orthogonality to such extremes also revealed current limitations: for example, while a designed binder did bind to the designed sequence, they also bound with less affinity to similar sequences [37]. Such limitations not withstanding, ZFPs have found widespread use [32], in particular in the area of genome editing, where the ZFP domains are coupled to the DNA restricting domain of specific restriction endonucleases and thus can be used to produce targeted DNA double strand cuts. These cuts trigger in turn the exchange of chromosomal genes for externally provided genes, an attractive proposition for gene therapy [38].

Next to the ZFPs, the frequent orthogonality between protein domains has also been exploited in a number of alternative systems, preferably to re-program signaling pathways in cells [39, 40].

2.1.4. Orthogonality by alternative chemistries

An alternative approach to orthogonality could be to operate with molecules that do not occur in typical cells and for which therefore no interactions have been designed naturally. Of course, there might be ample unintended interactions, so such strategies need to be carefully controlled. Still, this concept could be applied on various levels. For example, while central carbon metabolism defines a set of standard ways from glucose to the set of standard starting metabolites for anabolism (glycolysis, pentose phosphate pathway, etc…), it is possible to produce alternative routes on paper relying on novel intermediates which for example might not have the regulatory effects that glycolytic intermediates have. In fact, work is done on the establishment of such routes, in which pathways are built up by going backward from an intermediate (the end of the pathway) and successively evolving the enzyme to catalyze the step before [41].

Alternatively, one could propose a different set of molecules to encode genetic information that is replicated by dedicated enzymes and thus represents an orthogonal store of information in the cell [12]. First steps to the implementation of this strategy were successful, for example as it was possible to generate DNA-polymerase variants that could incorporate alternative nucleotides with novel hydrogen bond patterns in PCR reactions [42].

In fact, many other examples can be quoted in which the reengineering of the interface of molecular interaction has allowed the introduction of novel small molecules into the set of cellular interactions, and these novel molecules could potentially all behave orthogonally [43, 44]. However, the corresponding publications have usually been performed with a rather narrow experimental focus and thus all lack the proper controls to determine whether the novel engineered interaction is indeed orthogonal beyond the immediate scope of the experiment and does not lead to additional, unanticipated interferences with different cellular functions.

(17)

As discussed, a variety of concepts exist how to realize orthogonality in biological system, and from my point of view it is likely that some of these strategies will be successful in the end – in particular, when orthogonality becomes selectable, as suggested in the example of in vivo introduction of unnatural amino acids. This particular experimental system has also been shown to be rather robust, as many different unnatural amino acids have been inserted into proteins following this strategy [23]. But even here it is clear that the system is not fully orthogonal – most importantly, it would be necessary to prune the genome of the exploited cells of all other uses of the codon that is used to encode the novel amino acid. It is also clear that many interactions might simply remain undetected, because we do not bother to look or simply would not know how to look. These hidden interactions might or might not turn out to be important in the long run, when an existing orthogonal system is made part of more complex design. After all, it is completely unclear to what degree of completeness we have to insist on orthogonality for the various schemes of cellular reorganization. In summary, the field is only at its beginning, but at an important beginning. In the words of Sismour and Benner: “Ultimately, Synthetic Biology succeeds or fails as an engineering discipline depending on where the independence approximations become useful in the continuum between the atomic and macroscopic worlds.” [12]

2.2. Evolution and Synthetic Biology

As pointed out above, the long-term implications of evolution have not been discussed in the relevant literature so far, even though their implications have already been visible (see the discussion of the partial re-design of phage genomes above [27]). Clearly, for example orthogonality might make cellular behavior more predictable and easier to manipulate rationally, but in many instances this will be connected to expanding the amount of genomic information. With this, I mean that in order to achieve the same functionality, a thoroughly orthogonal system might need substantially more DNA to encode the information and more proteins, enzymes, and small molecule to implement the functionality. Obviously, these are bad starting conditions when competing for a limited pool of resources, and left to their own, it is probably safe to argue that from an evolutionary point of view synthetic biological systems are poised to loose out.

How to deal with this evolutionary pressure? Two layers of action can be envisioned in theory: a) interfering with the evolutionary machinery itself: and

b) making control and repair of biological systems easier.

Point a) refers to manipulations of the enzymes involved in replicating and repairing DNA in cells. Even though it might appear unlikely at the moment, it might be possible to improve these molecules in terms of accurate DNA propagation. Alternatively, all strategies that enable robust and conditional synchronization of replication would contribute to reducing the impact of evolutionary pressure. Point b) refers to technological solutions that first identify the modified section of DNA and

(18)

then provide means to rapidly return to the previous state (essentially, DNA sequencing and DNA synthesis – see below in the technical sections).

Available Synthetic Biology prototypes are so small that they do not drastically interfere with the fitness of the cell, and none of the applications has been so rewarding that long-term experiments regarding phenotype-stability were performed. Only one experiment has addressed this problem to some extent, however in an exceptional experimental context: a population-density regulation was shown to be stably maintained in a chemostat over appr. 50 generations, but the reactor volume (and therefore the population pool to generate critical mutations) was only 16 nL (roughly 6 orders of magnitude less than typical small-scale chemostat experiments), which makes comparison to traditional data difficult [45].

However, as will be discussed below, the ambition of synthetic biologists regarding the scope of their synthetic systems is rapidly changing – synthetic genomes are becoming available [46, 47], and it is easy to predict that evolutionary pressure will become an important point on Synthetic Biology’s agenda.

2.3. The technology to implement synthetic systems

As much as a conceptual problem, the progress of Synthetic Biology is a technological problem. The extent of changes in cellular functions that are required to implement Synthetic Biology is of a different order of magnitude than biotechnology has traditionally addressed. Just to name two examples: only to eliminate all amber stop codons from the genome of E. coli so that it could be used to insert unnatural amino acids only at the desired place would require around 300 mutations; and re-synthesizing the genome of Mycoplasma genitalium required the synthesis and assembly of 580 kbp [48].

Such orders of magnitude differences require a substantial leap in technological proficiency, but also in providing “content” to synthesize. First the technological side: Just as the “-omics” technologies have changed the analytical side of biology from single gene to genomes, so does Synthetic Biology need to implement the methods to move from the manipulation of single genes to that of suites of genes and eventually even genomes. Secondly, we need to adapt our measurement tools to the fact that the accurate analysis of dynamic systems will become of crucial importance, recognizing that this will require covering our measurements much better statistically (=more measurements per data point).

There are two broad lines of technological advances that in my view will determine the rate with which Synthetic Biology can advance: [i] the advance in our capacity to synthesize de novo (non-template driven) and error-free large (> 5 kbp) segments of DNA and [ii] the miniaturization and automation of current laboratory protocols for the manipulation and analysis of biological systems.

(19)

2.3.1. DNA synthesis and assembly

2.3.1.1. Towards large-scale, non template-driven DNA synthesis

Biological systems store the instructions they require for maintenance, growth, replication, and differentiation in DNA-molecules. Consequently, the ability to write new DNA is central to efforts in biological systems design. However, so far our ability to manipulate DNA has been rather limited, as will be argued in the following:

Cells need to produce DNA whenever a cell divides, in order to provide each daughter cell with the same set of DNA-encoded information. To do so, cells copy existing DNA-molecules. To be more precise, they produce the complementary version of an existing DNA-strand. This copying process is of exceptional quality – typical error rates in in vivo DNA replication, for example when the bacterium E. coli divides, are on the order of 10-7 to 10-8 (i.e. roughly one base substitution every 5 genome

duplications [49]). The (few) errors made in copying are an important source of the genetic variation that is required for biological system to evolve. Still, DNA is naturally propagated by copying only.

This is also reflected in the laboratory tools that we have developed to synthesize DNA. For example, the polymerase chain reaction (PCR) exploits thermostable DNA-polymerases to duplicate the two strands of an existing template-DNA double strand. Repeating this duplication over and over again allows exponential amplification of the template until enough material for further experiments or analysis is produced. The same is true for example for our current protocols that are used to determine DNA sequences. Irrespective of the specific protocol used, all methods rely on the reconstruction of a second strand of DNA along a template-strand. This reconstruction is then exploited analytically in various ways (for example by inserting nucleotides that cause the extension reaction to stop (Sanger sequencing [50]) or by controlling the availability of the next nucleotide for strand extension and recording any chemical reaction that might or might not have taken place (pyrosequencing [51])), but at the heart of all the processes is the ability to synthesize DNA while concomitantly evaluating the information encoded on the complementary strand.

The process of copying can be adapted to modifications – the conditions in which a PCR reaction takes place can be adjusted so that the error rate of the synthesis enzyme increases (the principle of directed evolution). We can also introduce specific sequence modifications into a DNA template that is supposed to be amplified by selecting adequate, chemically synthesized oligonucleotides to start the PCR process. But in both cases, the introduced modifications are minor compared to the sequence of the original template and in the latter case limited to the sequence of the oligonucleotides (as the rest of the molecule is again produced by copying from the template).

In order to demonstrate how inadequate these procedures are for addressing the task of (re-)designing large sections of chromosomal DNA, let us examine an example, the recruitment of the a cytochrome P450 monooxygenase from Artemisia annua (sweet wormwood), that catalyzes the 3-step oxidation of amorphadiene to artemisinic acid as part of a novel pathway from glucose to artemisinic acid in E. coli and S. cerevisiae [52]. First of all, the codon usage of the novel gene needs

(20)

to be changed from the plant A. annua to that of e.g. the Gram-negative bacterium E. coli. Next, the gene needs to be integrated into the regulatory structure of the new pathway – it might for example be part of an operon requiring tight transcriptional and translational coupling to the genes in front and afterwards. Furthermore, we might want to fine-tune the amount of enzyme available relative to the other pathway-members by influencing e.g. the efficiency of the ribosome binding site, the half-life of the corresponding section of the mRNA (by introducing specific secondary structures, see below) or of the protein itself (by adding specific tags to the protein, also see below). Alternatively, the gene might need to receive its own regulatory structure, including promoter and transcriptional terminator. Finally, in order to allow the rapid insertion of improved variants of the gene into the operon, it might be desirable to have the gene flanked by unique restriction sites, while at the same time internal restriction sites might have to be eliminated.

To go comprehensively through all these modifications, step by step, with the methods described above is so laborious that there is just no way that this can be done for more than a few genes in any given project. To be truly able to modify large stretches of DNA and adapt them to our specific requirements, we need to switch completely to de novo, non template-dependent DNA synthesis methods (Fig. 1). Only this change can give the power to implement comprehensively engineered DNA sequences on a significant scale into novel biological systems.

Even though this switch is desirable and absolutely vital for Synthetic Biology, it is not easy. In order to design DNA sequences de novo, we need to rely on our capabilities to synthesize DNA chemically– without the requirement for a template. In this process, a single stranded DNA molecule is built up nucleotide by nucleotide and the nucleotide sequence is only determined by the sequence of reactions the operator desires (Fig. 2). Such chemically produced stretches of DNA (oligonucleotides or “oligos”) are typically between 20 and 100 bp long. The technology as such is established and is an instrumental requirement for example for PCR reactions. But, as already indicated above, synthetic oligonucleotides are short, and to make up meaningful novel DNA sequences they need to be assembled into larger and larger molecules. The assembly of these short oligonucleotides into (ultimately) DNA sequences of genome size is one of the technologies at the heart of Synthetic Biology.

2.3.1.2. Oligonucleotide synthesis

Before discussing the assembly of ever larger DNA sequences from oligonucleotides, it is important to point out that there is an important reason why oligonucleotides used for assembling longer DNA sequences tend to be not longer than 100 bp: The manufacturing of oligonucleotides is error-prone and the likelihood of sequence errors increases with increasing length (Fig. 3).

Errors are introduced on various levels: [i] On the level of chemical synthesis of the oligonucleotides; [ii] on the level of assembling the oligonucleotides into larger fragments of DNA; and [iii] during storage of fragments in living cells, such as E. coli.

(21)

1. Bioinformatics-supported design of 1’600 oligonucleotides

2. Production of 64 appr. 500 bp “synthons” from oligos by PCA

4. Uracil-DNA glycosylase-supported cloning of PCR fragments in vectors 3. Amplification of synthons by PCR with uracil containing primers

5. Ligation-by-selection-supported assembly of 6 fragments of appr. 5 kb

6. Assembly of 32 kb cluster by assembling 5 kb fragments with the help of unique restrictions sites

A B

Fig. 1: Construction of a 32 kbp polyketide synthase cluster from 40meric chemically synthesized oligonucleotides. A) Single steps. B) Ligation-by-selection-supported cloning. BsaI and BbsI are type II restriction enzymes that have their recognition sequence here outside the relevant gene fragment The selection for a successful cloning step occurs via unique combinations of antibiotic resistances (in this case, for kanamycin (Km) and tetracycline (Tet). PCA: polymerase cycling assembly. Data taken from [53].

In order to understand the sources of errors in oligonucleotide production, it might be helpful to repeat briefly the fundamental steps in oligo-synthesis. The corresponding chemistry is well established. Currently, the bulk of syntheses is carried out by the “classical” phosporamidite protocol as a solid phase synthesis. Briefly, it operates as follows: A first nucleotide with its 5’-OH function protected by a DMT-group is coupled to polystyrene beads as the solid phase. Next, the DMT- group is removed by acid treatment (eg. TCA), generating a free 5’OH-group. Then, the phosporamidite of choice (A in Fig. 2) is added, converted to a reactive intermediate (B) in weakly acidic conditions, and coupled to the free 5’OH (C) to produce a novel phosphite linkage. These reactions take place in THF or DMSO. As the 5’OH of the added nucleotide is still protected, only one nucleotide is added to the growing chain. The 5’OH groups that do not react need to be capped so that they cannot continue to take part in the synthesis process and generate oligonucleotides with deletions. This is achieved by acetylation after treatment with acetic acid and 1-methylimidazole (not shown in Fig. 2). Finally, water and iodine are added to oxidize the phosphite linkage to a phosphodiester linkage. In between steps, the system is conditioned by washing with a suitable solvent. After repeating this sequence of

(22)

steps for the required number of times, the oligonucleotide is finally cleaved off the column and treated with ammonium hydroxide at high temperature to remove all remaining protecting groups.

In order to make this process amenable to miniaturization and to produce many different nucleotide sequences in a confined space, the deprotection of the 5’OH-group was made sensitive to light. By producing suitable masks that direct the light only to certain parts of a solid-phase synthesis array (photolithography), only these parts of the array are prepared for extension in the next round of adding phosphoramidites. This can be achieved by replacing the acid-labile DMT group by the photo-labile -methyl-6-nitropiperonyloxycarbonyl (MeNPoc) protective group. This technology allows the concomitant preparation of several thousand oligonucleotides on one solid support (in hybridization arrays, up to 1 million features per cm2 are possible).

The photolithography approach has been developed further: In order to eliminate the time-consuming and expensive utilization of photolithographic masks, a system with “micro-mirrors” has been developed (Nimblegen, Febit). Here, the light pattern is produced by rapidly adjustable high contrast video displays. The German company Febit claims that with this procedure, they can produce up to 500’000 oligonucleotides per day and chip.

A couple of properties are common to all three technologies: One crucial parameter in the oligo-synthesis is the coupling efficiency, which is a function of the completeness with which deblocking and chain-extension proceed. If in every step 99% of all started oligonucleotide chains are extended (see Fig. 2 from B to C), only 60% of all chains contain all nucleotides after 50 steps. This has considerable implications for the scale at which the synthesis needs to be started. Furthermore, also the capping does not proceed with total efficiency. Taken together, a significant fraction of the molecules on the solid phase has a different length from the intended oligo.

Next to deletions, also chemical modifications play a role: for example, the phosphoramidites that are used for chain extension are not completely pure. Then, oligonucleotide syntheses are prone to depurination (in particular under the acidic conditions during DMT removal), during which an adenine or a guanine can be hydrolysed off the sugar-phosphate backbone, leaving a free hydroxyl group.

These two lines of errors lead to a considerable percentage of wrong oligonucleotides in any given oligonucleotide mixture coming out of a synthesizer. The exact percentage is difficult to estimate and is also be a function of the specific supplier that is used and the implemented quality control criteria. It certainly is a function of the required oligonucleotide length, which should be obvious from the various error sources mentioned above. The various reports in the literature that describe the assembly of larger DNA fragments from oligonucleotides typically use oligonucleotides of around 50 bp [47, 53] as a compromise between the desire to produce long oligonucleotides to facilitate assembly and short oligonucleotides to minimize the number of errors due to chemical synthesis.

(23)

Next to ongoing efforts to improve the synthetic procedures and thus reduce the frequency of error introduction, several methods have been developed to identify errors introduced in synthesis and eliminate the corresponding DNA molecules. They are based on enzymatic and physical methods. The physical methods focus on the exploitation of size differences and the disturbances in the hybridization of error-containing complementary oligonucleotides. Polyacrylamide gel electrophoresis (PAGE) is for example easily sensitive enough to separate oligonucleotides of no more than one nucleotide difference. Therefore, subjecting oligonucleotides to a PAGE-purification can substantially reduce the number of erroneous oligonucleotides at the start of the experiment [47]. The same can be achieved by preparative HPLC, which is a standard technology if high-quality oligonucleotides are required.

Alternatively, hybridization under stringent conditions can be used to identify mismatches. Perfect complementarity between two DNA strands leads to a maximum number of possible hydrogen bonds between the two DNA strands and thus a higher temperature is required to separate the molecules again (“melting temperature”). This has been applied to reduce the error rate with light-directed, chip-based oligonucleotide synthesis [54]. A first chip was used to produce the oligonucleotides for DNA-fragment assembly (“construction oligos”). These oligonucleotides were eventually released from the first chip and then hybridized under stringent conditions to sets of complementary oligonucleotides that had been synthesized on a second chip. Ideally, those oligonucleotides with errors in their sequence should find no identical match in the set of correction oligonucleotides and be lost in a washing step because of the resulting decreased melting temperature. Obviously, this procedure requires that the all construction oligonucleotides are carefully designed before the start to have approximately the same melting temperature. The result was encouraging: the authors reported 1 error in a sequence of 1’400 bp. Though this is still far too high for any large scale synthesis approach, it is by a factor of 3 better than standard solid phase approaches [53].

The enzymatic methods include enzymes that can detect DNA structures that are typical when erroneous DNA molecules hybridize. For example, endonuclease VII of phage T4 can identify apurinic sites in DNA and then restricts both strands of the DNA molecule close to lesion site [55]. This leads to shorter fragments which can again be separated from the correct oligonucleotides (see above). Alternatively, E. coli’s MutHLS proteins can be used to detect mismatches and insertions/deletions in double stranded DNA. If such errors are present, fragments are cleaved at GATC sites [56] and the remaining correct fragments can again be isolated by size selection. This way, the error rate in DNA fragments produced from chemically synthesized oligonucleotides could be reduced by an order of magnitude [56].

Even when the error rate in the produced oligonucleotides can be reduced by (a combination of) the methods mentioned above, the produced larger DNA fragments require sequencing after assembly

(24)

[57] to confirm that the desired sequence has been achieved. Of course, this is a very laborious and time-consuming way of error-correction.

O H O H H H H DMTO Base P N OCE O H O H H H H DMTO Base P N OCE N N N O H O H H H H DMTO Base P O OCE O H O H H H H Base R O H O H H H H DMTO Base P O O O H O H H H H Base R -O

A

spont., 99%

I , H O

2 2

H

+

B

C

D

Fig. 2: Phosphporamidite-procedure for DNA synthesis in 3’- to 5’-direction: Commercially available nucleoside-3’-phosphoramidites (A) are added to a growing oligonucleotide chain and exposed to weak acid. This leads immediately to the formation of the reactive intermediate B. Within 1 min, B has formed a new phosphite linkage with the oligonucleotide chain attached to the solid support (C). C can be oxidized with iodine/water to D. CE: 2-Cyanoethyl; DMT: Dimethoxytrityl. Amine groups of bases are also protected (e.g. benzoyl)

Fig. 3: Dependence of the overall yield (OY) of an oligonucleotide synthesis on the number of steps and the average yield (AY), which gives the percentage of extended oligonucleotide chains per step.

(25)

2.3.1.3. Oligonucleotide assembly

Next, the oligonucleotides have to be assembled into larger DNA-fragments, usually to a size of around 500 bp. This is typically achieved by one of a variety of enzyme-assisted methods. The corresponding oligonucleotides are mixed, hybridized, and then converted to larger assemblies by polymerase cycling assembly (PCA, Fig. 4). In a PCA reaction, all oligonucleotides that together represent the targeted double stranded DNA fragment are present. By repeated melting and re-hybridization, the oligonucleotides are step-by-step extended into longer sections until a certain population reaches the desired length. Note that this reaction is carried out without terminal oligonucleotides in excess, so it is not an amplification reaction. Rather, every full-length fragment consists of oligonucleotides and their extensions, thereby reducing the chance of introducing errors by polymerase action. Indeed, a detailed study found polymerase action to be a negligible contribution in the overall error rate [53]. Remarkably, the error rate observed in the assembled 500 bp fragments was even lower than that expected from the presumable error frequency associated with the oligonucleotides, suggesting that the error rate in these oligonucleotides was either overestimated or that the PCA reaction contributed to error correction in some unappreciated way. However, extending the number of PCA cycles beyond 25 increased the error rate substantially.

A specific feature of the light-directed synthesis technologies is the rather low amount of oligonucleotide delivered. While a solid phase synthesis can be scaled to the amount of oligonucleotide required, the chip-based technologies can only produce the amount that is allowed by the feature-size on the chip (typically a chip-derived single sequence is made at around 105 to 108

1 5 6 2 5’ 1st extension 1 6 5’ 2nd extension 1 5 6 2 5’ Hybridization 1 6 5’ Hybridization 1 5 6 7 8 2 3 4 5’ 3’ Oligonucleotide design

Fig. 4: Polymerase cycling assembly (PCA): extension of oligonucleotides (40-70 bp) to larger fragments (appr. 600 bp) by repeated melting and polymerase-based extension. Only the first two cycles are shown.

(26)

molecules per feature translating into picomolar concentrations or lower after release into solution [54]). This is typically not enough for the subsequent stages of assembling oligonucleotides, so that a DNA-amplification step needs to be introduced. This amplification step needs to be done on all oligonucleotides at the same time, putting high requirements on the design of the oligonucleotides that need to be amplified. Essentially, the oligonucleotides obtain standard linkers on both ends that serve as hybridization targets for the PCR. However, PCR is not error free itself and thus can contribute to the error rate in the oligonucleotide set. Furthermore, it is unlikely that indeed all oligonucleotides can be faithfully amplified with a comparable efficiency, leading to potentially pronounced imbalances between the amounts of oligonucleotides in a sample (or even the absence of specific oligonucleotides).

After such an amplification step, the assembly of oligonucleotides to larger fragments was performed by a variant of the PCA reaction mentioned above, the polymerase assembly multiplexing (PAM) reaction. It was applied on a pool of oligonucleotides that represented the genes for 21 ribosomal proteins. However, rather than combining these 21 genes into one large DNA fragment immediately, the authors added terminal primers that allowed amplifying only a specific sub-set of oligonucleotides. Repeating this process, they obtained each of the 21 genes, but each from a different reaction. In a second round of PAM reactions, they could then recombine the 21 genes with a novel set of primers into one 14.6 kbp DNA fragment with all the genes.

2.3.1.4. Assembly of DNA fragments

Once the DNA oligonucleotides have been assembled into DNA fragments of still relatively modest lengths (typically around 0.5 kbp), these fragments still need to be assembled to larger fragments or even genomes. The methods that are used for this part are still very traditional, even if they are rather ingeniously applied. Essentially, fragments are combined by traditional cutting and pasting DNA. This can be considerably facilitated by vigorously applying intelligent working routines. For example, the assembly of 5 kbp fragments from 500 bp synthons was achieved by facilitating the cloning steps by smart selection (Fig. 1). Briefly, a synthon was excised together with an antibiotic resistance gene and inserted in a vector that had been prepared by eliminating a different resistance gene while retaining a second resistance gene different from the other two. Then, selection for successful ligation can be made by selecting from the unique combination of resistance genes, effectively reducing the amount of time that has to be invested to identify correct clones. Other methods such as PAM variants have been mentioned above.

Figure

Fig. 1: Construction of a 32 kbp polyketide synthase cluster from 40meric chemically  synthesized oligonucleotides
Fig. 2: Phosphporamidite-procedure for DNA synthesis in 3’- to 5’-direction: Commercially available  nucleoside-3’-phosphoramidites (A) are added to a growing oligonucleotide chain and exposed to weak  acid
Fig. 4: Polymerase cycling assembly (PCA): extension of oligonucleotides (40-70 bp) to larger  fragments (appr
Tab. 1: Milestones in DNA synthesis and assembly
+7

References

Related documents

Excellence Program for improving organizational project management maturity, developed a 10- year strategic plan and conducted a 6-month study to identify the most appropriate

An innovative laser methane sensor which uses the principle of spectral absorption to detect the methane concentration has been demonstrated, showing high accuracy, good stability,

For establishments that reported or imputed occupational employment totals but did not report an employment distribution across the wage intervals, a variation of mean imputation

Goldfish care Planning your aquarium 4-5 Aquarium 6-7 Equipment 8-11 Decorating the aquarium 12-15 Getting started 16-17 Adding fish to the aquarium 18-19 Choosing and

Proprietary Schools are referred to as those classified nonpublic, which sell or offer for sale mostly post- secondary instruction which leads to an occupation..

The study addresses configurations of three strategic choices (prospector vs defender orientation, degree of strategy deliberation and degree of market

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department

This result is partially a consequence of lower confidence when rating the friend and canonical individual as well as smaller mean absolute distances between those two individuals