3. Transfer RNA as a codon replicator
3.2.7. Replication fidelity
The observed rate of erroneous product formation can be attributed to the spontaneous background rate (Fig. 3.9b, dashed line). Reaction “+−+−” proceeded the same as the untemplated control, as it did not contain any strands which could bind next to each other to the template and form a backbone duplex. For reactions “+++−” and “++−−”, tem- plating worked for partial sequences, producing yields between the two reactions.
3.3. Discussion The fact that the reaction with a single defect (i.e. missing strand) had 40 % of the yield of the full reaction (and ca. 16 % for two defects) translates into a per-codon rep- lication fidelity of 1/1.4 = 71 %. To derive a per-base fidelity, the properties of codon duplex 0:0 were compared to duplexes0:0*, where 0* differs from 0 by a fixed number of point mutations. 99 % of all duplexes with 0* containing three point mutations have a ∆G ≥-12.5 kcal/mol compared to∆G 0:0 =-15.4 kcal/mol at 45 °C. In terms of melting temperatures, this translates into values at least 10 °C lower than the original codon duplex 0:0(Fig. 3.10).
Assuming that the replication does not differentiate between codon 0 and any codon 0* with up toK point mutations, the per-codon fidelity qK(N) is given by a cumulative
binomial distribution qK(N) = K X k=0 N k ! qN−k(1−q)k . (3.8)
Here, N is the codon length, andq the per-base fidelity. AssumingK = 2 and using the measured value of q2(15) = 0.71, one finds a per-base fidelity of q = 87.5 %. This is when one only considers the 15 bases of the codon. Including the whole length of the proto-tRNAs of about 83 bases, the per-base fidelity would read 97.8 %.
In fact, above assumption that the reaction cannot distinguish between codon0and0* with up to two mutations (i.e. K = 2) is essentially owing to mutations at the terminal bases. Codons0* with mutations at two internal bases all show similar properties as codons with a total of three mutations, and 99 % have melting temperatures more than 10 °C lower than dimer0:0(Fig. 3.10). Including this refinement, the per-base fidelity reads 92 %.
3.3. Discussion
I present a replication mechanism that is capable of cross-catalytically replicating a succes- sion of short nucleic acid stretches without the need for any ligation chemistry. Instead, nucleic acids are connected via hybridization of complementary domains. Replication is driven by thermal oscillations, does not require other fuel, and does not generate waste products which could interfere with the reaction later on. The reaction is relatively fast, and proceeds within a few thermal oscillations of 20 minutes each. This is comparable to other replicators [51], cross-ligating ribozymes [91], or autocatalytic DNA networks [123]. Codon sequences are replicated with a per-codon fidelity of about 70 %. Replication on a codon basis effectively constitutes a proofreading mechanism for a putative upstream polymerization process [17, 36, 43, 60] that would generate the proto-tRNAs. It rejects sequence snippets above a certain error ratio and thereby increases the effective fidelity of that replication process. The per-codon fidelity can be translated to a per-nucleotide basis, which is estimated to 88 to 92 %. Therefore, the underlying polymerization process could
feature a relatively low fidelity1, as that would only affect the concentration of “correct“
molecules, and thus the velocity of replication, but not its fidelity.
Overall replication fidelity is limited by a spontaneous formation rate, which originates from the interaction of strands not bound to a template but in free solution. At lower concentrations, as one would imagine in an prebiotic setting, this rate would decrease at the expense of an overall slower reaction. To some degree, such a background rate is inherent to hairpin-fuelled DNA or RNA reactions [39, 123].
A similar selection mechanism for nucleic acids is constituted by the highly sequence- specific gelation of DNA [74]. Here, DNA at very high local concentrations forms hydrogels of up to 100 µm in size. The required concentrations are generated by hyperexponential accumulation of the DNA in thermo-gravitational pores [64]. The selectivity of the process is caused by the structure of the hydrogel, which consists of a network of short oligomers, connected by base pairing of complementary domains. Already the mutation of single nucleotides is sufficient to prevent gelation. In fact, such a phenomenon could serve as a pre-selection mechanism for hairpin-driven replication mechanisms, as it promotes self- complementarity in nucleotide sequences and thereby selects for hairpins. A process with similar selection properties is the biased hydrolysis of nucleotide backbones [77], where double-stranded regions less likely to be cleaved than single-stranded domains.
Thermal oscillations like those discussed here are typical for laminar convection in thermal gradients [11]. Depending on the envisioned environment, the mechanism could also be driven by thermochemical oscillations [6] or oscillations in pH. In the latter case, denatur- ation of codon duplexes would be due to alkaline pH instead of high temperature.
The presented nucleic acids may appear to be rather large, as only 18 % of the nucleotides actually encode for information. However, modern ribosomes can make up more than 15 % of the mass of a cell [34], such that the total required mass need not be a major concern. Moreover, the reversibility of the bonds between the proto-tRNAs makes the strands re- usable, which further reduces the cost entailed by the non-coding parts.
Nevertheless, the replication mechanism would also work with shorter strands. For this study, the length of the strands was inspired by the size of modern tRNAs. Individual hair- pins as well as codon and backbone duplexes had relatively high melting temperatures and slow kinetics, which eased experimental handling. Smaller strands would be equally feas- ible, as long as the order of the melting temperatures of codon and backbone duplexes is preserved. For smaller strands, the requirements on the underlying polymerization process are also somewhat lower, as sequence space shrinks exponentially with decreasing molecule size. Moreover, binding of shorter codon duplexes would discriminate even single mis- matches, resulting in an increased selectivity of the proofreading mechanism. For strands
1The polymerization process could not be completely random, as the size of sequence space of all 83 nt strands is 1050.
3.4. Materials and methods of length 30, sampling the whole sequence space requires two micromoles of molecules.
The constraints regarding the stability of the backbone duplexes would be lifted by the combination with a proposed non-enzymatic ligation at short overhangs of RNA duplexes [107]. Such overhangs at each strand were present in the sequences used in this study and did not interfere with recognition or binding at the codon domains. In the bound configuration, such a ligation would proceed at the backbone duplexes and join successive strands. Another compatible mechanism would consist of a cleavage reaction at the codon domains [19], which would cut out the backbone duplexes and be followed by a ligation of the codon domains.
Considering the origins of translation, the double-hairpin configuration of the strands could suggest a link towards a simple translation system. Codon domains (containing what is the anticodon in modern tRNAs) are close to the 3’ termini of unbound strands. The formation of short peptide-RNA hybrids [44], combined with specific interactions between amino acids and the codons, could have given rise to a primitive genetic code. The spatial arrangement of strands that is replicated by the presented mechanism would then trans- late into a spatial arrangement of the amino acid or short peptide tails attached to the strands. The next stage would then be the detachment and linking of the tails to form longer peptides and eventually proteins.
Outside the context of translation, mechanisms similar to the one described here could also be relevant as a mutable assembly strategy for larger functional RNA molecules. Hair- pin loops are a ubiquitous secondary structure motif, commonly separated by stretches of unpaired bases. Unlike RNA systems with actual ribozyme activity [76, 115], the presen- ted system is more symmetric. However, as catalytic functionality in RNA can emerge from something as simple as a one-nucleotide bulge in a short duplex [118], the structural reg- ularity is no major roadblock.
On the question about the nature of the first functional RNA, the replication mechan- ism is compatible with the idea that their functionality need not be related to replication itself [111]. By relying on non-enzymatic polymerization of the RNA sequences instead of replicase activity, the assembled RNA complexes could rather have served structural or metabolic purposes. This also holds for potential chemical ligation activity, which is not required for replication, but could prove beneficial and evolve at a later stage.
3.4. Materials and methods
3.4.1. Strand design
DNA double-hairpin sequences were designed using the NUPACK software package [124]. The algorithm calculates free energies, melting temperatures and probabilities of different secondary structure configurations using the nearest neighbour model [97]. The model
A C G T ? A_hp A_hp* B_hp B_hp* ac0 A_L B_L*
structure mono_A0 = D13 U7 U15 D13 U7 structure mono_B1 = D13 U7 U15 D13 U7
structure dimer_A0B1 = D13 U7 U15 D33+ U15 D13 U7 structure ortho_ac_01 = U15 + U15
strand A0 = A_hp A_L A_hp* ac0 B_hp B_L* B_hp* strand B1 = B_hp B_L B_hp* ac1 C_hp C_L* C_hp* domain ac0 = N15
domain ac1 = N15 domain A_L = CGCCTAT domain A_hp = CGCTTAATTCCCG domain B_L = CTTTTCC domain B_hp = CGATGACCGTTCG domain C_L = CAAGCAC domain C_hp = GCGCACACTGTCG mono_A0.seq = A0 dimer_A0B1.seq = A0 B1 ortho_ac_01.seq = ac0 ac1
a b
Figure 3.11. Example NUPACK input. a.Excerpt from the NUPACK input used to generate the sequences of the replicating set of DNA strands, listed in table 3.1. The example defines three “real” secondary structures (monomers0A,0B and dimer0A0B), eight domains, and
two strands. The structureortho_ac_01is defined to ensure the orthogonality of the coding sequences.b.Visualization of secondary structuremono_A0, the0Adouble hairpin molecule.
Labels of the seven domains are indicated next to the domains.
accounts for Watson-Crick and wobble base pairing energies as well as stacking interac- tions between neighbouring base pairs. It contains contributions from mismatched bases, dangling ends, internal and external loops, corrections for salt concentrations (NaCl and MgCl2) and temperature, and other subtleties.
For calculations, a RNA or DNA sequence is represented as apolymer graph[23], where the strand is laid out in a circle and base pairs are represented by lines connecting nodes (bases) on that circle. When several strands are included in the calculation, they are con- catenated in each permutation and analysed separately. For most cases, base pairs must be strictly nested, i.e. the structure must be free of pseudoknots2. The free energy∆G
s of a
secondary structure is given by the sum of the free energies of its constituents:
∆Gs =
X
k
∆Gk. (3.9)
The probability of finding a particular secondary structure is given by
ps = 1 Ze −∆Gs/kBT, Z =X s∈Ω e−∆Gs/kBT. (3.10)
Z is the partition function,Ωthe state space.
The NUPACK design algorithm is controlled by an input script containing parameters
3.4. Materials and methods such as temperature and salt conditions as well as strand and target secondary structure definitions. The latter are given using the so-called DU+ notation, specifying stretches of paired (Dn) and unpaired (Un) bases and backbone nicks (+). The substructure of a duplex immediately follows its definition. Figure 3.11a shows an excerpt of the input used to generate initial sets of sequences.
Target secondary structures were all individual double hairpin monomers and all pairs of consecutive dimer complexes (e.g. 0A0B,0C0B). The sequence of each strand was decom-
posed into seven domains, as illustrated in Fig. 3.11b. In doing so, the complementarity of the 3’ hairpin of one strand (e.g. 0A) to the 5’ hairpin of the next (i.e. 0B) was easily
specified. In addition, the orthogonality of the two anticodon sequences was specified. The strand length of 82–84 nt was chosen to lie in the range usual tRNA strand lengths [105]. From ten generated candidate sequence sets, the most suitable was chosen with regard to optimal homogeneity in the binding energies and the ordering of melting temperatures. Minor outliers in the binding energies were eliminated by manual mutation of some of the bases. More importantly, the predicted hairpin melting temperatures of the initial se- quences were too high. While this would not fundamentally interfere with the replication mechanism, in would unnecessarily slow down templating kinetics. To facilitate sufficient degree of fluctuations in the hairpins at temperatures below the melting temperature of coding bonds, mismatches were introduced into the stem sequences. Finally, I added short 5’ overhangs of four nucleotides in length to each strand. These overhangs can be used to covalently ligate short adapter hairpins to the backbone duplexes. Once ligated, the com- bined strands could be analysed using DNA sequencing or subjected to other denaturing downstream processing.