The sequence-structure-function relationship in RNA circuits

RNA circuits are based on a certain number of basic RNA "parts". With regard to a computational design strategy, these parts can be broadly separated into two groups: those whose function can be predicted from their secondary structure – i.e., which function by base pairing to themselves or their target –, and those whose correct function requires tertiary structure folding. The first group are particularly interesting for automatic design or RNA circuits, as their secondary structure can be predicted from their sequence. If design rules can be devised such that their function can be predicted from their structure, then we should be able to predict their function from their sequence when provided with accurate structural predictions, and therefore design new ones (Kushwaha et al., 2016). This approach has notably been used for thede novodesign of riboregulators (Rodrigo et al., 2012; Green et al., 2014). The second group of RNA parts, that depend on specific tertiary structure, are harder 8

1.1. The sequence-structure-function relationship in RNA circuits

to use as tertiary structure cannot be predicted accurately based on our current knowledge. They nonetheless often contain secondary structure and some attempts can be made at incorporating within computational design pipelines, by ensuring that their predicted secondary structure is not disrupted (Shen et al., 2015), or by incorporating additional constraints into the folding calculation (Borujeni et al., 2015).

1.1.1 RNA design through abstraction of secondary structure

The correct function of riboregulators and other antisense RNAs can often be predicted based on the secondary structure calculations of their target messenger RNA (mRNA) when folded alone, or in presence of a small RNA (sRNA) regulator. This approach is based on the assumption that ribosome binding sites (RBS) are functional if unstructured, but inactive when occluded by secondary structure formation. Other constraints can also be incorporated into riboregulators, such as structural elements that enhance stability, or chaperone binding sites. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are composed of a protein (Cas9) and guide RNAs (gRNAs), and the latter falls into this category. The gRNAs contain a region that binds to DNA through base pairing, and a stem-loop motif that specifically binds the Cas9 protein. This system is described in more detail in sections 1.2.1 and 4.1.3.

Riboregulator and antisense RNA engineering

Riboregulators are a class of RNA based regulators of gene expression. The term is a catchall used to describe small RNAs that interact with an RNA target through base pairing, whether they are found in nature or engineered. The first bacterial riboregulator to be discovered inEscherichia coli (E. coli)was an antisense RNA (asRNA), MicF, that directly pairs with the ribosome binding site of its target, the OmpF mRNA (Andersen et al., 1987). MicF hinders access of the ribosome binding site and inhibits translation, as well as encouraging degradation of the mRNA (Delihas and Forst, 2001). Positive antisense riboregulation was first discovered inS. aureusby Morfeldt et al.

(1995). Expressed alone, thehlagene has a self-repressing 5’ untranslated region (UTR) that blocks access to the ribosome binding site. The authors showed that RNAIII could interact with the 5’ of thehlamRNA, unfolding it and exposing the RBS. RNAIII also codes for thehldgene, so it is an mRNA that also acts as a riboregulator. Both positive and negative riboregulators can be engineered. The first artificial antisense RNAs were based on the MicF RNA, redirected to pair withlpp, ompCand ompAmRNAs (Coleman et al., 1984). The synthetic riboregulators were targeted to the RBS of the mRNAs, allowing downregulation of these genes by manually changing the sRNA binding site. This approach has been shown to work in various bacterial species (Desai and Papoutsakis, 1999; Darsonval et al., 2015) and has been used to regulate natural metabolic pathways to optimise the production of chemicals of in- terest (Yoo et al., 2013; Na et al., 2013; Yang et al., 2015). Positive riboregulators have also been engineered (Isaacs et al., 2004; Rodrigo et al., 2012; Green et al., 2014). In the case of positive regulators, the cis-repressing 5’ UTR of the mRNA also needs to be designed. These circuits were at first derived from natural riboregulator circuits (Isaacs et al., 2004). Following structural design rules from natural regulators, Rodrigo et al. (2012) created completely artificial regulators, showing that computationally encoded design rules could be used to automatically design biological parts. These first design rules focused on a 5’ UTR which folded to occlude the RBS when in the off state. Green et al. (2014) devised a new set of design rules, focusing on blocking the start codon and surrounding region rather than the ribosome binding site. This allowed an improvement of the dynamic range of such circuits.

CRISPR gRNA engineering

CRISPR is a type of adaptive immune response system in prokaryotes (Barrangou et al., 2007). In type II CRISPR systems, the Cas9 protein acts as an RNA guided DNA endonuclease (Jinek et al., 2012), where 20 to 25 bases of RNA direct the nuclease to the target site on the DNA. TheS. pyogenesCRISPR system has the advantage of using 10

1.1. The sequence-structure-function relationship in RNA circuits

a single Cas9 protein which uses two small RNAs – crRNA and tracrRNA – to target and cleave double stranded DNA. The crRNA contains 20 to 25 nt of complementarity to the DNA target, and requires the tracrRNA to mature and bind to the Cas9 protein. The complex then cleaves both strands of the dsDNA target. A single synthetic guide RNA can also be used, by fusing the crRNA and the tracrRNA into one short synthetic guide RNA (sgRNA, or just gRNA). The CRISPR gRNA is formed of three domains: The 20 to 25 bp of targeting region, the Cas9 handle which binds the Cas9 protein, and the S. pyogenesterminator at the RNA 3’. In the crystal structure, the targeting region is single stranded, with the "seed" region maintained in A-form conformation by the Cas9 protein, ready for pairing with its target DNA (Jinek et al., 2014). The Cas9 handle is formed of two stems. The first is where the crRNA and the tracrRNA are paired, and a single sgRNA can be made by linking them with a small loop. A second shorter stem loop is present further in the tracrRNA. The terminator sticks out of the Cas9 protein, with some contact with the protein in the crystal structure. The secondary structure of CRISPR gRNAs has been shown to be important for their proper function, and structured targeting regions can inhibit them (Thyme et al., 2016). CRISPR gRNAs can be incorporated within RNA circuits, by inhibiting them with antisense RNAs (Lee et al., 2016). They can also be combined with riboswitches to create small molecule responsive gRNAs (Liu et al., 2016). Additional elements such as MS2 and PP7 binding regions can also be added onto gRNAs. Zalatan et al. (2015) showed that protein effectors fused to MS2 or PP7 could be targeted to specific DNA sequences this way.

1.1.2 RNA parts which require tertiary structure interaction

RNA aptamers and riboswitches

Aptamers are short nucleic acid sequences that can specifically bind a target molecule, through interactions involving both the nucleotides and the ribose-phosphate back- bone. Natural RNA aptamers that bind small molecules have been identified as part of riboswitches (Nahvi et al., 2002; Serganov and Nudler, 2013), where they serve as RNA

based small molecule sensors. Riboswitches are usually part of RNA 5’ UTRs, and con- trol mRNA translation by a conformational change upon binding, which modifies the accessibility of the RBS. Interestingly, the creation of artificial aptamers by Systematic Evolution of Ligands by Exponential Enrichement (SELEX – Ellington and Szostak, 1990; Tuerk and Gold, 1990) predates the discovery of natural ones. SELEX starts from a random library of DNA or RNA sequences, and selects for new aptamers through rounds of ligand binding in a column, followed by error-prone PCR amplification of the best binders. This technique has allowed the development of hundreds of artificial aptamers (Cruz-Toledo et al., 2012) that target small molecules, proteins or whole cells, leading to the idea that these could be used to engineer artificial riboswitches. However, the steps for transforming an aptamer into an artificial regulator have proven challenging to implement (Berens and Suess, 2015), although recent developments have shown promising results (Borujeni et al., 2015; Liu et al., 2016).

Ribozymes

Catalytically active RNAs were first discovered in the 1980s, revealing their role in complex cellular processes, beyond that of mere carriers of information (Cech et al., 1981; Guerrier-Takada et al., 1983). Many ribozymes cleave RNAs intrans, such as RNase P which processes transfer RNAs (tRNAs), or self-cleave, such as the Hammerhead or Hepatitis D virus (HDV) ribozymes (Serganov and Patel, 2007). Group I introns are ribozymes that autocatalytically remove themselves and ligate exons together (Cech et al., 1981; Banerjee et al., 1993). They all require tertiary interactions between the dif- ferent elements of the ribozyme to be active. However, secondary structure still plays a major role in their folding, and double mutants that preserve secondary structure can often remain active even when single mutants are not (Kobori and Yokobayashi, 2016). Ribozymes can be combined with aptamers, to produce small molecule responsive ribozymes, also called aptazymes (Soukup and Breaker, 1999). They are produced by including an aptamer into one of the ribozyme loops, in such a way that the ribozyme is destabilised and rendered inactive. Upon small molecule binding, the 12

In document Engineering of RNA sensors and actuators in living cells (Page 33-38)