In functional RNA it often the case that there are important sequence and higher order structural features that contribute to overall transcript function. As evidenced by Table 1.2, this line of logic extends to most major classes of eukaryotic RNAs. Specific examples of functional sequence and structure within these classes of RNA will now be provided.
RNA name Sequence Functionality Structure Functionality rRNA Peptidyl Transferase acitivity Ribosome assembly
tRNA Anticodon loop compatibility ’cloverleaf’ structure for inter- action with ribosome
mRNA Conserved Coding region binding motifs and accessible sites for proteins, small RNAs snRNA localization of Sm protein pen-
tamer around Sm site sequence
RNP assembly
snoRNA sense/antisense binding of tar- get RNA next to site to modify
snoRNP assembly
miRNA Sequence complementarity with target site
pre-miRNA processing, target site binding efficiency
piRNA sequence specific targetting of virally derived RNA
Unknown
lncRNA Polypurine tracts found in mul- tiple transcripts [145]
Act in multiple cases as scaffold for formation of RNP
Table 1.2: Example uses of functional sequence and structure in known classes of Eukaryotic RNAs.
1.8.1 Examples of sequence content contributing to function across multiple classes of RNA
It has been long-understood that RNAs contain particular conserved sequence that contribute heavily to their function. Examples of conserved sequence can be found in all described classes of RNA from Table 1.1.
Numerous examples of functional sequence content exist across RNAs that are involved in the production of protein. Ribosomal RNAs, which are likely the most important, conserved
transcripts across both eukaryotes and prokaryotes, are highly relient on conserved sequence for both their proper processing, localization and activity as a scaffold for ribosome assembly and function. For processing, pre-rRNAs contain a UCCCGA sequence element which through binding of the protein nucleolin promotes pre-rRNA maturation [146]. While less dependent in eukaryotes than in prokaryotes on conserved sequence content for the initiation of translation, a link between sequence content in the 3’ end of the 18s rRNA and efficient translation termination in eukaryotes has been identified [147]. The most obvious role of sequence content in rRNA is to promote binding between rRNA and ribosomal proteins in ribosome assembly. Transfer RNA has a particularly obvious dependence on sequence content in the form of its anticodon loop, whose conservation determines whether or not a particular amino acid will be brought matched up with the proper codon in mRNA coding regions. Another critical example of functional sequence content in tRNA is the CCA sequence found at the 3’ end of tRNA (either part of the primary transcript or added during processing), which acts as site of linkage between the tRNA and the corresponding amino acid to be transported to the ribosome [16]. The widest diversity of important sequence motifs for protein coding-involved RNAs can be found in mRNA. Here, sequence motifs that facilitate binding with proteins and various noncoding RNAs confer instructions for how an RNA is to be spliced, sub-cellularly localized, translated, and eventually degraded. Among the most obvious examples are the start and stop codon signals in the coding region (marking the translation start and stop sites in the mRNA, respectively), and the AAUAAA polyadenylation signal found in the 3’ end of mRNAs (promotes binding of RNA polyadenylation machinery to the 3’ end of the RNA, adding a 3’ polyA sequence to confer stability) [16].
Modifying RNAs such as snoRNA and snRNA use conserved sequence content that allows them to precisely access target regions of RNAs for chemical modification or splicing. Given that snoRNAs form snoRNPs that chemically modify specific locations in target RNAs, their ability to strongly bind sites near the modification site in a sequence-dependent manner is very important. The classification of snoRNAs has led to the identification of two different groups of snoRNAs (C/D box and H/ACA box) through the identification of highly conserved sequence elements (RUGAUGA/CUGA and ANANNA/ACA, respectively).
These motifs direct the binding of snoRNPs to their targets [65]. In snRNAs, the Sm site, a conserved sequence motif consisting of AUUUGUGG, is responsible for the recruitment of SM proteins for the formation of the snRNP complex [148]. Additionally, eukaryotic splicing is heavily influenced by basepairing between snRNA (as members of the spliceosome) and pre-mRNA [149].
Regulatory RNAs (both small and large) are both reliant on conserved sequence content for the primary purpose of preserving binding platforms with other biomolecules. Small RNAs such as miRNA and piRNA are both heavily reliant on strong sequence complementarity with their targets for efficient binding. Each miRNA has a network of target transcripts it recognizes through (often imperfect) antisense basepairing, such that alteration of miRNA sequence content (particularly the seed region) has been shown to abolish target preference [77]. In a similar but more divergent manner, piRNA display strong sequence complementarity to retrotransposons and foreign RNA, indicative that they serve the purpose of targeting transcripts of foreign origin [80]. While lncRNAs do not contain the same conserved sequence content for the purpose of target binding affinity, and our understanding of their full function in the genome is still limited, there are examples of conserved sequence motifs found within these longer transcripts that are highly functional. In the human genome there have been 481 so-called UltraConserved Regions (UCRs) that consist of spans of over 200 nucleotides with perfect sequence complementarity between humans, mice and rats [150]. Given that the double stranded nature of these regions in the genome makes for 962 possible transcribed regions that could overlap these UCRs, it was found that 890 of these regions could be classified as transcriptionally active, and that of these only around 41% clearly map to mature mRNA regions [151]. Several lncRNAs are found antisense to particular genes which they have been found to regulate the activity of, and examples where it has been shown that such regulation involves direct binding of target transcripts by lncRNA exist [96].
1.8.2 Examples of higher-order structure contributing to function across mul- tiple classes of RNA
While transcript sequence content (specifically , the presence of conserved primary sequence motifs) are very important determinants of an RNA’s function, higher order
structure that occurs as a result of the conferred sequence content is also key. Like functional sequence content, numerous examples across multiple functional classes of RNA in eukaryotes exist.
RNAs involved in the production of protein use structure to both bring together components of translational machinery and to regulate the fate of protein-coding transcripts. Of all RNA structures, the ”cloverleaf” structure ubiquitously formed by tRNAs is perhaps the most studied and well known. Such a structure serves the purpose of both facilitating the proper interaction with the ribosome and the consistent presentation of the triplet anti-codon sequence. This structure is important enough that cells have entire pathways (of which of snoRNA is an important member) for the purpose of inducing chemical modifications that are thought to further stabilize the structure [65]. As previously mentioned, such modifications are also frequent in rRNA. Like tRNAs, these transcripts are highly structured (in fact they have the most conserved structures for RNAs across species). Such highly ordered structure serves the purpose of providing a platform for the full assembly of the ribosome. As for mRNA, while it is typically understood to be unstructured, the truth is that there are many structural elements currently known to reside in mRNAs that have functional significance. like other RNAs, mRNA is often targeted for binding by proteins or noncoding RNAs. The binding affinity of these biomolecules for target mRNA is known to be dependent on the accessibility of the target site in the RNA’s folded structure [152, 153]. As detailed in Figure 1.7, there are multiple regulatory events that occur during an mRNA’s lifetime that are dependent on these interactions. Examples of some of the more well-known structural motifs that are targeted by these proteins and regulatory RNAs are Selenocysteine Insert Sequence elements, Internal Ribosome Entry Sites (allows for alternate translation initiation for certain mRNAs) and Iron Responsive Elements (a structural element which will be further brought into focus in chapters 2 and 3) [154–156].
Modifying RNAs typically utilize structure to form full RNP complexes in order to carry out their function. Both snoRNAs and snRNAs contain conserved structure that primarily serves this particular purpose. Both C/D box and H/ACA box group snoRNAs contain particular conserved secondary structure motifs that contribute to their function. The C and D box sequence motifs alluded to previously in C/D box snoRNA are brought in close
RISC RISC splicing Subcellular Localization Nuclear Export targeting by miRNA RNA stability / degradation Translation
Figure 1.8: A range of events that mRNA is subject to that are dependent on binding affinity with protein and other RNAs. The binding affinity is highly influenced by RNA structure.
proximity via basepairings between the 5’ end 3’ end of the sequence that are necessary for proper transcript localization in the nucleolus [157]. While the H/ACA box snoRNA has a more intricate conserved structure, here its primary purpose seems to be in optimizing the accessibility of the H and ACA sequence motifs previously mentioned in the secondary structure, as both are located in separate loop regions in the otherwise highly basepaired transcript [158]. This allows for efficient snoRNA basepairing with its target transcript in order for target pseudouridinylation to occur. The snRNAs also have several functional structural elements that allow them to carry out their function. There are several different snRNAs (with different ones making up the major and minor spliceosome) and several (U1 and U2 for example) are known to have conserved stemloops that are involved in acting as a scaffold for RNP assembly [159].
Regulatory RNAs have evolved structure for different purposes, with smaller regulatory RNA having conserved structure for their correct processing and larger regulatory RNAs having structure for this as well as other purposes (such as interacting with other biomolecules in the cell). Smaller regulatory RNAs such as miRNA can often be processed from a significantly longer precursor transcript. In order to be loaded into the RISC complex, the precursor transcript is processed via a protein known as DGCR8, which targets stemloop
structures within the miRNA precursor transcript [160]. These stemloop regions are processed and loaded onto RNAi machinery. Longer regulatory RNAs have been found to frequently utilize structure for a variety of purposes, including formation of RNPs and the basepairing of other transcripts with single stranded regions. In spite of often poor sequence content lncRNAs have shown evidence of being enriched for structure conservation [161]. Several lncRNAs, such as MALAT1, contain regions of highly conserved structure; the level of knowledge per transcript on what function these structures have varies [162].