1
Abstract
Long non-coding RNAs (lncRNAs) are a class of gene transcripts that do not code for a
protein product, but instead interact in the cell. Recent studies have identified thousands of
predicted lncRNAs in the human genome, and evidence increasingly suggests their importance in
development and disease, yet few have a known function. Of these, many lncRNAs affect
epigenetic states by regulating transcription, such as Xist, which silences genes in X chromosome
inactivation. We developed a novel assay, called TETRIS (Transposable Element to Test RNA’s
effect on transcription in cis), to efficiently test the effect of lncRNAs on transcription of
adjacent genes. The TETRIS assay was designed in tandem with a novel computational method
that predicts lncRNA function based on sequence, and allows for streamlined testing of these
predictions. TETRIS uses a transposable vector system to insert a gene cassette into the genome
of cultured mammalian cells. Transfected cells have drug-inducible lncRNA expression and
constitutive downstream luciferase expression, which quantitatively reports changes in local
transcription level. TETRIS results yielded significant evidence for the predicted function in
about 40-50% of tested lncRNAs. The immediate goal of TETRIS is to improve our
understanding of the relationship between lncRNA sequence and function. Ultimately, such
insights could improve our understanding of how lncRNAs function in disease, and reveal new
avenues for therapeutic intervention.
Introduction
Long non-coding RNAs (lncRNAs) are a class of gene transcripts that do not code for a
protein product, but instead interact in the cell. Recent studies have identified over 50,000
2
genes (Iyer et al. 2015). In just a few decades since their discovery, lncRNAs have been
increasingly identified as key regulators of gene expression at all stages, through altering DNA
methylation, histone modifications, and even protein activity (Rinn and Chang 2012). This
impressive range of functional capabilities seems to be achieved most commonly by association
with proteins to modulate or direct their activity.
LncRNAs are a relatively new class of RNA that do not follow the triplet codon structure
of mRNAs, and thus have increased flexibility of sequence as well as 3D structure (Rinn and
Chang 2012). Consequently, standard methods for analyzing sequence have been ineffective in
classifying lncRNAs by function or even homology (Menzel et al. 2009). Only a handful of
lncRNAs have a known function, but misexpression of lncRNAs is increasingly being linked
with disease, such as cancer (Derrien et. al 2012, Lin and Rana 2013). Many lncRNAs are
becoming useful as prognostic markers for various cancers, although it is usually not known if or
how their misregulation is promoting cancer (Rinn and Chang 2012). Without the ability to
quickly characterize lncRNAs by sequence, novel experiments must be designed for each
lncRNA to try and determine their functional role. This missing link between sequence and
function thus represents a major roadblock to progress in this field, which could have a
significant impact on our understanding of disease.
The Calabrese lab is attempting to find this missing link and streamline the process for
categorizing lncRNAs. Two main projects encompass these goals: the development of a
computational method for categorizing lncRNAs by function based on sequence content, and a
consistent experimental approach for testing these predicted functions. We hypothesize that
lncRNA function is not determined by its strict linear sequence so much as by regions of
3
Figure 1. Diagram of the TETRIS assay. The TETRIS vector and rtTA are cotransfected into mammalian cells, and
the transposase is transiently transfected. PiggyBac transposase inserts the cassette via a “cut and paste” mechanism at TTAA sequences. Drug-inducible control of the lncRNA is mediated by the TRE element. In the default state, rtTA cannot bind TRE to promote transcription of the insert. When doxycycline is introduced to the system, it binds the rtTA protein, allowing it to bind the TRE element and promote transcription. The luciferase reporter gene is transcribed in the opposite direction from the lncRNA, preventing read-through effects on luciferase expression. LncRNA effect on transcription is measured by comparing luciferase activity with and without doxycycline.
My research aims at understanding how lncRNAs can mediate transcriptional control in
cis by repressing or activating the expression of nearby genes. My work has focused on the
development of an experimental vector dubbed TETRIS: Transposable Element to Test RNA’s
effect on transcription in cis. This vector uses the PiggyBac transposon system to randomly
insert a gene cassette into the genome at TTAA sites (“PiggyBac…”). This cassette includes a
drug-inducible promoter that controls expression of the inserted lncRNA, a downstream
4
cells, we can compare the average luciferase expression with and without lncRNA expression,
and thus make inferences about how the lncRNA is affecting local transcription (Figure 1).
Figure 2. Dendrogram of mouse lncRNAs grouped according to 6-mer profile. Each row represents a lncRNA and each column represents a 6-mer, with a color scale from blue to yellow to indicate relative abundance. The approximate location of lncRNAs of interest is shown on the left hand side, in colors to represent predicted/known function (red = repressor, green = activator, black = neither). Bolded lncRNAs are known activators/repressors. Data for dendrogram courtesy of Mauro Calabrese.
The Calabrese lab has been developing a novel computational method for grouping
lncRNAs based on sequence. This involves “chopping” the lncRNA sequence in silico into short
4-8 nucleotide fragments, called k-mers, where the k depends on the length (ex. 6-mers).
5
yielding its k-mer profile. These can then be grouped according to similarity using a dendrogram
plot (Figure 2). Since we would predict that similarly grouped lncRNAs with similar k-mer
profiles could function in the same way, we can predict the function of unknown lncRNAs by
comparison to the known lncRNAs nearby. Xist is a well-studied lncRNA known for its role in
gene silencing during X chromosome inactivation. Xist silences an entire chromosome in cis, so
we would predict that lncRNAs grouped with Xist would also be transcriptional repressors
(Gendrel and Heard 2014). Using this computational method, TETRIS can be used to test
whether a lncRNA functions as predicted based on its k-mer profile. If this method can
successfully predict lncRNA function based on k-mer profiling, this would support our
hypothesis that lncRNA function is determined more by accumulation of motifs rather than
linear sequence.
Methods
Isolating lncRNA sequence
Mouse cDNA was prepared from 2 µg total RNA from various tissues (testes, kidney)
using the High Capacity Reverse Transcription Kit (“High Capacity…”). Mouse gDNA was
extracted from E14 cells using standard protocol (Strauss 2001). Spliced lncRNAs were
amplified by PCR using cDNA, and unspliced lncRNAs were amplified using gDNA. PCR
primers were designed based on gene annotations on UCSC Genome Browser (Kent et al. 2002)
as well as RNASeq data (Mauro Calabrese, personal communication). Primers for Gibson
Assembly included sequence homology to the vector at site of insertion (Table 1). PCRs were
6
Table 1. Gibson Assembly Primers with Vector Homology and PCR Target
Name Forward Primer 5’ ! 3’ Reverse Primer 5’ ! 3’
L19RIK GTTTGGTCTAGAGCTAGCGAATTCGAATTTGTTCT
GAGCCGGAGCGAGAGGCGCTTCAGA
CTAGCGATATCGCGGCCGCGGATCCGATTTACAG TTACTTTTTAATTCATTTTATAAATG
E13RIK GTTTGGTCTAGAGCTAGCGAATTCGAATTTAGAG
AAGAACTGGACCGGCCGCCATGTTGG CTAGCGATATCGCGGCCGCGGATCCGATTTACGCAGTATTTTAGCACAGTTCATTT ATAT
ROSA26 GTTTGGTCTAGAGCTAGCGAATTCGAATTTCTAGG
TAGGGGATCGGGACTCTGGCGGGAG
CTAGCGATATCGCGGCCGCGGATCCGATTTGCCA GGCCTTATCTGGAATGGGACATGTGT
HotairM1 GTTTGGTCTAGAGCTAGCGAATTCGAATTTCGGCC
GCTCCCGGAGCTGACTTGGAGCACT CTAGCGATATCGCGGCCGCGGATCCGATTTAACTCTTCCTTTCCCTCCCCCACAC GTTCC
HoxB5os GTTTGGTCTAGAGCTAGCGAATTCGAATTTATGTC
ATAGCGACTTTTGGGGTAGTTTGCT
CTAGCGATATCGCGGCCGCGGATCCGATTTGCCTT GCTTTTATTTCAATTTATTTTTACT
ELDR GTTTGGTCTAGAGCTAGCGAATTCGAATTTAGCG
CCAAGGCTCTCTCTCCCCAGGGGACT
CTAGCGATATCGCGGCCGCGGATCCGATTTTCAT ATGGACATCGCAGGCACAATCCTCAG
Tbc1d22-AS GTTTGGTCTAGAGCTAGCGAATTCGAATTTGTGAC
TCGGCGATTCCGGAAGTCCCGCCTT
CTAGCGATATCGCGGCCGCGGATCCGATTTCAAT ATATTCTACAGAACCCAATATATATA
Vector assembly and transformation
The lncRNA PCR fragments were inserted into SwaI-digested, dephosphorylated TETRIS
vector via Gibson Assembly (Gibson et al. 2009), which relies on homology arms flanking the
insert fragment that overlap with the vector at the insert site. After assembly, vectors were
transformed into J109 competent cells (“Pro-5 alpha, JM109…”) and plated overnight on
antibiotic selective agarose (100 µg/mL ampicillin). Colonies were picked and cultured overnight
in Luria Broth for DNA prep (“GenCatch Plasmid…”). Inserts were confirmed via diagnostic
digestion and sequencing.
Cell culture and transfection
Neuro2A cells were incubated on 10 cm plates with Alpha MEM growth medium
containing 10% FBS and 1X penicillin & streptomycin (“Neuro-2a…”). E14 cells were
maintained in feeder free culture on 10 cm plates coated with 0.1% gelatin in DMEM high
glucose plus sodium pyruvate, 0.1 mM non-essential amino acids, 0.1 mM β-mercaptoethanol,
1X penicillin & streptomycin, 2 mM L-glutamine, 1,000 U/mL LIF, and 15% ESQ FBS
7
TETRIS vectors were transfected into cells using Thermo Fischer Scientific
Lipofectamine 3000 reagent (“Lipofectamine 3000…”). Into 8 x 105 cells 0.5 µg of the TETRIS
and rtTA vectors and 1 µg of the PiggyBac transposase vector were transfected. After 48 hrs the
cells were introduced to antibiotic-selective (2 µg/mL puromycin and 200 µg/mL G418) media.
Luciferase Assay
Stable cell lines were split to 1 x 105 cells per well in a 24-well plate, plating in triplicate
for treatments without doxycycline (0 hr induction) and with (24 and 48 hr inductions). After the
allotted incubation time, cells were washed with PBS, lysed (“Passive Lysis Buffer”), and spun
down 5 mins at 13,200 rpm. 10 µl of lysate supernatant was added to 25 µl BrightGlo reagent
(“Bright-Glo Luciferase…”) in an opaque white 96-well plate and fluorescence was measured
using the PHERAstar FS microplate reader. Cell lysates were also measured for total protein
concentration using the Bradford method (Bradford 1976). Raw luminescence signal was divided
by protein concentration to correct for variation in cell number per well.
Statistics
P-values for the normalized luminescence signals were determined using a two-tailed
Student’s t-test.
Results
Initial testing of the TETRIS system was performed with lncRNAs that have a known
function in order to confirm that the assay could detect transcriptional regulation of luciferase.
The Enhanced Green Fluorescent Protein (EGFP) gene was inserted into TETRIS in order to
visualize the drug-inducible switch and as a negative control, since EGFP does not regulate
8
to be important for its repressive function (Gendrel and Heard 2014). Two fragments of Xist, 5.5
kb and 1.8 kb long starting at the 5’ end, were tested along with EGFP in a luciferase assay
(Figure 3). Both fragments of Xist repressed as expected, although EGFP induction caused a
slight repression as well.
Figure 3. Preliminary testing of TETRIS assay, data courtesy of Kaoru Inoue. EGFP is a negative control; fragments
of the transcriptional repressor Xist are a positive control. Assay was performed with E14 cells. Asterisks indicate p-value from a Student’s t-test with the 0 hr induction. Error bars represent ± 1 STD.
After preliminary testing, the TETRIS assay was used to assess the function of seven
lncRNAs predicted to activate or repress. Luciferase assays were first performed in mouse
Neuro2A cells, which are easy to transfect and maintain in culture (Figure 4). Three of the seven
lncRNAs matched the predicted function, while the others were either inconclusive or disagreed
9
Figure 4. Results of the luciferase assay for seven mouse lncRNAs, tested in Neuro2As. Cells were cultured in triplicate at three
conditions: uninduced, induced 24 hrs, and induced 48 hrs. Cells were then lysed and subjected to the luciferase assay. Raw luminescence signal was normalized by protein concentration and set relative to the uninduced cells on a 100-point scale. Predicted functions of the lncRNAs are shown in the table, left. Asterisks indicate a Student’s t-test with the 0 hr control. Error bars represent ± 1 STD.
Prediction testing was then repeated in the mouse embryonic stem cell line E14 (Figure
5). Four of the predicted activators matched, as compared to one in Neuro2As. Neither of the two
predicted repressors yielded strong results in E14s, however. Raw luminescence signal, and
therefore luciferase expression, was on average 10X higher in E14 cells compared to Neuro2As,
10
Figure 5. Prediction testing repeated in E14 cells, data courtesy of David Lee. Predicted functions are the same as described in Figure 4. Asterisks indicate a Student’s t-test with the 0 hr control. Error bars represent ± 1 STD.
To further show proof of concept of the k-mer profiling method, we designed synthetic
RNAs with a targeted function and then tested the sequences in TETRIS. These synthetic RNAs
were designed to match the Repeat-A Region of Xist, which is located in the 1.8 kb 5’ end of Xist
and known to be sufficient for transcriptional repression (Minks et al. 2013). The sequences were
made by generating many random sequences in silico of the same length and base pair
composition (%GC nucleotides), and then grouping them according to the k-mer computational
method. Repeat-A Random 1 (RAR1) and 2 were selected by having strong similarity to the
Repeat-A Region according to k-mer profile, and therefore are predicted repressors. RAR3 was
selected for having a dissimilar k-mer profile to the Repeat-A Region, and was predicted to not
repress transcription. These synthetic sequences were ordered from a DNA oligo synthesis
company and tested in TETRIS in Neuro2As (Figure 6) and E14s (Figure 7). RAR2 showed
significant repression in both cell lines, but RAR1 only repressed in E14s. RAR3 showed a
11
Figure 6. Synthetic RNA testing in Neuro2A cells. Asterisks indicate a Student’s t-test with the 0 hr control. Error bars represent ± 1 STD.
Figure 7. Synthetic RNA testing in E14 cells, data courtesy of KI. Cells were plated in duplicate instead of triplicate for this assay. Asterisks indicate a Student’s t-test with the 0 hr control. Error bars represent ± 1 STD.
Discussion
The TETRIS system was successful at distinguishing effects on luciferase expression that
12
Neuro2A cells, three induced significant changes in luciferase expression that matched the
predicted function: L19RIK, Rosa26, and E13RIK. Another three, HotairM1, Hoxb5os, and
ELDR, yielded inconclusive results, in which the 24 and 48 hr inductions disagreed or the results
were not significant. The final lncRNA, Tbc1d22-AS, induced significant repression when it was
predicted to activate. The results of testing the same vectors in mouse E14 cells yielded weaker
evidence of repression in L19RIK and E13RIK, but significant activation in the predicted
activators. Again, Tbcd22-AS did not match its predicted function. Although there appears to be
some discrepancy between cell types, results generally support the predicted functions or are
inconclusive, and only Tbc1d22-AS had strong results that defied the prediction.
The three synthetic RNAs had a better success rate overall. In Neuro2As, RAR2 repressed
as predicted, but RAR1 results were inconclusive. Both RAR1 and RAR2 showed significant
repression in E14s. RAR3 showed some repression in both cell types, though it was not strongly
predicted to repress. These results further support the hypothesis that motif abundance
contributes towards lncRNA function, as we successfully designed synthetic RNAs that repress
in cis based on their similarity to a known transcriptional repressor, Xist. Overall, TETRIS was
successful in providing evidence of function for some lncRNAs, but not all matched the
predictions.
However, it is no surprise that this assay would not work for all lncRNAs. The TETRIS
system can only test for transcriptional effects in cis, but lncRNAs are also known to act in trans
on other chromosomal locations. Even two lncRNAs that repress transcription in cis could act
via different mechanisms—perhaps one requires sequence specificity and is unable to target
luciferase. The TETRIS system can only test for a specific subset of lncRNAs that regulate
13
Xist meets these criteria, being able to silence an autosomal chromosome when inserted, we
expect that other lncRNAs can function similarly (Herzing et al. 1997).
LncRNAs can demonstrate exquisite tissue specificity, so there is the possibility that cell
type context could have significant impact on lncRNA function (Cabili et al. 2011). If lncRNA
function depended on the presence or absence of cellular components that vary between cell
types, then it could be compromised depending on what cell line the assay was performed in.
There is possibly some evidence of this in the discrepancies between TETRIS results in
Neuro2As and E14s, suggesting that inconclusive results cannot tell us that a lncRNA is
nonfunctional.
Although still in early stages, the development of this computational method and TETRIS
could profoundly impact our understanding of lncRNAs. Viewing lncRNAs as a collection of
enriched motifs instead of a strict linear sequence could be the missing link between sequence
and function. K-mer analysis can be adapted to address questions of lncRNA evolution, and
better detect homologues to connect findings across species. We are already using this method to
design synthetic RNAs with desired functions, which further down the road could have major
implications in developing lncRNA therapeutics to target misexpressed genes.
Acknowledgments
Many thanks to the Calabrese lab, including Dr. Mauro Calabrese and Dr. Kaoru Inoue for their
mentorship. Thanks to MC, KI, and David Lee for providing data. Funding was provided by the
14
References
Bradford, M M. 1976. “A Rapid and Sensitive Method for the Quantitation of Microgram Quantities of Protein Utilizing the Principle of Protein-Dye Binding.” Analytical Biochemistry 72 (May): 248–54. http://www.ncbi.nlm.nih.gov/pubmed/942051.
“Bright-Glo™ Luciferase Assay System Technical Manual” Promega. Accessed November 24, 2015. https://www.promega.com/resources/protocols/technical-manuals/0/bright-glo-luciferase-assay-system-protocol/
Cabili, Moran N, Cole Trapnell, Loyal Goff, Magdalena Koziol, Barbara Tazon-Vega, Aviv Regev, and John L Rinn. 2011. “Integrative Annotation of Human Large Intergenic Noncoding RNAs Reveals Global Properties and Specific Subclasses.” Genes & Development 25 (18): 1915–27. doi:10.1101/gad.17446611.
Derrien, Thomas, Rory Johnson, Giovanni Bussotti, Andrea Tanzer, Sarah Djebali, Hagen Tilgner, Gregory Guernec, et al. 2012. “The GENCODE v7 Catalog of Human Long Noncoding RNAs: Analysis of Their Gene Structure, Evolution, and Expression.” Genome Research 22: 1775–89. doi:10.1101/gr.132159.111.
“ES-E14TG2a (ATCC® CRL-1821™).” ATCC. Accessed March 20, 2016.
http://www.atcc.org/products/all/CRL-1821.aspx
“GenCatch™ Gel Extraction Kit.” Epoch Biolabs, Inc. Accessed November 24, 2015.
http://www.epochbiolabs.com/gelextraction.asp?pageName=products
“GenCatch™ Plasmid DNA Mini-prep Kit.” Epoch Biolabs, Inc. Accessed November 24, 2015.
http://www.epochbiolabs.com/dna_mini.asp?pageName=products
Gendrel, Anne-Valerie, and Edith Heard. 2014. “Noncoding RNAs and Epigenetic Mechanisms During X-Chromosome Inactivation.” Annual Review of Cell and Developmental Biology 30 (1). Annual Reviews: 561–80. doi:10.1146/annurev-cellbio-101512-122415.
Gibson, Daniel G, Lei Young, Ray-Yuan Chuang, J Craig Venter, Clyde A Hutchison, and Hamilton O Smith. 2009. “Enzymatic Assembly of DNA Molecules up to Several Hundred Kilobases.” Nature Methods 6 (5): 343–45. doi:10.1038/nmeth.1318. Herzing, L B, J T Romer, J M Horn, and A Ashworth. 1997. “Xist Has Properties of the
X-Chromosome Inactivation Centre.” Nature 386 (6622): 272–75. doi:10.1038/386272a0. “High-Capacity cDNA Reverse Transcription Kit.” ThermoFischer Scientific. Accessed March
15
Iyer, Matthew K, Yashar S Niknafs, Rohit Malik, Udit Singhal, Anirban Sahu, Yasuyuki Hosono, Terrence R Barrette, et al. 2015. “The Landscape of Long Noncoding RNAs in the Human Transcriptome.” Nature Genetics 47 (3). Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.: 199–208. doi:10.1038/ng.3192. Kent, W James, Charles W Sugnet, Terrence S Furey, Krishna M Roskin, Tom H Pringle, Alan
M Zahler, and David Haussler. 2002. “The Human Genome Browser at UCSC.” Genome Research 12 (6): 996–1006. doi:10.1101/gr.229102. Article published online before print in May 2002.
Lin, Nianwei, and Tariq M Rana. 2013. “Dysregulation of Long Non-Coding RNAs in Human Disease.” In Molecular Biology of Long Non-Coding RNAs, edited by Ahmad M. Khalil and Jeff Coller, 115–36. New York: Springer New York.
http://link.springer.com.libproxy.lib.unc.edu/chapter/10.1007/978-1-4614-8621-3_5/fulltext.html.
“Lipofectamine 3000 Transfection Reagent.” ThermoFischer Scientific. Accessed March 20, 2016. https://www.thermofisher.com/order/catalog/product/L3000008
Menzel, Peter, Jan Gorodkin, and Peter F Stadler. 2009. “The Tedious Task of Finding Homologous Noncoding RNA Genes.” RNA (New York, N.Y.) 15 (12): 2075–82. doi:10.1261/rna.1556009.
Minks, Jakub, Sarah El Baldry, Christine Yang, Allison M Cotton, and Carolyn J Brown. 2013. “XIST-Induced Silencing of Flanking Genes Is Achieved by Additive Action of Repeat a Monomers in Human Somatic Cells.” Epigenetics & Chromatin 6 (1): 23.
doi:10.1186/1756-8935-6-23.
"Neuro-2a (ATCC® CCL-131™)." ATCC. Accessed November 24, 2015.
http://www.atcc.org/products/all/CCL-131.aspx#generalinformation
“Passive Lysis Buffer.” Promega. Accessed March 20, 2016.
https://www.promega.com/resources/msds/msdss/e1000/e1941/
“PiggyBac Transposon System.” System Biosciences, Inc. Accessed November 24, 2015.
http://www.systembio.com/piggybac-transposon/overview
“Pro 5-alpha, JM109 and HB101 Competent Cells.” Promega. Accessed November 24, 2015.
https://www.promega.com/products/cloning-and-dna-markers/cloning-tools-and-16
competent-cells/bacterial-strains-and-competent-cells/pro-5_alpha_-jm109-and-hb101-competent-cells/
Rinn, John L, and Howard Y Chang. 2012. “Genome Regulation by Long Noncoding RNAs.” Annual Review of Biochemistry 81 (January). Annual Reviews: 145–66.
doi:10.1146/annurev-biochem-051410-092902.