• No results found

3.6.1 RNA-seq experiments Microarray data mirna seed enrichment analysis Analyses of GO terms

N/A
N/A
Protected

Academic year: 2021

Share "3.6.1 RNA-seq experiments Microarray data mirna seed enrichment analysis Analyses of GO terms"

Copied!
141
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)
(3)

Table of Contents

Abstract

... 9

Resumen

... 10

1

Introduction ... 11

1.1

Small non-coding RNA ... 11

1.2

Biogenesis and gene expression regulation of miRNAs in

animals and plants ... 13

1.3

Evolution of animal and plant miRNAs ... 16

1.4

Methods to predict potential miRNA targets ... 17

1.4.1 Experimental methods to predict miRNA targets ... 18

1.4.2 Computational methods to predict miRNA targets ... 19

1.5

Evolution of miRNAs and their targets in animals ... 23

1.6

Evolution of gene expression in animals ... 27

1.7

miRNAs in the nervous system ... 29

1.8

miR-124 ... 33

1.9

What is the function of miRNAs in animals? ... 37

2

Hypothesis and objectives ... ¡Error! Marcador no definido.

2.1

Hypothesis ... ¡Error! Marcador no definido.

2.2

Objectives ... ¡Error! Marcador no definido.

General Objective ... ¡Error! Marcador no definido.

Specific Objectives ... ¡Error! Marcador no definido.

3

Material and methods ... 38

3.1 Sequences and related information ... 39

3.2 Target prediction for miR-124 ... 39

3.2.1 Potential miR-124 targets ... 39

3.2.2 Shared targets between human and mouse ... 40

3.2.3 Homolog genes between human and fly ... 41

3.3 Experimental expression data ... 41

3.3.1 Human Illumina Body Map (EBI-EMBL RNAseq dataset) ... 41

3.3.2 Xenopus tropicalis tissue atlas ... 41

3.3.3 Microarray experiments ... 42

3.4 Expression comparison according to 3’ UTR length and

Context++ Score ... 48

3.5 Retrieved GO terms ... 48

3.6 Strategy to find common functions ... 49

(4)

3.6.1 RNA-seq experiments ... 50

3.6.2 Microarray data ... 53

3.7 miRNA seed enrichment analysis ... 55

3.8 Analyses of GO terms ... 59

3.8.1 GO analyses of miR-124 predicted targets of human and

mouse ... 59

3.8.2 GO analyses of human and mouse microarray experiments 60

3.8.4 GO term analysis of miR-124 species-specific and shared

functional targets ... 60

3.8.5 Enrichment analysis of up and down regulated set of genes

in human and mouse data ... 61

3.8.6 Analyses of X. tropicalis tissue atlas ... 62

4

Results ... 63

4.1

Description of 3’ UTRs ... 63

4.1.1 Genes with longer 3’ UTRs have higher expression in brain 63

4.1.2 Not all miR-124 predicted targets are functional targets ... 64

4.2

Differential expression and seed enrichment analyses ... 67

4.2.1 Effect of miR-124 on gene expression is detected in tissue

atlas experiments ... 67

4.2.2 Direct and indirect effect of miR-124 on gene expression ... 68

4.2.3 Common downregulated genes are miR-124 predicted

targets ... 74

4.3 Functions regulated by miR-124 ... 84

4.3.1 GO analysis of predicted miR-124 targets ... 84

4.3.2 Cellular processes are regulated by species specific and

shared targets ... 88

4.3.3 miR-124 direct effect provokes neuron-like phenotype ... 89

4.4 Direct and indirect regulation of cellular processes by miR-124 91

4.4.1 Direct effect of miR-124 ... 91

4.4.2 Indirect effect of miR-124 ... 94

5

Discussion ... 100

5.1

The evolutionary relationship of 3’UTRs and miRNAs ... 100

5.2

Effect of miR-124 in gene expression ... 102

5.2.1 Direct and indirect effect of miR-124 in human and mouse

... 103

5.3

Function of miR-124 ... 106

(5)

5.3.2 Direct effects of miR-124 are enriched in tissue atlas

experiments ... 108

5.4

Common functions regulated by miR-124 ... 109

6

Conclusions ... 113

7

Perspectives ... 115

7.1

Find functions regulated by miR-124 ... 115

7.2

Find if non-conserved targets regulate the same functions as

conserved targets ... 115

8

Bibliography ... 116

Appendices ... 128

Appendix 1: Context ++ Score parameters ... 128

Appendix 2: TargetScan 7 parameters for Xenopus tropicalis miR-124

predicted targets ... 129

Appendix 3: GO term enrichment methods ... 132

3.1 GO term analyses with Camera and Fry ... 134

(6)

List of Figures

Fig  1.  Biogenesis  of  sncRNAs  

  pp-­‐12  

Fig  2.  miRNA  biogenesis  in  plants  and  animals  

  pp.  15  

Fig  3.  miRNA-­‐mRNA  target  complementarity  in  animals  and  plants  

  pp.  17  

Fig  4.  TargetScan  different  types  of  seed  match  

  pp.  21  

Fig.  5  Metazoan  phylogeny  

  pp.  23  

Fig  6.    Plot  of  –log10(p-­‐value)  values  of  hypergeometric  tests  of  seed  sequences  of  example  3’  

UTRs,  over  and  underrepresentation  of  seed  sequences    

  pp.  56  

Fig  7.    Human  Body  Map  expression  comparison  according  to  3’  UTR  length  cut  off.  

  pp.  62  

Fig  8.    Human  Body  Map  Log  FC  comparison  between  miR-­‐124  predicted  targets  and  non-­‐ target  genes  

 

pp.  63   Fig  9.    Human  Body  Map  Log  FC  comparison  between  miR-­‐124  predicted  targets  and  non-­‐

target  genes  according  to  a  Context  ++  Score  cut  off  value  

  pp.  64  

Fig  10.  Sylamer  plot  of  differential  expression  analysis  of  Illumina  Human  Body  Map  

  pp-­‐66  

Fig  11.  Sylamer  analysis  of  overexpression  experiments  in  human  and  mouse  

  pp.  69  

Fig  12.  Sylamer  plot  of  human  tissue  atlas  

  pp.  70  

Fig  13.  Sylamer  plot  of  mouse  tissue  atlas  

  pp.  71  

Fig  14.  Venn  Diagram  of  downregulated  shared  genes  between  human  experiments  

  pp.  72  

Fig  15.  Downregulated  shared  predicted  target  genes  between  human  experiments    

  pp-­‐73  

Fig  16.  Downregulated  shared  genes  between  mouse  experiments  

  pp.  75  

Fig  17.  Downregulated  shared  miR-­‐124  predicted  targets  between  mouse  experiments    

  pp.  75  

Fig  18.  Sylamer  plot  of  X.  tropicalis  tissue  atlas  experiment  

  pp.  79  

Fig  19.  Sylamer  plot  of  D.  melanogaster  tissue  atlas  experiment  

  pp.  80  

Fig  20.  Heatmaps  of  enrichment  values  of  defined  up-­‐  and  down-­‐regulated  gene  sets  of   human  and  mouse  experiments  

 

pp.  86   Fig  21.  Comparison  of  Camera  enriched  GO  terms  between  human  and  mouse  

overexpression  experiments  with  FDR  <=  0.01  in  at  least  one  of  the  experiments.  

  pp.  89  

Fig  22.  Comparison  of  Camera  enriched  GO  terms  between  human  and  mouse  tissue  atlas   experiments  with  FDR  <=  0.01  in  at  least  one  of  the  experiments  

 

pp.  91   Fig  23.  Comparison  of  enriched  common  GO  terms  of  human,  mouse  and  frog  overexpression  

(7)

 

Fig  24.  Comparison  of  enriched  common  GO  terms  of  human,  mouse  and  frog  experiments  

  pp.  94  

Fig.  25  Boxplot  of  logRPKMs  values  of  functional,  non  functional  miR-­‐124  predicted  targets   and  non  targets  genes  in  human  brain  

  pp-­‐96  

Fig  S1.  Sylamer  plot  of  miR-­‐124  overexpression  in  glioblastoma  cell  line  

  pp.  129  

Fig  S2.  Camera  and  fry  results  of  control  up  and  downregulated  sets  of  each  human   microarray  experiments  

  pp.  130  

Fig  S3.  Comparison  of  significance  values  between  Camera  and  GOseq  analyses  

  pp.  132  

List of Tables

Table  1.  Number  of  predicted  miR-­‐124  targets  according  to  TargetScan  7.0  in  human,  mouse  

and  frog  and  TargetScan  4.1  for  fly   pp.  39   Table  2.  Human  and  mouse  microarray  platforms  used  to  find  experiments  related  to  miR-­‐124  

function   pp.  41  

Table  3.  Selected  experimental  data  for  human  and  mouse  overexpression  experiments.     pp-­‐43   Table  4.  Tissue  samples  used  as  a  human  tissue  atlas  experiment   pp.  45   Table  5.  Data  of  human,  mouse  and  fly  microarray  datasets  used   pp.  52   Table  6.  Number  of  seed  and  non-­‐seed  words  in  3’  UTRs  as  a  Sylamer  example   pp.  54   Table  7.  P-­‐values  of  obtaining  more  than  expected  x  number  of  seed  sequences  from  Table  6   pp.  55   Table  8.  P-­‐values  of  obtaining  less  than  expected  x  number  of  seed  sequences  from  Table  6   pp.  55   Table  9.  Differentially  expressed  genes  of  human  microarray  experiments   pp.  67   Table  10  .  Differentially  expressed  genes  of  mouse  microarray  experiments   pp.  68   Table  11.  Shared  downregulated  miR-­‐124  predicted  targets  in  human  experiments  pp  74   pp.  74   Table  12.  Common  downregulated  predicted  targets  in  mouse  experiments   pp.  76   Table  13.  Differentially  expressed  genes  of  tissue  atlas  of  tissue  atlas  of  X.  tropicalis  and  D.  

melanogaster   pp.  78  

Table  14.  Common  GO  terms  of  miR-­‐124  functional  targets  of  human  and  mouse     pp.  82   Table  15.  Comparison  of  enrichment  values  of  GO  term  analysis  of  miR-­‐124  predicted  targets  

and  tissue  atlas  experiment  of  human  data       pp.  83   Table  16.  Comparison  of  enrichment  values  of  GO  term  analysis  of  miR-­‐124  predicted  targets  

(8)

Table  S1.  Number  of  significantly  enriched  GO  terms  in  human  and  mouse  tissue  atlas  

experiments  using  GOseq  and  Camera  with  and  FDR  cut  off  <=  0.1         pp.  131  

List of Abbreviations and Symbols

B.  Sc   Bachelor  of  Science  

RNA   ribonucleic  acid  

ncRNA   non  coding  RNA  

sncRNA   small  non  coding  RNA   lncRNA   long  non  coding  RNA  

siRNA   small  interfering  RNA  

piRNA   piwi-­‐interacting  RNA  

snRNA   small  nuclear  RNA  

snoRNA   small  nucleolar  RNA  

dsRNA   double  stranded  RNA  

rRNA   ribosomal  RNA  

AGO   Argonaute  

bp   base  pair  

RISC   RNA-­‐induced  silencing  complex  

nt   nucleotide  

CNS   Central  Nervous  System  

ChIP   Chromatin  immunoprecipitation  

GEO   Gene  Expression  Omnibus  database  

GSE   Gene  Expression  Series  

GO   Gene  Ontology  

FDR   False  Discovery  Rate  

(9)

Abstract  

 

MicroRNAs (miRNAs) are small non coding RNAs (sncRNAs) of approximately 22 nucleotides, that are post-transcriptional regulators. Animal miRNAs recognize their targets by binding a ~7 nucleotide sequence (seed sequence) to the 3’ UTR of the target mRNA. Because of the shortness of the recognition sequence, animal miRNAs potentially regulate thousands of genes.

Evolution of miRNAs in animals is associated with an increase of morphological complexity, as there is an increase of miRNA families correlating with the evolution of cell types and tissues. There are 34 miRNA families that are conserved in all bilaterians. These miRNAs are conserved both in sequence and in their tissue-specific expression patterns, although most of their predicted targets are not. One of these miRNAs is miR-124 which is expressed in nervous tissue and is involved in neuronal differentiation. Targets of miR-124 are involved in maintaining non neuronal gene profiles. Because of the lack of conservation of predicted targets, although the sequence and the expression patterns are conserved, I propose that the conserved function of a miRNA is to maintain regulation of specific cellular processes, even though individual targets can change. Since miR-124 is expressed exclusively in the nervous system, I expect it to maintain regulation of processes important for this system, across different species. To test this hypothesis I analyzed functional genomics experiments of human, mouse, frog and fly where miR-124 was enriched (miR-124 overexpression experiments, for human and mouse; and tissue atlas experiments for all organisms).

To find common functions regulated by miR-124 I performed GO term analysis of predicted miR-124 targets and of functional genomics experiments to find common functions regulated by miR-124. I compared the GO term results of functional genomics experiments and targets analysis, and between functional genomics experiments and between species. The GO term analysis of predicted functional targets showed regulation of different processes between human and mouse, and these enriched GO terms were not regulated as expected in experiments where miR-124 is expressed. I conclude that the function of miR-124 cannot be studied at the level of predicted targets, because resulted GO term analysis of predicted targets are not shared between organisms nor in the GO term analysis of experimental data where miR-124 is expressed thus, the conservation of miR-124 functions cannot be evaluated using predicted targets.

GO term analysis of overexpression experiments showed downregulated GO terms related to cell cycle, DNA and RNA regulation, which could reflect what happens to a neuronal precursor cell during differentiation. Because of this, I propose that miR-124 is aiding in the transition to acquire a neuron-like phenotype such as in a neuronal precursor cell. Unlike the overexpression experiments, in the tissue atlas experiments, miR-124 is acting in specialized cells and is one of the many gene regulators acting in the brain. Because of this, I conclude that the potential functions found in my analysis of tissues should be further examined. An interesting example to further analyze consists of immune system functions which are downregulated according to my results of tissue panel GO term analysis, with previous reports suggesting that miR-124 may be regulating them.

(10)

Resumen  

 

Los microRNAs (miRNAs) son RNAs pequeños no codificantes de aproximadamente 22 nucleótidos. Los miRNAs en animales reconocen sus genes blancos al aparear una secuencia de ~7 nucleótidos (secuencia semilla) del miRNA a la 3'UTR del mRNA blanco. Debido a que la secuencia semilla es muy corta, los miRNAs en animales pueden regular miles de genes.

La evolución de los miRNAs en animales está asociada con el aumento de la complejidad morfológica, debido a que hay un aumento de familias de miRNAs junto con la evolución de nuevos tipos celulares y tejidos. Hay 34 familias de miRNAs conservadas en los bilaterados a nivel de secuencia y en su patrón de expresión tejido específico. Sin embargo, la mayoría de sus blancos predichos no están conservados. Uno de estos miRNAs es miR-124, el cual se expresa en tejido neuronal y está involucrado en diferenciación neuronal. Debido a la falta de conservación de blancos funcionales, pese a la conservación a nivel de secuencia y de patrón de expresión de los miRNAs, propongo que la función conservada de los miRNAs es mantener la regulación de procesos celulares específicos, aunque los blancos individuales puedan cambiar. Debido a que miR-124 está expresado exclusivamente en el sistema nervioso, espero que mantenga la regulación de procesos importantes para este sistema a través de diferentes especies. Para probar esta hipótesis, analicé experimentos de genómica funcional de humano, ratón, rana y mosca, donde 124 se encuentra altamente expresado (experimentos de sobre-expresión de miR-124 para humano y ratón, y atlas de tejidos para todos los organismos).

Para encontrar funciones comunes reguladas por miR-124, hice análisis de enriquecimiento de términos GO de los blancos funcionales predichos de miR-124 y de los experimentos de genómica funcional. Comparé los resultados del análisis de enriquecimiento de términos GO entre experimentos y entre especies. El análisis de los blancos funcionales predichos mostró que diferentes procesos estaban enriquecidos entre humano y ratón. Además, estos términos no estaban enriquecidos en los análisis de los atlas de tejidos, es decir, experimentos con un contexto biológico donde miR-124 está expresado. De acuerdo a estos resultados concluyo que la función, y la conservación de la misma, de miR-124 no puede ser estudiada al nivel de blancos.

(11)

Introduction

1 Introduction

1.1 Small non-coding RNA

(12)

Introduction

Fig 1. Biogenesis of sncRNAs.

(13)

Introduction

1.2 Biogenesis and gene expression regulation of miRNAs in

animals and plants

Although animal and plant miRNAs shared protein components of their biogenesis pathway such as RNase III enzymes and Argonaute (AGO) proteins, their biogenesis differs.

In animals, Pol II also transcribes the miRNA genes in the nucleus, forming the pri-miRNA of hundreds nt to several kilobases (Denli, Tops, Plasterk, Ketting, & Hannon, 2004; Ha & Kim, 2014). The pri-miRNA has a local stem-loop structure and single-stranded RNA segments at the 5’ and 3’ ends. Afterwards, the nuclear RNase III, Drosha, matures the pri-miRNA by cropping the stem-loop releasing the pre-miRNA, a hairpin-shaped RNA of ~65 nucleotides (Du & Zamore, 2005). The pre-miRNA has a two- or three single-stranded nucleotides overhanging at the 3' end, characteristic of RNase III cleavage of dsRNA. Most of the times the mature miRNA resides in the 5’ arm, but it can also reside in the 3’ arm (Du & Zamore, 2005). The pre-miRNA is exported to the cytoplasm by Exportin-5 and cleaved near the terminal loop by Dicer, producing a small RNA duplex (Bohnsack, Czaplinski, & Görlich, 2004; Hutvagner et al., 2001). From this duplex one of the strands is loaded on to an AGO protein to form the RNA-induced silencing complex (RISC). Gene expression regulation is mainly achieved by mRNA destabilization (Guo, Ingolia, Weissman, & Bartel, 2010).

(14)

Introduction

Manjunath, 2009). The change of strand loaded to AGO (when passenger strand is loaded as guide strand) is known as arm switching and may be due to alternative Drosha processing that changes the 5' terminus of the pre-miRNA, that provokes different double-stranded miRNAs after Dicer processing (H. Wu et al., 2009).

Animal miRNAs recognize their targets by binding the seed sequence to its mRNAs targets and due to the shortness of the seed sequence the same miRNA can regulate thousands of mRNAs (Friedman, Farh, Burge, & Bartel, 2009). Also many miRNAs can target the same mRNA having a combinatorial effect (Bartel, 2009). In animals, most of the down regulation of target genes is by mRNA destabilization, and translational repression affects a 10-25% of gene expression (Eichhorn et al., 2014; Guo et al., 2010; Hendrickson et al., 2009).

(15)

Introduction

Fig 2. miRNA biogenesis in plants and animals.

(16)

Introduction

1.3 Evolution of animal and plant miRNAs

miRNAs are small non-coding RNAs of about ~22 nucleotides present in animals and plants that adjust gene expression by targeting specific genes at their 3’ UTRs that are loaded to AGO proteins to achieve gene regulation (Axtell et al., 2011; Millar & Waterhouse, 2005) (Du & Zamore, 2005; Ha & Kim, 2014). The biogenesis of plant and animal miRNAs has common characteristics (for example, RNase III enzymes and AGO proteins) suggesting a common evolutionary history (Axtell et al., 2011). Although the miRNA machinery is very ancient, plants and animals don’t share common miRNAs (Millar & Waterhouse, 2005). Considering that the last common ancestor of plants and animals was unicellular and that both clades have independently evolved developmental mechanisms, it is comprehensible that current organisms of both clades do not share miRNAs (Meyerowitz, 2002). In the case of animal miRNAs, there are miRNA families conserved in all bilateria but not in cnidarians or poriferans (Pasquinelli et al., 2000). In plants, there are also ancient miRNA families conserved in non vascular plants, ferns, gymnosperms and angiosperms (Sandra K., Floyd JohnL., 2004).

Plant and animal miRNAs differ in regulation of their targets. Gene regulation by miRNAs is achieved by three mechanisms: translational repression, mRNA cleavage and mRNA destabilization. Plant miRNAs pair all or most of their ~22 nucleotides to their target mRNA (Figure 3). Then, target regulation is done by mRNA cleaving (Du & Zamore, 2005; Llave, 2002; Tang, Reinhart, Bartel, & Zamore, 2003). Animal miRNAs target the 3’ UTRs of genes by pairing a 6-8 length sequence from their 5’ end, known as the seed sequence

(Figure 3) (Lewis, Burge, & Bartel, 2005; Lewis, Shih, Jones-Rhoades, Bartel, &

(17)

Introduction

Plant and animal miRNAs also are different in their genomic arrangement patterns. Plants have mostly independent miRNA-encoding loci and they can express multiple hairpins from a pri-miRNA, as sometimes miRNA hairpins can be in genomic clusters. Most animal miRNAs have independent encoding loci but some are located in introns. This arrangement gives the advantage that these miRNAs can be regulated by the cis-regulatory elements that regulate the expression of the host mRNA (Axtell et al., 2011).

1.4 Methods to predict potential miRNA targets

The aim of studying how miRNA target recognition works is to understand the function of miRNAs (Grimson et al., 2007). Functions of miRNAs are based on the phenotypic effects of miRNAs (by mutations in the seed

Fig 3. miRNA-mRNA target complementarity in animals and plants.

(18)

Introduction

sequence, overexpression and knockdown experiments, and

immunoprecipitation experiments of miRNAs and Ago proteins) (Bartel, 2004). The first methods to identify miRNA-mRNA relationships were experimental by identifying potential targets and later confirming them by directed mutagenesis of the candidate target site (reviewed in Mazière & Enright, 2007). Many bioinformatics methods started to emerge, based on in silico identification of sequences complementary to the seed sequence in the mRNA target, although subsequent methodological problems arose, for example, the difficulty in ranking short 3’UTRs as potential targets. Since, statistical techniques have been implemented for longer sequences (Mazière & Enright, 2007).

1.4.1 Experimental methods to predict miRNA targets

The first miRNA-target relationships were found using forward genetics on larval defects of Caenorhabditis elegans for lin-4 and its target lin-14 (Lee, Feinbaum, & Ambros, 1993; Wightman, Ha, & Ruvkun, 1993). Reverse genetics, in the case of overexpression and knockdowns of miRNAs, have also aided in understanding their function (Thomas, Lieberman, & Lal, 2010). Experiments where a miRNA is overexpressed and/or knocked down followed by mRNA expression profiling are useful because they can help identify which downregulated or upregulated genes are due to the effect of the miRNA. They are also useful because the resulting differentially expressed genes can be subjected to Gene ontology enrichment analysis to find which functions the miRNA is regulating.

(19)

Introduction

that best ranked computational predictions corresponded to the most downregulated proteins (Baek et al., 2008). In the case of immunoprecipitation experiments, the miRNA specific Ago protein is marked with a FLAG tag, or an HA epitope tag and later immunopurified with an antibody. In one experiment, miRNAs and target mRNA fragments were enriched in immunopurified Ago1 proteins, and the complementary sequences with the miRNA seed region were found both in the 3' UTRs and the CDS (Easow, Teleman, & Cohen, 2007). In another study, the authors compared mRNAs bound to an immunopurified AGO1 of Drosophila

melanogaster against the upregulated transcripts of an AGO1 depletion

experiment, expecting to find common genes. Instead, they found that most genes were different in each experiment and that they differed according to structural properties. Apparently transcripts from the immunopurification experiment were enriched with shorter 3' UTRs than transcripts from the AGO1 depletion experiment. The authors argue that structural and molecular properties of each experiment gene set show differences in site context according to UTR length. Because these experiments were done in Drosophila

melanogaster, they also argue that longer 3' UTRs (enriched in AGO1

depletion experiment) are regulated by adenine/uridine-rich element binding proteins (AUBPs) in Drosophila and may have also provoked upregulation of genes in AGO1 depletion experiment, as near 20% of the upregulated genes in this experiment have conserved AU-rich elements (Hong, Hammell, Ambros, & Cohen, 2009).

1.4.2 Computational methods to predict miRNA targets

(20)

Introduction

The miRanda algorithm relies on the complementarity and the free energy of the binding, and uses as a filter site conservation (Enright et al., 2003). Considered sites are those with perfect or almost perfect complementarity. The free energy of filtered sites is calculated with the Vienna RNA folding package, which from a given sequence determines the suboptimal folds and the free energy (Wuchty, Fontana, Hofacker, & Schuster, 1999). Subsequent improvements of miRanda algorithms can also assess prediction of miRNA targets in protein-coding regions, also considers a non-uniform distribution of miRNA-mRNA complementarity and the influence of G:U wobbles in the binding of the sequences (John et al., 2004). Another algorithm that has as main parameter the thermodynamic stability is DIANA-microT (Kiriakidou et al., 2004). The algorithm considers a window of 38 nucleotides of the 3' UTRs and calculates the free energy of potential miRNA-mRNA binding site, it moves along the 3' UTR calculating the thermodynamic stability of potential binding sites.

Some of the methods that rely more on seed sequence complementarity are PicTar and TargetScan (Krek et al., 2005; Lewis, Shih, Jones-Rhoades, Bartel, & Burge, 2003). PicTar requires aligned orthologous 3' UTRs and will scan the alignments for seed matches of a given set of miRNAs. These targets are considered conserved due to the alignment, which the algorithm uses to assess their thermodynamic stability. Each predicted target that passes the thermodynamic stability filter is scored by maximum likelihood. The scores of each 3' UTR are combined to get a final score.

(21)

Introduction

al., 2007). The latest TargetScan version implements the Context ++ model, which has better performance than previous TargetScan methods. The Context++ model was developed taking into consideration parameters of previous models (context score, context score+, weighted context+). Context ++ considers 14 parameters (such as local AU content, target site abundance across all 3’UTRs and position of the site within the 3’UTR) of which nine are new (such as probability of conserved targeting, and length of the ORF and the 3'UTR) (Appendix 1) (Agarwal, Bell, Nam, & Bartel, 2015).

Target conservation in the Context++ model is assessed by the probability of conserved targeting (PCT) (Friedman et al., 2009). Previous studies defined conservation of miRNA targets according to the orthologous location in the genomes under consideration. In such studies, conserved sites are those maintained in the same position and non-conserved sites are those missing or that have changed position in one of the genomes. According to this criteria, there will be missing sites because they are not at the orthologous position, even if they might be under selective pressure (Friedman et al., 2009). The PCT is the Bayesian estimate of the probability that a site is

Fig 4. TargetScan different types of seed match.

(22)

Introduction

conserved because of miRNA targeting instead of chance or other reasons. It represents 1 – FDR (estimate of the false discovery rate). It is calculated according to a phylogenetic tree of 3’ UTRs, which considers that aligned sites at orthologous genes have a single origin, and also considering estimated background conservation. Conservation of sites is defined according to a branch-length score. For mRNAs with multiple sites, the aggregate probability of conserved targeting (aPCT) is calculated (aPCT = 1 - (FDRsite1 X FDRsite2 … )) (Friedman et al., 2009).

The 14 features in the Context ++ model were chosen according to previous analysis of 26 features by a step optimal Akaike Information Criteria (AIC) function (Venables & Ripley, 2002). This function determined which characteristics were more useful for the model to assess site efficacy. According to this analysis, the 14 features previously mentioned were the most effective (Appendix 1) (Agarwal, Bell, Nam, & Bartel, 2015). These features were used to train multiple linear regression models on 74 microarray datasets of transfected miRNAs or siRNAs. The resulting models, for each seed site type, together were called the Context++ model. The sign of the coefficient value shows the resistance or sensitivity of the repression. A site with a positive value means higher resistance to the repression and a negative value means the site is more prone to repression (Agarwal et al., 2015).

(23)

Introduction

relationships they considered, as many algorithms filter according to target conservation. In the second case, by choosing a set of predicted targets for each data set, the top predictions of Context++ model were significantly more repressed than the top prediction of the other models and were as good as results from AGO-IP experiments (Agarwal et al., 2015).

1.5 Evolution of miRNAs and their targets in animals

The origin of animal miRNAs is unclear since they were probably present in the last common ancestor of Metazoa, although there are groups of animals that have lost the miRNA pathway (Ctenophora, Placozoa and Choanoflagellida) (Berezikov, 2011; Maxwell, Ryan, Schnitzler, Browne, & Baxevanis, 2012). Between Porifera and Eumetazoa there are no common miRNAs families, also Porifera pre-miRNA hairpins are larger than bilaterian's. Because of this it was hypothesized that miRNAs in Porifera had an independent origin than eumetazoan's. Nonetheless, within Porifera there are homologues of Drosha and Pasha proteins, which are absent in lineages outside the Metazoa, indicating a single origin of miRNA substrates (Grimson et al., 2008).

(24)

Introduction

More recent expansions of microRNA families are found in vertebrates, particularly within rodents and primates. An example of expanded miRNA families in primates are the miR-139 and miR-301 families, which are expressed in embryonic stem cells (Guerra-Assuncao & Enright, 2012). Expansions of miR-139 and miR-301 may be required due to the increased morphological complexity and longevity in primates. As we know, DNA is exposed to oxidative damage that produces mutations which accumulate during DNA replication. Embryonic stem cells have many rounds of division and subsequent cell type specification, during this process there must be mechanisms that regulate the maintenance of genetic fidelity (Roccanova & Ramphal, 2003). Expansions of miR-139 and miR-301 could be related to regulation of this process although there is no direct evidence.

An alternative hypothesis that explains complexity in vertebrates, are

changes in protein coding genes due to genome duplication events (GDE) (Heimberg et al., 2008; Hertel & Stadler, 2015). Nonetheless, this hypothesis is not well supported as there are no major differences in the developmental gene protein toolkit in Metazoa (Heimberg et al., 2008).

Fig. 5 Metazoa phylogeny.

(25)

Introduction

The expansion of miRNA families could have contributed to regulatory changes that led to organismal complexity, as more regulators may have been needed (Heimberg et al., 2008 and; Hertel & Stadler, 2015). It is thought that regulation of gene expression by miRNAs can be by fine-tuning (slightly lowering the expression level of a gene), or they can act as buffers (reducing the expression variance or noise of a gene). Because of this, miRNAs have been proposed to function in stabilizing phenotypes (canalization), such as cell identity (Berezikov, 2011; C.-I. Wu, Shen, & Tang, 2009).

Acquisition of a new functional miRNA implies that regulated targets are probably, on average, beneficial or neutral for the organism. There are two models that propose how a new miRNA is integrated to gene regulatory networks: the decreasing and the increasing model (Kevin Chen & Rajewsky, 2007; Nozawa et al., 2016). The decreasing model proposes that at the beginning a new miRNA is expressed at low enough levels to not cause much effect on its targets. At this stage, most of its targets are potentially deleterious, with just a small fraction being neutral or beneficial. During the course of evolution, Natural Selection eliminates the deleterious targets and maintains beneficial ones. Subsequently the expression of the miRNA increases, becoming functional (Kevin Chen & Rajewsky, 2007). On the other hand, the increasing model also proposes that a new miRNA has lower expression, but that most of its targets are neutral and just a small fraction are beneficial. As time passes, the number of beneficial targets increases and neutral ones are reduced; then the new miRNA’s expression increases (Nozawa et al., 2016). Comparison of the number of targets between young (miR-954) and old (miR-277) miRNAs in Drosophila melanogaster supports the increasing model, as younger miRNAs have less targets than older miRNAs. Nevertheless, a comparison of all predicted targets of D.

melanogaster miRNAs defined as "old" or "young", only showed a slight trend

that old miRNAs have a greater numbers of targets than young ones (Nozawa et al., 2016).

(26)

Introduction

families are conserved within certain lineages, some more conserved than others (Christodoulou et al., 2010; Hertel & Stadler, 2015). Besides being conserved in sequence, these miRNA families also maintain their spatial (cell type/tissue) and temporal expression (Christodoulou et al., 2010). Nonetheless, along the clades that conserve the same miRNAs families, they don’t share all of their targets. One approach to understand the function of miRNAs in gene regulatory networks was to study the conservation of miRNA targets between distant clades. Particularly, Chen & Rajewsky analysed target conservation of miRNA families (K. Chen & Rajewsky, 2006). To achieve this, they constructed a set of common miRNAs families, shared between fly, vertebrates and nematodes, according to their previous analysis, in which fly had 69 conserved miRNA families (Grün, Wang, Langenberger, Gunsalus, & Rajewsky, 2005) and nematodes had 73 conserved miRNA families (Lall et al., 2006). Then, for the 15 common miRNAs families (K. Chen & Rajewsky, 2006), they predicted their targets with the PicTar algorithm (Krek et al., 2005), defining as conserved targets for each clade those where the target site was in the same region of the aligned 3’ UTRs. They predicted 9,379 targets in human, 3,082 in D. melanogaster and 2,679 in Caenorhabditis

elegans (K. Chen & Rajewsky, 2006).

They found that only 5 of these targets were shared among the three clades. Interestingly, a GO term analysis of conserved targets between human and fly showed enrichment for processes related to development.

(27)

Introduction

depletion of tissue specific miRNA seeds, or by shortening of 3’ UTRs (Stark et al., 2005).

1.6 Evolution of gene expression in animals

(28)

Introduction

have higher expression and sequence divergence than genes expressed in more than one tissue.

Divergence of gene expression is thought to be due to neutral evolution accompanied by selective constraints (Jordan, Mariño-Ramírez, & Koonin, 2005; Khaitovich et al., 2005). The neutral model of evolution proposes that most changes (in expression) have no effect on the fitness of the organism so they accumulate in a regular manner (Ohta, 1992). The selective constraint in the divergence of gene expression is done by purifying selection (eliminates deleterious variants). This was tested by comparing human and mouse gene sequence divergence and gene expression divergence (Jordan, Mariño-Ramírez, & Koonin, 2005). They found that orthologue genes between human and mouse have tissue-specific gene expression patterns more similar than random pair of genes of human and mouse. Changes in gene expression of random pair of genes are close to the rate of neutral expression level divergence because similarities are not expected to be detectable due to the time of divergence between human and mouse.

Similar results were found in the previous mentioned experiment that compared tissue transcriptomes of human and chimpanzee (Khaitovich et al., 2005). They also argued that expression patterns are consistent with the neutral model and purifying selection, where each tissue has its own evolutionary constraints. Due to this, they also mentioned that positive selection may be acting at the tissue level. For example, genes expressed in testis, show high expression differences between human and chimpanzee. Furthermore, genes expressed in testis from the X chromosome show higher difference in both gene expression levels and amino acid sequence, compared to the rest of the tissues.

(29)

Introduction

It is interesting that comparison of AS patterns between tissues also shows that the brain has the slowest rate of divergence between vertebrates. Nonetheless, comparison between human and chimpanzee AS patterns of the prefrontal cortex showed that from 120,951 orthologous exons, 6-8 % of these had splicing level differences (Calarco et al., 2007).

Another regulatory element that could be causing changes in gene expression are miRNAs. Deep sequencing of small RNAs from different organs from human, macaque, mouse, opossum, platypus and red jungle fowl (as an outgroup) showed an expansion of miRNA families in placental and marsupials (Meunier et al., 2013). They also found that recently acquired miRNAs have lower expression than ancient miRNAs families as previously proposed (Kevin Chen & Rajewsky, 2007; Nozawa et al., 2016). They also found that young (emerged since the mammal-bird split) and old miRNA families (emerged prior mammal-bird split) have different tissue-specific profiles. Old miRNAs were more expressed in heart and kidney, while young miRNAs mostly in brain cortex and cerebellum. They also assessed 3'UTR length and miRNA targeting in the different organs and found that cortex and cerebellum had longer 3'UTRs in all the analyzed species. This result along with the increase of miRNAs families in neural tissue could mean that gene expression evolution in brain has a major contribution by miRNAs.

1.7 miRNAs in the nervous system

(30)

Introduction

The study of miRNAs in the nervous system is particularly interesting because the nervous system is highly complex and has to mediate body to brain communication. The brain has billions of nerve cells with morphological and functional differences according to where they are located. Brain gene expression is highly diverse across nervous and brain tissue because of the formation of cellular network connections; miRNAs aid in regulating these changes in brain gene expression. miRNA regulation in the nervous system is a great example of how gene regulation of certain cellular process must be achieved according to external cues and subtle changes in gene expression patterns to provoke different phenotypes (Fiore et al., 2011).

miRNAs are involved in neurogenesis, which is the differentiation process of neural stem cells and neural progenitor cells to neurons (Scott, 2013). The study of miRNAs in this process was first attempted by mutating proteins involved in the miRNA biogenesis pathway. For example, mutations in Dicer have suggested that miRNA function may act in morphogenetic processes and not in early embryonic developmental stages (De Pietr Tonelli et al., 2009; Giraldez, 2005; Kawase-Koga, Otaegi, & Sun, 2009). Deletion of Dicer specifically in the CNS and cortex in mice, using two Cre lines, showed different phenotypes according to the timing of deletion. If the deletion was during early embryonic stages, development was not affected. But if the deletion was in later embryonic stages it affected migration of late-born neurons in cortex and expansion of oligodendrocytes precursors and differentiation in the spinal cord (Kawase-Koga et al., 2009).

(31)

Introduction

cortical layering; such phenotypes result from neuronal apoptosis (T. H. Davis et al., 2008; De Pietr Tonelli et al., 2009).

Further analyses have shown that miRNAs regulate neurogenesis at each step of cell specificity, indicating that they are involved in regional specificity of the CNS. For example, miR-196, miR-17-3p and miR-9 help maintain the regional establishment of neural progenitor cells during development (Abernathy & Yoo, 2015). miR-9 is an ancient miRNA that is highly expressed in the nervous system, specifically in neural progenitors of certain regions (Bonev, Pisco, & Papalopulu, 2011; Christodoulou et al., 2010; Lagos-Quintana et al., 2002). It is expressed in neural progenitors and in developing neurons of the forebrain but only in the mid- and hindbrain of Xenopus

tropicalis. When miR-9 is knocked down there is an increase of neural

progenitor cells in the hindbrain and decreases in the forebrain (Bonev et al., 2011). miR-196 regulates the spatio-temporal expression of Hoxb8. Dis-regulation of Hoxb8 expression pattern affects progression of motor neurogenesis. This was demonstrated by transcription assays and in vitro and

in vivo sensor-tracer analysis (Asli & Kessel, 2010). miR-17-3p regulates

developmental patterning of V2 interneurons (neuronal progenitor domain p2) and progenitors of spinal motor neurons (pMN) by targeting Olig2. Mutant mice lacking the miR-17~92 cluster show changes in the boundary between pMN and p2 cells.

(32)

Introduction

(Urbanska et al., 2008). This process is regulated by miRNAs such as miR-132, miR-125b and miR-134 (Edbauer et al., 2010; Fiore et al., 2009, 2011; Vo et al., 2006; Wayman et al., 2008). miR-132 is a target of cAMP-response element binding protein (CREB) which is induced by neurotrophins. When CREB is induced, it provokes miR-132 expression which downregulates p250GAP, promoting neurite outgrowth (Vo et al., 2006). The effect of miR-132 was later confirmed, when it was expressed in hippocampus neurons and incremented dendrite morphogenesis, and inhibited when this process was blocked. In another study, miR-125b and miR-132 were associated with the fragile X mental retardation protein (FMRP) by regulating dendritogenesis in mouse spine (Edbauer et al., 2010). FMRP interacts with Ago and Dicer and functions as a translational repressor. FMRP was implicated in synaptic abnormalities by knockout experiments (Comery et al., 1997). Because of this, they tested if the association of miR-125b and miR-132 were the effectors of FMRP phenotypes. Each miRNA has opposite effects, according to overexpression experiments: miR-125b provokes longer and thinner dendrites and miR-132 produces an increase in protrusions and reduced spine density. Expression of both miRNAs cancel the effects of FMRP knockdowns. On the other hand, miR-134 promotes dendritic outgrowth by targeting Pumilio2 (Pum2) (Fiore et al., 2009). Pum2 was found to control dendrite morphogenesis in Drosophila melanogaster (Ye et al., 2004). When miR-134 is knocked down, dendritogenesis becomes aberrant. So, by knocking down Pum2 in a miR-134 knockdown context they recovered the lost function of miR-134.

(33)

Introduction

These are some examples of how miRNAs regulate gene expression in the nervous system. They help illustrate how intricate miRNA regulation is and that miRNAs enable localized regulation.

1.8

miR-124

miR-124 is one of the 34miRNAs conserved across Bilateria, and it is involved in neuronal differentiation (Christodoulou et al., 2010; Fiore et al., 2011; E. Sun & Shi, 2014). miR-124 expression increases as the differentiation process continues. The targets of miR-124 are involved in sustaining non neuronal gene profiles (Cheng, Pastrana, Tavazoie, & Doetsch, 2009; Makeyev, Zhang, Carrasco, & Maniatis, 2007; Visvanathan, Lee, Lee, Lee, & Lee, 2007; Yoo, Staahl, Chen, & Crabtree, 2009).

miR-124 regulation in neuronal development starts when the RE1-silencing transcription factor, REST (or Neuron-Restrictive Silencer Factor, NRSF; a main transcription factor that represses neuron-specific genes) is downregulated (Conaco, Otto, Han, & Mandel, 2006). When miR-124 starts expressing, it degrades non-neuronal transcripts (Conaco et al., 2006). REST also regulates neuronal specific miRNAs, including miR-124; this was proved when finding potential binding sites of REST by serial analysis of chromatin occupancy (SACO) in a family of miRNAs where miR-124 is present. Then, by chromatin immunoprecipitation (ChIP) analysis, they found that REST occupied this site. They proved that REST regulates the expression of these miRNAs by introducing a dominant negative form of REST (dnREST) which increased in transcripts levels of these miRNAs as measured by quantitative PCR. They tested if miR-124 downregulates non-neuronal genes by depleting miR-124 with antisense 2'-O'-methyl oligoribonucleotides in cortical neurons. They measured mRNA levels of 17 non-neuronal transcripts previously identified by quantitative PCR, and found an increase for 10 of these genes.

(34)

Introduction

along with REST (Yeo et al., 2005). According to co-immunoprecipitation they found that SCP1 immunoprecipitates contained REST and vice versa. They also found that SCP1 targeted REST elements by ChIP and PCR primers of specific REST binding. Targeting of SCP1 was tested by luciferase reporter with 3'UTR of SCP1, which contains three evolutionary conserved miR-124 targets sites (Visvanathan et al., 2007), according to TargetScan (Lewis, Shih, Jones-Rhoades, Bartel, & Burge, 2003). This reporter, miR-124 RNA duplexes and a miR-124 expression vector were expressed in HEK293 cells; both the miR-124 duplexes and the expression vector supressed luciferase expression (Visvanathan et al., 2007). They also tested if targeting of miR-124 to SCP1 caused neurogenesis by electroporation of SCP1 without its 3’UTR to developing spinal cord. This resulted in the maintenance of neuronal progenitors instead of their reduction.

When REST is downregulated, miR-124 induces chromatin remodelling and neuronal splicing events (Abernathy & Yoo, 2015; Fiore, Khudayberdiev, Saba, & Schratt, 2011). During the development of the nervous system, a main characteristic is that cells lose multipotency at the mitotic exit. This process is followed by a change of the ATP-dependent chromatin-remodelling mechanism, the BAF or mSWI/SNF complexes (Staahl & Crabtree, 2013; Yoo, Staahl, Chen, & Crabtree, 2009). During differentiation of ES to neurons, the BAF complexes have different changes: ES have esBAF, neural progenitors npBAF and neurons nBAF (Staahl & Crabtree, 2013). Particularly, within the npBAF complex there is an exchange of BAF53a and BAF45a to their paralogs BAF53b and BAF45b, to form the nBAF complex in post-mitotic neurons (Yoo et al., 2009). miR-124 targets BAF53a, aiding in the transition of npBAF to nBAF (Yoo et al., 2009). This was proved by using a nestin promoter in miR-124 which was ectopically expressed in progenitor cells, which showed a reduced expression of BAF53a.

(35)

Introduction

cells (NPC), glia and non-neuronal cells, and PTBP2 is expressed in neuronal cells (Boutz et al., 2007). Using microarray to detect alternative splicing they found splicing of nearly 1,300 exons, and identified the splicing targets of each protein by knockdowns in N2A mouse neuroblastoma cells. The target gene sets of PTBP1 and PTBP2 were enriched for cytoskeletal rearrangement, vesicular or protein transport according to Gene Ontology analysis (Boutz et al., 2007). miR-124 targets PTBP1 inducing neuron-specific alternative splicing, including PTBP2 (Makeyev et al., 2007). They transfected mouse neuroblastoma cell lines, CAD, and Neuro2a (N2a) with miR-124. CAD cells were analysed using microarrays revealing 342 downregulated and 61 upregulated genes. Eight of the upregulated genes were alternative spliced; they also found that one of the downregulated genes was PTBP1. They confirmed targeting of miR-124 to PTBP1 by luciferase reporters. They confirmed that downregulation of PTBP1 is needed to induced alternative splicing and found inclusion of alternative exons in upregulated genes. Then, they determined if miR-124 induction decreases PTBP1 enough to provoke alternative splicing by using RNAi and RT-qPCRs. They confirmed depletion of PTBP1 and splicing of similar genes as observed in the microarray analysis.

(36)

Introduction

It has been proposed that some miRNAs regulate neuronal and immune response in the nervous system (NeurimmiRs) (Soreq & Wolf, 2011). The brain regulates immune responses using neurotransmitters, neuropeptides and hormones, and senses peripheral inflammation by detecting proinflammatory cytokines. NeurimmiRs change the expression of components of hematopoiesis and immune response, as well as many neurological diseases involved in inflammation; they include miRNAs such as miR-124, miR-132, miR-9, miR-146a, miR-212, miR-326, miR-125b and miR-155 (Soreq & Wolf, 2011).

(37)

Introduction

involved in differentiation of myeloid cells (C/EBP-α and PU.1), showed downregulation of these genes by miR-124, establishing regulation of macrophage activation by miR-124. Stress responses are regulated by glucocorticoid receptors (GCs) (Soreq & Wolf, 2011). The potential role of miR-124 in stress disorders is because miR-124 regulates GCs (Vreugdenhil et al., 2009). This was shown by transfection of miR-124 and later Western blot analysis of GRs, in which the proteins levels were reduced in comparison to control

.

1.9

What is the function of miRNAs in animals?

We know that there has been an increase of miRNA families during Metazoan evolution and that they correlate with evolution of cell types and tissues (Christodoulou et al., 2010; Guerra-Assuncao & Enright, 2012; Heimberg, Sempere, Moy, Donoghue, & Peterson, 2008; Hertel & Stadler, 2015). Conserved miRNA families in Bilateria are expressed in the same cellular type and at the same developmental time points (Christodoulou et al., 2010). One of the most conserved miRNAs in Bilateria is miR-124 which is expressed in neuronal tissue and is involved in aiding the cell to acquire and maintain a neuronal gene profile (Christodoulou et al., 2010). One would expect that if the miRNA is conserved locally (specific cell type), it would target the same genes, so that in order to study the precise cellular function a miRNA is regulating one simply must know the function of the miRNA’s target genes. Nonetheless, as previously mentioned, most miRNA-target relationships are not conserved (K. Chen & Rajewsky, 2006). In the particular case of miR-124, even the number of predicted targets along Bilateria species differs hugely. For example, according to TargetScan predictions miR-124 has more than 4,000 potential targets in human, while in fly it only has around 500 predicted targets (http://www.targetscan.org/vert_71/). So, the difference of predicted target numbers already indicates that the regulation of gene expression by miRNAs does not just rely on targeting specific genes.

(38)

Hypothesis and objectives

which are the particular cellular functions that a miRNA regulates in a given cell type, we must consider both conserved and non-conserved targets.

2 Hypothesis and objectives

2.1 Hypothesis

The conserved function of a microRNA is to maintain regulation of specific cellular processes, even though individual targets can change. Since miR-124 is expressed exclusively in the nervous system, we expect it to maintain regulation of processes important for this system, across different species.

2.2 Objectives

General Objective

Find common processes regulated by miR-124 in different species.

Specific Objectives

• Search for cellular processes with predicted miR-124 targets across different lineages.

• Analyse expression datasets of different species where miR-124 is affected.

• Find potential functions regulated by miR-124 from the experimental datasets.

• Compare potential functions regulated by miR-124 across the experimental datasets in the same species.

Compare potential functions regulated by miR-124 across different species

.

(39)

Material and methods

3.1 Sequences and related information

Information for Homo sapiens, Mus musculus, Xenopus tropicalis and

Drosophila melanogaster was downloaded from the Ensembl database using

the biomaRt library in R (Bronwen et al., 2016; Durinck et al., 2005). The used functions were useMart and getBM. The mart parameter in the useMart function was ENSEMBL_MART_ENSEMBL, the host www.ensembl.org, the datasets were hsapiens_gene_ensembl, mmusculus_gene_ensembl, xtropicalis_gene_ensembl and dmelanogaster_gene_ensembl. Information was retrieved using the getBM function. The attributes requested were the 3’UTR, transcript length, CDS length and the Ensembl transcript and gene identifiers. The request was filtered using the chromosome name. A total of 198,968 human transcripts were obtained, of which 70,794 had 3’UTR sequences. For mouse, there were 115,088 transcripts and 44,229 had 3’UTR sequences. In the case of X. tropicalis, there were 24,197 transcripts and 8,145 had 3’UTR sequences. For D. melanogaster there were 34,642 transcripts and 29,849 had 3’UTR sequences. The length of all available 3’UTRs, of the four organisms, was calculated and the longest 3’UTRs for each gene was selected. This resulted in 19,164 human 3’UTRs, 20,042 mouse 3' UTRs, 6,717 frog 3' UTRs and 13,459 fly 3' UTRs. The 3’UTR sequences were saved in FASTA files using functions from the Biostrings package (Pages, Aboyoun, Gentleman, & DebRoy, n.d.).

3.2 Target prediction for miR-124

3.2.1 Potential miR-124 targets

(40)

Material and methods

The numbers of predicted targets vary according to the organism (Table1).

The different number of miR-124 predicted targets between organisms in Table 1 differ mainly in the number of 3'UTRs; thus the reason why Xenopus has so few miR-124 predicted targets is that there are only ~6,000 UTRs to do the analysis. In the case of fly, the low number of predicted targets is also due to the biology of the organism which has much shorter 3'UTRs (Table 1, in the case of Xenopus short median 3' UTR length is due to the less number of annotated 3' UTRs (6,717)).

3.2.2 Shared targets between human and mouse

Orthologues between human and mouse were retrieved using the biomaRt library in R and the useMart and getBM functions to get orthologous predicted targets for miR-124 (Durinck et al., 2005). The parameters used for the useMart functions were the “ENSEMBL_MART_ENSEMBL” database and the “mmusculus_gene_ensembl” dataset to retrieve the orthologues genes between human and mouse. For the getBM function the attributes requested were the Ensembl gene and transcript id, human Ensembl gene and the homology-orthology type (“hsapiens_homolog_orthology_type”); the filter was the homology with Homo sapiens (“with_hsapiens_homolog”). Orthologues genes were define as those with the orthology type "one2one". On the other hand, shared targets were all orthologues genes that are predicted targets for

Organism Number of Targets median 3' UTR length (nc) Human 4,450 1,288 Mouse 4,961 937.5 Xenopus 1,351 550 Fly 560 227

(41)

Material and methods

both human and mouse. From the total predicted targets of mouse (4,961), 2,251 were predicted orthologous targets between human and mouse.

3.2.3 Homolog genes between human and fly

Homolog genes between human and fly were retrieved in a similar way as orthologues genes between human and mouse. Again, the library used was biomaRt and the functions were useMart and get BM (Durinck et al., 2005). In the case of the useMart function, the parameter that changed was the database, using the “dmelanogaster_gene_ensembl”. The attributes for the getBM function were the “ensemble_gene_id”, “ensemble_transcript_id”,

“hsapiens_homolog_ensembl_gene” and the

“hsapiens_homolog_orthology_type”. The filter was the

“with_hsapiens_homolog”. The output was 18,499 fly transcripts with homolog in human.

3.3 Experimental expression data

3.3.1 Human Illumina Body Map (EBI-EMBL RNAseq dataset)

Raw data of the Human Illumina Body Map (E-MTAB-513) was downloaded from the Array Express database (Kolesnikov et al., 2015). This dataset contains transcription profiling of the following 16 human tissues: adipose, adrenal gland, brain, breast, colon, heart, kidney, leukocyte, liver, lung, lymph node, ovary, prostate gland, skeletal muscle, testis and thyroid gland. There are no biological replicates, only technical replicates consisting of a single-end and a paired-end library for each tissue.

3.3.2 Xenopus tropicalis tissue atlas

(42)

Material and methods

3.3.3 Microarray experiments

Microarray experiments were retrieved from the Gene Expression Omnibus database (GEO) (Edgar, Domrachev, & Lash, 2002). Microarray platforms considered for human and mouse are listed in Table 2. Only one type of microarray platform (of each organism) was used to have the same set of genes for further analysis.

These platforms were chosen because they have probes to analyse the great majority of the genes of their respective species. HG-U133_Plus_2 covers over 47,000 transcripts and variants, being the largest of Affymetrix 3’IVT human arrays. Mouse430_2 is also the largest of Affymetrix 3’IVT mouse arrays, covering over 39,000 transcripts and variants (Affymetrix, n.d., 2007).

A query was made to the GEO database to obtain all the experiments available as “Gene Expression Series” (GSEs) where miR-124 is directly or indirectly expressed. The only filter used was the platforms mentioned above. Hence the query included a search for titles related to tissue atlas, a specific tissue, neuronal cell types or explicitly the mention of “miR-124”. This was achieved using the R libraries GEOmetadb and GEOquery (S. Davis & Meltzer, 2007; Zhu, Davis, Stephens, Meltzer, & Chen, 2008). Access to the GEO database was done with the GEOmetadb library by downloading the “GEOmetadb.sqlite” file with the getSQLiteFile function. Once this database was downloaded, connecting with the database was done with the dbConnect function, the parameters used were: SQLite and the downloaded GEO database (GEOmetadb.sqlite). The output was stored in an object required to

Organism Microarray Platform GPL ID

Human Affymetrix Human Genome

U133 Plus 2.0 Array (HG-U133_Plus_2)

GPL570

Mouse Affymetrix Mouse Genome

430 2.0 Array (Mouse430_2)

GPL1261

(43)

Material and methods

retrieve the experiment information. The function to retrieve the information is dbGetQuery and uses the object defined with the dbConnect function, and requires SQL syntax to implement the query. This means that the information is stored in tables and the output will also be a table. To get information that is stored in different tables I had to join them using a common identifier. The information retrieved was the GSE identifier and title, the identifiers and titles of all samples (GSM) within each GSE, and the supplementary files corresponding to each GSM. The SQL syntax for this would be:

“SELECT  gse.gse,  gse.title,  gsm.gsm,  gsm.title,   gsm.supplementary_file”  

The tables from which this selection was done were the GSE, GSM and experiment platform (GPL). The common identifier was the GSE ID to query between them. The SQL syntax is:

“FROM”,    

"gsm  JOIN  gse_gsm  ON  gsm.gsm=gse_gsm.gsm",   "JOIN  gse  ON  gse_gsm.gse=gse.gse",  

"JOIN  gse_gpl  ON  gse_gpl.gse=gse.gse",   "JOIN  gpl  ON  gse_gpl.gpl  =  gpl.gpl"  

The same parameters were used for both microarray platforms (GPL570 and GPL1261) except the one specifying the platform. This is specifying the GPL identifier in the GPL table.

"WHERE",  

"gpl.gpl  =  'GPL570'"   or    

"WHERE",  

(44)

Material and methods

A specific query was done to obtain experiments in which miR-124 was altered. Experiments that explicitly had the word “mir-124” in the title and/or summary of the GSE were searched. The SQL syntax was the same as the other query except for the “WHERE” parameter (the gpl.gpl parameter changes according to the organism GPL570 for human and GPL1261 for mouse):

 “WHERE”,  

 "gse.title  LIKE  '%mir-­‐124%'  OR",  

                         "summary  LIKE  '%mir-­‐124%'",  sep  =  "  ",                            "AND",  

                         "gpl.gpl  ='GPL570'"

(45)

Material and methods

A mouse tissue atlas experiment was also analysed (GSE9954). The tissues were diaphragm, spleen, muscle, liver, brain, lung, kidney, adrenal gland, bone marrow, adipose, pituitary gland, salivary gland, seminal gland, thymus, testes, heart, small intestine, eye, embryonic stem cells, placenta, ovary. Most of the tissues have 3 replicates, except for the muscle and bone marrow that have 4 replicates.

Although there are some human tissue atlases on older platforms, on the selected platform we did not find an experiment similar to the RNA-seq data set and the mouse tissue atlas microarray experiment mentioned above (GSE9954). Hence for human, a dataset was constructed with tissue samples of different experiments (GSEs). To do this, GSEs that had in their title a specific tissue were selected. The tissues were: heart, brain, colon, liver, lymph, lung, breast, blood, prostate, kidney, adipose, muscle, ovary, adrenal, testes and thyroid. GSEs for the adrenal and testes tissues were not found.

Organism GSE ID Number of

samples

Title Cell line

Human GSE32876 6 3 transfected samples and 3 control samples Inferring transcriptional and microRNA-mediated regulatory programs in glioblastoma Human glioma neurosphere line MSK543 Human GSE6207 14* 7 transfected samples and 7 control samples miR-124 transfection

time course HepG2 cell line

Mouse GSE8498 6

3 transfected samples and 3 control samples

The MicroRNA miR-124 Promotes Neuronal Differentiation by Triggering Brain-Specific Alternative Pre-mRNA Splicing CAD cells

(46)

Material and methods

References

Related documents

Apple has granted MFi certification for PAX’s innovative portfolio of compact MPOS terminals D200, D210 which can now be used with iPhone, iPAD and other mobile

Uses a Level - wise search, where k-itemsets (An itemset that contains k items is a k- itemset) are used to explore (k+1)- itemsets, to mine frequent itemsets

This paper will expound on the findings discovered from efforts made by including an occupational therapist on an interprofessional education initiative charged with developing

The models for aldosterone were adjusted for age, sex, BMI, site, disease state, serum cortisol, 24-hour urine potassium, and plasma renin activity.. Data are represented in a

The investigated documentary materials demonstrate that, without existing a systematic pedagogical literature devoted explicitly to adult learners, the ideas, reflections

Between 2009 and 2012, most of the REDD+ international funds evaluated by REDDX were invested in development and strengthening institutional and technical

The results of studied samples given in the table 4 showed that the lower concentration of Calcium was 50 mg / l in the sample taken from Habeer well (6) and the highest

The garden dates from the early 17th century and consists of the west parterre and lime walk, woodland garden and sundial garden and holly walk.. The Tudor Old Palace