2 Hypothesis and objectives ................. ¡Error! Marcador no definido
3.3 Experimental expression data
3.3.3 Microarray experiments
Microarray experiments were retrieved from the Gene Expression Omnibus database (GEO) (Edgar, Domrachev, & Lash, 2002). Microarray platforms considered for human and mouse are listed in Table 2. Only one type of microarray platform (of each organism) was used to have the same set of genes for further analysis.
These platforms were chosen because they have probes to analyse the great majority of the genes of their respective species. HG-U133_Plus_2 covers over 47,000 transcripts and variants, being the largest of Affymetrix 3’IVT human arrays. Mouse430_2 is also the largest of Affymetrix 3’IVT mouse arrays, covering over 39,000 transcripts and variants (Affymetrix, n.d., 2007).
A query was made to the GEO database to obtain all the experiments available as “Gene Expression Series” (GSEs) where miR-124 is directly or indirectly expressed. The only filter used was the platforms mentioned above.
Hence the query included a search for titles related to tissue atlas, a specific tissue, neuronal cell types or explicitly the mention of “miR-124”. This was achieved using the R libraries GEOmetadb and GEOquery (S. Davis & Meltzer, 2007; Zhu, Davis, Stephens, Meltzer, & Chen, 2008). Access to the GEO database was done with the GEOmetadb library by downloading the
“GEOmetadb.sqlite” file with the getSQLiteFile function. Once this database was downloaded, connecting with the database was done with the dbConnect function, the parameters used were: SQLite and the downloaded GEO database (GEOmetadb.sqlite). The output was stored in an object required to
Organism Microarray Platform GPL ID
Human Affymetrix Human Genome U133 Plus 2.0 Array
(HG-U133_Plus_2)
GPL570
Mouse Affymetrix Mouse Genome 430 2.0 Array (Mouse430_2)
GPL1261
Table 2. Human and mouse microarray platforms used to find experiments related to miR-124 function.
Material and methods
retrieve the experiment information. The function to retrieve the information is dbGetQuery and uses the object defined with the dbConnect function, and requires SQL syntax to implement the query. This means that the information is stored in tables and the output will also be a table. To get information that is stored in different tables I had to join them using a common identifier. The information retrieved was the GSE identifier and title, the identifiers and titles of all samples (GSM) within each GSE, and the supplementary files corresponding to each GSM. The SQL syntax for this would be:
“SELECT gse.gse, gse.title, gsm.gsm, gsm.title, gsm.supplementary_file”
The tables from which this selection was done were the GSE, GSM and experiment platform (GPL). The common identifier was the GSE ID to query between them. The SQL syntax is:
“FROM”,
"gsm JOIN gse_gsm ON gsm.gsm=gse_gsm.gsm",
"JOIN gse ON gse_gsm.gse=gse.gse",
"JOIN gse_gpl ON gse_gpl.gse=gse.gse",
"JOIN gpl ON gse_gpl.gpl = gpl.gpl"
The same parameters were used for both microarray platforms (GPL570 and GPL1261) except the one specifying the platform. This is specifying the GPL identifier in the GPL table.
"WHERE",
"gpl.gpl = 'GPL570'"
or
"WHERE",
"gpl.gpl = 'GPL1261'"
Material and methods
A specific query was done to obtain experiments in which miR-124 was altered. Experiments that explicitly had the word “mir-124” in the title and/or summary of the GSE were searched. The SQL syntax was the same as the other query except for the “WHERE” parameter (the gpl.gpl parameter changes according to the organism GPL570 for human and GPL1261 for mouse):
“WHERE”,
"gse.title LIKE '%mir-‐124%' OR",
"summary LIKE '%mir-‐124%'", sep = " ", "AND",
"gpl.gpl ='GPL570'"
In total, 6 GSEs for the human microarray platform GPL570 that manipulated miR-124 in some manner were retrieved. For mouse, one miR-124 overexpression experiment and a tissues atlas experiment were retrieved.
The human experiments analysed were chosen according to the number of biological replicates and that they were only transfected with the microRNA.
Three overexpression experiments were analysed (Table 3, two for human:
GSE6207 and GSE32876, and one for mouse: GSE8498) and experiments with tissues samples to make the contrast between the brain samples vs. the average of the other tissues, similar to the RNA-Seq experiment from the EBI-EBML database. Although the experiment GSE6207 consists of a time course without any replicates, it was analysed defining each time point as a replicate.
Material and methods
A mouse tissue atlas experiment was also analysed (GSE9954). The tissues were diaphragm, spleen, muscle, liver, brain, lung, kidney, adrenal gland, bone marrow, adipose, pituitary gland, salivary gland, seminal gland, thymus, testes, heart, small intestine, eye, embryonic stem cells, placenta, ovary. Most of the tissues have 3 replicates, except for the muscle and bone marrow that have 4 replicates.
Although there are some human tissue atlases on older platforms, on the selected platform we did not find an experiment similar to the RNA-seq data set and the mouse tissue atlas microarray experiment mentioned above (GSE9954). Hence for human, a dataset was constructed with tissue samples of different experiments (GSEs). To do this, GSEs that had in their title a specific tissue were selected. The tissues were: heart, brain, colon, liver, lymph, lung, breast, blood, prostate, kidney, adipose, muscle, ovary, adrenal, testes and thyroid. GSEs for the adrenal and testes tissues were not found.
Organism GSE ID Number of
time course HepG2 cell line
Mouse GSE8498 6
Table 3. Selected experimental data for human and mouse overexpression experiments.
Material and methods
All experiments that had non-cancerous samples were selected. Cancerous samples were removed in experiments with cancerous and non-cancerous samples. Chosen experiments are indicated in table 4.
Material and methods
Tissue GSE ID Number of Samples
(GSMs)
GSE Title
Heart GSE71226 6 Expression data from coronary heart disease in Chinese Han people
Colon GSE11345 3 IL17 and IL22 stimulation of T84 colonic epithelial cell lines
Liver GSE13471 18 Expression data from human normal pre-frontal cortex, liver and colon tissues and colon tumors Lymph GSE19882 5 Expression data from human lymphoma
endothelium and reactive lymph node-derive Lung GSE18454 12 Analysis of normal lung cells treated with 5-aza-dC
to induce DNA demethylation Breast GSE77978 11 Analysis of human breast milk cells: gene
expression profiles during pregnancy, lactation, involution and mastitic infection
Blood GSE4488 16 Expression data from whole blood Prostate GSE9951 19 Transcriptome analyses in normal prostate
epithelial cells following exposure to low-dose cadmium
Kidney GSE11045 6 Expression data from kidney and liver Adipose GSE15773 19 Expression data from human adipose tissue
Muscle GSE12474 10 Microarray analysis of skeletal muscle hypertrophy induced by heat-stress in healthy humans Ovary GSE40400 6 Expression data from human cumulus cells isolated
from oocytes at MI and MII stages in polycystic ovary syndrome (PCOS) patients
Thyroid GSE3678 14 PCT versus paired normal thyroid tissue Brain GSE54567 28 Expression data from human brain dorsolateral
prefrontal cortex – including control samples and samples with major depression disorders Table 4. Tissue samples used as a human tissue atlas experiment.
Material and methods
The FlyAtlas microarray experiment (GSE7763) was also retrieved. It has samples of different tissues of adult and larvae; for the analysis only adult samples were used to analyse differentiated tissue. The tissue samples used were hind gut, mid gut, accessory gland, brain, crop, head, ovary, whole body, salivary gland, carcass, spermatheca mated and virgin, fatbody, eye, heart, ejaculatory duct, trachea, muscle and wings; each tissue has 4 replicates.
Raw data of the mentioned GSEs was retrieved using the getGEOSuppFiles function specifying the GSE ID