Microarrays to study global gene expression patterns

1.7.1 Microarrays as a means of studying complex disorders

Microarrays, commonly known as DNA chips, are defined as a collection of microscopic DNA spots attached to a solid surface. They are used to measure the expression levels of large numbers of genes simultaneously. Each of the DNA spots contains picomoles of a specific DNA sequence known as a probe. A probe is a short section of a gene, or other DNA element, that is used to hybridize a complementary DNA (cDNA) or complementary RNA (cRNA) sample under high stringency conditions. Detection of hybridization is usually through a fluorophore-labeled target, which is used to determine relative abundance of nucleic acid sequences in the target. Microarrays have been used extensively for humans and mice, in the search for gene expression changes associated with complex disorders such as: aging, cancer, obesity, schizophrenia, Alzheimer’s Disease, autism, alcoholism and, more recently, FASD124–131

. As a disclaimer in this thesis, when referring to gene expression, I specifically mean mRNA levels (i.e. the transcriptome). The application of microarrays to mice, particularly the B6 mouse, is well-

documented; mouse microarrays are useful for examining the underlying biological mechanisms governing complex phenotypes, which include abnormalities resulting from prenatal alcohol exposure91,132–134. Complementary DNA microarrays allow for the quantification of large numbers of messenger RNA transcripts for the purpose of large-scale insight into cellular processes involved in the regulation of gene expression135.

1.7.2 Overview on functional, pathway, and network analysis of microarray data

Following the identification of differentially-expressed genes (DEGs) using microarrays, various software programs can be used to determine the importance of specific pathways and networks in relation to behavioural phenotypes. For functional characterization of DEGs, the following databases are applicable: The Database for Annotation, Visualization and Integrated Discovery (DAVID)136, Gene Ontology Enrichment (GE) (Partek Inc., St. Louis, MO, USA), and Ingenuity Pathway Analysis (IPA) (Ingenuity® Systems, Redwood, CA). The freely available DAVID program relies on functional annotation clustering, which uses a p-value to examine the significance of gene-term enrichment using a modified Fisher’s exact test136,137. A p < 0.05 suggests that an input gene is significantly more enriched than by random chance. The Partek Genomics Suite software version 6.6 (Partek Inc., St. Louis, MO) tool, GE, returns a list of functions based on the DEGs. The higher the enrichment score, the more overrepresented a functional group is in the gene list. This score is calculated using a chi-squared test, which compares the proportion of the gene list in a group to the proportion of the background in the group138. An enrichment score > 1 indicates that the functional category is over expressed, while a score of 3 corresponds to significant over expression (p < 0.05). IPA, a commercially available tool, is regarded as a high-quality functional analysis and knowledge discovery program. In IPA, a p-value is associated with a function, and is calculated using a right-tailed Fisher’s exact test139.

The p-value is calculated by considering the number of focus genes that participate in a process, versus the total number of genes that are known to be associated with that process. The p-value identifies significant over-representation of focus genes in a given process. Over-represented functional processes are those that have more focus genes than expected by chance (right- tailed)139.

To identify genes in already-established pathways, Pathway Express (Intelligent Systems and Bioinformatics Laboratory, Michigan, USA), Pathway Enrichment (PE), and IPA will be used. The freely available Pathway Express database is considered as a high-throughput pathway visualization tool that focuses on pathways alone (no functional characterization). By default, Pathway Express uses hypergeometric distribution for p-value calculation140. The impact factor associated with a pathway is a probabilistic term that takes into consideration the proportion of DEGs in the pathway. Essentially, Pathway Express quantifies the influence of DEGs on a given pathway. The quantification of DEGs, in addition to the p-value, is used to distinguish between pathways that have the same proportion of DEGs141. Pathway Enrichment is a new feature incorporated in Partek software. PE generates an enrichment score, in addition to a p-value, for each pathway. This score is the negative natural log of the enriched p-value (Fisher’s exact test)142. Both PE and Pathway Express rely on information from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa Laboratories, Tokyo, Japan); therefore, the results should be similar between the programs. Canonical pathway assessment using IPA results in a list of significant pathways that are altered based on the input genes. Canonical pathways are determined based on two parameters: a ratio of the number of genes from the data set that map to the pathway, divided by the total number of genes that map to the canonical pathway, and a p-

value calculated using Fisher’s exact test that determines the probability that the association between the genes in the data set and canonical pathway is due to chance alone143.

Two tools, GeneMANIA (Donnelly Centre for Cellular and Biomolecular Research, Toronto, Ontario) and IPA, can be used for network analysis of inputted genes that have shown to interact with other molecules. GeneMANIA incorporates a linear-regression algorithm. This algorithm is used to calculate a functional association network of inputted genes with other genes that are known to interact with the input genes; GeneMANIA uses this algorithm to predict gene function144. To complement GeneMANIA results, IPA can be used to further identify interacting networks. IPA has been successfully used by various researchers to examine networks of interacting genes that may play a role in a variety of diseases and disorders including FASD94,145–

148

. IPA links genes and molecules by interactions, which are based on current knowledge from records maintained in the Ingenuity Pathways Knowledge Base. Highly-interconnected networks are likely to represent significant biological function149,150. In IPA, each network is limited to ~35 genes, by the algorithm. A p-score, which is derived from a p-value, is calculated using Fisher’s exact test. This score is assigned to each network, and is simply a measure of the number of focus genes in a network (user-specified genes)151. Another way to define the score is, the likelihood that the focus gene within the network was found by random chance152. A high number of focus genes leads to a higher network score. The network score is equal to the negative exponent of the respective p-value such that a score of 3 corresponds to a p-value of 10E-3153. Therefore, a network is considered significant with a p-score > 2. The combination of results from the DEG list, along with the functional, pathway, and network analyses, will give insight into FASD-relevant genes.

For a gene to be determined relevant to FASD-related phenotypes, it must be differentially-expressed on the microarray. Also, the gene must be implicated in neurodevelopment or behaviour, implicated in a top pathway or network, or involved in an inverse relationship with a differentially-expressed microRNA (miRNA). If the gene is classified as a central “hub” molecule in one of the top three networks, it may also be worthy of investigation. For the purpose of this study, a “hub” is defined as a molecule that appears in a top IPA network with strong intramodule connectivity to all other genes in the same module (analogous to the definition of a “hub” node in protein-protein interaction networks); a hub does not necessarily need to be altered, itself154. Previous studies have defined “hubs” as nodes with connectivity greater than five molecules155–157. In my thesis, a hub is restricted to a minimum of 10 interactions in order to provide a stringent analysis. A hub molecule is linked to dysregulated gene(s), which suggests that there are common mechanisms associated with neurodevelopmental ethanol exposure. The hub genes identified in this thesis are based on the number of interactions in the network158.

In document Neurodevelopmental Consequences of Prenatal Alcohol Exposure: Behavioural and Transcriptomic Alterations in a Mouse Model (Page 36-40)