Bioinformatic Analysis - The REST/NRSF pathway as a central mechanism in CNS dysfunction

2.2 Methods

2.2.6 Bioinformatic Analysis

2.2.6.1 ECR (Evolutionary Conserved Regions) Browser

Conservation of transcription factor consensus binding sequences were addressed based on the TRANSFAC 4.0 database (Matys, 2003) available

through the rVista 2.0 tool on the ECR Browser

using the following parameters: minimum matrix conservation (similarity between the consensus binding site for a transcription factor and a potential binding site in the query sequence), 70%; minimum number of homologous sites (the minimum number of sites of which a matrix is built), 4; factor class level (the classification of transcription factors in the TRASNFAC database is hierarchical and include 6 levels, from family of transcription factors to splice variants), 4; and similarity of the sequence to the matrix, 1.

2.2.6.2 Genevar (Gene Expression Variation) suite

Analysis and visualisation of eQTL (expression quantitative trait loci) association patterns within the NRSF and BDNF genes was performed using the Java-based application platform Genevar, version 3.3.0, accessible at

HapMap study based on lymphoblastoid cell lines from CEU individuals was used to address SNP-gene associations (Stranger et al., 2012). Analysis parameters were set to default (Spearman’s rank correlation coefficient, rho) with 10,000 permutations in order to construct a distribution of the test statistic, under the null hypothesis of no SNP–probe associations. This involves randomly re-assigning expression intensities to the individuals’ genotypes and re-computing the correlation coefficient and statistical significance for the shuffled dataset, which is repeated 10,000 times (Yang et al., 2010).

2.2.6.3 HapMap Genome Browser and Haploview

SNP genotype data for genomic regions of interest corresponding to individuals from the CEPH trios of European descent were downloaded from the HapMap Genome Browser, release #28 (August 2010, NCBI build 36, dbSNP

b126), which can be accessed

data was uploaded into Haploview 4.1

freely available software for measuring linkage disequilibrium (LD), defining haplotype blocks and identifying haplotype tagging SNPs (htSNPs) (Barrett et al., 2005). Under the standard Linkage format, htSNPs were identified using the pairwise-tagging function (r2_{threshold, 0.8). SNPs were filtered to include only} those with a minor allele frequency (MAF) of greater than 5% within a Caucasian population.

LD analysis was performed using the D-prime (D’) statistic, which is derived from the earliest measures of disequilibrium, termed D. D quantifies disequilibrium as the difference between the observed frequency of a two-locus haplotype (combination of alleles at adjacent loci on a single chromosome) and

100

the frequency it would be expected to show if the alleles were segregating at random. Adopting the standard notation for two adjacent loci — A and B, with two alleles (Aa and Bb) at each locus — the observed frequency of the haplotype that consists of alleles A and B is represented by PAB. Assuming the independent assortment of alleles at the two loci, the expected haplotype frequency is calculated as the product of the allele frequency (P) of each of the two alleles, or PA × PB. Therefore, one of the simplest measures of disequilibrium is: D= PAB − PA × PB, which states the linear relationship between a given pair of markers, were a D' value of 1 represents complete LD. The squared correlation coefficient (r2_{), used for htSNP analysis and therefore} also a measure of LD, is determined by dividing D’ by the product of the four allele frequencies. When r2 _{= 1, this indicates that two markers have equal allele} frequencies and are therefore in complete LD (D’ = 1).

2.2.6.4 HaploReg

To determine the potential regulatory effects of non-coding SNPs within the genome, we uploaded SNPs of interest into the online package HaploReg,

version 2, accessible

coding SNPs on evolutionary conservation, chromatin states and regulatory elements. The latter is assessed through allele-specific alterations to position weight matrices (PWMs) of known transcription factors using ENCODE (Encyclopaedia of DNA elements) data, determined by logarithm of odds (LOD) calculations (Ward and Kellis, 2012).

101

2.2.6.5 NCBI

Sequence alignments between human and rat genomes were performed using the basic local alignment search tool of nucleotide databases (BLASTN)

(Altschul et al., 1997), available at NCBI

making comparisons of nucleotide or protein sequences to sequence databases, calculating the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help to identify members of gene families.

2.2.6.6 Pathway analysis tools 2.2.6.6.1 MetaCoreTM

Gene expression data generated from GPR analysis (see section 2.2.5.7) was uploaded into the online biological pathway analysis software MetaCoreTM_, version 6.15 build 62452. Functional enrichment of the experimental dataset was performed using: 1) the Pathway Map analysis tool to identify significantly associated pathways based on p-value and GPR fold-change and 2) Build Network for Your Experimental Data feature using the Transcription Factor Targets Modelling algorithm with default settings under Analyse Networks (Transcription Factors) to generate sub-networks based on the presence of transcription factors and/or receptor targets within the original input file. Genes/proteins uploaded from experimental datasets and from which pathways were built upon were termed ‘seed nodes’.

102

2.2.6.6.2 DIANA-miRPath

Predicted NRSF target miRNAs were uploaded into the freely available DIANA-miRPath pathway analysis web-server that utilises experimentally validated miRNA interactions derived from DIANA-TarBase v6.0. Using a complex meta-analysis algorithm, the software performs enrichment analysis of miRNA gene targets with the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway database; a resource of pathway maps based on metabolism, cellular processes, genetic processing, environmental interactions and human diseases, generating KEGG pathway hits with a p-value of <0.05 (Vlachos et al.,

2012). Diana-miRPath can be accessed from:

2.2.6.7 Sequence Manipulation Suite

The CpG Islands sequence analysis function available at the Sequence

Manipulation Suite

reports potential CpG islands using the method described by Gardiner-Garden and Frommer (1987). The calculation is performed using a 200 bp window moving across the sequence at 1 bp intervals. CpG islands are defined as sequence ranges were the observed/expected value is greater than 0.6 and the GC content is greater than 50%. The expected number of CpG dimers in a window is calculated as the number of 'C's multiplied by the number of 'G's divided by the window length.

103

2.2.6.8 UCSC Genome Browser and Galaxy

Bioinformatic analysis of human and rat genomes were performed using the UCSC Genome Browser, assembly hg19 and rn5, respectively

ENCODE transcription factor ChIP-seq data (March 2012 release) with flanking sequences of precursor-microRNAs (pre-miRNAs), data was uploaded through the Table Browser function on UCSC into the web-based platform Galaxy

upstream flanking sequences of pre-miRNAs using the intersect tool under the Operate on Genomic Intervals function, which allows for intersection of the intervals of two datasets. Data was downloaded as a spreadsheet for analysis.

2.2.7 Genotyping

In document The REST/NRSF pathway as a central mechanism in CNS dysfunction (Page 117-122)