Chapter 2 Methods 2.1 experimental methods 2.1 experimental methods
2.1.23 Graph pad prism
GraphPad Prism is commercial scientific 2D graphing and statistics software published by GraphPad Software, Inc. It has an intuitive interface that allows the plotting of graphs but also statistical analyses of all data. It was used for the plotting and statistical analysis of my experimental data.
57 2.2 Bioinformatics tools
A range of bioinformatic tools have been used throughout this thesis. Making use of publically available microarray data helped to guide the initial
experiments and has provided direction for future work. Sections 2.2.1 through to 2.2.5 explain the programs that were used to create a cytoscape network of OA, as described in 2.2.6.
2.2.1GEO
The gene expression omnibus (GEO) is a publically available international repository run by the National Center for Biotechnology Information (Ron Edgar et al. 2002). It archives and freely distributes microarray, next-generation
sequencing, and other high-throughput functional genomic data sets. GEO stores and provides all details of the study, a link to any associated publications as well as raw data files, processed data, and descriptive metadata, which are indexed, cross-linked, and searchable. The database can be searched
according to any property of the data for example: organism, cell type, treatment or disease.
2.2.2 R
R is a computational language used for statistical computing and graphics.
Originally it was developed by Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues (Team 2013). It is an open source software with a wide variety of statistically and graphical techniques. It also has a variety of “packages” that can be added to tailor it to your specific needs. It allows the incorporation, storage and manipulation of large data files making it ideal for manipulating microarray data. Some of the R packages were also useful for combining multiple microarray data files. Bioconductor is a package that I used regularly.
2.2.2.1 Bioconductor
An open source package designed to provide tools of the analysis and comprehension of high-throughput genomic data (Gentleman et al. 2004). It allows for data to be taken directly from GEO or imported from your own data files. It provides the “RMA” function which computes the RMA (Robust Multichip Average) expression measure described in (Irizarry et al. 2003), providing a
58
quick and uniform way to normalise microarray data. It also contains the function “ComBat” which allows user to correct for batch effects, when the reasons for the batch effects are known. The methodology used is described in (Johnson et al. 2007).
2.2.3 ARACNE
The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) is a novel algorithm that uses mutual information (MI) to quantify interaction between genes (Margolin et al. 2006). This number quantifies the "amount of information" obtained about one random variable, through the other random variable. In this case the variable is gene expression and how much the change in one gene effects the expression of another. MI is scored between zero and one, with one being the highest. How the expression of one gene corresponds to another is used for your data to determine interactions and their MI values.
The data processing inequality is then used to remove the majority of indirect candidate interactions that is inferred by co-expression methods. The algorithm provides a way of using microarray data to expose the functional mechanisms that underlie the cellular processes.
2.2.4 Cytoscape
Cytoscape is an open source software platform used primarily for the visualisation of molecular interaction networks, whilst also integrating gene expression profiles and other state data (Shannon et al. 2003). Cytoscape core distribution provides a basic set of features for data integration, analysis, and visualization. Allowing you to change how the network is visually presented as well as sorting the data by whatever value you have used to quantify
interactions. For example when looking at MI you can sort the data so that any interactions with an MI lower than a set threshold are deleted. This helps to identify the strongest interactions in the data. Additional features can be added to Cytoscape, known as apps. Throughout this thesis I have used two
“clusterMaker” and “PhenomeScape”.
2.2.4.1 clusterMaker
clusterMaker is an open source Cytoscape app that contains multiple different clustering algorithms, which allow you to locate distinct clusters of genes in your
59
network that interact with each other (Morris et al. 2011). These can then be presented as hierarchical groups of nodes that can be exported to other
programs to determine the processes governed by this specific group of genes.
Community clustering (GLay) was used as it is powerful tool that can quickly identify groups of genes that interact exclusively with each other (Su et al.
2010). It then presents these clusters in a uniform order, with the largest cluster at the beginning followed by smaller clusters, leading to single genes at the end.
2.2.5 DAVID
The Database for Annotation, Visualization and Integrated Discovery (DAVID) is an integrated biological knowledgebase that contains analytical tools for
extracting biological meaning from gene lists (Huang et al. 2009). It requires a gene input list containing common identifiers, such as official gene symbols.
The DAVID Bioinformatics Resources consists of five integrated, web-based functional annotation tool suites: the DAVID Gene Functional Classification Tool, the Functional Annotation Tool, the Gene ID Conversion Tool, the Gene Name Viewer and the DAVID NIAID Pathogen Genome Browser. I made use of the DAVID Gene Functional Classification Tool, which allows functional
annotation clustering of a gene list based on enriched biological themes. This means by using pre-selected biological themes, such as GO term or KEGG pathways, the system can then look for terms that contain a significant number of your gene list. This provides an idea of biological functions your gene list could be responsible for.