CHAPTER II: MICROARRAY ANALYSIS AFTER T7-BASED RNA
3.5. Data retrieval: the database web interface
3.5. Data retrieval: the database web interface
A set of web pages were set up to allow a user-friendly interface to the LPD
(Fig 3.5.1), which can be found at
http://w3pain.biochem.ucl.ac.uk/idiboun/develop/search/searchCommonGenes /introduction.php. The web pages allow retrieval of various types of data from the LPD and were designed in accordance with a set of anticipated use cases specified by potential users from the LPC. One important use case was the possibility to retrieve genes showing a similar pattern of expression regulation across a number of microarray pain experiments. Figure 3.5.1 shows the form that allows this search to be conducted. Various drop-down menus and free- text fields are used to allow the user to specify the required search parameters. Among these, the pain model(s) of interest so that all microarray experiments featuring this model(s) are compared or alternatively, a subset of experiments that are of particular interest to the user. In addition, the desired fold change or significance value, allowing the most significant subset of the common genes to be filtered out. Importantly, the ability to identify common genes between different experiments is powered by the mapping between the heterogeneous gene identifiers from the different array platforms, discussed earlier.
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.5. Data retrieval: the database web interface
Figure 3.5.1. LPD meta-analysis web pages. Showing (A) the search form that allows genes commonly regulated in a number of selected expression studies or pain/neuropathy models to be retrieved, (B) the result from this search.
A
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.5. Data retrieval: the database web interface
Figure 3.5.1-B shows the results from a search of commonly regulated genes across a number of randomly selected studies. The results for each gene are shown in a separate table. The rows of the table describe information about the gene as specified by each selected dataset; including the gene identifier, a textual description of the function of the gene and the fold change.
Further to searching for commonly regulated genes across varying microarray experiments, an important use case scenario consisted of the ability to browse functional information of lists of genes of interest; such as the ones obtained from cross-comparing microarray experiments. Figure 3.5.2 shows the LPD web interface that allows functional information for a given gene in a gene list to be broken down by homology to the protein annotation source as well as the type of annotation consisting of KEGG or GO.
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.5. Data retrieval: the database web interface
Figure 3.5.2. LPD functional annotation web pages. For each gene/probeset, GO and KEGG functional information are broken down by sequence identity to BioMap protein homologs serving as the source of annotation.
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.6. Conclusion
3.6. Conclusion
Microarray screening is characterised by a sheer genomic scale amount of data. Setting up a microarray database that is capable of handling such data efficiently is a non-trivial task and is further compounded by the need to project functional annotations on the gene expression data. The latter are heterogeneous in nature and often use different nomenclature schemes to refer to the same genes; which adds significantly to the complexity of the task involved. Furthermore, the need to capture information on the microarray experimental procedure implies an additional layer of data, leading to an even more complex underlying database schema.
The work presented in this chapter has certainly shed light on some of the overheads with the setting up of a microarray database. First, the integration of disparate gene expression and functional datasets proved rather challenging and is a process that requires considerable amount of time and resources to be maintained. Second, our choice to use a simplified data model than MIAME, although beneficial from the point of view of reducing the complexity of the data model, proved occasionally inefficient for failing to capture more complex microarray experimental designs such as time course experiments
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.6. Conclusion
and also for offering little assistance with constructing MIAME compliant descriptions of LPC microarray experiments.
In effect, many of these complex tasks such as the formalisation of descriptions of microarray experiments based on the MIAME standard and data integration are fairly non-specialised procedures that can be handled with generic software. This is because the MIAME data model was designed to be fairly general to accommodate all different microarray experimental designs that might be applied to study any biological phenomenon. Similarly, industry manufactured genomic-wide arrays, such as Affymetrix arrays, are becoming very popular among research communities undertaking microarray work. Because of their popularity, robust functional annotations for these arrays have already been assembled and are constantly revised by many independent sources; examples are the annotations by Ensembl and Bioconductor.
Microarray free software platforms are key to leveraging generic software solutions intended to serve routine handling of microarray data. For instance and as outlined in the introduction of this chapter, many provide user friendly tools for experimental data input in the MIAME format and deploy the logic of the MIAME model to support downstream statistical analysis of the data. Array probes functional annotations are provided built-in and additional
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.6. Conclusion
annotations may be easily incorporated, which also provides a mechanism for easy updates. Moreover, many free software microarray platforms provide generic tools for meta-analysis of the data; notably, cross comparisons of gene lists across different datasets of similar array platforms.
In effect, open source software systems constitute ideal microarray data management platforms. Thus, in addition to offering basic generic functionality for handling microarray data, these tools are often fully extendable; which allows them to harbour additional tools tailored to the specific needs of specialised research communities. In the future, the LPD will benefit from the open source software solution by adapting the maxd software (highlighted in the introduction section), for its numerous benefits. First, the fact that maxd accepts and assists in the development of customised MIAME data model is an attractive feature that, together with the use of ontologies, will help the LPD evolve into a pain knowledge-base repository. Second, maxd has a range of data browsing and analysis tools that would allow members of the LPD to conduct basic manipulations and searches of the data. Finally, maxd is configured to allow easy incorporation of additional functionality. This feature will be used to incorporate in-house analysis protocols as well as other free analysis software tools such as MatchMiner (Bussey et al., 2003). The latter is a tool that allows mapping of heterogeneous
3. Adatabase of gene expression data from animal models of peripheral neuropathy
3.6. Conclusion
gene identifiers, which is instrumental for cross-comparison of microarray results obtained with different array platforms.
4. A Gene ontology based model of the functional characteristics of peripheral neuropathy
4.1. Introduction