8.5 The Plant Microarray Databases
8.5.1 Project Specific Requirements
One goal of all Medicago related projects is to deliver integrated databases, that store the microarray-based expression data obtained in the course of the project, together with relevant information on the experimental conditions profiled and the protocols used to obtain transcriptome profiles.
Currently the sequencing of the Medicago truncatula genome is an ongoing project. Consequently, the reporter sequences of the microarrays were designed against ESTs. From the point of data representation, there are no direct implica- tions. The PCR-primer pairs and oligonucleotide sequences can be directly entered into EMMA2 sequence objects. The EST sequences are organized into tentative consensus sequences by clustering the ESTs. While new ESTs are sequenced or existing ESTs resequenced the assignment of ESTs to TCs might change and hence the annotation of the TCs. The process of re-annotation of ESTs is frequently carried out, creating the need for dynamic updates of the annotations.
The EST annotation should be kept up to date by linking the internal sequence annotations against an external component using the BRIDGE integration com- ponent. At first, there was no BRIDGE-aware component suitable for linking the representations against. As the internal data representation of EMMA2 allows for storage of freetext annotations of sequences, this was used as a fall-back. The ESTs were then linked against the TIGR-medicago gene index to facilitate map- ping the ESTs on their TCs. The sequence descriptions in EMMA2 were regularly updated by a script. At a later stage, the sequence data was linked against the BRIDGE-aware SAMS application (see below).
Another specific requirement, that emerged early in the GLIP project was a data- mining component for sequence annotations. The users should be able to query for expression data from the whole project, not only from specific experiments. The query should be made on known unique sequence identifiers or with a boolean full- text search within the annotations. This search strategy should resemble search mechanism employed by search-engines for web-pages.
8.5.2 Project Setup
The GLIP-microarray database was set-up using the standardized procedures. Databases for the ArrayLIMS and EMMA2 systems where created using the stan- dard table-definitions. No additional optimizations were required in the first place. The standard role definitions were also found to be sufficient for the GLIP database. The initial project administrator (termed ’Chief’) was the only initially registered user. All other users were registered by the administrator on their request.
Concerns of data privacy raised by the user community resulted in the creation of several user-groups making collaboration within an institution possible, while disclosing the data to other users for a limited period of time.
The requested integration of the novel datamining tool created the necessity to modify the database and EMMA2 modules. Due to the flexible modular design,
8.5. The Plant Microarray Databases 163
this step required only limited efforts. To make specific data fields searchable, a single full-text index was added to the Description-object of the GLIP-database. A display module and a backend module were added to the standard modules to provide the requested functionality (see Figure 8.10 on the next page). After this implementation step and an automated database-update, the datamining function- ality was automatically available within all other projects. Data integration with the ESTs stored in the SAMS system was established by adding BRIDGE URIs to the array layouts in EMMA2.
8.5.3 Results
After the set-up phase, the users were able to upload experimental data to the ArrayLIMS and to use the analysis pipelines autonomously. Currently, the plant databases contain a total of over 300 hybridizations.
Several relevant findings could be obtained by using the microarray data in con- junction with the EMMA2 analysis pipelines. Within a study of Yahyaoui et al. (2004), over 750 genes, including a large proportion of transcription factors, were found to be differentially expressed during root nodulation by using the Mt6k-RIT macroarrays and microarrays. The authors applied a pipeline of normalization and t-test with a combined filtering strategy in combination with hierarchical cluster analysis. By visual inspection of the cluster results, the authors end up with five independent clusters and conclude that there exists a clear switch between a general root-specific and nodule-specific gene expression program.
Based on Mt6kRIT microarray hybridizations, several comparative transcription profiling studies of root nodules and root tissues during AM formation (K¨uster et al., 2004; Manthey et al., 2004) now allow for a more global comparison of ex- pression profiles during nodulation and formation of mycorrhizza. It was found that the two endosymbioses, although they were known to share common mechanisms, have only limited overlap of their genetic programs, with 75 genes being co-induced in the two interactions.
The article of Firnhaber and colleagues provides insights into the developmental expression regulation during the development of M. truncatula flowers and pods. The authors describe the extension of the of the Mt6RIT towards the Mt8k mi- croarrays and their subsequent application to identify more than 700 genes with developmental expression regulation (Firnhaber et al., 2005).
In a recent study, the more comprehensive Mt16kOli1 70mer oligonucleotide mi- croarrays were applied to specify the overlapping genetic program activated by two commonly studied microsymbionts, Glomus mosseae and Glomus intraradices. In total, 201 plant genes were significantly co-induced at least 2-fold in either inter- action (Hohnjec et al., 2005), using normalization functions and statistical analysis pipelines implemented in EMMA2. A set of well-known marker genes were found to be co-activated, thus validating the transcriptomics data (Hohnjec et al., 2006). As EMMA2 is used throughout all three plant-related functional genomics projects, a cross project integration of the obtained expression data is feasible.
Figure 8.10: The Datamining Wizard of EMMA2. It allows to search for expression data by a boolean full-text search. The search can be restricted to experiments and conditions. Below the search mask a table containing the results of the search is depicted.