• No results found

of a transcription factor induces the expression of a target gene in one tissue, this might not be the case in another tissue. It is a crucial issue to generalize experimental results as far as possible, but the difficulty of this inductive step is one of the reasons why biology is such

a complex science. For working with pathway queries we suggest to generalize and transfer

facts liberally, so that as much knowledge as possible can be utilized for the generation of instances. The scoring methods and manual inspection must then determine the relevant instances.

In summary, thepathway query languageallows to make use of background knowledge in

the form of networks at different levels of detail. The more detailed the available networks

are, the more specific pathway queries can be developed and analyzed. Pathway queries

are no substitute for detailed dynamical models, but they represent an efficient way of querying available knowledge and putting it into context with experimental data.

7.2

Future challenges

Advances in biology and especially in bioinformatics are often driven by the technology of biological measurements. Novel experimental methods often allow addressing new biologi- cal questions and produce new types of data. Therefore, novel algorithmic techniques will often be required as well. Thus, if we would like to know what awaits bioinformatics in the coming years, we have to take a look at developments in biotechnology.

But not only biotechnology but also new understandings in biology can lead to new challenges. An example for new insights in biology that raised new problems is the un- derstanding of a gene. Not long ago it was believed that in most cases one gene would correspond to one protein. While alternative splicing is still not well understood today, it is generally believed that it does play a fundamental role in the generation of the great functional diversity in the human proteome.

The analysis of biological data like sequences or mRNA expression in order to answer specific research questions is at the core of bioinformatics. But besides this targeted data analysis, the integration of data may allow to address new questions, which we tried to demonstrate in this thesis. This kind of data integration is another field where many challenges lie ahead. Especially for industrial applications of bioinformatics methods it will be crucial to provide data from many sources together with algorithms working on these data.

In the following, challenges arising from these different aspects will be discussed.

7.2.1

New generation of microarrays

Due to improvements in lithography technology, the main provider of high density oligonu- cleotide microarrays, Affymetrix, can print more and more probes on a single array. The next generation of Affymetrix microarrays will include the All-Exon arrays and Tiling ar- rays. The All-Exon array type has recently become available and the Tiling arrays will be available soon. Both types have several million probes on a single microarray. The

156 7. Conclusions and Future Work

All-Exon array has a probe set (usually consisting of four probes) for every exon supported by an Ensembl or RefSeq transcript sequence and, in addition, many putative exons de- rived by gene prediction software and other methods. This technology makes it possible to discover differential splicing events, i.e. genes that are differently spliced in two experi- mental conditions. Splicing is believed to have great influence on the functional diversity of proteins, and differential splicing therefore represents a regulatory mechanism that for the first time can be investigated on a large scale. The final goal of an analysis with exon arrays is to estimate the expression level not only for genes, but for all possible transcripts. Tiling arrays represent the complete genome sequence except repetitive elements at a certain resolution, and thus allow for the identification of previously unknown transcribed elements. Possible applications of tiling arrays are discussed in Bertone et al. (2005).

7.2.2

Proteomics and metabolomics

Mass spectrometry (MS) is currently the key technology for proteomics and metabolomics. Advances in related technologies drive the development in proteomics and metabolomics. Glinski and Weckwerth (2005) demonstrate this connection for studies in the area of plant physiology. They also describe what can be achieved with state-of-the-art MS technology. Fiehn (2002) provides an overview on metabolomic analysis methods, their differences and terminology. In addition, some approaches for mining metabolite data and modeling the metabolic behavior of an organism are described.

Today, models of the metabolism of many organisms are available at a quite de- tailed level. The stoichiometry of most relevant reactions as well as the participating enzymes are known. One important challenge is the integration of transcriptomic data with metabolomic data. Such integrated models will foster the understanding of the inter- play between metabolism and regulation of gene expression.

7.2.3

MicroRNA and epigenetics

Regulation through microRNAs and DNA methylation constitute regulatory mechanisms that receive increasing attention in biological research. While expression regulation through protein-DNA interaction by transcription factors is in principle quite well understood, the importance of those two mechanisms has only recently become obvious. Large-scale measurements for microRNAs will soon become available as they can be performed with microarrays, for instance with the tiling arrays described above. So far, no technology for large-scale measurements of DNA methylation states has been developed.

These two new regulatory mechanisms add a new dimension to the process of gene and protein regulation. Since microRNA targets might be easy to identify by sequence methods and their concentration can be measured with microarrays, it can be hoped that their understanding will develop quickly and their influence can soon be accounted for in regulatory models.

It is also possible to induce gene silencing with artificially introduced RNAs, which is then called RNA interference (RNAi). Using so-called cell microarrays, genome-wide loss-