• No results found

1.4 Definitions and notation

1.4.2 Graphs and Petri nets

As most parts of this work deal with biological networks, a computational representation

is needed. Graphs are a well-studied data structure in computer science and are well

suited to represent biological networks. A graph consists of vertices and edges; in many applications vertices represent some kind of objects, and edges describe relationships among those objects.

In our case, vertices will most of the time correspond to biological entities, mainly genes and proteins, while edges represent interactions between those entities. In some cases, a different representation is advantageous, where we have two different sets of vertices, one of which describes the biological entities and the other one their interactions. Edges in that kind of graph can only occur between the two types of vertices, defining which biological entities participate in which interaction. This can be useful for instance in metabolic networks where interactions correspond to chemical reactions with several participating metabolites and enzymes. A graph with two sets of vertices and edges only between those sets is called a bipartite graph.

The formal definition of a graph is as follows.

3

1.4 Definitions and notation 17

Definition 1.4.2. A graph G is a tuple of vertices and edges, G = (V, E). In a directed

graph, the edges are ordered pairs of vertices E ⊂ V ×V; in an undirected graph, the

edges are unordered pairs E ⊂ {{u, v}:u, v ∈V}.

A subgraph G′ of G is a graph G= (V, E) with V V and E E. An induced

subgraph is defined by selecting only a subset of vertices, while the edges are the same as

in the original graph, but restricted to the selected set of edges: V′ V and v, w V:

(v, w)∈E′ if and only if (v, w)E.

In chapter 5, an algorithm will be introduced that is related to thesubgraph isomor-

phism (SI)problem, which is defined as the following decision problem: Given two graphs

G1 and G2, is there a (not necessarily induced) subgraph G′1 of G1 that is isomorphic to

G2? A graph G = (VG, EG) is isomorphic to another graph H = (VH, EH) if and only if

there is a one-to-one mapf betweenVG andVH such that for all pairs of vertices v, w∈VG

the following condition is fulfilled:

(v, w)∈EG⇔(f(v), f(w))∈EH (1.14)

Sometimes a variant of this problem is considered which is called the induced subgraph

isomorphism problem. In this variant, the problem is to decide if G2 is isomorphic to an

induced subgraph ofG1.

A Petri netis a bipartite graph with one set of vertices called places and the other set

of vertices called transitions. Petri nets are used to study the dynamics of complex systems such as metabolic networks using so-called tokens that are used to describe for instance the concentrations of metabolites. A function giving the number of tokens for each place is then called a marking of the Petri net. Although we do not investigate the dynamics of networks in this work, we will adopt the Petri net terminology of places and transitions, when we deal with bipartite graphs.

Chapter 2

Expression Data Analysis

The term gene expression data refers to the abundances of messenger RNA transcribed from different genes that can be found within a cell under a certain experimentally con- trolled condition. Methods for measuring these abundances include serial analysis of gene expression (SAGE) (Velculescu et al., 1995), cDNA library sequencing (Adams et al., 1991), cDNA subtraction (Bautz and Reilly, 1966; Muerhoff et al., 1997), quantitative real time polymerase chain reaction (RT-PCR) (Bustin, 2000), and northern blotting (Parker and Barnes, 1999). Today, gene expression data are usually collected using DNA microarrays which are capable of measuring tens of thousands of mRNAs simultaneously. As microarray measurements are often impaired by strong noise, classical small-scale meth- ods are still important for the validation of results from microarray analysis or for a more exact quantification of expression levels of single genes. Of high interest for the applica- tion of microarrays are also statistical and computational methods for the analysis of the raw microarray data. This chapter first reviews the microarray technology and then the problems arising from the data analysis task together with some algorithms that have been developed for these problems.

2.1

Areas of application

Gene expression measurement using DNA microarrays has many applications in biology and medicine. An overview of applications and technology can be found in Stoughton (2005). Emphasis on medical application or drug discovery is put in Petricoin III et al. (2002) or Stoll et al. (2005), respectively.

The most common set-up for microarray experiments is the comparison of two biological conditions, for instance comparing diseased with healthy tissue. The goal is to find out more about the disease on a molecular level, or – more concretely – to find potential drug targets among genes that are up- or down-regulated in the disease sample. Even if the regulated genes are no suitable drug targets, they could still be useful as diagnostic markers.

20 2. Expression Data Analysis

lular processes. In an early study that uses a time series of microarray measurements Spellman et al. (1998) investigate changes of gene expression during the yeast cell cycle.

Although the dominant application of microarrays today is the measurement of mRNA concentration, microarrays are not limited to expression profiling. When the sequences deposited on the microarray (the probes) correspond to other genetic features, they can also be utilized for genotyping and help identifying novel genes, transcription factor binding sites, alternative splicing, and exon structure.