Network comparison - VANESA - A bioinformatics software application for the modeling, visualiza

Figure 5.9: Biological hub detection measurement in a biological protein-protein interaction network in VANESA. Nodes with the most incident edges are highlighted. Nodes with the same vertex degree are colored in the same way.

properties, which only occur in special kinds of biological networks, such as a high graph density value in protein-protein interaction networks, can be easily identified. With this information, networks and motifs can be classified and characterized.

Additionally, all reconstructed networks can be compared with randomly generated networks. Therefore, VANESA offers the possibility to generate random, regular, bipartite, connected, and Hamilton graphs with a given node size. The graphs can be directed or undirected, as well as weighted. Therefore, the Barabási-Albert (BA) model is implemented, among others, which generates random scale-free networks [AB02]. With these networks, statistically meaningful comparative analysis can be performed, which is especially useful in theoretical biology.

5.6 Network comparison

To determine differences and similarities between a given set of networks, new graph comparison techniques have been implemented in VANESA. One of the realized techniques is the so-called “heat-graph” approach.

A heat graph is a graphical representation of a set of different networks, where the individual nodes are color-coded in accordance with their frequency of occurrence in the set of networks.

Figure 5.10: Comparison of a protein-protein interaction network with randomly generated networks, based on following centrality measurements: largest, smallest and average vertex degree, average neighbor degree, graph density, centralization, global matching index, and clustering coefficient, average shortest path, and maximum path length. Results are visualized in a parallel coordinate plot, which intuitively shows that a protein-protein interaction network differs in centrality values, such as average neighbor degree and average shortest path.

110 5.6. Network comparison

Figure 5.11: Heat-graph result for 4 biological networks. It is clearly visible that the back- bone of the merged graph is constructed of proteins that appear in all networks (blue circle), whereas specialized proteins (red circles) influence the information and processing flow within the network.

The more often a certain node appears across the different networks, the more important it is in its function, and thus, color-coded by a large blue circle in the heat-graph approach. The less it appears within the set of networks, the more specialized it is in its function and thus, color- coded by small red circles (see Figure 5.11). However, colors, shapes, and outward appearance can be adapted by users.

The heat-graph is constructed by following four steps:

1. Consider a set S of graphs G1. . . Gn containing equal subgraphs or nodes. Based on S

create one merged graph Gm, consisting of all graphs in the set S (see Algorithm 5.6),

2. compute a vector where graph topological similarities over the set S are represented, 3. layout the merged graph,

4. paint the heat-graph using custom color mapping.

The comparison vector is constructed by identifying equal or similar topological structures in S. If topological similarities exist such as similar subgraphs, information about these structures is stored in a matrix/vector product, which is described in Definition 13.

Algorithm 12Merge a set of graphs Input: Set S consisting of graphs G₁. . . Gn

Output: Merged graph: Gm 1: Initialize G_m = ∅ 2: for G_i∈ S do 3: for v ∈ VGi do 4: if v /∈ G_m then 5: Gm = Gm∪ v 6: for e ∈ EGi do 7: if e /∈ E_G_m then 8: EGi = EGi∪ e

Definition 13. A comparison vector V is a vector containing elements v1 to vi where:

vx:= X e∈Gm    1 e = x 0 e 6= x

Finally, the heat graph is visualized. For each node in Gma specific corona is painted. The color

is defined by the function: c(x, y) = max(c(x, y), DCF (v, x, v)), where DCF is the distance correction function DCF (v, x, y) = hammingwindow(x − vx+ R + y − vy+ R, 2R) with R =

circle radius defined by the user. The distance function is used to create smooth transitions between overlapping coronas. Briefly, the hamming window function calculates a shape similar to that of a cosine wave for two overlapping circles. In the last step, users can choose color mapping that highlights results in different ways.

In addition to the heat-graph approach, VANESA offers the possibility to visualize overlapping biological networks in a 2.5D space. In this restricted three-dimensional representation each network is separately visualized. The common sub graphs are visualized on parallel two- dimensional planes (see Figure 5.12 and 5.13 for two examples). The method is similar to the heat-graph approach, although it uses other drawing aesthetics. In the heat-graph approach, overlapping parts are highlighted with different colors in one merged graph. In this approach, the intersections are visualized in the middle plane of all other networks. The advantage of this method is that by visual analysis, connections between different networks can be highlighted and simultaneously exposed in their differences. This approach is especially well-suited for large networks as it reduces the size of all networks to a subset of relevant overlapping subgraphs. Furthermore, users can visualize each network in 3D space, where they can navigate through the network, rotate it, and center as they wish. In addition, it is also possible to compare two networks with each other in 2D space, such as presented in Figure 5.14.

112 5.6. Network comparison

Figure 5.12: 2.5D comparison function in VANESA, where overlapping biological networks are visualized in a restricted three-dimensional space.

Figure 5.13: 2.5D comparison function in VANESA. A zoom-in of the middle plane, where overlapping parts of a set of protein-protein interaction networks are visualized.

Figure 5.14: The figure shows two different regulatory networks being compared in terms of similarities and network structures within VANESA. Biological elements, which occur in both networks, are colored yellow. Regulatory processes taking place in both systems are highlighted with blue edges. Based on the results of the network comparison functions, scientists are able to focus on specific structures and elements, which they could visually examine within the visualization pane of VANESA.

In document VANESA - A bioinformatics software application for the modeling, visualization, analysis, and simulation of biological networks in systems biology applications (Page 122-128)