• No results found

Exploring and Comparing Networks

3.2 Network Inference

3.2.2 Exploring and Comparing Networks

There are several automated community detection methods that decompose complex networks in their building blocks, which constitute patterns (motifs) of potentially interesting relationships be- tween elements [129]. MAVisto [158] (Figure 3.9) is a representative tool, which uses algorithms and visualisation for exploring motifs in biological networks. There are many other similar meth- ods, which mostly focus on the task of motif discovery such as: the one proposed by Song et al. [168], FANMOD [191], POWRS [52] and SeAMotE [5]. For instance, PheNetic [53] supports the interpretation of molecular profiling data using networks. Other tools suggest motif simplifica- tion [62] or summarisation [136] techniques, to enhance network representation, exploration and analysis. Most of those motif discovery methods use heuristics and their results are often neither accurate nor stable. In addition, many of those methods only support the analysis of a single large and complex network, while the challenge of Bayesian network inference requires the combination of features from many different networks, which usually, are relatively small (up to fifty nodes). A more flexible approach would be to use consensus clustering as a framework for combining results [110]. However, it is not clear how to optimally combine results in an automated way because there is a lot of variation in the data, many network formats and many possible network structures. Most importantly, it is hard to integrate tacit knowledge of domain experts in a fully automated method. Thus, there are no sufficiently good algorithmic solutions for detecting the best network model in any data set. Visualisation approaches can provide flexibility in exploring heuristic search results and in constructing consensus networks which are the two main challenges in Bayesian network inference.

Research in network visualisation has yielded a plethora of tools and techniques to improve layout readability, visualise networks with specific attributes, as well as visualise and explore network series (dynamic networks) [183, 24]. Techniques exist for the comparison of two data sets [75] as well as for the visualisation of series of data sets in temporal data [19]. The challenge of finding differences between two graphs has been previously studied and the most common approach is small multiples in which the topology of each network is shown clearly, however comparing networks becomes a cognitive task. Archambault et al. present an algorithm which uses the difference map between two graphs to decompose their nodes and edges in order to create a hierarchical structure which can be used to find differences more easily [11, 14]. Semantic Graph Visualiser (Figure 3.10) aims at the comparison and merging of two different networks by superimposing common nodes and by using colour to encode their different properties [10]. ManyNets aims at the comparison of multiple networks by visualising network metrics and metric distribution in table format [68]. To help with the comparison of multiple networks, Hasco¨et and Dragicevic [85] propose an interactive select-and-hide method to allow comparing multiple topologies by colouring networks and allowing the user to enable or disable networks. The main problem with these methods is scalability with respect to network density (the more links the network contains and the more networks are “superimposed”, the more line-crossings occur) as well as the number of networks.

Figure 3.9: Screen-shot of MAVisto analysing a transcriptional regulatory network of Saccharomyces cerevisiae with different per- spectives to explore motifs. On the left-hand side, the network is shown with the motif-preserving layout of highlighted matches of the feed-forward loop motif. On the right-hand side, all discovered motifs can be further analysed. Detailed information is presented in the motif table (top), the structure of the currently active mo- tif is displayed in the motif view (middle) and the motif frequency spectrum is shown in the motif fingerprint (bottom) (as found in Schreiber et al. [158]).

Besides automated and general network visualisation approaches, there are visualisation tools which specifically target the analysis of Bayesian networks. VisNet [196] has been explicitly designed to visualise properties of a single Bayesian network using a node-link representation. Elvira [109] uses a similar approach but gives more emphasis to the interpretation of the Bayesian networks. Kadaba et al. [97] use animation to show causal relationships in networks. NetEx [49], which is a Cytoscape plug-in, targets the problem of visualising large Bayesian networks as node- link diagrams. The Visual Causality Analyst [184] provides a GUI that supports causal reasoning and it also uses node-link diagrams to represent Bayesian networks. CompNet [108] (Figure 3.11) facilitates the comparison of networks visually and via metrics. The tool presents an overlay of a number of networks and statistics on the presence or absences of nodes in given clusters of this union. However, all these tools work only for one or a small number of Bayesian networks and none of them supports any specific visualisation or interaction capabilities for the exploration of heuristic search results and the creation of consensus networks. Thus, the limitation is visual scalability in terms of the number of networks they can show in a readable manner. This thesis targets the task of exploring the output of heuristic search algorithms, such as the ones included in BANJO, which can generate hundreds of networks in a single run.

Figure 3.10: The Semantic Graph Visualizer (SGV) comparing two process graphs representing workflows involved in buying a computer. (as found in Andrews et al. [10]).

Figure 3.11: (a) CompNet canvas displaying the union of eight protein-protein interaction networks. The names of nodes belong- ing to different communities are marked with different colours. (b) The ‘pie-nodes’ representation enables to identify the presence/ab- sence of individual nodes across the compared networks. (c) The cumulative community distribution plot (d) Bubble chart repre- senting similarity between networks (e) Hierarchical tree built us- ing network similarity (as found in Kuntal et al. [108]).

There are many network analysis tools in the literature, which are either generic or they are tai- lored to perform specific visualisation tasks. Also, there are several reviews that compare the dif-

ferent features of those tools, mainly focusing on tasks that those tools can perform [154, 141, 171]. Although there is a rich literature in graph comparison tools and algorithms for motif discovery, there is a lack of tools that address the challenge of network inference [7]. Probably this is be- cause our targeted challenge is only relevant to the post-processing of results produced by network inference methods, which generate relatively large numbers (possibly hundreds) of candidate net- works. However, we found that the design of some already existing visualisation tools and tech- niques could be extended to also support the exploration and comparison of heuristic search results used for inferring Bayesian networks. In Chapter 5, we describe how we evaluated some of those approaches with domain experts who wanted to infer networks from their data. Then we present the design of BayesPiles, a novel visual analytics tool which has been created based on the de- sign of an already existing tool for exploring dynamic graphs, called MultiPiles (Figure 3.12) [18]. BayesPiles can be used as a visual analysis component within the experimental process that modellers often follow to infer the structure of biological Bayesian networks.

Figure 3.12: Small multipiles (i.e. MultiPiles) create lists (piles) of similar dense graphs in a time line, visualised using the technique of adjacency matrix. Larger piles indicate longer occurrences of the graph in the time line [18].

Summary of challenge 2: Heuristic search algorithms sample the search space of all possible network models based on parameter settings and a network score that encodes the fitness of its structure to the underlying data. The purpose is to find a final consensus network which is rep- resentative and explains the observations collected from the biological system. In this process, the role of the modeller is to guide the heuristic search (choosing the algorithm and setting its parameters) and to decide on a method that determines the structure of the final consensus net- work. Our modellers were interested in exploring sets of multiple candidate networks generated by a Bayesian network algorithm [166], implemented and distributed freely as part of the soft- ware package called BANJO [167]. The process of exploring such data sets was found to be very complicated and time-consuming. Thus, there was a need for developing a visualisation ap- proach which can strengthen the argument for choosing a particular consensus network as the final model, and which can also speed up the process of finding it, by providing visual feedback to the user. Moreover, a visual analysis approach can facilitate the reproduction of networks by different modellers, supporting the cross-validation of research results. Based on a literature review and modellers’ feedback, we identified MultiPiles [18] as the most promising technique for exploring

many networks. In Chapter 5 we present BayesPiles, a novel tool inspired by MuliPiles which can support domain-specific visual analysis tasks for inferring Bayesian networks.