Graph Visualization and Interaction - Functional coherence and annotation agreement metrics for

ables each user the individual deletion of any of their Sets and Collections.

Each Set must belong to a Collection, which besides providing a way to group Sets that share some functional similarity, can also and consequently create a coarser level of granularity. Most importantly, a proper use of the Collection/Set organization is paramount for computing meaningful GO term enrichment p- values. For any given Set, the statistical tests are applied on the remaining Sets in that Collection as the background set to determine the statistical signicance of the enrichment of any given annotation term in the Set being explored.

The input proteins in each Set are expected to have a close degree of functional similarity, such as is the case of functional protein families or other groups of functionally related proteins. Alternatively, a Set can host dissimilar proteins if the intended purpose is just to navigate the generated annotation graph and manually sort and select sub-sets of proteins.

5.2 Graph Visualization and Interaction

GRYFUN enables the generation of the annotation graph for any protein Set (within a given user Collection) under the context of each one of the three GO orthogonal ontologies (biological process, molecular function and cellular compo- nent). That functionality can be accessed through the Explore page on this web application. Figure 5.2 depicts the selection of the molecular_function ontology aspect for generating the GO annotation graph for a Set named PL1 (correspond- ing to the family with the same name) of the Polysaccharide Lyase Collection. In the depicted query, all Evidence Code are considered (default) but each user has the ability to lter the annotations with only the evidence codes that are relevant for their own work.

The annotation graphs generated by GRYFUN are similar (and dependent) on GO graphs, however they present with a couple of important dierences. A GO graph is meant to denote relationships between terms, so while each term

Figure 5.2: GRYFUN's graph generation menu from its Explore Page.

is represented by a node the relations in-between them are represented by graph edges. Figure2.7 shows a GO sub-graph depicting nodes of the biological process GO aspect connected by is_a edges. Each of these edges starts at child nodes (terms) and point towards parental nodes (terms), and thus denote the existing hierarchical relationships between terms. Additionally, all terms converge into a common root node, thus leading to the true path rule that states that the pathway from a child term all the way up to its top-level parent(s) must always be true (Gene Ontology Consortium, 2000).

On the other hand, in the annotation graphs, like the one shown in Figure5.3 the edge direction is reversed. Every protein in a Set leading to an annotation graph is mandatorily annotated to at least the root term (biological_process in this case). Depending on how well annotated any given protein is, it will ow down the graph towards more specic nodes. That ow can be immediately discernible from the annotation graph given that the edge thickness is generated in proportion to the number of proteins that ow down from one parent node to its child node. Therefore, by representing the annotation ow on the graph image, an immediate visual cue is provided regarding the annotation terms that are more represented in any given protein Set.

Hovering the mouse cursor over any graph node will reveal the associated term and its annotation frequency within the current Set as a tooltip. On the other hand, clicking on any of the nodes (white nodes: inherited annotations,

5.2 Graph Visualization and Interaction

Figure 5.3: Example annotation graph of a sample protein set for the GO biologi- cal_process aspect.

color nodes: direct annotations) will dynamically generate a oating window containing a list of the respective UniProt accession numbers annotated to that term within the Set, as shown in Figure5.4. Furthermore, those oating windows also display the respective species names, alphabetically sortable and segregated into Superkingdoms. These lists can be exported into plain TSV les. Also, any number of oating windows, up to the number of nodes in the currently displayed graph, can be open simultaneously. Furthermore, these windows can be dragged anywhere on screen and collapsed and expanded as required. Most importantly, these oating windows enable access to one of the most interesting features in GRYFUN, graph re-rooting. This feature is similar to the GOLEM (Sealfon et al., 2006) focus feature which reduces the graph to a selected GO annotation term and its vicinity (parents and children). On the other hand, the re-root feature in GRYFUN allows the selection of any non-leaf term node in the annotation graph and the generation of a new sub-graph rooted at the term represented by the chosen node. After this re-rooting operation, and despite the Set remaining whole, only the proteins annotated with the new temporary

Figure 5.4: Partial display of the Explore page following a graph generation with the node (for term lyase activity) oating information window displayed on screen as well.

chosen root are considered during the generation of a new annotation sub-graph that subsumes all of their annotations that are children of the new chosen term. Hence, this feature enables the focus on more specic functional branches and terms of interest while abstracting from terms that sometimes describe accessory activities that despite being associated to some proteins in a given Set can be considered to be noise.

In document Functional coherence and annotation agreement metrics for enzyme families (Page 119-122)