Non-Tottering - Graph kernel extensions and experiments with application tomolecule classifica

Mah´e et al. (2004) introduced the idea of non-tottering walks. A tottering walk is one which simply turns around and traverses the previous edge in the opposite direction. A tottering walk is less informative as it will contain repeated vertices and therefore the tottering walks will explore less structure of a graph. A walk which only totters will move back-and-forth between two vertices, without exploring any other area of the graph. In Mah´e et al. (2004), they showed that by modifying the original graphs, totters could be removed. In this Section, we will describe a new method of removing totters that is implemented using product graphs and dynamic programming. This approach will allow us combine it with the four kernels given in Sections 4.2 and 4.3. Next, we will describe how walks may totter in the original graphs but not in the product graph. We update the equations to ensure that we are not tottering in the original graphs as well.

4.4.1 Non-Tottering using DP

Given a product graph, G× = (V×, E×), we can count the number of matching walks

between two graphs using the DP equations given in Section 4.2. Non-tottering can be added to these DP equations with the following. For each round n in the computation

Chapter 4 Finite-Length Graph Kernels and Extensions 48 and for each vertex of the product graph, one simply keeps an array of how each previous vertex, j, contributed to the sum. For n > 1, let:

Anj(vi) =

(

Dn−2(vj) if (vj, vi) ∈ E×

0 otherwise (4.13)

then the DP update equation can be written as:

Dn(vi) = X vj:(vj,vi)∈E× X t:(vt,vj)∈E×∧vt6=vi Ant(vj). (4.14)

For the finite-length kernels, one uses the original DP equations for n = 0 (eq (4.1)) and n = 1 (eq (4.2)). Although, for n > 1 we use eq (4.14).

Similarly, for the infinite-length kernels, one uses original DP equations for n = 0 and n = 1. For the IM graph kernel, the new DP update for n > 1 is multiplied by Πt(vj|vi)

(described in Section 4.3.1) giving: Dn(vi) = X vj:(vj,vi)∈E× X t:(vt,vj)∈E×∧vt6=vi Πt(vj|vi)Ant(vj). (4.15)

For the IG graph kernel, the new update for n > 1 is multiplied by γ (described in Section 4.3.2) giving: D_n(v_i) = X vj:(vj,vi)∈E× X t:(vt,vj)∈E×∧vt6=vi γA_nt(v_j). (4.16)

This modification to the DP update, requires storing A which requires a maximum memory penalty of O(2|V×|d), where d is the maximum degree of V×. Although d = |V×|

for a fully connected graph, it is typically much smaller for a molecular graph. We require 2 times |V×|d as storage for the current and previous iteration in the DP algorithm.

In order to remove totters, the computational cost rises to O(p|V×|d2) from the original

formulation without totters which was O(p|V×|d). It now requires d2 as for a given

vertex, the algorithm must loop over each connected edge as the values in A which shows the contribution from two back. Similarly, one could remove walks that return after k iterations (a cycle of length k) although the complexity rises to O(p|V_×|k_).

We will not include non-tottering features in the DP equations of Section 4.7 as we will not test this in further experimental chapters. Despite its reported success on a small dataset (Mah´e et al. (2004)), we found that removing totters significantly reduced the accuracy of our models.

Chapter 4 Finite-Length Graph Kernels and Extensions 49

4.4.2 Non-Tottering in the Original Graphs

In this Section we discuss how a walk in the original graph may contain totters even though the product graph does not totter. In Figure 4.2 we show an example of two graphs, G₁ and G₂, along with the full direct product graph, G_×. We have uniquely subscripted each of the vertices in G1 and G2 from 1 to 5, although note that we are only matching based on the labels A or B. A walk in G×, ABA, has product vertices

(A₂, A₃), (B₁, B₄), (A₂, A₅). This corresponds to a tottering walk in G₁, given by the first component (shaded), A2, B1, A2, and a non-tottering walk in the second graph G2, from the second component, A3, B4, A5.

G

₁ 4 4 2 B₁ A A B A 2 3 5 A B A 3 5 1 2 B A A

Figure 4.2: On the left is the product graph, G× for two graphs on the right: G1and G2. The dashed lines show a walk on each of the graphs. There is no tottering in the

walk on G×, although tottering occurs in the original graph G1.

The dynamic programming equations that remove tottering in Section 4.4.1 can be updated to remove totters in the original graph as well. We note that each vertex vp

has a label of the form (a, b) where a ∈ V1 and b ∈ V2. Totters can be removed from the original graphs, by modifying the inner summation from eq (4.16). For a given vertex vi, we check that the vertex two back, vt is not the same vertex, given by the following

constraint t : (v_t = (a, b), v_i = (c, d)) ∈ E_×∧ (a 6= c) ∧ (b 6= d). Removing totters may be useful in more complex graph matching, where we allow soft matching of labels and gaps, as it will remove features that may be undesirable.

In document Graph kernel extensions and experiments with application to molecule classification, lead hopping and multiple targets (Page 60-62)