• No results found

In this Section we introduce new extension for graph kernels. When matching vertex labels between two graphs, we may want to consider soft-matching which allows in- exact matches. A soft-match potentially allows a more appropriate weighting scheme,

Chapter 4 Finite-Length Graph Kernels and Extensions 50 rather than a binary comparison: match or no match. Even though two labels may not match exactly, expert knowledge may allow us to define a degree of similarity between mismatched labels that provides a better model of the data.

In our research, we are motivated to use soft-matching as molecules are complex struc- tures with other features that can be matched besides the atom label (i.e. Carbon, Oxygen). Soft-matching allows much more flexibility in the graph kernel so that one can incorporate more prior knowledge when designing the kernel. As we are interested in finding walks which correspond with parts of the molecule that binds with the tar- get, we will soft-match mismatched atom labels if the atom shares a common functional property associated with binding to the target. The functional property is given by the topological pharmacophore (TP) label that is used in the TP graph (please see Chapter 2). Each vertex in the molecular graph has a corresponding TP label which we will use for soft-matching.

Soft-matching was introduced with infinite-length graph kernels in G¨artner et al. (2003), where it was termed a non-contiguous label sequence kernel. Here, we prefer the term non-matching, as later on we will introduce a method for matching walks with gaps, and we wish to make clear the distinction between non-contiguous (with gaps) and non- matching (no gaps, but walk symbols may not be aligned). Assume we have the full direct product graph (i.e. ignoring labels) Go = (Vo, Eo). Recall from Chapter 3, the soft-matching infinite-length graph kernel defined in G¨artner et al. (2003) is:

κ(G1, G2) = |VX×| i,j=1 " X n=0 λn((1 − α)E×+ αEo)n # ij , (4.17)

if the limit exists. There are two important points to note about this kernel that should be considered when defining the DP equations for soft-matching. First, the outer sum is over i, j ∈ V×, so only walks that are anchored (or start and end with matching labels) are counted. Secondly, each block of non-matching symbols of length n has a penalty factor of αn.

In Figure 4.3, we show an example of a product graph, G×, that allows soft-matching. If no soft-matching were allowed this example would not produce any walks with length greater than 0 (i.e. counts of matching product vertices). The vertices of the full, direct product graph will allow a match between every vertex so for this example |V×| = 16. We only show the product vertices that have edges to emphasize where the soft-matching occurs. The 12 remaining product vertices will not be counted as we only start and end walks at matching vertices (anchored walks).

Chapter 4 Finite-Length Graph Kernels and Extensions 51 G 1 G2 G X D C C D D E A A B C A B A C D E

Figure 4.3: Two graphs, G1 and G2 are shown with their product graph G×. Note

that walk ABC in G1 is soft-matched with walk AEC in G2. This is shown with the

corresponding walk in G×, (A, A),(B, E),(C, C). The first component of a product

vertex (shaded) is from G1, while the second component is from G1.

4.5.1 Soft-Matching using DP

In this Section we describe how soft-matching is added to the dynamic programming equations. Assume we have a product vertex, vi = (a, b). We first introduce a sub-

stitution matrix S where Svi = [0, 1] indicates the strength of a match for a product vertex. If a product vertex has matching labels (i.e. label(a) = label(b)) then Svi = 1. In order to describe the DP equations, we need to introduce notation to differentiate between product vertices with matching labels and non-matching labels. Let VN be the set of vertices from Vo that do not have matching labels, i.e. VN = Vo\V×. For specific

vertices, we will indicate this by superscript, that is v×i , viN will denote vertices from V×, VN respectively. The dynamic programming equations are as follows:

D0(vi×) = 1 D0(vNi ) = 0 Dn(vi) = Svi X vj:(vj,vi)∈Eo Dn−1(vj)

with the kernel computation the same as eq (4.3):

κ(G1, G2) = X

vi∈V×

Chapter 4 Finite-Length Graph Kernels and Extensions 52 Setting Svi = 1 for vi ∈ V× and Svi = α for all vi ∈ VN gives the kernel defined in eq (4.17) from G¨artner et al. (2003). We note that our definition gives a much more general soft-matching kernel where one can choose specific weights for each label allowing one to incorporate more prior knowledge. With our approach to soft-matching we can specify different weights for certain mismatched labels, which allows one to incorporate expert knowledge. The walks are anchored to matching vertices by initializing the matching vertices to 1 and non-matching to 0 in the DP equations. The last vertex in the walk is anchored to a matching vertex as well, since the kernel in eq (4.18) only counts walks that finish in V×. The time complexity of this kernel is O(p|Vo|d), where d is the

maximum degree of Go. Although we have only shown the soft-match DP equations for

finite-length kernels, we will show the DP equations for infinite-length kernels in Section 4.7.