2.3 Graphlet Kernels for Large Graph Comparison
2.3.2 Graph Kernels based on Graph Reconstruction
In this section, we define graph kernels based on the idea of decomposing a graph of size
n recursively into its subgraphs of size k. We will refer to these subgraphs ask minors, as formalized in the following definition.
Definition 15 (k Minors) Let M be an×nmatrix. The set of all size-k sub-matrices of
M obtained by deleting n−k rows and corresponding columns of M is called the k minors of M. Analogously, given a graph G of size n, the set of all size-k graphs obtained by deleting n−k nodes from G is called the k minors of G.
Definition 16 (Principal Minors) Let M be a n ×n matrix. The set of all (n −1)
minors of M is called the set of principal minors. Analogously, given a graph G of size n, the set of all n−1 minors is called the principal minors of G.
In the sequel we will be concerned with 4 minors. Therefore, we study some of their properties now. For undirected graphs, the entries in the upper triangular submatrix
2.3 Graphlet Kernels for Large Graph Comparison 71
of the adjacency matrix completely determine the graph (Recall that we are considering graphs without multiple edges and without self-loops). In the case of graphs of size 4, this submatrix contains 6 entries, each of which could either be 0 or 1 depending on the presence or absence of the corresponding edge. Therefore, there are 26 = 64 different types
of graphs of size 4. We refer to these 64 graphs as a graphlets [Przulj, 2007], and denote them as G4 = {graphlet(1), . . . , graphlet(64)}. Corresponding to these 64 graphlets one
can also compute a matrix P ∈ {0,1}64×64 whose entries are defined as:
Pij = 1 if graphlet(i)'graphlet(j), 0 otherwise. (2.30)
P precomputes the isomorphism relationship between graphlets.
Recursive Graph Comparison
Graph reconstruction tries to establish isomorphism between graphs by checking their principal minors for isomorphism. Along the same lines, we define a graph kernel to measure similarity between graphs by comparing their principal minors. Motivated by the matrix reconstruction theorem, we recursively iterate this procedure down to subgraphs of size 4, resulting in a graph kernel based on graphlets.
Definition 17 (Graphlet Kernel) Given two graphs G andG0 of size n≥4, letM and M0
denote the set of principal minors of GandG0 respectively. The recursive graph kernel,
kn, based on principal minors is defined as
kn(G, G0) = 1 n2 P S∈M,S0∈M0kn−1(S, S0) if n >4, δ(G'G0) if n= 4 (2.31)
where δ(G'G0)is 1 ifGandG0 are isomorphic, and 0 otherwise. Now the graphlet kernel is defined as
k(G, G0) :=kn(G, G0). (2.32)
Lemma 18 The graphlet kernel is positive semi-definite.
Proof The proof is by induction. Clearly, k4(G, G0) := δ(G ' G0) is a valid positive
semi-definite kernel [Sch¨olkopf and Smola, 2002]. For any n ≥ j > 4 let kj−1(S, S0) be
a valid kernel. Since the class of positive semi-definite kernels is closed under addition and multiplication by a positive constant, it follows thatkj(G, G0) is a valid positive semi- definite kernel.
It is easy to see that the above kernel simply compares the 4 minors in both G and G0, and hence can be computed non-recursively. This intuition is formalized below.
72 2. Fast Graph Kernel Functions Lemma 19 Let M4 and M04 denote the set of 4 minors of G and G
0 respectively. The
graphlet kernel can be computed without recursion as
k(G, G0) =kn(G, G0) = X S∈M4 X S0∈M0 4 δ(S 'S0). (2.33) Equivalently, k(G, G0) =kn(G, G0) = X S,S0∈G 4 #(SvG) #(S0 vG0)δ(S'S0), (2.34)
where #(S v G) is the number of occurrences of S in G, and #(S0 v G0) the number of occurrences of S0 in G0.
Proof Clearly (2.33) is true for graphs of size 4. For n > 4 it follows by unrolling the recursion and noting that there are n minors of size n−1,n−1 minors of sizen−2 and so on.
To see (2.34) note that M4 and M
0
4 are multisets of elements from the graphlet set G4,
with each graphlet S or S0 occurring #(S vG) or #(S0 vG0) times respectively.
Since there are n4,i.e., O(n4) 4 minors in a graph, the following corollary is immediate.
Corollary 20 Let c denote the time required to perform an isomorphism check on two graphs of size 4. While a naive, recursive implementation of the recursive graph kernel requires O(n2nc) runtime, the runtime can be reduced to O(n8c) via the non-recursive for- mula, (2.33).
While we reduce runtime from exponential to polynomial in the size of the graphs by Corollary 20, then8 factor still represents a major problem in real-world applications. The
expensive step is the pairwise comparison of the 4 minors of both graphs. Note however that if one needs to compute the pairwise kernel on a database ofmgraphs, then theO(n4) work per graph can be amortized by employing the following scheme: Precompute all the 4 minors of the graph, check for isomorphisms to any of the 64 graphlets, and store their frequency of occurrence. Overall, this requiresO(mn4c) effort. Modulo isomorphism, there are only 11 distinct graphs of size 4. Therefore, computing each individual entry of the kernel matrix requires O(1) effort. The total cost of computing the m×m kernel matrix therefore reduces from O(m2n8c) to O(mn4c+m2). Typically, m ≤ n4 and therefore the overall time complexity is dominated by the mn4c term. In the following, we will first
describe an efficient scheme to perform the isomorphism checks efficiently, and then we will show how to avoid the n4 term by an efficient sampling scheme that drastically speeds up the preprocessing step.