• No results found

Pattern analysis using graph structures has proved to be a challenging problem. The reason is that graphs have no ordering relations and so they cannot be easily converted to pattern vectors. Hence, the classical statistical methods from pattern recognition or machine learning cannot be directly applied to graphs without first converting them to pattern vectors. To overcome this problem, one way is to extract a pattern vector from a graph that captures the structure of the graph in a way which is permutation invariant. These features can be the number of nodes and edges, the node degrees etc.

Another approach for graph characterization is to extract permutation invariant char- acteristic from the matrix representation of the graphs. The matrix representation can be the adjacency matrix or the closely related Laplacian matrix of the graph. In [83], Wilson et al. have used the spectral decomposition of the Laplacian matrix and basis sets of symmetric polynomials to convert the graphs into pattern vectors. These pattern vectors are complete, unique and continuous and more importantly they are permutation invari- ant. They have also explored different methods to embed the vectors in a pattern space suitable for clustering including principle component analysis (PCA), multidimensional scaling (MDS) and locality preserving projection (LLP).

Ren et al. [59] have used the coefficients of the characteristic polynomials of the matrix representation of the graph to embed the graph into a higher-dimensional feature space. The characteristic polynomial is the determinant det (λI − M ), where I is the identity matrix, M is the matrix representation of the graph, and λ is the variable of the poly- nomial. With the appropriate choice of the matrix, these coefficients capture the cyclic

structure of the graph [62,70]. They have explored number of different matrix representa- tions, including the adjacency matrix, the Laplacian matrix which is the adjacency minus the diagonal degree matrix, the Perron-Frobenius operator which is the adjacency matrix of a transformed version of the graph, and the adjacency matrix of the positive support of third power of the discrete time quantum walk matrix. They have experimentally shown that polynomial coefficients perform better than the graph spectra.

Another approach for graph embedding, related to the embedding methodology we propose in this thesis, is to use the frequencies of substructures as feature for the vectorial representation of the graph. These substructures can be random walks, shortest paths, or cycles in the graph. For example Kashima et al. [35] use frequent path finding algorithm that finds m frequent paths for constructing feature space. Recently, Ren et al. [62] have explored the use of the Ihara zeta function as a mean of gauging cycle structure in graphs. The Ihara zeta function is computed by first converting a graph into the equivalent oriented line graph, and then computing the characteristic polynomial of the resulting structure. The coefficients of the characteristic polynomials are related to the frequencies of prime cycles of different sizes, and can be computed in polynomial time from the eigenvalues of the adjacency matrix of the oriented line graph. The method can be easily extended from simple graphs to both weighted graphs [61] and hypergraphs [60].

Riesen et al. [65] have proposed a general approach for transforming graphs into n- dimensional real vector spaces by means of prototype selection and graph edit distance computation. The key idea is to use the distances of an input graph to a number of training graphs as vectorial description of the graph. Their method is based on the idea of mapping the pattern vectors into dissimilarity spaces [51] [50]. This idea was also applied to strings by Spillmann et al. in [73]. The main challenge in this method is the selection of prototype. Riesen et al. [65] have proposed the following five prototype selectors

Centres: selects the m prototypes situated in the centre of graph set Γ

Random: randomly selects m prototypes from graph set Γ

Spanning: iteratively selects m prototypes as follows. The first prototype selected is the set median graph. Each additional prototype selected by the spanning prototype selector is the graph which is furthest away from the already selected prototype graphs.

the dissimilarity information.

Targetsphere: This method first selects the centre graph and the graph which is furthest away from the centre. The remaining m − 2 are selected by locating the graphs that are nearest to the interval borders in terms of edit distance.

Many other approaches have been proposed to embed the graph in a feature space. Jouili et al. [34] have proposed a graph embedding technique based on the constant shift embedding which transforms a graph to a real vector. The constant shift embedding increases all dissimilarities by an equal amount to produce a set of Euclidean distances. This set of distances can be realized as the pairwise distances among a set of points in a Euclidean space. Xiao et al. [87] have used the solution of the heat equation, called the heat kernel, to embed the nodes of a graph into a higher-dimensional Euclidean space. They have used the geometric properties of the resulting embedding to characterize the graph.