Some Observations in Design Pattern Detection by Graph Matching

(1)

Rajwant Singh Rao

Depart of Computer Science & Information Technology

Guru Ghasidas Vishwavidyalaya Bilaspur (C.G.)

Abstract

—

In many object oriented software, there are recurring patterns of classes. With the help of these patterns specific design problem can be solved and object oriented design become more flexible and reusable. Design Pattern existence improve the program understanding and software maintenance. Hence a reliable design pattern mining is required. Graph Matching algorithms are useful and very general form of pattern matchingto find the realistic use in several areas. In this paper we are reviewing the different graph matching algorithms to detect design patterns.

Keywords— design pattern, UML, matrix, subgraph isomorphism

I. INTRODUCTION

To reuse expert design experiences, the design patterns [1] have been extensively used by software industry. When patterns are implemented in a system, the pattern-related information is generally no longer available. To understand the systems and to modifications in them it is necessary to recover pattern instances. For graph matching there are many algorithms, we are reviewing some of the algorithms which can be applied for design pattern detection. The details of each of the algorithms are discussed in sections. The Section II describes about the exact graph matching. Section III describes about the inexact graph matching. Section IV describes the Kleinberg Approach for vertices scoring and Section V describes Subgraph Isomorphism detection.

II. EXACT GRAPH MATCHING ALGORITHMS Exact graph matching processes [7], used to discover a one-to-one correspondence (isomorphism) between the vertices of two graphs which abstain the similar figure of nodes so that there is also a one-to one correspondence between the related edges. In the perspective of design pattern mining, the application of such a procedure would need the investigation of entirely possible sub graphs of the system Graph that have the similar number of vertices with the pattern. The most significant drawback, is that a given design pattern may be

implemented in various forms that differ from the basic structure found in the literature, and as a result exact matching is insufficient for design pattern detection.

III. INEXACT GRAPH MATCHING ALGORITHM

The inexact graph matching algorithm is apply when an one to one mapping is not found between two graphs and the aim is to find the best matching between both graphs. Consider an example, there are algorithms which calculate the edit distance between two graphs [1], defined as the number of modifications required to reach from one graph to the other graph. In the context of design pattern detection this gives inaccurate results.

In inexact graph matching algorithm which graph takes less number of modifications to reach to the target graph is said to the closer to the target graph. In context of design pattern detection it gives the inaccurate result. This is given by the following example.

Example:

Consider the system under study contains two segments represented by the corresponding class diagram, which is shown in figure 1.

The design pattern which is to be detected is also graphically depicted in Fig. 1. This pattern is known as the Facade elemental design pattern [2].The class diagram of segment 1 is a modified version of the design pattern, containing an additional inheritance level. On the other hand, the class diagram of segment 2 does not form a pattern since it only consists of a simple hierarchy of classes. Fig. 2 shows the class diagrams as graphs (one for associations and one for generalizations).

An inexact matching algorithm shows that the class diagram of segment 2 is closer to the pattern. This is because, to obtain

(2)

the graphs of the pattern from the corresponding graphs of segment 2, only two operations is required (one node deletion in the generalization graph i.e. node c and one edge ba and for association graph only the deletion of node a.).

While, to obtain the graphs of the pattern from the corresponding graphs of segment 1, four edit operations in

total are required (generalization graph: deletion of edges CD and, deletion of node A, and node D), deletion in association graph: nodes A and D).

A

B

D

C

a

b

c

2

1

[image:2.595.44.517.231.679.2]

System segment1

System Segment2

Element Design Pattern

Fig. 1 UML class diagrams of two system segments and a design pattern.

Generalization Graph

Association Graph

System Segment1 System segment2 Element Design Pattern

Fig. 2. Corresponding graphs for the UML diagrams shown in Fig. 1 (Letters within nodes are not labels but indicate the name of the corresponding node.)

A

B

a

b

c

1

2

A

B

a

b

c

1

2

D

C

D

(3)

Generalization Matrix

1 2

A B C D

a b c

1

0 0

A 0 0 0 0

a 0 0 0

Gen

pattern

=

Gen

Seg1

= B 1 0 0 0 Gen

Seg2=

b 1

0 0

2

0

C 0 0 0 1

c 0 0 0

D 0 0 0 0

Association Matrix

A B C D

a

b c

1

2

a 0 0

0

A 0 0 0 0

b 0 0

0

Gen

pattern

=

1

0

Gen

Seg1

= B

0 0 0 0

Gen

Seg2

= c

1 0

0

2

1

0

[image:3.595.40.564.94.340.2]

C 1 0 0 0

D 0 0 0 0

Fig. 3. Adjacency matrices resulting from the corresponding graphs in Fig. 2.

IV. KLEINBERG APPROACH FOR VERTICES

SCORING

Kleinberg [3] presented a link analysis algorithm for identifying pages on the web (by calculating hub and authority). That is authoritative sources on broad search queries.

The hubs can be defined as pages which points to many good authorities and authorities are the pages which are pointed to by many good hubs.

Kleinberg [3] presented an algorithm for computing hub and authority score as given below.

• LetNbe the set of nodes in the neighbourhood graph.

• For every nodeninN, letH[n] denotes its hub score andA[n] its authority scores.

• InitializeH[n] andA[n] from 1to n for all n Є N • While the vectorsHandAhave not converged: Do

• For allninN,A[n]:= Σ (n, nl)H[nl] • For allninN,H[n] := Σ (n, nl)A[nl] • Normalize theHandAvector.

This is illustrated by following example.

Consider the UML diagram corresponding to system under study (Fig. 4) and the UML diagram corresponding to design pattern (Fig. 5)

The generalization, direct association and dependency graph of system under study (i.e. corresponding to UML diagram shown Fig. 4) is shown Fig. 6

The generalization and dependency graph of Factory Method design pattern (i.e. corresponding to UML diagram shown Fig. 5) is shown in Fig. 7.

By applying Kleinberg’s approach for hub and authority score, we have Fig.8 and Fig.9 by which we can find out how similar the two nodes of taken graphs are. For generalization graph both model graph (or system under study) and design pattern have similar hub and authority score, but for dependency graph they differ. From Fig.8 and Fig.9, we find that the similarity between two nodes of the taken graphs. For generalization graph both model graph (or system under study) and design pattern have similar hub and authority score, but for dependency graph they differ.

(4)

b

Client

AbstractFactory

+CreateProduct()

AbstractProduct

[image:4.595.43.276.75.672.2]

ConcreteFactory ConcreteProduct

Fig. 4 System Design

Product

ConcreteProduct

Creator

+FactoryMethod()

ConcreteCreator

Fig. 5 Factory Method Design Pattern

Generalization Direct Association Dependency

Graph Graph Graph

Fig. 6 Corresponding Graphs for UML diagram shown in Fig. 4

Generalization Graph Dependency Graph

Fig. 7 Corresponding Graphs for UML diagram shown in Fig. 5

Hub Authority Hub Authority

a

0

a 0

0

b

0

0.5

b 0

0

c

0

0.5

c 0

0

d

0.5 0

d 1

0

[image:4.595.304.519.96.207.2]

e

0.5 0

e 0

1

Fig. 8 Hub and Authority score for Generalization graph and Dependency graph of Fig.6

Hub Authority Hub Authority

a

0

a

0

b

0

0.5

b

0

c

0

0.5

c

0

d

0.5

0

d

0

1

e

0.5

0

e

1

0

Fig. 9 Hub and Authority score for Generalization graph and Dependency graph of Fig. 7

V. SUB GRAPHISOMORPHISMDETECTION A. Theoretical Approach

The subgraph isomorphism is a useful generalization of graph isomorphism. The subgraph isomorphism problem [4] is determine whether a graph is isomorphic to a subgraph of any other graph. Let [5] G1 (V1, E1) and G2 (V2, E2) be two graphs, where V1, V2 are the set of vertices and E1, E2 are the set of edges.

Let M1 and M2 be the adjacency matrices corresponding to the graphs G1 and G2 respectively. A permutation matrix is a square (0, 1)-matrix that has exactly one entry 1 in each row and each column and 0's elsewhere. Two graphs G1 (M1, Lv, Le) and G2 (M2, Lv, Le) are said to be isomorphic [5] if there exist a permutation matrix P such that

M2 = P M1 PT ₍₁₎

Given an n x n matrixM =(mij),let Sk,m(M) denote the k x m matrix which obtained from M by deleting rows k + 1 , . . . , n and columns m + 1 , . . . , n, where k, m < n.A subgraphSof a graph G, S G, is a graph S = (Mi_{, L}_v_{, L}_e₎ _where Mi_{= S}_m,m_{(P M P}T_{) is an m x m adjacency matrix for some} permutation matrix P. The concept of subgraph isomorphism can now be described as follows.

Let G1 and G2 be two graph with adjacency matrices M1 and M2 of dimensions m x m and n x n respectively, where

c

d

e

a

c

d

e

a

c

d

e

a

c

d

e

a

c

d

e

[image:4.595.304.511.241.344.2]

(5)

m < =n. There is a subgraph isomorphism from G1 to G2 if there exists an n x n permutation matrix P such that

M1 = Sm,m(P M2 PT) (2)

We take M2 matrix as a system design matrix and we guess nondeterministically M1 as a design pattern matrix. And then try to find out whether M1 is subisomorphic to M2 or not or it can be easily said whether there exist design pattern in the system graph or not.

Hence the problem of finding a subgraph isomorphism from graph G1 to G2 is equivalent to finding a permutation matrix for which equation (2) holds. Thus, we generate permutation adjacency matrix of a model graph (system under study) one by one and check whether equation (2) holds or not, when it holds we stop and declare that that particular design pattern has been detected. It can be also possible that there is no design pattern exists in system graph. In this case we find no permutation matrix for which equation (2) holds.

B. MATRIX REPRESENTATION

Here we will take the system under study as well as design pattern in the form of UML diagrams. There is one or more design features are present in system and patterns, they can be represented in the form of matrices. For example there is generalization relationship, we have to create n x n matrix for n number of classes. If [6] the ith _{and j}th _{classes have} generalization relationship, the corresponding cell of the matrix should be 1, otherwise 0.

To reduce the number of manipulations, we combine different matrices (like generalization matrix, association matrix, dependency matrix, aggregation matrix etc) into a single matrix. So there will be only two overall matrices, one corresponding to system and one for design pattern. To combine different matrices into an overall matrix, certain root value of a prime number is given to a matrix (different feature matrices) and then combine them. The cell value [2] of each matrix is then changed to the value of its root to the power of the old cell value.

Let us consider the UML diagram corresponding to system design (fig 4) and other UML diagrams corresponding to design patterns (i.e. Façade, Factory Method, Mediator, Singleton).

Facade

[image:5.595.320.534.82.674.2]

Subsystem Classes

Fig. 10 Façade Design Pattern

Mediator

ConcreteMediator

Colleague +mediator

Fig. 11 Mediator Design Pattern

Singleton

+Instance(): Singleton -instance

Fig. 12 Singleton Design Pattern

Direct Association matrix of fig 4 (root = 2)

(6)

Generalization matrix of fig 4 (root = 5)

[image:6.595.72.540.64.759.2]

Overall Matrix

Fig. 13 Matrix Representation of System Design

1. Calculation of matrices for design patterns

Let us consider figure 2 i.e. for Facade Design Pattern. There should only one matrix corresponding to direct-association.

Class 0: Façade

Class 1: Subsystem Classes

Direct Association matrix of fig 10(root = 2)

Overall Matrix

Fig. 14 Matrix Representation of Facade Design Pattern

Now consider figure 5 i.e. Factory Method Design Pattern for dependency and generalization.

Class 0 : Product Class 1 : Creater

Class 2 : ConcreteProduct Class 3 : ConcreteCreater

Dependency matrix of fig 5 (root = 3)

Generalization matrix of fig 5 (root = 5) 20₃0₅0 ₂1₃0₅0 ₂1₃0₅0 ₂0₃0₅0 ₂0₃0₅0

20₃0₅0 ₂0₃0₅0 ₂0₃0₅0 ₂0₃0₅0 ₂0₃0₅0

20₃0₅0 ₂0₃0₅1 ₂0₃0₅0 ₂0₃0₅0 ₂0₃1₅0

20₃0₅0 ₂0₃0₅0 ₂0₃0₅1 ₂0₃0₅0 ₂0₃0₅0

20 ₂1

20 ₂0

30₅0 ₃0₅0 ₃0₅0 ₃0₅0

30₅1 ₃0₅0 ₃0₅0 ₃0₅0

[image:6.595.77.249.76.401.2]

(7)

[image:7.595.127.222.83.173.2]

Overall Matrix

Fig. 15 Matrix Representation of Factory Design Pattern

Consider Mediator Design Pattern, now two design features are there, direct-association and generalization.

Class 0 : Mediator Class 1 : Colleague Class 2 : ConcreteMediator

Generalization matrix of fig 11 (root = 5)

Overall Matrix

Fig. 16 Matrix Representation of Mediator Design Pattern

Consider Prototype design Pattern, there are also two design patterns direct-association and generalization (as in the case of mediator design pattern).

Class 0 : Singleton [ 1 ]

[21_]

[image:7.595.347.526.94.210.2]

[2] Overall Matrix

Fig 17: Matrix Representation of Singleton Design Pattern Z

2 Design pattern detection

The main objective of this project is to detect design pattern in given system. We have the UML diagrams of corresponding system design and design patterns and then we can calculate overall matrices corresponding to them (by above given procedure). Thus one matrix for system design and another for design pattern. First we non-deterministically guess any design pattern (or subgraph) of system design pattern and then by applying equation (2) it can be find out whether they really are sub-isomorphic or not or design pattern exists or not in a particular system design.

Let us consider the overall matrix corresponding to system design i.e., ad guess any subgraph of it.

.

a. Design Pattern Detection as Façade Design Pattern

First we guess that Façade Design Pattern (figure 10) exists in system design (or can say that there is subgraph

isomorphism is in between them).

Let there is a permutation matrix P of order of system design i.e. 5x5,

20₅0 ₂0₅0 ₂0₅0

21₅0 ₂0₅0 ₂0₅0

[image:7.595.62.301.181.664.2]

(8)

Calculation of P S PT

After eliminating entries from 3rd_{row and 3}rd_{column we} have reduced matrix as (because Façade Design Pattern is of 2 x 2 orders)

This is same as Façade design pattern’s overall matrix. So subpart of system design has a carbon copy as a facade design pattern in system design or can be said that Façade design pattern is detected in system design.

b. Design Pattern Detection as Factory Method Design Pattern

Now suppose we guess that Factory Method Design Pattern (figure 5) exists in system design

If we take permutation matrix as above then after calculating P S PT_{and after eliminating 5}th_{row and 5}th_{column (since} factory method has order 4 x 4),

Which is not the same as FM (i.e. adjacency matrix corresponding to Factory Method Design Pattern).

Now take permutation matrix as

After eliminating 5th_{row and 5}th_{column (since factory method} has order 4 x 4),

P =

P

T

=

(9)

Which is not the same as FM ( i.e. adjacency matrix corresponding to Factory Method Design Pattern).

Now assume permutation matrix as

After eliminating 5th_{row and 5}th_{column (since factory method} has order 4 x 4),

This is the same as Factory Method Design Pattern Matrix. So there is a design pattern detection held in system design.

c. Pattern Detection as Mediator Design Pattern

Similarly if permutation matrix is assumed as

Then it can be easily verified that the mediator design pattern is identified as design pattern which exists in system design.

d. No Design Pattern Detection

It can be possible that a particular design pattern does not exist in system design. In this there will be no permutation matrix for which we can find out (after row and column elimination) a matrix which is equivalent to design pattern matrix.

For example if we take singleton design pattern (figure 12), there should be no permutation matrix for which we could detect singleton design pattern in system design.

VI. Searching for minimal key structures

In this method a defined key structure is associated to each pattern. The key structure for pattern describes the number of minimum classes and objects that are present in that structure. By define the key structure of pattern the pattern can be securely identified. The properties of key structure are used as the search criteria. There are three software systems for automated searching is known which are based on this approach. These are DP++[8], for C++, KT [9] for Smalltalk and SPOOL [10] realized for C++, applicable for Java and Smalltalk.

The DP++[8] searches the following design patterns: COMPOSITE, DECORATOR, ADAPTER, FACADE, BRIDGE, FLYWEIGHT, TEMPLATE METHOD and CHAIN OF RESPONSIBILITY. DP++ [8] is not applicable for other patterns. This tool has three parts: (i) C++ Code Translation Subsystem used for analyzing source code. (ii) Pattern Detection Subsystem used for the recognition of generation patterns and (iii) Display Subsystem used for the visualization of detected patterns. Information about the achieved values of recall and precision is not available.

(10)

SPOOL [10] is capable of searching BRIDGE, FACTORYMETHOD and TEMPLATEMETHOD patterns.

VII. Searching for class structures

This approach uses the pattern class structures described by Gamma patterns [11]. There are three software systems for automated search which are based on this approach. These are Pat [12] for C++, IDEA [13] for UML diagrams and the multi step search tool [14].

Pat [12] describes the pattern class structure by PROLOG rules. This tool is able to finds ADAPTER, PROXY, BRIDGE, DECORATOR and COMPOSITE patterns. This search tool has been tested with software systems containing 9–343 classes. The achieved recall value in each case is 100% and the average precision value is 36.75%.

IDEA [13] based on UML search approach which uses class and collaboration diagrams. It also used PROLOG rules. This is able to find the following patterns: TEMPLATE METOD,

PROXY, ADAPTER, BRIDGE, COMPOSITE,

DECORATOR, FACTORY METHOD, ABSTRACT

FACTORY, ITERATOR, OBSERVER and PROTOTYPE. The IDEA [13] search algorithm is given below:

Step1. Input UML diagram to perform search algorithms Repeat through step 2 to 5

Step2. In the UML model matches the structural templates with the class diagrams.

Step3. Refine with collaboration templates and diagrams Step4. Consolidate detected patterns and select critics. Step5. Display the patterns and critics

The multi step search tool [14] is able to find the following patterns ADAPTER, BRIDGE, PROXY, COMPOSITE and DECORATOR. This is unable to recognize the other patterns. The multi step search tool was tested with different C++ Libraries [14]. The achieved recall value is 100% and an average precision value is 35%.

VIII. Fuzzy Logic

As described in [15] this approach is based on structural differences between Gamma patterns [11] and real life software systems. To solve this problem patterns are supposed as composition of sub patterns with inheritance relations. This approach uses fuzzy logic to test different pattern implementations.

VIII. Metrics

This idea is based on characterization of each

pattern by metrics. As described in [16] there are

three classifications of metrics. It is given below for

each metrics with example:

Object-oriented Metrics

Weighted methods per class Depth of inheritance tree Number of children (subclasses) Coupling between objects

Structural Metrics

Fan-in, number of modules sending information to the observed module

Fan-out, number of modules receiving information of the observed module

Information flow, structural complexity

Procedural Metrics

Pure lines of code

McCabe’s cyclomatic complexity Lines of comments

For each of the required patterns these metrics are calculated for the system by using a tool.

A pattern repository and a pattern wizard are available for the metric approach. This tool has been tested. The result of these test are a very low average precision value of 43.55%.

As described in [16] this approach is applicable for

all Gamma patterns.

Conclusion:

The exact graph matching, inexact graph matching and Kleinberg Approach for vertices scoring show the similarity between nodes not in the whole graph. To reduce this drawback, another approach is introduced, called sub graph isomorphism detection. The subgraph isomorphism shows whether a graph is isomorphic to a subgraph of any other graph or not .It uses the concept of Overall matrix to reduce the number of manipulations, it combines different matrices (like generalization matrix, association matrix, dependency matrix, aggregation matrix etc) into a single matrix. So there will be only two overall matrices, one corresponding to system and one for design pattern. After it we try to find out whether a particular design pattern exists on the given graph.

References:

[1] D.J. Cook and L.B. Holder, “Substructure Discovery Using Minimum Description Length and Background Knowledge,”J. Artificial Intelligence Research, vol. 1, pp. 231-255, Feb. 1994

(11)

[3] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” J. ACM, vol. 46, no. 5, pp. 604-632, Sept. 1999.

[4] Gabriel Valiente, Algorithms On Trees And Graphs, 2002, page 367

[5] B.T. Messmer, H. Bunke, Subgraph isomorphism detection in polynomial time on preprocessed model

graphs, Second Asian Conference on Computer Vision, 1995, pp. 151–155.

[6] Jing Dong, Yongtao Sun, Yajing Zhao, Design Pattern Detection By Template Matching, the Proceedings of the 23rd Annual ACM Symposium on Applied Computing (SAC), pages 765-769,Ceará, Brazil, March 2008 [7] Nikolaos Tsantalis, Alexander Chatzigeorgiou,

George Stephanides, and Spyros T. Halkidis,” Design Pattern Detection Using Similarity Scoring” [8] Bansiya J (1998) Automatic Design-Pattern Identification.

Dr. Dobb’s Journal. Available online at: http://www.ddj.com

[9] Brown K (1996) Design reverse-engineering and automated design pattern detection in Smalltalk. Master’s thesis. Department of Computer Engineering, North Carolina State University. Available online at:http://www.ncsu.edu/ [10]. Keller RK, Schauer R, Robitaille S, Page P (1999) Pattern based reverse engineering of design components. In: Proc. Of the 21st International Conference On Software Engineering. IEEE Computer Society Press, pp 226–235 [11]. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design Patterns – Elements of Reusable Object-Oriented Software, Addison Wesley

[12]. Kraemer C, Prechelt L (1996) Design recovery by auto-mated search for structural design patterns in object-oriented software. In: Proc. of the Working Conference on Reverse Engineering, pp. 208–215

[13]. Bergenti F, Poggi A (2000) Improving UML design using automatic design pattern detection. In: Proc. 12th. International Conference on Software Engineering and Knowledge Engineering (SEKE 2000), pp 336–343

[14] Antoniol G, Fiutem R, Cristoforetti L (1998) Design pattern recovery in object-oriented software. In: 6th

International Workshop on Program Comprehension, June, pp 153–160

[15]. Niere J, Wadsack JP, Wendehals L (2001) Design pattern recovery based on source code analysis with fuzzy logic. Technical Report tr-ri-01-222, University of Paderborn. Available online

http://www.ddj.com

http://www.ncsu.edu/