• No results found

Structure-level Matching

5.2 KSMS Overview

5.2.3 Structure-level Matching

At the structure-level, input schemas are parsed and converted into a graph data struc- ture. Structure matching is used to adjust incorrect matches generated at element-level matching, and it nds additional mappings. KSMS uses the results of element-level matching to match schema graph structures based on a well-known graph matching algorithm called Similarity Flooding (Melnik et al., 2002) that is well-known. This algorithm is described below:

Similarity Flooding: Melnik et al. (2002) presented a graph matching algorithm called Similarity Flooding (SF) and explored its usability for schema matching. This algorithm works based on the following intuition. First, schemas are converted into directed labelled graphs. These graphs are used in an iterative xed point computation to determine the matches between corresponding nodes of graphs. Each edge in a graph contains three parts in the form< s, p, o >, wheresand odenote the source and

represents schema elements (Madhavan et al., 2001). In order to compute similarities, this algorithm uses the concept that two nodes are similar when their neighbor elements are similar. This algorithm accepts several input formats, in particular, SQL DDL, XML, and RDF. The matching results produced by this algorithm is referred to as mappings.

According to Ngo et al. (2011b), many methods can be used to do structure-level matching. The methods consider that two elements of two dierent ontologies are similar if all or most of their elements in the same view are already similar. The view can be the direct super/sub-elements, sibling elements or list of elements in the path from the root to the current element on the ontologies' hierarchy. However, the problems of these methods are that when the viewpoints of two ontologies are similar, these methods face problems (Ngo et al., 2011b). For this, matching two nodes based on the similarity of the adjacent neighbors is more exible and applicable.

Dierent kinds of neighbour elements such as parents, children and leaves, can be considered to estimate similarities between pairs of schema elements structurally. Con- sidering only one context (parents or leaves or children) does not provide appropriate results. For this, KSMS uses the well-known Similarity Flooding algorithm considering the following three neighbouring contexts: parents, children and leaves for structure- level matching:

• Parents: The similarity between inner nodes of graphs is computed based on the

similarity of their parent nodes. That means, two non-leaf elements are similar if they are similar according to element-level matching, and the parents of the two elements are similar.

• Children: The similarity of children nodes is used to determine the similarity

between inner nodes of graphs. That means, two non-leaf elements are similar if they are individually similar according to element-level matching, and if the immediate children sets of the elements are similar (Madhavan et al., 2001).

• Leaves: The similarity of leaf nodes is used to determine the similarity between

inner nodes of graphs. Two non-leaf elements are structurally similar if their leaf sets are highly similar, even if their immediate children are not (Madhavan et al., 2001). This is because the leaves represent the atomic data that the schema/ontology ultimately describes.

Graph representations of two schemas are shown in Figure 5.8 (Madhavan et al., 2001).

Figure 5.8: CIDX and EXCEL schemas

According to Figure 5.8, an example of structure-level matching is described below:

• Line is mapped to ItemN umber because their parents Item and Item are

matched and other two children ofItemmatch according to element-level match-

ing. For example,QtyandU OMare abbreviated forms ofQuantityandU nitOf−

M easurerespectively. So these elements are matched using domain thesaurus at

the element-level.

• The children of P OShipT o are City and Country in CIDX schema dataset.

In EXCEL schema dataset, City and Country are children of Address, not DeliverT o. So P OShipT o and DeliverT o can not be mapped according to

children-context. However, P OShipT oand DeliverT o are mapped according to

the leaf-context ( the leavesCity andCountry of P OShipT oare matched to the

leaves City and Country of DeliverT o) and element-level matching (Ship is a

synonym of Deliver).

• The root elementP O of CIDX is mapped to the root element ofP urchaseOrder

of EXCEL as these elements are matched according to the parent-context and element-level matching (P O is an abbreviated form of P urchaseOrder, so they

are matched using domain thesaurus at the element-level).

After getting the similarities of the neighbour elements, the role of Similarity Flood- ing is to perform a recursive propagation of the pre-computed neighbour similarities

using xed point computation (Hai, 2005). The Similarity Flooding algorithm ter- minates when the xed point is reached, the similarity values of all schema elements stabilise and are taken as structural similarities of schema elements (Melnik et al., 2002).