8.3 Complete change pattern discovery algorithms
8.3.2 UCP discovery algorithm
To perform a group of change operations in same order over time is very unlikely. In a real world scenario, users perform ontology changes by opting for different orders of change operations. However, the end result (i.e. impact) of the change operation sequences may be the same (i.e. semantically identical sequences). The main difference between OCP and UCP discovery algorithm is the definition of the search space for a specific graph node search. As the change operations in a sequence can be in an unordered form, the basic idea to discover the unordered complete patterns (Type 3 - c.f. Section 8.1.1) is to modify the node search space in each iteration containing the earlier nodes as well as subsequent nodes based on the permissible node distance value. The pseudo code of UCP algorithm is described in algorithm lists (8.3.6 – 8.3.8) and is explained in rest of the section.
8.3.2.1 Description of algorithm
Step A: Similar to the OCP algorithm, the UCP algorithm iterates over each node of the graph and selects it as a candidate node (cnk), where k refers to the identification key of the node in graph G.
Step B: An iteration process is used to construct a candidate sequence cs by extending candidate node cnk to its subsequent context-matching nodes cnk++.
Step C: The next step is to identify the discovered nodes dn and add them as a first member to the discovered sequence set DS. There are two main differences in the extension of discovered sequences ds in the UCP and OCP algorithms, i.e.
i. The area of change log graph in which the mapping node will be searched (step D).
ii. Introduction of an unidentified nodes list ul, which will keep record of un- identified candidate nodes.
Algorithm 8.3.6 Unordered Complete Change Pattern Discovery Algorithm
Input: Graph (G), Minimum Pattern Support (min supp), Minimum Pattern Length (min len), Maximum n-distance (X)
Output: Set of Domain-Specific Change Patterns (S) 1: for i = 0 to NG.size do
2: k = 0
3: cs ← GenerateCandidateSequence(n(gi)) 4: if (cs.size < min len) then
5: go back to step 1. 6: end if
7: DN ← DiscoverM atchingN odes(cnk)
8: DS ← DN
9: if (DS.size < min sup) then 10: go back to step 1.
11: end if
12: while (DS.size ≥ min supp) do
13: for each discovered sequence ds in DS do
14: t ← getT argetEntity(ds) 15: setSearchSpace(ds) 16: a ← searchInSpace(ds, cnk++, t) 17: if (found) then 18: ds ← a 19: ascendSequence(ds) 20: setSearchSpace(ds) 21: if (!ul.isEmpty()) then 22: nodeFound = true
24: nodeF ound ← searchU nidentif iedN odes(ul, ds) 25: ascendSequence(ds) 26: setSearchSpace(ds) 27: end while 28: end if 29: else 30: ul ← cnk++ 31: end if 32: end for
33: if (ds.size < min len) then
34: discard ds from DS
35: end if
36: end while
37: for each discovered sequence ds in DS do 38: if (ds.size < cs.size) then
39: discard ds from DS 40: end if 41: end for 42: Pdomain specif ic← (ds + cs) 43: S ← Pdomain specif ic 44: end for
Step D: Before the extension process on any discovered node starts, the search space (i.e. the range of graph nodes in which a particular node will be searched) has to be set. The search space is described using two integer variables, i.e. start range (rs) and end range (re) where, rsand rerepresent the node ids of the starting and ending graph nodes of the search space. The range of the search space can be
calculated as;
rs= min(id) − X − 1 (8.1)
re= max(id) + X + 1 (8.2)
Where, min(id) and max(id) are the minimum and maximum id values of the existing graph nodes in the discovered sequence ds at any particular iteration. Step E: New values of rs and re are calculated at the start of each iteration of the discovered node expansion process. For example, given the gap constraint (X) user input value as 1 and a discovered sequence ds contains two graph nodes, ds = {n9, n11} at any particular iteration, then the space in which next candidate node will be searched will be the sequence of graph nodes n7− n13. As the algorithm scans the whole graph only once (i.e. in step 7 to get the discovered node set) and narrows the search space later, the search space defining technique helps us in achieving a good performance of the algorithm.
Step F: The unidentified nodes list ul keeps record of all candidate nodes which were not matched in the ds expansion process. If a new node is added to the discovered sequence ds, the sequence will be converted into ascending form (based on their id values) and the search space is reset. If the match becomes false and ds is not expanded, the respective candidate node cnk++is added to the unidentified nodes list.
Step G: Once the discovered sequence ds is expanded, an iteration process is applied to the ul to search for the unidentified nodes in the updated search space. If an unidentified candidate node is found and matched (to a discovered node) in the updated search space, the node will be added into the discovered sequence and removed from the unidentified node list. Based on the modified discovered sequence, the values of rs and re are re-calculated.
Step H: At the end of the expansion of any particular discovered sequence, if the length of any expanded discovered sequence is less than the minimum threshold value of pattern length, it must be discarded from the set of discovered sequences. Step I: In next step, all discovered sequences whose length is less than the length of the candidate sequence are discarded.
Step J: As a last step, a candidate sequence along with discovered sequences is saved as a domain-specific change pattern in the result list S and the algorithm goes back to step 1 and selects next graph node as a candidate node.
Algorithm 8.3.7 Method:setSearchSpace()
Input: Graph G, Discovered Sequence ds, Sequence Gap Contraint X Output: Updated search space (minId - maxId)
1: n1 ← getF irstN odeOf Sequence(ds) 2: n2 ← getLastN odeOf Sequence(ds) 3: minId = n1.getN odeID() − X − 1 4: if (minId ≤ 0) then
5: minId = 1
6: end if
7: maxId = n2.getN odeID() + X + 1 8: if (maxId > G.size) then
9: maxId = G.size 10: end if
Algorithm 8.3.8 Method:Method:searchInSpace() Input: Graph Node n, Discovered Sequence ds Output: Updated Discovered Sequence ds
2: for each node w in range from minId to maxId do 3: if (ds.contains(w)) then
4: go back to step 2
5: else
6: if (matchOperation(n, w) && matchElement(n, w)) then 7: if (matchContext(n, w, t)) then 8: ds ← w 9: return ds 10: end if 11: end if 12: end if 13: end for