Chapter 2: Background and Fundamentals
3.7 Finding Associated Web-service Operations
In order to assist users to compose web services, we need to identify potentially relevant web services given a textual description of services. In this section we propose an approach to explore associations between web-service operations based on service operations match- ing. This approach uses the concept of attribute closureto obtain sets of operations. Each set is composed of associated web-service operations.
3.7.1 Web-service operations matching
Similar to the process of obtaining connectivity, now let us see how to match web-service operations. Given two web-service operations op1 :s1,s2, ...,sn→t1,t2, ...,tm and op2: x1,x2, ...,xl→y1,y2, ...,yk, for each schema tree ofop1, we find its corresponding schema tree of op2 with the minimum match distance. We simply identify all possible matches between two lists of schema trees, and return the source-target correspondence that mini- mizes the overall match distance between the two lists, as shown in Figure 3.5. We formally describe this process in algorithm 3.4.
3.7.2 Clustering Web-service Operations
Suppose OP={op1,op2, ...,opq}is a set of web-service operations and each pair of op- erationsopiandopj (i,j=1,2, ...,q)match with the distance ofzi j. We classifyOPinto a set of clusters{opc1,opc2, ...}. The clustering algorithm is described as below. It begins
Sn X1 S1 Si Xl Xi Tm Y1 T1 Ti Yk Yi
Figure 3.5: Matching web service operations
input :op1:s1,s2, ...,sn→t1,t2, ...,tm op2:x1,x2, ...,xl →y1,y2, ...,yk
output: The match distanceZbetweenop1andop2
fori←1tondo 1 Si=min{ED(si,xj)|j=1,2, ...,l}; 2 end 3 fori←1tomdo 4 Ti=min{ED(ti,yj)|j=1,2, ...,k}; 5 end 6 Z= ∑n i=1Si+ m ∑ i=1Ti 7
79
with each operation forming its own cluster and gradually merges similar clusters. 1. Set up a match matrixMq×q. Mi j is the match distance of operationopiandopj. 2. Find the smallestMi j in the match matrixM. IfMi j <thresholdδthen merge these
two clusters and update M by replacing the two rows with a new row that describes the association between the merged cluster and the remaining clusters. The distance between two clusters is given by the distance between their closest members. There are nowq−1 clusters andq−1 rows inM.
3. Repeat the merge step until no more clusters can be merged.
Finally, a set of clusters {OPC1,OPC2, ...}is obtained. Given a cluster OPCi and an op- eration OPCik ∈OPCi, OPCik is called apivot of OPCi if it minimizes the sum of match distances to all the other operations in OPCi. We consider all operations in OPCi as in- stancesofOPCik.
For example, in Figure 3.1 we give a clustering result. There are two clusters of web- service operations. One is {W S1,W S4}, and the others are{W S2}and{W S3}. In cluster
{W S1,W S4}the pivot isGetOrder and the instrances of GetOrder areGetOrder andOr- derBuilder. In cluster {W S2} the pivot is CheckoutOrder, which is also an instance of itself.
3.7.3 Identifying Associations
A set of web-service operations is said to be associatedif they potentially contribute to a user’s web-service composition. Clearly, given two web-service operations op1 andop2, if the output attributes of op1 are similar to the input attributes of op2 thenop1 and op2
may participate in a user’s service composition together. The objective of this step is to find all associations between web-service operations. To do this, we first find associations among clusters{OPC1,OPC2, ...}. LetOPCik, sayx1,x2, ...,xk→y1,y2, ...,yj be a pivot of OPCi. LetX ={x1,x2, ...,xk}andY ={y1,y2, ...,yj}.We first compute theattribute closure X+ with respect to X, which is the set of attributes A such that X → A can be inferred by transitivity. At the same time, a pivot setPS associated withOPCik is computed. The overall process is shown as algorithm 3.5.
We perform a worst case time analysis of Algorithm 3.5. The repeat loop is executed at most|S|times, where|S|is the total number of pivots corresponding to all clusters. The calculation ofq takes time|S| − |T|, where T is the number of pivots in the pivot set PS. Hence the total execution time takes in the worst case timeO(S2).
We first choose a pivot OPCik for each cluster OPCi. For each pivot, we compute a pivot set. We eliminate duplicate pivot sets. If two pivots are in the same pivot set, then their corresponding instances are associated.
Each pivot set PS={p1,p2, ...,pk, ...} can generate a set of operation groups in the form of{p0
1,p02, ...,p0k, ...}, where p0iis an instance of pi. Operations in the same group are associated. To obtain an operation group, we simply replace each pivot piinPS with one of its corresponding instances. All possible operation groups are output as search results.
For example, a pivot set for the clusters given in Figure 3.1 is {GetOrder, Shippin- gOrder,CheckoutOrder}. It can generate two search results, one is{GetOrder,ShippingOrder,
CheckoutOrder}and the other is{OrderBuilder,ShippingOrder,CheckoutOrder}.
Recall that in Section 3.6, each web-service operation is assigned a ranking score com- bining both service relevance and service importance. Thus, each operation group can acquire agroup scoreby counting the sum of operation scores in it. A higher group score
81
indicates a more desirable search result for web service composition.
input : A pivot p:x1,x2, ...,xk→y1,y2, ...,yj
output: A pivot setPScontaining associated pivots X={x1,x2, ...,xk};Y ={y1,y2, ...,yj}; 1 Closure=X; 2 PS={X→Y}; 3 repeat 4
ifthere is a pivot q:U →V such that the match distance of U and
5
Closure is less than thresholdδthen
setClosure=ClosureSV;
6
setPS=PSSq;
7
end
8
untilthere is no change;
9
Algorithm 3.5: Algorithm for computing attribute closure and pivot set