The Matchmaking Task - Semantically Equivalent Execution

5.4 Semantically Equivalent Execution

5.4.1 The Matchmaking Task

Determining whether a service (or an operation) semantically compares with another one is commonly known as matchmaking, which is a well known task at the core of service discovery. In fact, it can be understood as a special Information Retrieval problem [KLKR08]. Matchmaking has been widely considered in the literature on semantic services resulting in a variety of approaches. As pointed out in [Klu08], one dimension of classifying matchmaking approaches are the reasoning methods employed. One can distinguish three categories along this dimension: logic-based, non-logic-based, and hybrid. In short, logic-based matchmaking relies on possibly non-monotonic deductive rules of inference as available in (Description) Logics. Non-logic-based comprises all suitable methods that are not logic-based. One direction are approaches that ap- ply (syntactic) similarity measures from Information Retrieval to quantify the semantic relatedness in terms of a distance measure. Another direction considers Machine Learning methods to find predictive patterns on the semantic relatedness based on the (meta) data available about services. Finally, hybrid approaches combine logic and non- logic-based approaches. For (comparative) overviews of these approaches the reader is referred to [Klu08, MIK+10, BB10].

Formalization

Independent of the actual approach, semantic matchmaking starts from setting up the notion of a match. The rationale behind a match is that an advertised service/operation (i.e., an offer) is of equivalent or similar value than a requested service/operation (i.e., a demand). Given a profile that semantically describes them, a match is defined in most cases exclusively on profiles; we shall distinguish advertised and requested profiles by denoting them with Pra and Prr, respectively. In mathematical terms a match is either formalized in terms of a binary relation or a binary predicate (that maps to true or false interpreted as match or no match). Oftentimes the notion of a match is asym- metric (hence irreflexive). Applied to profiles, it is usually intended to represent that an advertised profile Pra matches with a requested profile Prr, while the opposite – Prr matches with Pra – not necessarily holds as well. It is reasonable at least to consider the notion of a match as irreflexive because one might want to exclude the trivial case of no avail: every profile matches to itself per se. Finally, matchmaking is inevitably driven by domain (or background) information/knowledge based on which one con- cludes whether there is a match or not; that is, the domain knowledge entails a match or not. Altogether, we define matchmaking on profiles as follows.

Definition 5.3(Matchmaking Domain & Problem). A matchmaking domain is a 4-tuple MD = (K,P,∼,|=)whereK is a collection of domain knowledge5,P is a finite set of profiles,

∼ ⊆ P × P is an irreflexive match relation, and |=is a consequence relation.

Given a matchmaking domain MD and a requested profile Prr ∈ P, a matchmaking problem is a tuple MP = (MD, Prr). ms ⊆ P is a set of Prr-matches for MP iff∀Pra ∈ ms : K |=

Pra ∼Prr.

5_K_{need not necessarily be a DL knowledge base herein. It can build on other formalisms of repre-} senting knowledge/information. In the same vein,|=need not realize deductive rules of inference.

The software that implements matchmaking in a specific matchmaking domain is usually called a matchmaker. A matchmaker inevitably needs to be closely integrated with a repository that stores profiles, which makes it a natural component of a service directory, integrated accordingly in the retrieval process (query answering).

As there can be a set of profiles that match some requested profile, one might ad- ditionally want to order them according to some preferences. There are basically two ways considered in the literature to this. First, by defining several matching relations that differ from each other in their degree of match (DoM). This induces a discrete rank over the DoM and needs to be combined with simple algorithmic processing: If the best match relation does not hold, try the second best, if this fails then try the next best, and so on. This approach has been considered in most cases on the functional dimension of profiles. Another way is to define, in addition to the notion of a match, a possibly strict order that models user preferences. For instance, one might want to order profiles that functionally match a requested profile according to their reliability, response time, throughput, usage costs, or the like. Of course, the order relation can incorporate both the functional and non-functional dimension. Preference-based matchmaking is consequently defined as a straightforward extension of basic matchmaking.

Definition 5.4(Preference-based Matchmaking Domain & Problem). A matchmaking do- main with preferences is a 5-tuple MD = (K,P,∼,>,|=)where> ⊆ P × P is a preference order.

Given a matchmaking problem MP= (MD, Prr), ms = {Pr₁a, . . . , Pra_n}is a set of Prr-matches for MP, indexed by consecutive integers{1, . . . , n}, iff the following holds

(1) ms is a set of Prr-matches for MP in MD = (K,P,∼,|=), (2) ∀Pra_i, Pra_j ∈ ms : i<j implies Pra_i >Pra_j .

Prr₁is called the most preferred match and Prrnthe least preferred match.

The preference order>can be defined in many ways. One possibility are quantita- tive metrics such as distance measures.

DL-based Matchmaking

The general principle common to all DL-based approaches known in the literature is to formulate the notion of a match, in one or another way, based on the set-theoretic subsumption relation. Reasoning on whether there is a match is thereby reduced to the standard subsumption inference task and derives its computational complexity properties (see Section 3.1.4). In the context of DL-based semantic services it has been first described in [PKPS02] and [LH03]. The former assumes structured semantic service de- scriptions such as a profile and defines the notion of a match on elements of the struc- ture, which is why it can be classified as structured matchmaking. In contrast, the latter assumes that a service (or an operation) is semantically described by a single concept only that is a complex intersection

where Ci are atomic or complex concepts. It is therefore classified as monolithic matchmaking [Klu08]. Part of the idea in [PKPS02] is inspired by earlier works on component theory in software engineering [ZW97] that similarly build on “more general”, “more specific” abstractions over the signature of components as well as parallel works in agent-based environments [SWKL02].

Inputs and Outputs. There are four prominent matching relations defined exclusively in terms of the subsumption relation [PKPS02, LH03]. Under structured matchmaking on the functional dimension of the profile they are utilized by pairwise matching elements in the input and output sets. More specifically, a profile Pra matches with profile Prr regarding the outputs if there is a matching output oa ∈ Pra.O for every output or ∈ Prr.O. Note here that Pra may specify more outputs than Prr (i.e.,

|Pra.O| ≥ |Prr.O|). We call such an additional output in Prafor which there is no matching output in Prr a spare output. Formally,

Pra ./-matches Prrregarding O ⇔ ∀or ∈ Prr.O∃oa ∈ Pra.O : type(or) ./ type(oa)

(5.2) where./can be one of

• Exact: type(or) ≡type(oa), • Plug-in: type(or) w type(oa), • Subsume: type(or) vtype(oa),

and by a slight abuse of notation (the meaning should be clear) • Intersection6: (type(or) utype(oa)) 6v ⊥,

• Disjoint: (type(or) utype(oa)) v ⊥ .

These relations constitute a ranked while discrete DoM, the order of which can be written as

Exact >Plug-in>Subsume> Intersection >Disjoint

where, stated informally,>means “stronger than”. The Exact match is clearly the most preferable as it is the strongest relation corresponding to extensional equality: Accord- ing to the model-theoretic semantics in DLs, the concepts or data ranges of matching outputs have the same extension in every modelI. Given execution compatibility (see

Definition 4.16), the value range of each output in Prr.O coincides with the value range of the matching output in Pra.O. The plug-in match is the second best and basically states that the outputs of the advertised suffice to fulfil the outputs of the requested (i.e., the advertised does not produce output values that also the requested would not produce). Conversely, the subsume match states that there might be output values produced by the advertised that would not be produced by the requested. The disjoint relation is at the lowest level. It is actually not a match since it shows that the advertised is incompatible with the requested as they have an empty intersection, which is why

we were speaking of four matching relations. This distinguishes it from the intersection match which captures the case where both are not totally incompatible.

Contrary to outputs, a match is defined vice versa when applied to the inputs of a profile. More precisely, a profile Pra matches with profile Prr regarding the inputs if there is a matching input ir ∈ Prr.I for every input ia ∈ Pra.I. Observe that in this case Prr may specify more inputs than Pra (|Prr.I| ≥ |Pra.I|); which means that there can be spare inputs in Prr for which there is no input in Pra. However, it is reasonable to formulate a single condition rather than different degrees of match for inputs. This is justified by the consideration that the advertised should generally suffice to fulfil processing at least the range of input values that the requested does, but not less, in order to be considered a match. Formally,

Pramatches Prrregarding I ⇔ ∀ia ∈ Pra.I ∃ir ∈ Prr.I : type(ir) vn type(ia) (5.3) where n is the maximum distance between type(ir) and type(ia) in the concept/data range hierarchy. This means that the distance is calculated based on a graph-theoretic model in which vertices represent concepts and edges represent direct subsumption relations between them (e.g., given A v B v C and there is no D1, D2with A v D1 v B, B v D2 v C, the corresponding graph contains three vertices A, B, C and two edges

(A, B),(B, C), but not(A, C)). The simplest way is to take the edge count distance (e.g., given edges(A, B),(B, C), the distance is 1 for A, B and 2 for A, C). The basic assump- tion under the edge count distance measure is that subsumption represents uniform distance. As this might not be appropriate in general, another possibility is to assign weights in the interval [0, 1] to edges, thereby allowing for variability in the distance. It is further reasonable to combine this with standardization by requiring that the total sum of weights on the edges between a parent concept and its direct sub concepts is one. Determining appropriate weights can be done based on information-theoretic models in which one quantifies the semantic relatedness of concepts.

LimitingCondition (5.3) upwards using a distance is motivated by the case of profiles that are too generic and that should therefore be filtered. Otherwise, one would include profiles that match everything in the worst case: Imagine an input ia of a very generic advertised profile with type(ia) = >. Clearly, iamatches any input according to

Condition (5.3)because > is the universal concept. Limiting a match to direct parents (n =1) effectively avoids such matches, provided that the domain conceptualization is not flat (i.e., where>is the direct parent). Such limits have also been applied conversely as lower bounds for matchmaking on outputs (e.g., [KFS09]). Finally, the distance can also be utilized for ordering. An advertised input ia₁matches a requested input ir more closely than another input ia₂if its type is closer to that of ir; that is,

(type(ir) vm type(ia1)) > (type(ir) vn type(ia2)) if m<n, which orders i₁abefore ia₂in this case.

Preconditions and Effects. Structured matching as embodied by Condition (5.2)and

(5.3) can, in principle, be applied respectively to effects and preconditions that are DL-based (e.g., [BOI09]). As mentioned before, this principle, in fact, is inspired by previous work on precondition and effect matching on software components [ZW97].

Upon closer inspection we have found, however, that doing so is inappropriate for effects whose semantics is defined in terms of a belief update. We argue that Con- dition (5.2) does not appropriately capture the intuition of the plug-in and subsume match in this case. To explain this, recall that an inclusion C v D or R v S can be understood as an implication (see Section 3.1.1). In fact, an inclusion on effects describes a ramification (i.e., an indirect effect). For example, given a role inclusion hasBoughtBook(x, y) v ownsBook(x, y), an effect atom hasBoughtBook(x, y) implies ownsBook(x, y) as an indirect effect – the indirect effect of buying a book is owner- ship. Now, what is the intuition of the plug-in and subsume match regarding effects? According to the widely adopted view of [ZW97], the plug-in match7 is defined as an implication. Expressed in terms of profiles it reads as follows:

Praplugs into Prrregarding E iff the effects described by Prrare implied by Pra. (5.4) Alas, we see two problems in this definition. First, it would be possible that Prade- scribes additional effects besides the ones that imply the effects of Prr. Second, as the definition makes use of implication, it is sufficient that Pra specifies effects that indirectly imply Prr’s effects.

We view the plug-in and subsume match as follows. An advertisement subsumes a request regarding effects if the advertisement creates at least all the effects that the request creates. We might, more figuratively, say that the advertisement “does more” than the requested. If we further understand the plug-in match as the dual of the subsume match then this would mean that an advertisement plugs into a request if it creates some of the request’s effects. It should now be apparent that Condition (5.2) does not represent this. We would actually accept that an advertisement plugs in if it creates pre- indirect effects and it subsumes the request if it creates post-indirect effects. The former means that the advertisement creates effects that indirectly imply (cause) the effects of the request (Ca v Dr), whereas the latter means that none of the effects of the request would be created (Ca w Dr). Neither case does adequately represent our intuition. An alternative definition that represents it simply builds on containment on the effect sets and (mutual) subsumption between single effects. More precisely,

• Weak plug-in: ϕ vmp(ϕ); weak subsume: ϕwms(ϕ);

• Strict plug-in/subsume: ϕ ≡mp,s(ϕ).

We call the former weak and the latter strict because, depending on the actual effect semantics, a weak match might allow for indirect effects whereas a strict match does not. Observe that “v” is directed equally for weak plug-in versus weak subsume: in both cases the effects of Praare more specific than those of Prr. Furthermore, “v” need not

be interpreted strictly in the DL set-theoretic way. It particular, it need not be a transitive relation. Analogously, “≡” is not necessarily understood in the strict mathematical sense (reflexive, symmetric, transitive) as one might want to rule out transitivity.8

FromCondition (5.5)one can easily derive how weak/strict exact, intersection, and disjoint matches are defined, which we leave as an easy exercise.

To conclude, if we compare Condition (5.4) with our definition then we see that it corresponds to a weak subsume, which perfectly reflects the concerns raised above. Offline versus Online Matchmaking

Whether matchmaking can be done offline versus the need to do it online at runtime (e.g., as part of a CFI cycle) is merely determined by the temporal variability – thy dy- namics – of information included in a match relation. For instance, DL-based subsumption matchmaking on IO profile parameters can be done offline since they are statically typed. The same applies to static preconditions and effects. Non-functional matchmaking on N profile parameters is is more likely to be done online. The reason is that typical non-functional properties can be subject to possibly frequent dynamic changes (e.g., the response time that varies depending on the load). Clearly, offline matchmaking can be utilized for performance optimization by pre-computing matches between profiles and combining it with appropriate indexing or caching techniques for fast retrieval of matches at runtime (e.g., [SHF11]).

In document Flexible semantic service execution (Page 128-133)