Hybrid Query Processing - Efficient Semantic 3D Scene Retrieval

6.3 Efficient Semantic 3D Scene Retrieval

6.3.2 Hybrid Query Processing

Once the semantic indices are created, an on-line query q can be answered efficiently. Key idea is to process three sub-queries of q over in parallel the scene concept, semantic service and geometric feature indexes. Those processes yield three ranked lists Rsc(q), Rss(q) and Rg f(q) of scenes relevant

to q in terms of scene concept, semantic service and geometric feature, respectively. Final ranking is computed by applying Fagin’s TA [92]. In this section, we detail the sub-query processing in each aspect and the final aggregation.

Scene concept subquery processing: If the logical expression τ(C, Oreq) of the requested SC C of a

query q is not empty, the subquery for the SC will be processed. For this purpose, the process clas- sifies τ(C, Oreq) as a concept C′∈ O into the scene ontology O and then returns the corresponding

rank list RSC(q) of candidate scenes that are (partially) relevant to q in terms of the scene concept q.C.

Semantic service subquery processing: If q.SS of a query q is not empty, the subquery for the semantic service aspect will be processed: First, for each ss ∈ q.SS, a rank list R(ss) of scenes that are relevant to ss is computed. For this purpose, the indices IIO and IPE are searched in parallel. The

resulted ranked lists R(ss)[io] and R(ss)[pe] are further merged into the list R(ss) of scenes relevant to ss. Finally, all the lists R(ss) of ss ∈ q.SS are merged, which leads to the ranked list RSS(q) of

scenes. Each scene in RSS(q) partially matches q in terms of the requested semantic services.

Searching index IIO for ss. For each ss ∈ q.SS, this subprocess first retrieves in parallel a set of

ranked lists {R(Cs′)[l]} (l ∈ {i, o}). Each list corresponds to a distinct parameter concept C′s[l] in

ss[l]. For this purpose, the logical expression of each distinct concept C′sin ss[i] (ss[o]) is classified

to a concept Cs∈ Osand its corresponding ranked list with suffix [i] ([o]) is retrieved. Subsequently,

TA [92] is performed on {R(Cs′)[l]} to compose a ranked list R(ss)[io] of scenes relevant to q with

respect to the IO parameters of the requested service ss.

Let m the cardinality of {R(C′_s)[l]} for ss. TA performs a sorted scan of all its input list in {R(C_s′)[l]} from top to bottom in parallel. The i-th scan fetches the score values at the i-th positions of all lists in {R(Cs′)[l]}. Besides, it employs a m-ary function t for computing the aggregated relevancy

score and threshold. The general form of t is given in Fagin’s work [92] which leaves space to applications for further customization. In this context, we define t as the weighted average of the vector of scores ⃗s fetched from each rank list in {R(Cs′)[l]} per scan. The weight vj of the j-th list in

{R(C_s′)[l]} refers to the number of appearance of its corresponding concept Cs′in either ss[i] or ss[o]: t(⃗s) = ∑ m j=1vj· sj ∑mj=1vj (6.4)

Each scan performed by the TA may find a new scene xn that does not exist in the curren-

t R(ss)[io]. To insert xn into R(ss)[io], it is necessary to compute the aggregated relevance score

s(xn, ss)[io] of xnwith respect to the IO of ss ∈ q.SS: From each ranked list in {R(C′s)[l]}, TA collects

(possibly by random access) the so far missed ds(xn.id,C′s) of xn; and further applies the t function on

all ds(xn.id,Cs′) in order to compute s(xn, ss)[io]. TA maintains a threshold value T for determining

its termination, which is updated with the t function value over the latest scanned values. This up- date happens after each scan. TA terminates, if T ≤ s(x, ss)[io] for all the ranked objects x in R(ss)[io].

Searching IPE for ss. For each ss ∈ q.SS, the searching of IPE for ss results in two sets of ranked

lists {R(α)[l′]} (l′ ∈ {p, e}) for every non-negative predicate α in ss[l′]. In addition, it merges the ranked lists in each set into a list R(ss)[l′] of scenes that are relevant to ss in terms of ss[l′]. For this purpose, multiple pairs of the same object x in different lists are merged. Pairs in different lists are merged if they share the same scene id. The score value s(x, ss[l′]) of x in R(ss)[l′] of each result pair is computed by applying the Gödel minimum t-norm and maximum t-conorm functions according to the conjunctive, respectively disjunctive relations between the predicates in ss[l′]:

s(x, ss[l′]) = mincla∈ss[l′_](s(x, cla[l′])),

s(x, cla[l′]) = maxα ∈cla(da(x, α)[l′]).

(6.5) where cla[l′] denotes a clause of disjunctive predicates. Finally, the search process merges R(ss)[p] and R(ss)[e] in order to compute R(ss)[pe] of scenes which are relevant to ss in terms of the precondi- tion and effect. The completion of the parallel computations of R(ss)[io] and R(ss)[pe] triggers their merging and yields the ranked list R(ss) of scenes relevant to q in terms of ss ∈ q.SS. The relevancy score s(x, ss) of x in R(ss) is the convex combination of the corresponding scores in R(ss)[io] and R(ss)[pe]:

s(x, ss) = φ s(x, ss[io]) + ψs(x, ss[pe]), (6.6)

where the real positive values φ and ψ (φ + ψ = 1) are the weights of IO and PE matching respectively. They can vary in specific systems with different concerns.

Merging R(ss) for all ss ∈ q.SS. The subprocess on ISSmerges the resulted ranked lists R(ss) of all

ss∈ q.SS. The entries in different lists are merged if they share the same id. The relevancy score s(x, q.SS) for x with respect to q.SS is the average of the scores s(x, ss) of x in R(ss) for each service

6.3 Efficient Semantic 3D Scene Retrieval 181

ss:

s(x, q.SS) =_|q.SS|1 ∑ss∈q.SSs(x, ss). (6.7)

Finally, the merged list are resorted in descending order of s(x, q.SS) yielding the ranked list RSS(q)

of scenes partially relevant to q with respect to q.SS.

Geometric feature subquery processing: The subquery processing in geometric feature aspect is done by the following steps:

1. For each g f ∈ q.GF, it first applies parallel searches in the B+trees bt(g f . f , k). Each travel retrieves a ranked list R(g f . f .k) of scenes relevant to q in terms of g f . f .k. Please note that R(g f . f .k) does not have similarity scores but the attribute values.

2. For each entry (x.id, x.v( f .k)) ∈ R(g f . f .k), the process computes a geometric feature attribute similarity score sk(vq( f .k), x.v( f .k)) between the requested volume vq( f .k) and x.v( f .k). It

results in a new ranking R(q, g f . f .k) of scenes that are relevant to q in terms of the requested volume on g f . f .k.

3. All lists R(q, g f . f .k) of the attributes belonging to the same feature type g f . f are further merged (by scene id) into a ranking R(q, g f ) of scenes relevant to q in terms of g f .

4. TA algorithm is executed on these the feature-level rankings R(q, g f ), which computes the total ranking of scenes relevant to q in terms of q.GF.

The geometric data types specified by X3D, XML3D and COLLADA specifications includes the following primitive data types:

• single number, string or boolean (e.g. SFDouble, SFString);

• 2-, 3- or 4-ary tuple of numbers or strings (e.g. SFVec2d, SFVec3f, float4_type); • vector of values in the types above (e.g. MFDouble, MFVec3d). Denote t p(k) the primitive da-

ta type of feature attribute k. The geometric feature attribute similarity score can be computed as follows:

sk(v1, v2) =

• and(v₁, v2), if t p(k) is single boolean;

• EDS(v1, v2) = 1 −

ED(v1, v2)

max(|v1|, |v2|)

, if t p(k) is a single string, where |v1| denotes the length of

v₁; • min(v1 v2 ,v2 v1 ), if t p(k) is single number; • 1 |v1|∑ |v1|

• cos_sim(v1, v2), if t p(k) is a pair, triple or a vector of numbers;

• V EDS(v₁, v2) =

1 |v1|∑

|v1|

i=1EDS(v1i, v2i), if t p(k) is a pair, triple or a vector of strings;

• 1

|v₁|∑

|v1|

i=1cos_sim(v1i, v2i), if t p(k) is a vector of pairs or triples of numbers;

• 1

|v₁|∑

|v1|

i=1V EDS(v1i, v2i, if t p(k) is a vector of pairs or triples of strings;

where and(v1, v2) is the conjunction of v1 and v2; EDS(v1, v2) the Levenstein edit distance of

v1and v2; cos_sim(v1, v2) the cosine distance of v1and v2. In the context of iRep3D, we skip

the types SFImage, MFImage, SFTime and MFTime in X3D specification since they are not geometric data type.

The first step in geometric feature subquery processing retrieves a ranking R(g f . f .k) of entries containing the identifiers of scenes and their values on attribute v( f .k). Instead of directly retrieving a ranked list pointed by a leaf node, R(g f . f .k) is then computed by applying tolerance window strat- egy. It retrieves at most N entries from the both sides of the entry (x.id, x.v(g f . f .k)) whose feature attribute value has minimum distance to vq(g f . f .k). Name N the half-window width value.

Final aggregation: Three sub-rankings of the partially matched 3D scenes with respect to q are constructed by parallel subquery processing. The final ranking is executed immediately after their completion. That is to apply TA on those ranked lists: Rsc(q), Rss(q) and Rg f(q). In case that the

score of a scene x is missing in some rank list, the lowest score in that list is used. TA terminates if the threshold is not larger than the least score of the m-th (cf. Definition 4) entry in the total ranking, or all three lists above have been entirely scanned.

In document Semantic search and composition in unstructured peer-to-peer networks (Page 197-200)