Query Relaxation - : Triple-Pattern Language Model

Triple-Pattern Search

Definition 3.14 : Triple-Pattern Language Model

3.2.4. Query Relaxation

Even though triple-pattern queries allow users to search RDF knowledge bases in a very precise manner, they often lack flexibility on the result retrieval side. Recall that a query result is a tuple of triples that instantiate the query triple patterns. This instantiation is assumed to be exact. Allowing approximate pattern- instantiation can improve the recall of such queries and can alleviate the prob- lem of ”too few results”. For example, consider the following query consisting of 2 triple patterns

?d hasWonPrize Academy Award for Best Director ?d directed ?m

The above query asks for directors that have won the Academy Award for Best Director, and movies they directed. Directors that were nominated for an Academy Award for Best Director or directors that have won a Golden-Globe Award for Best Director and movies they directed are all potentially-relevant results to the original query. To retrieve such results, relaxed versions of the given query can be issued, in addition to the original query, and their results can be combined with the original query results before returning them to the user that issued the query. For example, the following relaxed query

?d hasWonPrize ?x ?d directed ?m

asks for directors or actors who have won any award, and movies they directed and in which they also acted. The above query is a relaxed version of the original example query, where the object of the first triple pattern is replaced with the variable?x.

We start by explaining how, given a triple-pattern query, a set of relaxed queries can be generated. We then explain how our ranking model is extended to handle query relaxation in order to provide a ranked list of exact and approximate query results.

Generating Relaxed Queries

A relaxed query is generated by relaxing one or more triple pattern. In turn, a triple pattern is relaxed by replacing one or more of the constants (i.e., a URI or a literal) specified in the triple pattern with a variable.

3.2. Ranking Model

?d hasWonPrize Academy Award for Best Director ?d directed ?m

?d hasWonPrize ?x ?d directed ?m

?d ?x Academy Award for Best Director ?d directed ?m

?d ?x ?y

?d directed ?m

?d hasWonPrize Academy Award for Best Director

?d ?x ?m

?d hasWonPrize ?x

?d ?y ?m

?d ?x Academy Award for Best Director

?d ?y ?m

?d ?x ?y

?d ?z ?m

Table 3.10.: Relaxed queries for a two triple-pattern query

Definition 3.17 : Relaxed Query

Given a triple-pattern query Q = (q1, q2, ..., qn)where qiis a triple pattern, let VAR(Q) be the set of variables that appear in Q. Let VARi ⊂ VAR be a set of infinite variables corresponding to triple pattern qisuch that VAR1, VAR2, ..., VARnare all pairwise dis- joint and ∀ 1 ≤ i ≤ n, VARi ∩ VAR(Q) = φ. Let CONST (qi)be the set of constants specified in triple pattern qi. Let r(qi)be the set of relaxed triple patterns obtained by replacing one or more constants consti ∈ CONST (qi)with a variable vari ∈ VARi. The set of all relaxed queries R(Q) is then:{r(q1)∪{q1}×r(q2)∪{q2}×...×r(qn)∪{qn}}. Table 3.10 shows all possible relaxed queries for an example query.

Extending the Ranking Model

Our ranking model is extended to handle query relaxation as follows. Given a query Q = (q1, q2, .., qn) where qi is a triple pattern, we rank the results to the query Q and all its relaxations using the ranking model described in Sub-

section 3.2.3. A result R is ranked by computing the KL divergence between the query language model and the result language model. While the estimation of the result language model follows the exact same procedure described in Sub- section 3.2.3, the estimation of the query language model is slightly different when query relaxation is allowed. We describe next how to estimate the query language model when relaxations are allowed.

Estimating the Query Language Model with Relaxations. Given a query

Q = (q1, q2, ..., qn) where qi is a triple pattern, we estimate the probability of a tuple in the language model of query Q as follows (assuming independence between the triples):

P(T|Q) = n Y

i=1

P(ti|qi) (3.19)

Now, assume that triple pattern qihas the set of relaxations r(qi) ={q1i, q2i, ..., q mi

i } where qj_iis a relaxed triple pattern obtained by replacing one or more constants in qi with a variable. The probability P(ti|qi) is then estimated as a weighted sum of the following m + 1 probabilities:

P(ti|qi) = λ0P(ti|q0i) + λ1P(ti|q1i) + .... + λmiP(ti|q

i ) (3.20)

where q0

i is the original triple pattern qi(i.e., without any relaxations). The probability P(ti|qji)is the probability of triple ti in the language model of triple pattern qji which is estimated according to Equation 3.8 in case q

i is a simple triple pattern (i.e., not augmented with any keywords) and according to Equation 3.9 in case qj_i is keyword augmented. The parameters λj weigh the contribution of each triple pattern (whether the original or its relaxations) and Σmi

j=0λj = 1. In general, the λjs are set based on the ”closeness” of the relaxed pattern to the original one. We thus set the λs based on the number of relaxations in the pattern (i.e., constants replaced with variables). This means that the larger the number of relaxations is, the lower the weight is. This implies that the original triple pattern gets the highest weight, and relaxed patterns with the same number of relaxations get equal weight.

In document Effective searching of RDF knowledge bases (Page 72-75)