Top-k Triple-Pattern Query Processing
5.1. Query Processing for Triple-Pattern Search
Our query processing framework handles three types of query processing tasks: triple-pattern queries, keyword-augmented triple-pattern queries, and query re- formulation. We explain each task separately in the following.
5.1.1. Triple-Pattern Queries
To search an RDF knowledge base, triple-pattern queries are expressed. For ex- ample, to find thriller movies and their directors, the following query consisting of two triple patterns can be issued:
?d directed ?m
?m hasGenre Thriller
Given a query with n triple patterns, the results of the query is the set of all tuples of n triples that instantiate the query triple patterns and satisfy the query join conditions denoted by using the same variable in more than one triple pat- tern. To find such tuples, we need to first retrieve an instantiation list Li for each triple pattern qi specified in the query which contains all the triples from the knowledge base that instantiate triple pattern qi. For instance, the instantiation list of the first triple pattern in our example query would consist of all triples with predicate directed. Once we have an instantiation list Li for each triple
pattern qi, we need to join them based on the join conditions specified in the query using some joining strategy. For our example query, a valid result would be a tuple of two triples T = (t1, t2), such that the first triple t1 ∈ L1, the second triple t2 ∈ L2 and the object of t1 is the same as the subject of t2. One example result for our example query above is the 2-tuple (i.e., tuple of two triples):
Quentin Tarantino directed Pulp Fiction Pulp Fiction hasGenre Thriller
Moreover, we would like to also provide a ranked list of query results rather than a set of unranked matches. This means that after joining all the triples from the different instantiation lists, we must rank the joined tuples using some ranking strategy. For example, the result tuples can be ranked using our ranking model for triple-pattern queries described in Chapter 3.
5.1. Query Processing for Triple-Pattern Search
5.1.2. Keyword-Augmented Triple-Pattern Queries
In addition to triple-pattern queries, our framework also processes keyword- augmented triple-pattern queries. A keyword augmented triple-pattern query is a triple-pattern query where one or more of the triple patterns are augmented with one or more keywords. For example, the following keyword-augmented query can be issued to retrieve thriller movies about serial killers and their di- rectors:
?d directed ?m
?m hasGenre Thriller serial killer
To be able to process keyword-augmented triple-pattern queries, we augment each triple in our knowledge base with a text snippet, which contains any con- textual text from the sources the triples were extracted from. This way, we can also process the keywords specified in a keyword-augmented triple-pattern query. The results of a keyword-augmented query consisting of n triple patterns is the set of all tuples of n triples that instantiate the query triple patterns, satisfy the query join conditions and whose text snippets match the keywords specified in the query. To find such tuples, we need to retrieve an instantiation list Li for each triple pattern qi specified in the query which contains all the triples from the knowledge base that instantiate triple pattern qi and whose text snippets match the keywords specified in qi, if any exists. Once we have an instantiation list Li for each triple pattern qi, we need to join them based on the join condi- tions specified in the query using some joining strategy. For example, one result for our example query above is the 2-tuple:
Alfred Hitchcock directed Psycho Psycho hasGenre Thriller
Once we have generated all possible joined tuples for a given query, we need to also rank them according to some ranking strategy such as the ranking model from Chapter 3.
5.1.3. Query Reformulation
Similar to query expansion in traditional IR, triple-pattern queries can be au- tomatically reformulated to improve their recall. In Chapter 4, we presented
directed hasGenre Thriller
actedIn: 0.413 var: 0.525 Action: 0.477 created: 0.418 var: 0.503 produced: 0.438
var: 0.472
Table 5.1.: Substitution lists for three resources and the score of each substitution
?d directed ?m δ ?m hasGenre Thriller δ
?d actedIn ?m 0.413 ?m hasGenre Action 0.477
?d created ?m 0.418 ?m hasGenre ?y 0.503
?d produced ?m 0.438 ?m ?y Thriller 0.525
?d ?x ?m 0.472 ?m ?y Action 1.002
?m ?y ?z 1.028
Table 5.2.: Reformulated triple patterns and their scores
a framework for query reformulation that reformulates a given triple-pattern query by reformulating one or more of its triple patterns. A triple pattern is in turn reformulated by replacing one or more resources (whether entities or re- lations) that appear in the triple pattern with similar resources or variables. In order to do so, we associate with each resource X in the knowledge base a sub- stitution list which consists of a list of resources ordered on some score. More precisely, let the substitution list of resource X be L(X). A resource Y ∈ L(X) would be associated with a score δ(Y, X) which measures how close resources X and Y are according to some distance metric. Moreover, L(X) would also contain an entry for the variable substitution (i.e., replacing X with a variable) which we denote by var, and this would also be associated with a score δ(var, X). The way we construct these substitution lists has been discussed in Chapter 4.
For example, consider the triple-pattern query:
?d directed ?m
?m hasGenre Thriller
Table 5.1 shows the substitution lists for the three resources that appear in the example query: directed,hasGenreandThrillerand their scores. Using
5.1. Query Processing for Triple-Pattern Search
?d directed ?m; ?m hasGenre Thriller δ
?d actedIn ?m; ?m hasGenre Thriller 0.413
?d created ?m; ?m hasGenre Thriller 0.418
?d produced ?m; ?m hasGenre Thriller 0.438
?d ?x ?m; ?m hasGenre Thriller 0.472
?d directed ?m; ?m hasGenre Action 0.477
?d directed ?m; ?m hasGenre ?x 0.503
?d directed ?m; ?m ?y Thriller 0.525
?d actedIn ?m; ?m hasGenre Action 0.879
?d created ?m; ?m hasGenre Action 0.884
?d produced ?m; ?m hasGenre Action 0.915
Table 5.3.: Top-10 reformulated queries for a given example query and their scores
these substitution lists, a list of reformulated triple patterns for each of the triple patterns in the example query can be constructed. Furthermore, the reformula- tions in each of these lists are associated with scores that represent how close the reformulations are to the triple pattern the list belongs to. The score of a refor- mulated triple pattern is computed as the sum of the scores of the substitutions that resulted in the reformulated triple pattern. Table 5.2 shows the set of refor- mulated triple patterns for each triple pattern in our example query and their scores δ.
Using the reformulation lists of individual triple patterns, a list of reformu- lated queries can be generated. Again, each such reformulated query is asso- ciated with a score which measures how close the reformulated query is to the original query, and the score of a reformulated query is computed as the sum of the scores of its triple patterns. Table 5.3 shows the top-10 reformulated queries for our example query.
Now, to process a query and all its reformulations, we must do the following. Let Q = (q1, q2, ..., qn)be the given query where qi is a triple pattern. Further- more, let{q1
i, q2i, ..., q mi
i } be all the reformulations of triple pattern qi. For each triple pattern qi, we must:
2. Retrieve the instantiation list Lji for each reformulated triple pattern q j i where 1 ≤ j ≤ mi
3. Merge all instantiation lists Li and L1i, L2i, ...., L mi
i
Once we have retrieved a merged list of triples for each triple pattern qi, we need to join them based on the join conditions specified in the query to produce joined result tuples which are then ranked using some ranking strategy such as the one explained in Chapter 4.