System Performance - Experimental Results

Experimental Results

7.2 System Performance

In this section, we focus on the performance of the different steps of our system. On the one hand, we measure the performance of the disambigua-tion process taking into account different depths when comparing ontolog-ical contexts. On the other hand, we evaluate the impact of the reduction techniques we apply during the semantic query generation; and, finally, the performance of the generation step along with the semantic filtering (it in-cludes both the local and global checks).

7.2.1 Keyword Disambiguation Performance

To test the feasibility of our disambiguation techniques, we have performed a set of tests to evaluate its performance in a detailed and systematic way.

The tests were executed on a Sunfire X2200 (2 x AMD Dual Core 2600 MHz, with 8GB RAM). We have used the same set of ontologies as in the previous

section (the test collection OWLS-TC4 plus the ontologies dbpedia 3.6.owl, schema.org, People+Pets, Koala, Animals, and WordNet). Thus, a total of 55 ontologies were consulted by our prototype.

The input keywords were not selected randomly but based on actual queries proposed by students of different degrees with skills in Computer Science. We considered fifty sets of input keywords to perform the tests, ten for each number of keywords. In Figure 7.1, the results for different sizes of the inputs are shown.

Figure 7.1: Keyword disambiguation performance evaluation.

As it can be seen, the disambiguation times depend on which depth is considered for matching (i.e., how many levels of parent and children terms in the ontological context). From the experiments, we have seen that using a depth greater than two lead to wrong results. This is due to the fact that the closer you get to the TOP concept in the ontologies, the more false positives appear, as too general subsumer terms are considered. So, a depth of two levels is considered to be semantically optimal. The cached results corresponds to executions on which the extraction procedures had been already performed and stored, as at first, it was the most expensive task.

7.2.2 Evaluation of Query Generation

We turn our focus now on the performance of the query generation step. The tests have been carried out using Pellet¹⁴ 1.5 as background DL reasoner.

They were performed with the same settings, that is, on a Sunfire X2200 (2 x AMD Dual Core 2600 MHz, with 8GB RAM). For the sake’s of exper-iments repeatability, we selected two well-known ontologies: People+Pets and Koala. They are two popular ontologies of similar size to those used in well-known benchmarks such as the OAEI¹⁵. We only show the experimental results obtained with simplified BACK as output query language because most search approaches are based only on conjunctive queries. Nevertheless, we have also performed the experiments with another non-DL languages, and we obtained similar execution times and conclusions.

For the experiments, we considered different sample sets of input key-words (selected from the terms of the above ontologies) and measured av-erage values grouped by the number of keywords in the set. As in the evaluation of the performance of the disambiguation process, these inputs were based on actual queries proposed by students of different degrees with skills in Computer Science. The sets were chosen according to the following distribution: 10 sets with a single keyword (5 selecting a role and 5 selecting a concept), 15 sets with two keywords (5 sets where both keywords are roles, 5 sets where both keywords are concepts, and 5 sets where one key-word is a role and the other one is a concept), 20 sets with three keykey-words (5 with 2 concepts and 1 role, 5 with 1 concept and 2 roles, 5 with 3 con-cepts, and 5 with 3 roles) and, following the same idea, 25 sets with four keywords and 30 sets with five keywords. Notice that, even though our ap-proach can effectively deal with instances as well, we do not consider sets with instances because the selected ontologies do not have instances (as it happens frequently [WPH06]). We set the maximum number of keywords to 5, as the average number of keywords used in keyword-based search en-gines “is somewhere between 2 and 3” [MRS08], and thus we can see how our system performs with inputs below and above this average number of keywords.

We conducted four experiments: 1) no VTs added, the system works only with the user keywords; 2) one VT added, to try to find a possible missing keyword; 3) two VTs (1+1), is the same situation as 2) with an extra refinement step once the user has selected a candidate for the first VT to be rendered; and 4) two VTs added, to find two possible missing

14http://clarkparsia.com/pellet, last accessed October 3, 2013.

15http://oaei.ontologymatching.org/, last accessed October 3, 2013.

keywords at the same time¹⁶. We have also considered that the user inputs at least one keyword. The X-axis in Figures 7.2 and 7.3 represents the total keywords considered, i.e., the input and the VTs added by the system. Thus, considering 3 keywords, the results are for 3 user keywords (no VTs), 2 user keywords and 1 VT (one VT), and 1 user keyword and 2 VTs.

Figure 7.2 shows the average number of generated queries and the av-erage number of patterns that are presented to the user (notice that the Y-axis is in log scale).

Figure 7.2: Performance evaluation: average number of queries and shown patterns.

As expected, the number of queries generated rapidly increases with the number of input terms, as the more operands there are, the more queries can be built. Moreover, performing the semantic enrichment leads also to a significant increase in the number of queries because many new interpreta-tions appear. However, the use of query patterns reduces up to an average 92% the options that the user is presented with. Figure 7.2 also shows that, despite generating a higher number of queries, the system compresses the queries more when it has two VTs at once than in the other situations. This may be beneficial to the user, but it might require her/him more time

navi-16We do not consider adding more than 2 VTs because we do not aim at discovering the user’s intended query when too many keywords were missed in the input.

gating through the candidate keywords for the VTs. The number of queries is lower for two VTs (1+1) as, in the refinement step, the user has fixed a VT and there are less options. Last but not least, no possible interpretation (according the query language) is discarded.

Finally, the average times that the generation process takes are shown in Figure 7.3. They include the generation and the semantic filtering time.

Being the low they are makes the system suitable to be a responsive front-end (note that we would have to add the times of the sense discovery module, but this module can be use in a standalone mode as well). As it can be seen in Figure 7.3 (notice that the Y-axis is in log scale), the average times for 3 and 4 keywords are similar and really low (recall that the average number of input keywords was between 2 and 3).

Figure 7.3: Performance evaluation: processing time of the query generation step.

7.3 Summary of the Chapter

In this chapter, we have analyzed QueryGen in a qualitative and a quantita-tive way. Firstly, adopting a standard that is being used to query DBpedia from keywords, we have evaluated the semantic capabilities of our approach discovering the user’s intended meaning. Secondly, we have evaluated our system working with the DBpedia Adapter, which uses our simplified

ver-sion of BACK as query language. The results of both evaluations show the potential of our techniques. Moreover, the analysis of the queries that pre-sented problems has raised several issues that, far from being a dead end for the approach, will guide our future work regarding keyword interpretation.

In particular, exploiting the knowledge in Linked Data repositories and de-tecting the operators to be used are specially well positioned to be good lines of improvement.

Then, we have moved into performance related aspects of the system. At first, the disambiguation procedure might be seen as quite expensive so as to be used in a system with user interaction. Thanks to the inner structure and the reimplementation of several parts of the original code, we have managed to lower the times while not compromising the quality of the keyword dis-ambiguation. Regarding the query generation, our system presents a good performance. In particular, the inconsistent query filtering is fast enough thanks to the fact that, once we have the original ontology classified, the reasoners can assess the satisfiability of the expressions without reclassifying this ontology.

Finally, the impact of the reduction techniques has also been evaluated.

The results show that the reduction rates that are managed reduce the pos-sibilities presented to the user dramatically, which is an remarkable achieve-ment as it is done without losing any possible interpretation.

Conclusions

In this chapter, we present different conclusions about our work. Due to the broad aspects dealt with during the thesis’ period, first, we present the main contributions of QueryGen as a whole system, and then, we present the secondary contributions to different fields such as location-dependent queries. After this, we present the publications related to the work presented in this thesis, analyzing their quality according to different quality index rankings. Finally, we are aware that there is plenty of work ahead and, so, we present some future work.

In document Semantic Keyword-based Search on Heterogeneous Information Systems (Page 147-154)