4.7 Pure Type XML Retrieval
4.7.5 Completeness
The completeness proof demonstrates that any possible aboutness situation is covered within our reasoning system. We have to demonstrate that for any aboutness relationship between two XML documents, their corresponding representation as situations leads to . For all valid XML documents A and B ∈ X: If A about B then map(A) map(B). Proof Let S ≡ map(A) and T ≡ map(B). Also, let C be the subdocument of A if we remove B and U ≡ map(C). T T , as Reflexivity is given. With Left Monotonic Union, T ⊗ U T . With Set Equivalence and the definition of map, S T .
Having shown the completeness of our aboutness reasoning system, the reflection of pure type XML retrieval completes our discussion.
4.7.6 Reflection
For XML retrieval, we further specify the top and bottom document components and queries from Section4.5 by differentiating those that are either exhaustive (D Q) or specific (Q D). We then have eight cases to cover, which we present in a simplified notation. For a complete overview of all eight cases see Table 5.9 in Section 5.2.6. Here, we only present four cases, which have an entry in the table for pure type XML retrieval. Let D be a set of document components and Q be a set of queries:
1. A top exhaustive document component Dj is always exhaustively about any query
Q: {Dj|∀Q, Dj ∈ D, Dj Q}. The (virtual) root of the document collection is
Dj, as exhaustivity is promoted up the document forest.
2. A top exhaustive query Qj is the one which all document components D are exhaus-
tive answers to: {Qj|∀D, Qj ∈ Q, D Qj}. The top exhaustive query is {∅}. Let
us assume the document component D is itself {∅}. Then, the only query Q that any D is always an exhaustive answer to is {∅}. Let us assume D ≡/ {∅}. Then, with D ≡ map(A) the only always given subset to map(A) is {∅}. Therefore, {∅} is the top exhaustive query.
3. A top specific document component Dj is always specifically about any query Q:
{Dj|∀Q, Dj ∈ D, Q Dj}. The top specific document component is {∅}. The
proof is analogous to the one for top exhaustive queries.
4. A top specific query Qj is the one which all document components D are specific
answers to: {Qj|∀D, Qj ∈ Q, Qj D}. The top specific query is again the
(virtual) root of the document collection.
All the other entries in Table 5.9 are missing, as they are complementary statements. E.g., if a top exhaustive document component can be found, it is impossible that there is a bottom exhaustive query that will never find any answer in the document component set. We use this complementarity for our reflections of XML retrieval models to effectively reduce the number of reflections we have to do.
It might be surprising that the top specific query is like the top exhaustive document component the one that contains all the information in the document component set. This is the case as we are not looking for the most specific query — a question impossible to answer for all possible situations —, but we are looking for the one that delivers only specific results. This can just be the complete document tree, as all document components contain never more information than is present in the document tree.
4.8
Conclusion
This chapter has introduced our theoretical evaluation methodology. We have started with existing theoretical evaluation methodologies and have adjusted them to the requirements of XML retrieval. Our methodology is based on a well-defined number of steps, through
which we iterate each time we analyse a model. The first step is the translation to define a model’s symbolic representation of information as the result of its indexing mechanism. It is formally represented by the function map.
The next step in our methodology derives aboutness rules to describe the functional behaviour of XML retrieval systems. We have defined basic rules, combination and con- tainment rules and also non-aboutness reasoning though the latter are seldom used in IR reasoning.
Aboutness is defined as a relationship between situations. In a theoretical evaluation framework, rules are used to define the reasoning aspects of this relationship. Rules are the logical representation of how a system decides a document to be about a query. Rules do not hold for all aboutness decisions but only for particular ones. Thus, an aboutness decision can be specified by the reasoning rules it incorporates. The aboutness decision can be further qualified by analysing how these reasoning rules are implemented by it: fully, conditionally or not at all. [Wong et al.,2001] call this functional benchmarking.
By comparing the kind of rules a particular system incorporates and the way it does so, we are able to give an overall comparison of the behaviour of XML retrieval systems. A further investigation of aboutness boundaries for particular retrieval systems is called reflection, our third step of each theoretical evaluation. Translation, reflection and about- ness rules were developed in [Huibers, 1996] as part of the theoretical evaluation of any retrieval model. In XML retrieval, we are also interested in how much a retrieval system uses structure to support the aboutness decision. Theoretically, we measure this by deter- mining the difference in reasoning of the XML retrieval model to its ‘flat’ retrieval model equivalent (if there is one) and what we call pure type XML retrieval.The final step in our theoretical evaluation is the development of the pure type XML retrieval model to qualify the impact of structure on the retrieval performance.
Chapter 5
Theoretical Evaluation of XML
Retrieval Models
5.1
Introduction
This chapter goes through five XML retrieval models submitted to INEX and evaluates them theoretically using the methodology developed in Chapter 4. We are looking only at models that performed well in INEX and are therefore comparable. Furthermore, all of the models performed not just during INEX 2005 but over a longer period of time so that one can assume that models are well developed and potential problems we find are not the result of a premature submission.
For each of the models, we first describe its background including its retrieval algorithm and indexing mechanism. Secondly, we calculate an example that reflects various standard retrieval situations. This allows us to understand better the overall behaviour of the model. Thirdly, we proceed with our theoretical evaluation of XML retrieval by first presenting the equivalent flat document retrieval model, before in the forth step, we iterate through all the theoretical evaluation steps described in Section 4.2: translation, aboutness rules, completeness and reflection. We repeat this procedure for each model, starting with the XML vector space retrieval model (Section 5.2). In Section 5.3, we analyse two language models and finally in Section 5.4two structured models are introduced, which have been specifically designed for INEX.