Experiments - QUERY REFORMULATION WITH SYNTACTIC OPERATORS

PART I Modeling Ambiguous Search Intent

CHAPTER 5 QUERY REFORMULATION WITH SYNTACTIC OPERATORS

5.4 Experiments

In this section, we present the experiment results for evaluating the proposed methods. We begin by introducing the experiment setup, followed by the details of our evaluation results. Finally, case studies are presented for better understanding of our system.

5.4.1 Experiment Setup

We use TREC 2004 Robust track [71] as our experiment dataset. The dataset includes TREC disk 4&5 minus congress reports. There are around 500,000 documents in the dataset. The query set consists of 250 queries. The title field and description field are used to represent different types of queries. The average length of title field and description field is 2.7 and 15.6 terms, respectively. We refer the title field as (relatively) short query, and the description field as (relatively) long query. Not all the queries are difficult queries. To simulate the scenario of search difficulty, we adopt the minimum deletion method proposed by Wang et al. [73]. We use BM25 as our baseline. In addi- tion, we also implement MultiNeg, a state-of-the-art method for using negative feedback [73]. It is worth noting that our method does not conflict with existing methods for using negative feedback. Instead, we use negative feedback in a different manner, which could potentially bring complementary factor to the existing methods. To test this idea, we further combine the use of syntactic reformulation with the MultiNeg method.

Table 5.3: Automatic Syntactic Reformulation to Directly Refine Search Results on Long Queries (Description) with BM25 NDCG@10 P@1 MAP BM25 0.065 0.157 0.089 Necessity† 0.077 0.205 0.104 Phrase 0.069 0.161 0.093 Operator-Combination 0.065 0.161 0.088 Result-Combination† 0.079 0.209 0.114 MultiNeg† 0.074 0.216 0.093 RC+MultiNeg†‡ 0.081 0.221 0.115

MultiNeg Wang et al. reported that the MultiNeg strategy outperforms all the other existing methods for using negative feedback, including Rocchio like SingleQuery method and SingleNeg method. Therefore, we implement MultiNeg methods for both BM25 as a baseline system. The MultiNeg methods work by combining the original score of each document to be re-ranked with a penalty score.

Scombined(Q, D) = S(Q, D) − βS(Qneg, D) (5.6) where the penalty score is computed by looking at each negative document separately:

S(Qneg, D) = max Q0_∈NS(Q

0_{, D)} _(5.7)

where S(Q0, D) is the similarity of the negative document Q0 and document D. In this case, it is the BM25 ranking score.

We evaluate automatic syntactic reformulation with necessity and phrase operator by using them to directly refine search result by re-ranking unseen documents. NDCG@10 is used as the primary metric. All the reported results are based on 5-fold cross validation.

5.4.2 Experiment Results

Table 5.3 and Table 5.4 show the performances on long queries (description) and short queries (title) respectively. The base retrieval model we use is BM25. Runs that show statistical significant improvement over the baseline model (p − value < 0.05) are marked by†. We also compare our performances with another baseline system for using negative feedback, i.e. MultiNeg. Runs that

Table 5.4: Automatic Syntactic Reformulation to Directly Refine Search Results on Short Queries (Title) with BM25 NDCG@10 P@1 MAP BM25 0.078 0.217 0.111 Necessity 0.081 0.229 0.115 Phrase 0.083 0.221 0.119 Operator-Combination 0.076 0.201 0.108 Result-Combination 0.082 0.225 0.119 MultiNeg 0.078 0.216 0.111 RC+MultiNeg†‡ 0.084 0.225 0.119

show statistical significant improvement over the MultiNeg methods (p − value < 0.05) are marked by‡.

We see that for long queries, reformulation with necessity operator achieves higher performances than with phrase operator. However, for short queries, phrase operator tends to work better. Long queries are much more verbose and noisy, and thus more likely have the central topic missed in returned documents. Reformulation with necessity operator is more useful here as it discovers the important but underrepresented topical term and ensures it to be matched. A short query has fewer keywords and therefore less noise. However, the connection between keywords is usually lost as a cost of being succinct. Reformulation with phrase operator alleviates the problem by imposing the phrase constraint strongly connected terms. We also see the Result-Combination strategy always outperforms Operator-Combination. Result-Combination usually brings further improvement to the performance. More importantly, it provides a more robust solution compared with reformulation with single type of operator.

Result-Combination outperforms MultiNeg. MultiNeg method takes the entire content of a negative document as non-relevant. In our method, we try to find the commonly missing semantics among negative examples. Compared with MultiNeg, this is a more precise way of using negative feedback and is therefore more effective. The performance could be further improved by combining syntactic reformulation with MultiNeg (RC-MultiNeg). Since our method does not directly use the negative information to refine scores of documents, it is complementary with the existing methods that work in this way.

We see that it is generally more difficult to improve the search results for short queries. Short queries are typically more succinct. As a result, there is less room for applying syntactic operators.

Table 5.5: Examples of Suggested Queries

ID Query NDCG

1 Original find instances plagiarism literary journalistic worlds 0.023 Suggested find instances +plagiarism literary journalistic worlds 0.137 2 Original fear open public places agoraphobia widespread disorder

relatively unknown 0.000

Suggested fear open public places +agoraphobia widespread disorder

relatively unknown 0.142

3 Original impact chunnel british economy life style british 0.046 Suggested impact +chunnel british economy life style british 0.201

4 Original commercial uses magnetic levitation 0.081

Suggested commercial uses “magnetic levitation” 0.333

5 Original maternity leave policies various governments 0.208 Suggested “maternity leave” policies various governments 0.330

5.4.3 Cases Studies

In order to better understand how syntactic reformulation works for improving retrieval performance, we show some concrete examples of automatically reformulated syntax queries in Table 5.5.

From these examples, we see syntactic operators help convey query intents and clear ambiguities. For instance, the original query of query 5 does not convey clearly that “maternity leave” is a phrase with a specific meaning. It caused ambiguities as the terms are matched separately. Auto- matic syntactic reformulation eliminates the ambiguities by stressing the phrase match on the two terms.

Syntactic reformulation also discovers the underrepresented concepts in the keyword queries. For instance, the term “chunnel” in query 3 is overlooked in the original query as there are other popular topics in the collection that match the rest of keywords well. Our algorithm is able to detect this problem and solve it by applying necessity operator on the term.

Our algorithm is also able to discover the representative terms in queries. In query 2, “agoraphobia” basically represents the entire query intent, while explanation terms are noisy and distractive. By emphasizing on this term, it maintains relevance to the central topic and dismisses the unneces- sary distractions.

In document Intent modeling and automatic query reformulation for search engine systems (Page 71-75)