• No results found

Intent-aware Model Merging

7.2 Intent-aware Search Result Diversification

7.2.3 Learning Intent-aware Ranking Models

7.3.2.2 Intent-aware Model Merging

After demonstrating the effectiveness of selecting a single ranking model for each sub-query with our model selection regime, in this experiment, we address re- search question Q2 from Section 7.3, by investigating whether deploying our model merging regime could bring further improvements. As discussed in Sec- tion 7.3.1.3, both regimes are based on the predictions given by an SVM clas- sifier. In particular, the model merging regime is enabled by fitting the SVM predictions to a logistic regression model. For this particular investigation, we focus our attention to the WT sub-queries, as they allow for assessing the ef- fectiveness of our merging regime across the two proposed training labelling al- ternatives, judg and perf. The results based on BS sub-queries using perf labels lead to identical conclusions and are hence omitted for brevity. In particu- lar, Table 7.5 shows the diversification performance of xQuAD under the model merging regime (Mrg(svm,perf)), in contrast to its performance under the model selection regime (Sel(svm,perf)). Once again, a first significance sym- bol for both regimes denotes a significant difference (or lack thereof) compared to DPH. A second symbol for Mrg(svm,perf) denotes significance compared to Sel(svm,perf), which serves as a further baseline for this investigation.

Table 7.5: Diversification performance of xQuAD using informational (inf) or navi-

gational (nav) models selectively (Sel) or through merging (Mrg).

Sq p(d|q, s) ERR-IA α-nDCG @20 − = + @20 − = + DPH 0.178 0.282 +xQuAD WT Sel(svm,judg) 0.244N 32 6 59 0.357N 30 6 61 +xQuAD WT Mrg(svm,judg) 0.255N◦ 29 6 62 0.368N◦ 28 6 63 +xQuAD WT Sel(svm,perf) 0.265N 26 6 65 0.380N 27 6 64 +xQuAD WT Mrg(svm,perf) 0.268N 26 6 65 0.381N 27 6 64 +xQuAD BS Sel(svm,perf) 0.241N 25 6 66 0.355N 29 6 62 +xQuAD BS Mrg(svm,perf) 0.237N 24 6 67 0.352N 27 6 64

From Table7.5, we observe that the model merging regime can improve upon the model selection regime in most cases. In particular, when using judg labels,

we observe improvements of 4.3% (0.255 vs. 0.244) in terms of ERR-IA@20, and 3.1% (0.368 vs. 0.357) in terms of α-nDCG@20. With perf labels, lower and in- consistent differences are observed, with the merging regime performing slightly better for the WT sub-queries and the selection regime performing better for BS sub-queries. Nevertheless, the observed differences between the two regimes are not statistically significant. These results answer research question Q2, by showing that merging multiple intent-aware ranking models can be at least as effective as selecting the single most effective model. Moreover, we believe that the merging regime can offer additional benefits for an intent-aware diversifica- tion. For one, it can help attenuate the harm of selecting the wrong model for a particular sub-query. Additionally, it provides a natural upper-bound for the selection regime. Indeed, model selection is a special instance of model merging, with a mutually exclusive probability distribution of intents p(ι|s).

7.4

Summary

In this chapter, we have addressed the third claim of our thesis statement, by showing that an improved estimate of the relevance of a document with respect to each sub-query leads to an improved coverage of this sub-query. In turn, an improved coverage of multiple sub-queries leads to an improved diversifica- tion performance, as demonstrated using our xQuAD framework, introduced in Chapter 4. As a means to improve coverage estimates, we built upon previous research on query intent detection for web search. In particular, we proposed to leverage ranking models that estimate the relevance of a document with respect to each sub-query by taking into account the intent of this sub-query.

In Section 7.1, we provided background on the categorisation of intents in web search, and described ranking approaches from the literature that success- fully exploited intent information in order to improve search effectiveness. In Section 7.2, we proposed two classification regimes for leveraging intent-aware ranking models according to the predicted intent of each sub-query: model selec- tion, which applies a single model given the most likely intent of each sub-query, and model merging, which combines relevance estimates produced by multiple models proportionally to the likelihood of each intent for a particular sub-query.

The model selection and model merging regimes were thoroughly evaluated in Section 7.3. In particular, in Section 7.3.2.1, our experiments showed that the model selection regime, choosing between an informational and a navigational ranking models on a per-sub-query basis, significantly outperforms each of these models when applied uniformly for all sub-queries, regardless of their predicted intent. In addition, in Section7.3.2.2, we showed that the model merging regime, which mixes the scores produced by the informational and the navigational mod- els, performs at least as effectively as the model selection regime.

Arguably, refined relevance estimates with respect to a sub-query could pro- vide not only an improved estimate of the coverage of a document that satisfies this sub-query, but also an improved estimate of the novelty of any further doc- ument satisfying this sub-query, given the previously ranked documents. Hence, it is not clear whether the gains in diversification performance observed in this chapter are merely due to an improved estimation of coverage, or whether novelty also plays a role. Investigating this question is the purpose of the next chapter.

Document Novelty

The previous chapter showed that an improved diversification can be achieved by improving the estimation of the relevance of each retrieved document with respect to each identified sub-query. For hybrid diversification approaches, such as xQuAD, this estimation can be leveraged to compute both the coverage and the novelty of a document. Nevertheless, it is not clear how coverage and novelty interplay, or what the role of novelty is when diversifying the search results.

In this chapter, we challenge the common view of novelty as an intuitive diver- sification strategy, and thoroughly assess the impact of this strategy in contrast to and in combination with coverage. To this end, Section 8.1 briefly recaps on our definitions of aspect representation and diversification strategy, as introduced in Section 3.3. Section 8.2 proposes a unifying methodology to enable the direct comparison of existing diversification approaches across these two dimensions. Following the proposed methodology, in Section 8.3, we thoroughly investigate the role of novelty as a diversification strategy, through both an empirical evalu- ation as well as through simulations. Our results show that existing approaches based solely on novelty cannot consistently improve upon a non-diversified base- line ranking. Moreover, when deployed as an additional component by hybrid approaches, we show that novelty does not bring significant improvements, while adding considerable efficiency overheads. Finally, through a comprehensive anal- ysis with simulated rankings of various quality, we demonstrate that, although inherently limited by the performance of the initial ranking, novelty plays a role at breaking the tie between documents with similar coverage scores.

8.1

Diversification Dimensions

The most prominent diversification approaches in the literature can be organ- ised according to two orthogonal dimensions, as proposed in Section 3.3: aspect representation and diversification strategy. The aspect representation determines whether the possible information needs underlying a query are represented ex- plicitly, based upon properties of the query itself (e.g., query reformulations or categories), or implicitly, based upon properties of the retrieved documents (e.g., the terms comprised by each document). In turn, the diversification strategy determines how a particular aspect representation is leveraged to diversify the retrieved documents. In particular, novelty-based approaches achieve this goal by comparing the retrieved documents to one another, in order to promote those that carry new information. In contrast, coverage-based approaches directly esti- mate how well each document covers the identified query aspects. Finally, hybrid approaches combine the goals of coverage and novelty into a unified strategy.

Unfortunately, the prevalence of different aspect representations has precluded a direct comparison between coverage and novelty. As a result, it remains unclear whether the striking difference in performance commonly observed between cover- age and novelty-based approaches is due to their underlying aspect representation (explicit vs. implicit) or to their diversification strategy (coverage vs. novelty). It is also unclear how much novelty actually contributes to the effectiveness of hybrid approaches, while penalising their efficiency. Although intuitive, novelty has yet to be shown effective for diversifying web search results. In particular, existing evidence of the effectiveness of novelty as a diversification strategy is based on either qualitative studies (Carbonell & Goldstein, 1998) or on curated corpora, such as Wikipedia (Rafiei et al.,2010) or newswire (Wang & Zhu,2009). To allow a thorough investigation of the role of novelty for search result di- versification, in the next section, we adapt two existing novelty-based approaches to leverage explicit query aspect representations. Likewise, we produce coverage- only versions of two approaches that deploy a hybrid of coverage and novelty, including our xQuAD framework. By doing so, we bridge the gap between the diversification approaches in the literature and enable their evaluation in terms of the aspect representation and the diversification strategy dimensions.