• No results found

Deriving Query Intent Popularity from Content Dynamics

5.5 Temporal Dynamics of Multi-faceted Queries

5.5.2 Deriving Query Intent Popularity from Content Dynamics

In the previous I proposed an approach and methodology to evaluate the effectiveness of deriving multi-faceted query intents from Wikipedia article content structure. In this section, to answer RQ2.4 I observe how the temporal dynamics of query intent popularity is reflected by content dynamics, that is, the frequency of section change activity in Wikipedia.

With that goal in mind, I compare the ground-truth temporal popularity of a query intent (obtained from a query-log) to the temporal dynamics of changes made to the intent repre- sentation in Wikipedia, i.e., the frequency of changes made to the informational content in the particular article section.

Experimental Procedure

Comparing the temporal dynamics of ground-truth query intents and Wikipedia article sec- tions requires comparable time series of query volume and section change activity.

Previous user behaviour contained in query logs captures the past popularity of query intents over time. As I do not have access to a suitably large-scale and long-term query log, I rely on

the temporal query volume data provided by Google Trends1 as the ground-truth popularity

temporal dynamics for each query intent. An example illustrating the comparison of these temporal dynamics is shown in Figure 5.11 for two query intents of the 2011-2012 Thailand Floods. The significant correlation between query intent ground-truth popularity and related Wikipedia section changes is prominent.

Temporal changes of Wikipedia article sections can be obtained by comparing contiguous Wikipedia article revisions. The stream of changed sections can be mined from the stream of article revision text. Standard diff and patch operations identify locations of text changes between adjacent revisions. Each change can in turn be resolved to a specific section by seeking its nearest parent section title header.

5.5 Temporal Dynamics of Multi-faceted Queries

Figure 5.11: Temporal dynamics of popularity from 2011-09-18 for two query intents of the 2011-2012 Thailand Floods. Each time series was normalised (maintaining magnitude) and exponentially smoothed for clarity (α = 0.35).

Section structure (e.g., section presence and hierarchy) is constantly evolving during collab- orative editing. In some cases, this poses challenges for identifying the relevant section after sequential article revisions. In many cases, a re-organisation leaves the original content, yet provides new structuring.

Evaluation

To compare the ground-truth query intent popularity and related Wikipedia article section change activity, I aggregate observations to represent their temporal dynamics as time series of differing temporal resolutions. Using down-sampling, I experiment with three temporal resolutions: 1, 7 and 14 days. A lower resolution (i.e., 14 days) acts to smooth erratic tempo- ral variation and noise which would be more apparent at higher resolutions, where short-term temporal factors may also effect comparison (e.g., weekday-to-weekend natural variance). Pearson’s correlation co-efficient, r, is used to measure the temporal correlation between the temporal dynamics. Time series representing the temporal dynamics were constructed and down-sampled as outlined in Section 3.13 on page 53.

5.5.3

Results

In this section I report results from the experiments conducted for RQ2.3 and RQ2.4, using the approaches and methodology outlined in the previous sections.

5.5 Temporal Dynamics of Multi-faceted Queries

Table 5.2: Recallwiki of weak and strong matched intents, for topics 1-20 (All), 1-10, and

11-20.

Weak Strong

Topics Recallwiki Recallwiki

All 0.89 0.68

1-10 0.87 0.64

11-20 0.92 0.72

Table 5.3: Average temporal correlation between Wikipedia section change activity and ground-truth query popularity.

Average Pearson r Temporal resolution

Topics 1 day 7 days 14 days

1-10 0.32 0.49 0.58

11-20 0.15 0.25 0.33

Deriving Query Intents from Content Structure

In Table 5.2 I report Recallwiki to evaluate the effectiveness of query intent representation

in Wikipedia article sections against the ground-truth, (i.e., # intents that Wikipedia covers / # total intents). The recall is initially computed per-topic, however I report the average computed over all, short- and long-term topics for comparison.

In all cases, recall of query intents by Wikipedia sections is medium to high. Around 10% to 20% more query intents are matched with weak matching rather strong matching. Weak matching yields the highest recall of query intents with 0.89 for all topics.

Deriving Query Intent Popularity from Content Dynamics

In Table 5.3, I report the average correlation r between the ground-truth and Wikipedia intent representation popularity, for each temporal dynamic time series resolution, and topics 1-10 and 11-20. Short-term events have the greatest temporal correlation correlation, particularly at larger temporal resolutions.

5.5.4

Discussion

In this section I discuss the results presented in the previous section with respect to the re- search sub-questions in this chapter related to event-driven multi-faceted queries (i.e., RQ2.3 and RQ2.4).

5.6 Chapter Conclusions RQ2.3: Does content structure of related information reflect query intents for event- driven multi-faceted queries?

In Table 5.2, I observe that Wikipedia sections do substantially reflect user’s query intents

as there is a relatively high Recallwiki for both strong (0.68) and weak (0.87) match assess-

ments, thus answering RQ2.3. From closer examination, it is apparent the query intents without coverage are generally those related to a specific resource (e.g., “bbc”), or generic type of information (e.g., jokes or videos), and so are missing from Wikipedia. These query intents are less likely to change over time as they refer to generic facets common for many event-driven topics, and so, their lack of presence in Wikipedia does not harm the general applicability of this approach.

RQ2.4: Do content dynamics of this structured information correlate with query intent popularity for event-driven multi-faceted queries?

In Table 5.3, I observe the temporal correlation between query intent popularity and Wikipedia section editing temporal dynamics. Considering the raw Wikipedia article revision stream is relatively noisy (e.g. if one editor repeatedly commits tiny changes), short-term topics have a relatively strong correlation at all temporal temporal resolutions. Evident is that as the tem- poral resolution increases, correlation is increased as daily noise is aggregated and smoothed. For long-term events, correlation increases with a larger temporal resolution, e.g., 14 days. This is likely caused not only by noise smoothing, but also the fact that longer events may consist of aspects which develop more slowly over many weeks, rather than just days.

5.6

Chapter Conclusions

Short and non-specific queries are a common problem for search engines. Such queries lead to uncertainty about the user’s information need, and hence, what information is relevant. Ad- dressing this problem, intent-aware search result diversification approaches interleave search results covering the most popular interpretations (or, query intents) to satisfy as many users possible. Since uncertainty changes over time, in this chapter I considered time-sensitive search result diversification as a practical means to integrate temporal relevance into time- aware IR.

In this chapter I explored the temporal dynamics of query intents for both ambiguous and multi-faceted queries, with a view to facilitating temporal relevance in time-aware IR. Given that an adequately large-scale and long-term query log is unavailable for time-aware IR re- search, I first argue the suitability of Wikipedia as a surrogate source of temporal dynamics. This argument is based on the findings of several studies which have examined topical, cov- erage, real-time and behavioural aspects of Wikipedia from different perspectives. Using

5.6 Chapter Conclusions temporal dynamics sourced from Wikipedia, I have analysed the temporal dynamics evi- dent in ambiguous queries, and developed techniques to support the temporal dynamics in event-driven multi-faceted queries. The findings made in this work provide a foundation to future work in modelling temporal relevance as time-sensitive search result diversification. In the following two sections I outline the conclusions specific to ambiguous and multi-faceted queries.

Temporal Dynamics of Ambiguous Queries

Temporal dynamics play a central role in the intended intent of many ambiguous queries. Optimal intent-aware result ranking is far from stationary for the overwhelming majority of ambiguous queries. Indeed, ranking changes frequently over hours, days and months be- cause of both random effects, and to a much greater extent, temporal dynamics. Importantly, differing types of ambiguous queries exhibit varying temporal dynamics. Person/place/film entities change hourly throughout the day, in contrast to acronyms which change daily. Nei- ther periodicity nor event-driven ranking changes are exclusive. As illustrated in Figure 5.4, and indicated by the results I presented in Section 5.4.4, many distinct temporal dynamics interact to produce complex temporal ranking effects. Modelling these compound effects is a considerable challenge. Established time series modelling approaches (Radinsky et al., 2013b) will likely need to be adapted to provide truly time-sensitive ranking, which is proac- tive to the changing influences and needs of web search users – which is often not predictable solely from past popularity evidence. Overall, the findings of this work hold several implica- tions for developing temporal relevance approaches to support real-time information seeking behaviour, including: (1) the need to support varying temporal dynamics for intrinsically different types of queries over hours, days and months, (2) past query intent popularity ev- idence is important, but cannot be considered in isolation, (3) external sources may offer insight for anticipating upcoming ranking changes caused by unpredictable event-driven in- fluences. Moreover, with sufficiently large-scale ground truth query intent ranking provided

by Wikipedia, “back-testing”1 methodologies can be employed to evaluate future temporal

relevance models based on past temporal query intent popularity and social signals.

One clear limitation of these findings that they are based on Wikipedia data, rather than real long-term query log past interaction data – which is not available for open research. While I argued that Wikipedia temporal dynamics reflect those found in query logs in many cases, there may be many instances where that is not the case. Consequently, while these findings are suggestive of the temporal dynamics found in ambiguous queries, further work is necessary to validate these findings with a real large-scale proprietary query log. By analysing a full

1Back-testing refers to evaluating a model on past time periods, as is commonly employed to validate

5.6 Chapter Conclusions query log, as well as understanding what percentage or ambiguous queries are affected by temporal dynamics, further analysis will highlight the percentage of overall query volume that is affected – and hence, the satisfaction impact possible.

Temporal Dynamics of Multi-faceted Queries

Temporal dynamics play a central role in the query intent popularity of event-driven multi- faceted queries. Without knowledge of the emerging query intents, and their temporal pop- ularity, applying effective intent-aware query ranking is problematic. Reducing the reliance on past query logs to mine query intents, I have found related content structure and dynamics can be used to derive multi-faceted query intents for event-driven topics (e.g., long-running news events). In particular, I have shown that the majority of major event-driven query intents can be represented by Wikipedia article sections and subsections. Moreover, the popularity of each of these intents over time is equally reflected by the editing activity of informational content in each section. Consequently, Wikipedia article structure offers a means to under- stand (i) the query intents currently present and emerging, and (ii) the temporal popularity of each intent. Overall these findings lead to new methods for modelling query intents to support intent-aware search result diversification in scenarios where there is insufficient, or outdated past query log evidence to adequately rank query intents in real-time.

Future work on this theme will investigate how this approach can be extended to more general content (e.g., all web pages), and less quickly evolving multi-faceted queries which may still have temporal elements. Moreover, given the manual data collection and cleaning necessary to evaluate the proposed approach, I employed a relatively small sample set of event-drive queries which may not be fully representative of all types of event-driven queries (Kairam et al., 2013). Accordingly, future work needs to characterise differing types of events (e.g., sports, natural disasters, politics, and so on) over more periods of time, to ensure any un- derlying temporal event factors such as the impact and social interest are better captured to validate these findings, and highlight any types of events that may be problematic (e.g., users are less prone to update the content in real-time in Wikipedia, or alternatively, or more likely to falsify content on Wikipedia).

Part III

Exploiting Temporal Dynamics in

Collections

Chapter 6

Temporal Semantic Query Expansion

In Chapter 3, I explored several temporal dynamics evident in information collections, in- cluding patterns and trends in word and phrase use over time. Despite the inherent temporal dimension of many collections – such as the web, news and tweets – the majority of infor- mation retrieval research has concentrated on developing models based on a static view of the collection as a whole. In particular, the statistical measures and distributions used to characterise information, relationships and topics by conventional retrieval models are often assumed to be stationary over time. In this chapter, I explore methods to exploit the tempo- ral dynamics of index term use in time-based collections to improve retrieval performance. Based on the epiphenomenon that semantically similar index terms have highly similar tem- poral dynamics, I propose a novel approach for identifying a topic’s chronotype terms – that is, the cluster of consistently temporally related words which comprise the topic. I exploit the terms uncovered by this method in automatic query expansion to improve IR system ef-

fectiveness for diverse time-based collections.∗

6.1

Introduction

The majority of past information retrieval (IR) research has typically centred around devel- oping and evaluating approaches using relatively short-term or static snapshots of real-world time-based collections, including web documents, news stories and user-generated content such as blogs and tweets. As a result, such approaches disregard the underlying temporal dynamics captured in many time-based collections as they change over time.

This has led to a prevalence of retrieval models and related approaches developed to per- form well empirically for static and stable collections. However, the majority of real-world collections evolve as new items are added incrementally over time. This means many of

6.1 Introduction the statistical measures relied upon by IR approaches – such as term frequency, specificity and semantic relationships – are far from stationary over time, as I showed previously in Chapter 3. Evolution in the composition of the collection may mean many IR approaches are not optimal over time. Nevertheless, static IR approaches fail to exploit the potentially rich insight in terms of temporal structure and meaning afforded by the underlying temporal dynamics of the collection. In particular, relatively little work has exploited the temporal dynamics of term popularity and semantic similarity in IR. In this chapter I look to exploit the temporal dynamics of term popularity in a collection for improving retrieval effectiveness through query expansion (QE).

QE is a technique commonly employed to augment a user’s query with additional highly re- lated terms to improve retrieval performance. It is motivated by the fact users often struggle to formulate specific and descriptive queries for their information needs. Indeed, Teevan et al. (2011) recently found an average of 3.08 and 1.64 words used for web and Twitter queries, respectively. More specifically, QE aims to improve the document matching capability and topic coverage afforded by a query, and thus ultimately improve relevance ranking. Pseudo-

relevance feedback (PRF)1is a common approach to query expansion. PRF assumes the top-k

ranked documents retrieved by the original user-provided query are relevant. Subsequently, by identifying distinctive terms contained within these documents, the original query can be expanded with further terms descriptive of the query’s topic to improve retrieval effectiveness by better matching further relevant documents (i.e., increasing recall, and if possible, main- taining precision). Since the seminal work of Rocchio (1971), PRF has become established as an effective method to improve retrieval system performance on average (Carpineto and Ro- mano, 2012). However, PRF is known to be problematic in many scenarios since it can have an erratic effect on retrieval performance as it can both substantially harm, as well as dra- matically help individual query performance. It would be interesting to study how temporal dynamics can be exploited to improve the effectiveness of PRF techniques.

At the heart of any PRF approach is the method used for distinguishing the most distinctive terms from the PRF (or, local) documents. Distinctive terms will ideally only be found in relevant documents, and thus will discern relevant from non-relevant documents following PRF. The majority of established approaches (Lavrenko and Croft, 2001; Rocchio, 1971; Zhai and Lafferty, 2001) rely on non-temporal local term importance and global (or, collection- based) term discriminability statistical measures. In general, temporal dynamics have seen little consideration in QE and PRF approaches.

In this chapter, I posit the temporal dynamics of term popularity are valuable for identifying highly related (or, semantically similar) terms suitable for QE in time-based collections. Ac-

6.1 Introduction cordingly, based on the epiphenomenon that semantically similar terms typically have very similar temporal dynamics (Alfonseca et al., 2009; Chien and Immorlica, 2005; Radinsky et al., 2011), in this chapter I hypothesise that the most effective terms for QE in a time-based collection are not only those which are distinctive in PRF documents (i.e., non-temporal ev- idence), but also those with a consistently high degree of temporal dynamic similarity with one another (i.e., temporal evidence). I consider these terms to comprise the query topic’s chronotype. Importantly for QE, chronotype terms appear together in documents persistently over time – and thus, have a stable, or at least temporally significant semantic similarity over the collection time period. These terms are therefore optimal candidates for QE since they can distinguish relevant from non-relevant documents consistently over time in a time-based collection which contains documents covering evolving topics.

To examine the aforementioned hypothesis, I propose Temporal Semantic Query Expansion (TSQE). TSQE relies on a Temporal Semantic Network (TSN) to capture non-temporal (i.e., term frequency in PRF documents) and temporal (i.e, temporal semantic similarity between terms) evidence available for candidate QE term selection. Combining temporal and non- temporal evidence, network analysis of the TSN is employed to score the candidate QE terms and determine those which are most valuable.

6.1.1

Motivation

Despite the prevailing assumption in much of conventional time-insensitive IR, term seman- tics are rarely stationary over time. Of course, the period over which change occurs is de- pendent on the collection. For example, in rapidly changing real-time user-generated content (e.g. Twitter), words and phrase semantic relations could change in minutes as unfolding events take precedence. In contrast, books may take many years – even decades to centuries to reflect shifts in semantics (Michel et al., 2010). To consistently satisfy users over time, IR should take into account the inherent change in collections such as these.

Radinsky et al. (2011) establishes that temporal dynamic similarity is a strong indicator of semantic similarity between terms, however this finding has never been operationalised to im- prove IR system effectiveness. In an early preliminary study, I employed a naive independent term model for QE based on temporal dynamic similarity and found small but significant im- provements in retrieval effectiveness. The approach and results of this study are summarised in Appendix C. In this chapter, I present the subsequent refined approach for QE based on