• No results found

Deriving Query Intents from Content Structure

5.5 Temporal Dynamics of Multi-faceted Queries

5.5.1 Deriving Query Intents from Content Structure

To answer RQ2.3, in this section I evaluate the effectiveness of deriving query intents from Wikipedia article content structure (i.e., article sections), compared to query intents obtained from ground-truth query log. As I am studying events following their occurrence, for simplic- ity at this stage I assume that all ground-truth query intents are present during the event. The experimental procedure conducted is outlined in the following sections. I first select a set of 20 event-driven queries/topics to build a test set for answering RQ2.3 and RQ2.4. Follow- ing this, I describe how to obtain the ground-truth of possible query intents, and describe the methodology for assessing the matches between ground-truth query intents and Wikipedia article sections. Finally, I present evaluation results of deriving event-driven query intents using Wikipedia article structure.

Queries

I select two categories of event-driven queries for evaluation, related to significant events be- tween January 2010 and December 2012. Four international graduate students were asked to

collectively identify major events in the Wikipedia 2011-2012 News pages1, and provide the

query they would have used to find general information on the event. From this pool of events

5.5 Temporal Dynamics of Multi-faceted Queries

Table 5.1: Example short- and long-term event-driven queries, along with their multiple query intents (obtained from Google Related Searches).

Query (topic) Query intents (from query-log)

Eyjafjallajokull Short-term

(Topics 1-10)

eyjafjallajokull effects, eyjafjallajokull facts, eyjafjallajokull volcano webcam, how to pronounce eyjafjallajokull, eyjafjal- lajokull bbc, eyjafjallajokull case study

Libya

Intervention Long-term

(Topics 11-20)

libya intervention responsibility to protect, libya intervention poll, libya intervention debate, libya intervention timeline, libya intervention nato, libya intervention legality, libya inter- vention oil, libya intervention success

and queries I selected two sets of 10 topics based on the following characteristics.

All topics are themselves an event, with each having a central descriptive Wikipedia arti- cle. Topics 1-10 are relatively short events (e.g., severe weather or a shooting), which have most temporal interest between 1 to 14 days. In contrast, topics 11-20 are prolonged events which happen over many weeks, months or even years (e.g., the Libya Intervention). Often these longer events are composed of many facets, concerning different people, places and

interaction over time1. Two example categorised queries with their multi-faceted intents are

presented in Table 5.1. The motivation for choosing these two categories of queries is to re- flect events with diverse temporal characteristics, evolution and interest. A full listing of the events/queries used is provided in Appendix B.

Query Intents

I first obtain the ground-truth of query intents for each event-driven query. For large-scale commercial search engines, the ground-truth of intents should be based on a large number of users. Since I do not have a query log covering the event periods, I instead propose an approach to derive intent ground-truth using features provided by a commercial search en- gine. Since Google is the most universally used commercial web search engine, I examined the suggestions provided by Google Query Auto-Completion, Google Related Searches and Google Trends Related Searches. To select the best source, I define the criteria as follows: (i) the source should cover a variety of diverse intents/facets of an event, and (ii) it should cover the most popular query intents so that temporal statistics can be obtained. Based on my observations, I believe that the queries suggested by Google Related Search met the criteria and therefore I selected it as the ground-truth for this study. Google Auto-Completion and

1In this work I do not investigate seasonal event-driven queries such as Christmas. Instead, I am interested

5.5 Temporal Dynamics of Multi-faceted Queries Google Trends Related Searches data either over-reward tail queries or do not cover multi- ple diverse query intents. An example of ground-truth query intents obtained in this way is shown in Table 5.1.

Intent Matching and Assessments

To establish which ground-truth query intents are reflected by Wikipedia article sections, I attempt to match each ground-truth intent to a possible section and furthermore, assess the match strength. This consists of several steps: (i) event article identification: identifying multiple Wikipedia articles that are most related to event-driven topics, (ii) section-intent automatic matching: retrieving sections from the articles identified above, that might match the intents (for further assessments), and (iii) match assessments: assessing match strength between retrieved sections and query intents. I illustrate each step in detail below.

1. Article Identification. Before listing all the candidate sections that can be potentially matched to the query intents, the set of Wikipedia articles most related to each event-

driven topic, {Atopic}, must be identified. Major events are typically represented by

a central article (e.g. “Occupy Movement”), with related articles detailing substan- tial aspects such as “Reactions to the Occupy Movement”, “Occupy Movement in the United States” and “Occupy Canada”. As this work concentrates on a small number of topics I manually identified related articles as those linked from the central article via “See also:” and “Main article:” references, although past work has proposed automatic methods (Hu et al., 2009).

2. Section-Intent Auto-Matching. I posit that a query intent is reflected by one or more

sections (or, subsections) contained in {Atopic}. For example, the ‘Occupy Movement’

article has sections including: ‘Background’, ‘We are the 99%’, ‘Goals’, ‘Methods’, and ‘Protests’ (with a subsection for each participating country). Despite the hierarchi- cal nested structure of Wikipedia sections, to avoid complexity I employ a flat section structure. Hierarchy is particularly challenging for Wikipedia articles as it will change dramatically over time, so is left for later work. Matching between query intents and article sections was performed semi-automatically. To begin, I took each ground-truth query intent and extracted the intent key terms and automatically retrieved up to three

sections from {Atopic} which most contained the term in their header title or text. For

example, for the query intent ‘libya intervention oil’, I identified article sections con- taining the term ‘oil’, such as ‘Controversy’ and ‘Oil Supply Disruption’.

3. Match Assessments With the large pool of potentially matched sections retrieved by the system described above, two separate individuals were asked to annotate the extent

5.5 Temporal Dynamics of Multi-faceted Queries to which each section reflected the intent. Assessments were made in three grades: either a strong match (i.e., section is entirely about the intent), weak match (i.e., loosely related) or no match. For 92% of intents, the annotators were in agreement of match grade. For the remainder with labelling conflicts, where a no match label was present, it was selected by default. Similarly, where a ‘strong’ and ‘weak’ label were selected for an intent, they were resolved by defaulting to a ‘weak’ label. This annotated dataset provided the final ground-truth intent and Wikipedia section matches for study.