Contextual Related Terms - Interactive Information Retrieval with Structured Documents

On the one hand, a term suggestion mechanism is a very useful practice for assisting the searchers during query formulation, as human memory works better in recognising relevan- t/irrelevant information. It also takes less time to judge the relevance of terms than that of document surrogates.

Table 5.6: In response to the open questions What features of the interface were the most and

least useful for this search task? — Some positive comments about the related terms

* The most useful feature was, surprisingly, the related terms function. Here (after a small

degree of trying and failing) I found the right word combination I was looking for.

* I found the only way to get close to the information I was seeking was by using related

terms.

* In this task the related terms was the most useful feature.

* Related terms list was very useful for disambiguation of search results in cases when

there were more people with the same name, related terms captured their different professions (e. g. film maker, painter, banker)

* the search result with the related terms was not as good as I expected

the meaning of a suggested term is not apparent or the searcher’s knowledge is not sufficient to grasp the meaning. Furthermore, even highly correlated terms may be useless or even distract- ing for a searcher. For example, a user searches events in Versailles. One of the suggestion of the related term tool is Treaty of Versailles; though this suggestion is referring to an event in Versailles, a searcher may not recognise it due to lack of knowledge. This problem is identified by one of the searchers in iTrack 2006

The list of related terms is too vast. In situations where I did not know the meaning of a keyword extracted from the task description, the related terms did not help. Some of them might have been synonyms but there was no way for me to know.

Therefore there is the need of some service that can explain on demand the meaning of a proposed term. Context is very useful for determining the meaning of terms. Keyphrases usually have many different meanings, and those meanings depend heavily on the context in which those keywords appear. In the state of the art search engines, Keyword In Context (KWIC) is a well known method of presenting the results. The sentence or sentences in which the keyword appears is presented to a searcher for determining the usefulness of a result. Sentences are by definition a coherent linguistic entity to overcome problems with semantics. They present the query terms in a better way. Furthermore, they are small enough to allow searchers to assess relevance in a short time [White, 2004]. Sentences are preferred over paragraphs (as used in passage retrieval [Salton et al., 1993]) simply because they take less time to assess. This allows searchers to make speedy judgements on the relevance/irrelevance of the information presented to the them.

tion were extracted using the LingPipe6tool. It extracts sentences heuristically by identifying tokens in context that end sentences. The Lucene search engine7is used to index and retrieve the top k (with k between 3 and 10) sentences. When applying this method, the following problems were faced:

1. Some sentences were too short. Some highly scoring sentences were often headings thus too short to be indicative.

2. For example, most Wikipedia pages contain a section with external links, containing this links as a list of bulleted items. The complete list was regarded as one sentence, and thus it often became too long.

3. Some sentences were redundant. The top ranking sentences were often too similar in case they were retrieved from the same document. Thus, keyword query terms were shown in similar contexts and the value of the generated summary was diminished.

In order to resolve the above mentioned problems, the following measures were taken. Only sentences exceeding a minimum length are considered for presentation as context (threshold: 15 tokens including punctuation). This is a frequently used threshold for removing captions, titles and headings [Teufel and Moens, 1997]. The maximum length was set to 50. To avoid the presentation of similar contexts, each context should come from different document. The DAFFODILsystem was enhanced by integrating the contextual related term tool. For this, the suggestions of [Rieh and Xie, 2006] were taken into account. These are

1. Provide a secondary window in addition to the main window of a search engine in which user and system interact.

2. Facilitate users in manipulating multiple queries in an efficient way.

3. Assist users in reformulating queries by providing context-based term suggestions. 4. Provide the ability to select query terms from the term suggestion list and allow users to

modify them.

In addition to these points, the top three contexts of the each proposed term are provided as tooltip as depicted in figure5.1. There is also the possibility to view more than the top three contexts in a separate window. In this case, the top ten contexts are shown (see figure5.2), and the searcher can view the complete element detail for each of these sentence by clicking on it.

6_{http://alias-i.com/lingpipe/}_{(Last date accessed on January 6, 2009)}

Figure 5.1: Contextual related tool showing related terms along with top 3 KWIC as tooltip for the term “heating House“

5.8 Evaluation

The evaluation of the tool was performed within iTrack 2008 where 30 searchers participated in the experiments. The infrastructure of the experiment was similar to iTrack 2006-2007 with the following exceptions: only the element retrieval system was used and each searcher worked on two tasks of her own choice. Tasks are given in appendixD.

Several questions in the questionnaire referred to system features. Here we are listing only those questions which are about the contextual related tool. Searchers were asked to rate the usefulness of different features of the system on the scale of 1 to 5, where 1 stood for ’Not at all’, 3 ’Somewhat’ and 5 for ’Extremely’. These are as follows.

1. How satisfied were you with the information provided in the related term list? 2. How useful was/were

a) the related terms?

b) the related terms context? c) the way of presenting the terms?

Figure 5.2: Contextual related tool showing related terms along with top 10 KWIC in separate window

The results are summarised in table 5.7. Results showed that searchers found the tool somewhat useful. In comparison to the previous year, results are a little better for the related terms tool. Usefulness of related terms is also higher and there are no comments on the usefulness of this tool. However, the results are not as good as we expected. This may be due to two major reasons; Firstly, phrases often occur in the wrong order. The reverse order of phrase is due to the alphabetical sorting of the components, in order to find the phrase in any order. Therefore, one could keep the original order of phrases, even if some occurrences get lost. The second problem ”no highlighting of terms in tooltip“ can be easily addressed.

System Features µ σ2

How satisfied were you with the information provided in the related term list? 2.64 1.29

How useful were the related terms? 2.76 1.61

How useful were the related terms context? 2.64 1.69

How useful was the way of presenting the terms? 2.76 1.56

How useful was the way of presenting the context of terms? 2.76 1.61 Table 5.7: Searchers rating the usefulness of contextual related tool on the scale of 1 (Not at

Table 5.8: Responses to open questions What features of the interface were the most and least

useful for this search task? — Some positive comments about the related terms

* Some related terms have several contexts. Some are relevant and some not.

Perhaps the system should display the most relevant search result.

* It did present useful related terms related to the topic I was researching,

regardless of it actually leading to relevant results.

* I think the useful part of this system is providing related terms and their context.

it provides useful related terms lists. It helps the users to search his/her topic in other possible ways.

* It was nice to have a list showing related searches next to the list of hits.

Table 5.9: In response to the open questions What features of the interface were the most and

least useful for this search task? — Some negative comments about the related

terms

* titles in the side window (related terms) did not relate to the search result they triggered. * Please show only relevant related terms

* The related terms does not provide me good terms. So I almost never look at it.

In document Interactive Information Retrieval with Structured Documents (Page 71-76)