• No results found

4.6 Bias Analysis on CQA Vote-based Test Collections

5.1.3 Constructing Effective Structured Queries

In Sections 5.1.1 and 5.1.2, the question tags are proposed to be used to represent both the users’

expertise (the document to be searched) and the main information need of a given question (the query) based on the assumption that they present a generalized category of the expertise. These proposed representations can be further improved by constructing more structured queries.

In information retrieval community, much of the research have been focused on improving the representation of a given query, by weighting the query terms or expanding queries with additional terms. Similar approaches are also exploited in this dissertation in order to construct more effective queries.

Using question tags as query terms can be effective, but still all terms within question tags may not be equally important when it comes to assessing expertise of users. In fact certain tags may have more power in determining question-specific expertise. Therefore, several tag weighting heuristics are exploited in order to analyze the effects of provided tags on expertise

Figure 5.2: An example question thread from StackOverflow.

Figure 5.3: An example tag suggestion box appeared in StackOverflow.

estimation. We study three types of weighting schemes which are based on different sources of information (1) the asker’s information need, (2) the generality of tags over the collection and (3) the responders’ expertise areas.

5.1.3.1 Tag Weighting based on Asker’s Information Need

The question tags may look similar to keyword web search queries; however their formation is a little different. When users start typing tags, possible tag matches are displayed to users (as seen in Figure 5.3) in order to help them select among existing tags. Users can choose from one of the proposed tags or create their own tags. All tags are considered to be independent of each other.

This gives users the freedom of assigning tags in any order they want without being restricted by any possible syntactic or commonly used ordering of terms.

However, there may exist some other kind of user motivation for the ranking of given tags.

Tags are the last information that is required to post a question. Users are asked about the title and body of the question before the tags. Therefore, when it comes to selecting tags, most of the users have already described their information need in detail in the title and body fields. Unlike search engines where users start typing query terms immediately; tags, which also look like keyword queries, are constructed more carefully after detailed consideration of the information need. Additionally, unlike the title and body fields of question, there may be a limit to the number of tags that can be used for a question. For instance in StackOverflow environments, each question can have up to 5 tags. Due to these reasons, it may be the case that users start choosing tags based on their representation of the information need. In other words, more descriptive tags may have been selected earlier than other relatively less significant tags.

In order to analyze whether the order of assigned tags have any effect on the representation of the information need, this thesis proposes weighting tags based on their relative rank within other assigned tags. The following weighting is used for tags.

weightrank = log(((N + 1) − rank) + 1) (5.1) where N is the number of tags and rank is the relative ranking of the tags. In this approach,

Figure 5.4: An example question with frequent and rare tags used together.

tags are initially ranked in reverse order (reverse rank= (N + 1) − rank), and then these ranks are logarithmically scaled. 1 is added in order to prevent giving 0 weight to the last tag. Example question in Figure 5.1 has tags “python string”. Applying this approach to weight this query3 returns log(3) weight for python and log(2) for string while in the original query, both tags are weighted equally.

5.1.3.2 Tag Weighting based on Term Generality over Collection

The state-of-the-art retrieval models use term specificity in order to weight terms according to their frequency within collection. Giving higher relevancy scores to matches of rare terms rather than frequent terms improves the retrieval performance [41]. Therefore, many retrieval models were adapted to compensate for this term specificity by using frequency-based measures such as the inverse document frequency (id f ).

However, this weighting of terms according to their frequency within collection may not always be a useful feature. For instance, in CQA environments where users can show their expertise only through answering other users’ question, it can be hard to find experts for very specific tags even though they exist but did not show their expertise on that particular question-specific tag. Therefore, using term question-specificity in these environments may not return possible expert candidates who have shown their expertise on rarely occurring terms. However, these users may have shown enough evidence of expertise on more frequently used tags, which probably represent a higher-level information need required to provide an accurate reply to the question.

An example question with both frequent and rare tags used together is given in Figure 5.4.

At the same day this question was asked4, the document frequency (d f ) of the provided tags are as follows:

d f (linux)= 88, 465 d f (shell)= 31, 014 d f (grep)= 6, 041 d f (ls)= 464

(5.2)

The question-specific tag ls is a relatively less used tag compared to more popular linux and shell tags. The grep tag is also not very frequently used. Using term specificity for this query gives

3N= 2, and for this specific question python is ranked the first while string is ranked the second.

4Snapshot was taken on November 11, 2014.

more relevance score to users who answered the previous 464 questions on ls, however, users who answered the 88K questions on linux or 31K questions on shell may have enough expertise to reply this particular question even though they did not answer a question tagged with ls or grep.

In order to decrease the effects of this term specificity weighting in different retrieval models, this dissertation proposes using term generality as a way to weight tags. Logarithmically scaled frequency, log(d f ) where d f stands for the document frequency (number of questions tagged with tag t), is proposed to be used as term weights in order to compensate for possible term specificity weightings within different retrieval models. Applying this weighting scheme to the original query “python string” returns log d fpythonand log d fstringweights respectively. This weighting can be thought as the reverse of the id f , which is log(N/d f ) and can be approximated as −log(d f ) since log(N) is constant. Therefore, for systems that use the standard id f in their ranking, using this term generality weighting disables the effects of id f .

5.1.3.3 Tag Weighting by Candidate Expertise

Pseudo (blind) relevance feedback (PRF), which performs a local analysis on a probabilistically query relevant part of the collection, is a commonly used approach to improve the retrieval performance. In PRF, the original query is searched within the collection and the top k ranked documents are assumed as relevant and a relevance model is computed over these documents.

This approach is mostly used as a first step for query expansion where top t important terms with high relevance value that occur in these top k documents are chosen to expand the original query. PRF is also used as a reweighting approach where the constructed relevance model is used to assign weights to the original query terms.

In terms of expert retrieval, this process can be thought as using the top k retrieved expert candidate profiles5to find out what they are experts on in common, and either reweighting the original query terms or adding more terms (expertise areas) which are highly weighted within relevance model. In this thesis, instead of query expansion, reweighting is applied and weights estimated from PRF relevance model are used as weights of the original query terms. Analyzing the top ranked k expert candidates with respect to their common expertise level of the specific query tags may help us to identify the expected level of expertise for each tag that we should be looking for within the collection. This weighting scheme is similar to weighting based on term generality, however instead of using the whole collection, only probabilistically relevant part of the collection is used in this approach.

In this dissertation, an adaptation of Lavrenko’s relevance models [50] is used for PRF6. In applying PRF, estimating the right value for k is important for the overall performance. Using a high k value may increase the probability of including less relevant or even irrelevant candidate profiles to the relevance feedback model. On the other hand, using very small k may also bias the feedback model towards few candidates. In this thesis, different values for k is experimented with, and the best value of k is chosen with 10-fold cross validation as described in Section 4.4.

After retrieving top k expert candidate profiles, top t terms are selected from the relevance model. If a tag from the question exists within these t terms, then the weight of the term estimated by PRF is directly used as the weight of the particular tag. For tags that are not among the top retrieved t terms, the smallest weight of the retrieved terms, which is the weight of the tthterm is

5User profiles are constructed for each responder from the tags of the questions they answered.

6More details of the applied PRF approach can be found at Don Metzler’s PhD dissertation [63].

used as the tag weight, in order to not give those tags 0 weight. A summary of weight assignment protocol is as follows.

T= [ term1term2 ... termt], weight(termi)>= weight(termi+1) (5.3)

weightPRF(tag)=





weight(termi), if tag = termi & termi∈ T

weight(termt), otherwise (5.4)

where T is a ranked list of top t terms retrieved as a result of PRF. These estimated weights, weightPRF(tag), are directly used as corresponding tag weights.

5.1.3.4 Summary

Overall four types of tag weightings are used. In addition to the uniform tag weighted queries, tags are weighted based on asker’s information need, tag generality over the collection and candidates’ expertise. In addition to these proposed representations of question tags, widely used representations of users’ expertise and questions, such as question and answer bodies, are also analyzed for comparison. Two state-of-the-art expert retrieval algorithms, document and profile-based approaches, are used for retrieval with these proposed and baseline representations.