TREC expert finding task aimed to identify experts for a given topic by using the available data collections. These tasks did not provide an user case scenario on what to do with the experts, therefore their evaluations of identified candidates was limited to manual assessment of their topic relevant documents. This kind of expertise assessment of browsing candidate’s associated content is acceptable, if user is seeking for experts on a particular topic just to catch up with their topic-relevant content (such as their blog posts). For such a task and evaluation strategy, the prior expert finding approaches assumed the data collection to be a static one, and therefore proposed static approaches to identify expertise.
However, if expert finding is performed to communicate with the identified candidates or to follow their future topic-specific content, then static approaches may not work as expected due to not being able to model the existing and never ending change of users and their topics over time. For instance, users’ interests on topics may change over time. At some point in their life, users can be experts on particular topics and may have created lots of topic related content.
However, that phase of their life can be over, and they may have moved to something else. At this point, static approaches can still identify these users as experts due to their long time finished topic related activity. This will cause expertise seekers to follow or even contact with these candidates who may not be up-to-date on the particular topic. Contacting with uninterested or not up-to-date experts (or routing questions to them) may not only cause expertise seeker to lose time or receive unsatisfactory answers, but it may also cause unnecessary disruption to the identified expert candidates.
In social media environments, with the availability of timestamped user interactions, such unwanted situations can be prevented at some degree. For instance, as to be mentioned in Chapter 7, timestamps from user activities have been used by prior work [21, 51, 94] in order to estimate the availability of users for tasks like question routing. Estimating availability of users seems to improve the overall performance of question routing, however it does not help with estimating the topic-specific expertise or interest of users at a certain time. This dissertation focuses on these problems and tries to answer the following research question by proposing a temporal modeling of expertise:
• RQ3: What techniques can be used to identify more up-to-date topic-specific experts who have shown relatively more topic-specific expertise and interest in general and also recently?
3.4 Summary
This dissertation explores different types of evidence within social media, and proposes a more effective and efficient expert identification system. The proposed expert finding system consists of three parts, and each of these parts focus on different types of information. The first part uses the associated content of authors to effectively retrieve an initial good ranking of experts.
The second part estimates authority of users from the authority networks constructed from user interactions. The final part uses timestamps to construct temporal expertise models, which not only models user’s expertise but also integrates user’s recent interest on the particular topic.
Depending on the characteristics of social media, availability of the evidence and type of the expertise related task, some or all of these parts can be used for effective expertise estimation.
Before describing the details of these steps and proposed approaches, the datasets and the experimental methodologies used are explained in the next chapter in order to make the reader familiar with the experimented social media types and expertise related tasks.
Chapter 4
Datasets and Experimental Methodology
In this dissertation, two types of social media, blog and community question answering (CQA), are used to test the proposed approaches. The blog data used is an intra-organizational data collection. This dataset is just like other social media blogs with posts and comments, and it is also similar to previous expert finding TREC collections mainly due its professional domain and use within organizations among coworkers and peers. TREC’s expert finding task, for a given query retrieve a rank list of expert candidates, is applied to this collection. Furthermore, similar assessment and evaluation approaches that have been used in TREC expert finding task are applied to this collection.
CQA sites provide a communication channel between information seekers and providers.
They are one of the social media platforms that highly benefit from identification of expert users for a given question. Therefore, a popular CQA site with millions of users from around the world is used in this thesis as another type of social media. Due to the structure and nature of these sites, two types of expertise related tasks are chosen to test the proposed approaches. The first one is routing questions to users who have the necessary expertise on the topic of question, and the second task is ranking replies based on responders’ question-specific expertise. The evaluations of these tasks are performed by using the activities and feedback of the actual users of the system.
4.1 The Corporate Blog Collection
Research on how an organization can use its internal social media for locating experts necessarily involves data that is difficult to share widely. Our research used blog and related data provided by a large multinational IT firm. This blog collection has been previously used for research [78, 79, 80]. Although the dataset is not public due to the personal and company-internal information it contains, we believe that it is typical of such datasets. The dataset characteristics are summarized below so that the dataset can be compared to other blog datasets.
The collection consists of blog data (posts and comments) and employee metadata covering a 56-month timespan. An example blog post and comments made to this post are presented in Figure 4.1. A blog post consists of a title and body, while a comment only consists of a body.
Average length of the these fields are summarized in Table 4.1. All blog posts and comments are timestamped and have the author information available as seen with the unique ids. This
Figure 4.1: An example blog post and comments from the corporate blog collection.
Field Ave. Length
Post Title 3.94
Post Body 291.70
Comment Body 24.85
Table 4.1: Average length of fields in corporate blog collection.
# Posts 165,414
# Comments 783,356
# Employees >100,000
# Posters 20,354
# Commenters 42,169
# Readers 92,360
Table 4.2: Statistics of the corporate blog collection.
dataset also includes access logs - which employees read which blog entries - for 44 of the 56 months. Statistics related to this dataset are summarized in Table 4.2.
Blog posts and comments are on a wide range of personal, social, and work-related topics, for example, poetry, sports, jokes, photography, self-improvement, technology, corporate functions, and testing. A single blog may contain posts on a wide variety of topics. These blogs may also contain organizational spam such as cut-and-paste from documentation or manuals due to incentives for participation that the company offered to employees when the blogs were first deployed.
Employees must login to corporate information systems; therefore users are not anonymous in this environment. All posts and comments created have the authorship information available.
Only this information is used to associate posts and comments with corresponding candidates.
The access logs contain the employee ID of a blog post visitor, the date and time of the visit, the URL of the blog post visited, and the employee ID of the author of the blog post. Employees also have access to a corporate blog search engine. We were provided with this search engine’s access logs, which contain queries, ids of the employees who performed the search, and timestamps of the search.
4.1.1 The Expert Blogger Finding Task
Due to the similar characteristics of this dataset with previous TREC expert finding collections, this dataset was used for a task very similar to TREC’s task, which is identifying expert candidates for a given topic. Evaluation methodologies that had been used in TREC were also used for evaluating expert finding task on this blog collection.
4.1.1.1 Evaluation Data
An initial manual assessment was performed by company employees1, but the quality (inter-rater agreement) was low. After removing possibly biased assessors and their assessed topics, the inter-rater agreement (average kappa value) was 0.38, which is interpreted as fair [2, 49] or poor
1The details of this prior assessment, and the results calculated with values retrieved from this assessment are presented in Appendix A.
[32] by statisticians. Therefore, a second assessment was performed. Due to data confidentiality agreement, this second assessment was performed by the author of this dissertation. In the first assessment, average number of expert candidates assessed for each topic was around 25. In the second assessment, a deeper pool was used and average number expert candidates assessed was increased to 44 on average.
40 work-related topics were created for testing. Some of these were selected from search queries in the access logs of the corporate blog search engine and the rest were created by company employees. The topics from the access logs were selected to mirror task-specific expert-seeking behavior such as ‘oracle performance tuning’ and ‘websphere process server’. On the other hand, topics created by the employees were considerably more general like ‘mainframe’
and ‘cloud computing’.
A sample-based approach was used to create the pool of candidate experts to be assessed. The top 10 candidates returned by several content-based expert-finding algorithms were combined to create a candidate pool. Deeper pools are desirable, of course, but an explicit goal was to produce pools small enough for an assessor to assess a query in less than an hour.
Expert candidates within the pool were anonymized and displayed in random order. For each candidate, the top 3 most topic-relevant posts or comments were displayed during assessments.
Expertise was measured on a 4-point scale (not an expert, have some expertise, an expert, very expert) depending on candidate’s documents.
4.1.1.2 Evaluation Metrics
In expert retrieval, the top ranked expert candidates are especially important, because the cost of a false positive in expert search is very high. Consulting with a falsely identified expert will not only be time consuming for the expertise seeker, but it will also be an unnecessary disruption for the identified expert candidate. Therefore, early Precision@n (P@n) metric was used to report the performance in TREC’s expert finding task, and also used in this dissertation. P@n is calculated as shown for each query and then averaged over all queries.
P@n= |{expert users}n∩ {retrieved users}n|
|{retrieved users}n| (4.1)
Another metric that has been also used in TREC’s expert retrieval task is the Mean Average Precision (MAP) which has been calculated as follows:
AP=
where AP is the Average Precision score, |Q| is the number of queries, and exp(k) is an indicator function equal to 1 if user k is an expert, 0 otherwise. Both MAP and P@n metrics are calculated over binary assessments values, they do not differentiate the graded relevance values. Therefore, they are calculated for different levels of expertise.
Normalized Discounted Cumulative Gain (NDCG) metric is also used in order to measure the graded relevance of 4-point scale assessment values. NDCG metric does not only consider the
position of the expert candidate but also integrates the candidate’s level of expertise into the
where expLevelkis the graded expertise level of candidate at rank k from a ranked list of candidates from 1 to n, DCG is the Discounted Cumulative Gain, and IDCG is the ideal DCG.