• No results found

Evaluation Metric and Comparative Methods

6. A Case Study: Finding Right Answerers in CQA Sites

6.4. Evaluation Method

6.4.2. Evaluation Metric and Comparative Methods

The last step is how to evaluate the answerer finding approaches using the constructed testing data. Below we describe the detailed evaluation method:

(1) Take a testing sample of a given expertise topic;

(2) Randomly select N users who do not have expertise in the topic from all the 24,285 users in the dataset (based on their specified expertise topics), and merge the testing user with the N users to form an answerer candidates pool;

(3) Build the testing user’s expertise model according to the answerer finding approach to be evaluated (note that the testing question is taken from the testing user’s answering history), and build other users’ expertise models using the UAH based approach;

(4) Rank the candidates based on the calculated similarity between the testing question and user expertise model.

Thus, the higher the ground truth user for a testing sample is ranked, the better the applied answerer finding approach performs. In the construction of a candidate pool for a testing sample (step 2), only users who do not have expertise in the current topic were used. It tries to reduce the effect of noise from similar users on the experimental results. If many users with expertise in the same topic are included in the candidate pool, those users could be highly ranked too in the result list. It should be noted that these similar users did not answer the testing question (i.e. they were not selected as ground truth answerers) but it is very likely that they are capable of answering the question (qualified answerers). Meanwhile, the selected ground truth user is also expected to be ranked at the top of the list. However, there is no effective way to distinguish the importance of those users and the ground truth user. In this case, it is possible that the ground truth user keeps being ranked behind the similar users. This would impact our observation on the effectiveness of different answerer finding approaches. Therefore, testing samples are selected by expertise topics and other users in the candidate pool are selected against the given topic. In step 3, among the users in a candidate pool, the ground truth user’s expertise model is built using the applied AF approach (either the UAH based approach or the UET based approach), while that of all other users are built using the UAH based approach. The UAH based approach is first applied to all the users to observe its performance, i.e. the ranking of the ground truth user. Then, it is assumed that the ground truth user is a cold start user,

109

i.e. no answering history, but the user’s expertise topics are provided. In this case, the UET based approach is applied to build the ground truth user’s expertise model. It is expected that this approach can rank the ground truth user at a position as high as the UAH based approach does.

Specifically, the two types of AF approaches are implemented in the experiments as follows:

(1) User answering history based approach: Namely, the TF-IDF algorithm based approach as described in Section 6.2.1;

(2) User expertise topic based approach: To better understand the performance of the UET based approach, we experiment with its three variations. They are implemented following the steps described in Section 6.2.2 but with different settings:

Random expertise topics: Randomly select K expertise topics from all the 149 topics and then for each topic randomly select N/K questions categorised in that topic from the question pool to simulate the user’s answering history, where K is an integer randomly generated between 1 and the maximum number of expertise topics (obtained from the Quora profiles of all the users in the dataset);

Specified expertise topics: Of the expertise topics specified in the user’s Quora profile, for each topic randomly select N/K’ questions categorised in the topic from the question pool to simulate the user’s answering history, where K’ is the total number of expertise topics specified in the user’s profile (only the 149 topics are considered).

Inferred expertise topics: Similar to the previous variation, the only difference is that the user’s expertise topics are inferred from the user’s Twitter content in this approach (the inference approach proposed in Chapter 5 is applied for inference).

Finally, the ranking position of the ground truth user in the result list is used as the evaluation metric in the experiments. For example, if the user is ranked at the first place in the result list, the metric value is assigned as 1. The smaller the obtained metric value is, the better the answerer finding approach performs. To further alleviate the effect of possible bias in the selection of candidate users, for each testing sample, step 2 to step 4 are repeated M times, i.e. randomly generate M different candidate pools, and the metric

110

value for the testing sample is assigned as the averaged value over M candidate pools for that sample. Then, the metric values of all the samples for a topic are averaged as the final metric value for that topic. By doing so, different answerer finding approaches can be evaluated by expertise topics.

6.5.

Experiments and Results

In the experiments, we set N = 100, i.e. 100 candidate users for each testing question, and

M = 5, i.e. we repeat the test 5 times for each testing question. Figures 6-4 – 6-13 show the experimental results of each of the 10 testing expertise topics. It should be noted the variance among M runs of experiments for more than 95% of the testing samples is not significant (smaller than 5%).

Figure 6-4: Ranking accuracy of different answerer finding approaches for the topic “sales”

111

Figure 6-5: Ranking accuracy of different answerer finding approaches for the topic “machine learning”

Figure 6-6: Ranking accuracy of different answerer finding approaches for the topic “video games”

112

Figure 6-7: Ranking accuracy of different answerer finding approaches for the topic “music”

Figure 6-8: Ranking accuracy of different answerer finding approaches for the topic “recruiting”

113

Figure 6-9: Ranking accuracy of different answerer finding approaches for the topic “photography”

Figure 6-10: Ranking accuracy of different answerer finding approaches for the topic “philosophy”

114

Figure 6-11: Ranking accuracy of different answerer finding approaches for the topic “islam”

Figure 6-12: Ranking accuracy of different answerer finding approaches for the topic “startups”

115

Figure 6-13: Ranking accuracy of different answerer finding approaches for the topic “dating and relationships”