Semantic Link Analysis for Finding Answer Experts *

(1)

Semantic Link Analysis for Finding Answer Experts

* YAO LU1,2,3_{, XIAOJUN QUAN}2_{, JINGSHENG LEI}4_{, XINGLIANG NI}1,2,3_,

WENYIN LIU2,3_{AND YINLONG XU}1,3 1_{School of Computer Science and Technology}

University of Science and Technology of China Hefei, 230026 P.R. China

2_{Department of Computer Science}

City University of Hong Kong HKSAR, P.R. China

3_{Joint Research Lab of Excellence}

CityU-USTC Advanced Research Institute Suzhou, 215123 P.R. China

4_{School of Computer and Information Engineering}

Shanghai University of Electronic Power Shanghai, 200090 P.R. China

Recommending unanswered questions to answer experts is an important mechanism in User-Interactive Question Answering (UIQA) services and is helpful to reduce asker’s waiting time and obtain high-quality answers. In this paper, we address the task of iden-tifying answer experts in UIQA services with semantic information extracted from user interaction behaviors. We first construct the user question-answer interaction graph through direct semantic links and latent links extracted from the records of question ses-sions and user profiles. After that, two expert-finding approaches are developed by em-ploying the semantic information in the so-called propagation link analysis method and in the language model, respectively. Experimental results on Yahoo! Answers dataset show that the extracted semantic information indeed improves the performance of both propagation and language model for the task of answer experts finding.

Keywords:user-interactive question answering, answer expert finding, semantics, link analysis, language model

1. INTRODUCTION

User-Interactive Question Answering (UIQA) [1] is now a popular social network application in the age of Web 2.0. It provides a platform for people in online communi-ties to seek information and share knowledge through the way of questions and answers. People can ask for aids by posting questions and help others by answering their questions. The content of question and answer in the UIQA communities provides a good choice for users to acquire information they need in the form of answers rather than lists of docu-ments from search engine. However, in existing UIQA services, users need to passively wait for other users to access the portal website, read questions and provide answers. It may take several hours or even a few days before askers receiving any answer. On the other hand, different answer repliers have different expertise levels, some users who pro-vide answers may just want to earn incentives and they may not be experts on the ques-Received February 14, 2011; revised August 9, 2011; accepted August 31, 2011.

(2)

tions they involved. Therefore, automatically recommending the unsolved questions to appropriate answer experts can be an important mechanism in the UIQA services to re-duce asker’s waiting time and improve answer quality. As the increasing spread of Q&A community, the task of finding answer experts has become one of the most important issues in UIQA services. There already have been various measurements for identifying experts in forum systems or UIQA services in existing works. Liu et al. [2] define “an-swer expert” as a person who has previously an“an-swered similar questions with the new one in the system. According to this definition, the problem of expert finding can be casted into an information retrieval (IR) problem. However, the question-answering rela-tionships between users, which are helpful to improve the performance of expert finding, are ignored. Therefore, some researchers attempt to utilize the user ask-answer interac-tion into the task of expert finding. Jurczyk and Agichtein [3, 4] employ the link struc-ture-based algorithms, such as PageRank [5] and HITS [6], into the task of ranking an-swer experts. Each user will obtain a value while higher value corresponds to more ex-pertise than the lower one. However, the relationship between users is straightforwardly question-answering links in traditional link analysis which treats each relationship with equal weight. Actually, a high quality or a spam answer may leads to different weight in the user question-answer interaction graph model. Intuitively, user provides a high qual-ity answer will be more expertise than the ones providing spam answers.

In this paper, we propose to find answer experts for each category based on mining the semantic information in UIQA. A user question-answer interaction graph will be firstly constructed based on user interaction behaviours in UIQA. Different sources of semantic information are extracted from the user interaction behaviours and answer con-tents in question sessions. After the construction of the graph, a link analysis approach called propagation [7] is performed to generate the first expert-finding approach. Mean-while, another approach, semantic language model, is proposed by incorporating the ex-tracted semantic information into the traditional language model. Experimental results on Yahoo! Answers collection demonstrate the effectiveness of the two expert-finding ap-proaches and also evidence the usefulness of the extracted semantic information.

The remainder of the paper is organized as follows. In section 2, we briefly review the related work on expert finding. In section 3, the notations, the construction of user profiles, and the formal definition of user question-answer interaction graph will be firstly given. After that, details of the expert finding algorithms are introduced. We pre-sent the evaluations of the two methods and discussions in section 4. Finally, we draw the conclusion and discuss the future work in section 5.

2. RELATED WORKS

The task of expert finding is first introduced in TREC 2005 [8]. It aims to find the most suitable persons with the appropriate skills and knowledge. One solution of previ-ous works for expert finding mainly relies on the language models (LM) and treats the problem as an information retrieval (IR) task. Apart from this, link-based analysis method has also been widely used. In this section, we will briefly review the related work from the following three perspectives: expert finding based on language model (LM), expert finding based on link analysis and the hybrid of the two approaches.

(3)

2.1 Language Model-based Expert Finding

Cao et al. [9] propose a two-stage language model for expert finding at the enter-prise track of TREC 2005. The language model consists of two parts, a co-occurrence model and a relevance model, both of which are based on statistical language modelling. Liu et al. [2] define an “expert” in UIQA services as a person who has previously an-swered similar questions to a given one. Person’s expertise is characterized by user pro-file which is derived from the previously answered questions. The experiment is imple-mented with three different language models. Different ways of building user profiles are implemented. From the experimental results, they draw the conclusion that the user pro-files built based on all previously answered questions produces the best performance. Moreover, the result also reveals that the three language models achieve equivalent per-formance in the expert finding task. Zhang et al. [10] propose a mixture model for expert finding which is built based on Probabilistic Latent Semantic Analysis (PLSA). The pro-posed mixture model considers the latent topics between terms and documents, contains more semantic information. Their experiments indicate that the proposed mixture model with semantics can achieve better performance than the conventional language model.

2.2 Link Analysis-based Expert Finding

Jurczyk and Agichtein [3, 4] adopt HITS for estimating user authority in the UIQA service. In their methods, the behaviours of asking and answering between users can gen-erate a user question-answer relation graph, in which each node represents a user and each edge represents the question-answering relationship between users. For each user, both the hub and authority value are calculated, where a high hub value means a good authority, otherwise a low authority. The experiment on Yahoo! Answers dataset shows that the link analysis is promising for estimating the authority of users in UIQA services. Zhang et al. [7] propose a propagation-based approach for finding expert in a social net-work. In their method, both personal information and relationships among persons are taken into consideration for finding expert in a social network. They firstly use the per-sonal information to calculate an initial expert score for each person. Next, according to the relationships among persons, the propagation approach runs iteratively to get a more accuracy result.

2.3 Hybrid Approaches

Zhou et al. [11] propose a framework for routing a given question to the top-k po-tential experts in a forum effectively. In their proposed method, they first calculate the expertise of users according to the content information. After that, they re-rank the user by utilizing the structural relations among users in the forum system. Finally, they inte-grate the results from the above two steps in a probabilistic model and get a final ranking score for each user. Experiment on real forum data reveals that the hybrid approach is effective on identifying and ranking answer experts. Kao et al. [12] also propose a hybrid approach to find answer experts in specific category in question answering websites. Their method takes user subject relevance, user reputation and authority into considera-tion. The subject relevance indicates the relevance between user’s domain knowledge

(4)

with the target question. User reputation is calculated by the ratio of best answer user provides and user authority is derived by applying link analysis in the asker-answerer network. Their experimental results demonstrate that the novel hybrid approach performs better than other conventional methods.

3. ANSWER EXPERTS FINDING

In this section, we first describe the way of constructing user profiles from historical questions asked and answered by user in the Q&A community. Afterwards, we give a formal definition of user question-answer interaction graph, and then construct the graph based on the question sessions and user profiles. In the graph construction procedure, two kinds of semantic information are considered. One is the direct relation link which in-volves semantic information extracted from question sessions. The other one is the latent link discovered from user profiles. After constructing the graph, a link analysis method called propagation [7] will be employed to rank the experts in a descending order. Users who earn a higher value will be regarded as with higher expertise level, and top ones will be chosen as answer experts. Furthermore, we also give details of the semantic language model based on the extracted semantic information. We first list the notations used in this paper in Table 1.

3.1 User Profiles Construction

User profiles, which reflect user’s expertise background, are acquired from the question session user involved. How to construct user profile is a crucial procedure in expert finding. In UIQA service, a user can post a question with a question subject and additional question details (optional) to start a question session. Other users browse the questions from categories they are interested in and answer them. The question session will be closed when the question is resolved or the lifetime of question session is termi-nated. From the description, we can see that a whole question session contains informa-tion of (1) who post this quesinforma-tion; (2) the quesinforma-tion subject, quesinforma-tion details and category it belongs to; (3) answerers and answers they provided; (4) the best answer chosen for this question by the asker.

We build the profiles of user ui for each category ck with question texts only (in-clude the question subject and question detail information) from user previously asked and answered questions, denoted as UPik. Each user profile UPik contains two aspects: (1) all the questions posted by user ui in the category ck, denoted as uqik;

JJJG

(2) all the questions answered by user ui in the category ck, denoted as ua_ik.

JJJG

The two parts of user profiles uqik

JJJG

and uaik,

JJJG

which have been pre-processed with stop words removal and word stemming, are represented using Vector Space Model (VSM) as (w1, w2, …, wi, …, wm),

where wi is the weight for term ti ∈T in user asked questions profile or user answered questions profile; T is the vocabulary of all the terms existing in all the question and an-swer text in the system. The weight wi is measured by the TFIDF weighting scheme.

(5)

Table 1. Notations and their descriptions used in our approach.

k i

UP User profiles of user _c_k. ui in category Mqid Unigram language model for ques-_tion_qid_. k

i

uqJJJG All the questions posted by user _u_{i in the category}_c_k. Mai Unigram language model for _answer_a_i. ith k

i

ua

JJJG _{All the questions answered by}

user ui in the category ck. RBA Asker rating for the best answer. G User question-answer interaction _digraph. Vai Voting score evaluation for _swer. ith an-U All users in UIQA system. Npos Number of positive votes received. R All relationships between users in _{UIQA system.} Nneg Number of negative votes received. rij Relationship from user ui to uj. Nall Number of total votes received. srij Semantic relation link between _asker_u_{j and answerer}_u_i. λ Parameter to determine the weight _{of best answer quality (set as 0.6).} qid Unique identifier for each ques-_tion_q_. θ Threshold for distinguish latent _links. qdqid The difficult level of question qid. UcQ All the questions user answered in _category_c_.

ari Relevance score of ith answer. wi Weight of term ti.

aqi Quality score of ith answer. δ Threshold for distinguish answer _experts. Aqid Answer set to the question qid. S(ui)p Expert score of user _eration. ui in the p it-Tqid Date time of question qid posted. ε Threshold for the terminal condition.

Tai Date time of _plied. ith answer being re- sw_uq_i Semantic weight of the answer pro-_{vided by user}_u_{i to the question}_q_. Tavg

Average time of the question be-ing answered (measured by sec-onds).

3.2 User Question-Answer Interaction Graph

In UIQA services, user behaviours are reflected in different aspects. Generally speak- ing, a user plays three roles in the UIQA system: asker, answer and evaluator. Users with different roles in the same question session will have “interaction” among each other. These interaction relations are utilized to construct the user question-answer interaction graph. The graph is defined as a directed graph G = (U, R), where u∈U represents a user and rij ∈R represents the relationship from users ui to uj. In traditional link analysis methods, the interaction relationship is only a simple link without taking into account any semantic information. Taking all different context and interaction information into consideration, we extract several kinds of semantic information for the link analysis al-gorithm. In the following two sub-sections, we will elaborate the construction of user question-answer interaction graph from two aspects: direct relation links and latent rela-tion links.

(6)

3.3 Direct Relation Links

The direct user relationship links can be obtained directly from the UIQA system based on the information of each question session. For example, if user ui is the direct answerer of a question asked by uj, there will be an edge from ui to uj. However, this kind of link is purely question-answer relationship without semantic information. Therefore, we propose a new semantic link to replace the pure question-answer link.

In a question session qid, the semantic relation link srij between asker uj and an-swerer ui is defined as a four-tuple structure srij = (qid, qdqid, ari, aqi), where qid is the unique identifier for each question, qdqid indicates the difficult level of question qid, ari and aqi represent the relevance and quality of the ith answer replied by user ui respec-tively. These kinds of semantic information cannot be obtained directly; instead, they can be extracted from context and other interaction information from the question session. (1) Question Difficulty Levels

The difficulty for each question varies greatly. The easy one can be answered by many users and the hard one may get few answers. Specifically, we consider the answer time interval on measuring the question difficulty. The average consuming time of all the answers to the question will be calculated. The longer time cost, the more difficult the question is. The difficulty level qdqid of question qid can be calculated according to the following formula: 1 (1 ), 1 | | avg T qid qid qd e A − = + − + (1) ( ) , | | i i aid a qid a A avg qid T T T A τ ∈ − = ⋅

∑

(2)

where Aqidis the answer set corresponding to question qid; Tqidis the date time of ques-tion qid being posted and Tai is the date time of ith answer ai being submitted; Tavg is the average time in terms of second of the question being answered; τ is the tuneable pa-rameter which is set as 1/3600 to avoid dropping too fast.

(2) Answer Relevance

Answer relevance reflects the degree of correlation between a question and its an-swers. Among the various answers, some ones might have low relevance or even being spam answers. Therefore, answer relevance is an important indicator of answer quality and answer provider’s expertise level. Here we use the KL-divergence language model [13] to calculate the relevance score between question qid and its ith answer ai:

( || ) ( , ) KL Mai Mqid , i i ar =Relevance a qid =e− (3) ( | ) ( || ) ( | ) log , ( | ) i i i a a qid a q w p w M KL M M p w M p w M =

∑

⋅ (4)

(7)

respec-tively; KL(Mai || Mqid) represents the KL-divergence between Mqid and Mai; w is the words existing both in the question and answers. The higher KL-divergence score obtains, the lower relevance score calculated; and vice versa.

(3) Answer Quality

Answer quality is another indicator of answerers’ expertise level. Generally, the best answer chosen by asker should have a higher quality than others. On the other hand, other users usually take the role as evaluator for voting answers. Hence, the voting in-formation for answers also reflects answer quality. We adopt Eq. (5) to calculate the an-swer quality aqi for the ith answer ai of question qid.

2 If is best answer 5 , 1 _Otherwise | | 1 i i BA a i i a qid R V a aq V A λ λ + ⎧ _{× +} ⎪⎪ = ⎨ −_⎪ + − ⎪⎩ (5) 1 , | | i pos neg a qid all N N V A N − = + (6)

where RBA is the asker rating for the best answer; λ is a parameter to determine the weight of the best answer, which is set as 0.6 in this paper to make sure that the best an-swer gets the maximum value among the anan-swers of the question; Vai is the voting score

for ai which is derived by Eq. (6), in which Npos, Nneg and Nallrepresent the number of positive, negative and total votes that answer ai received. If there is no voting for the answer, the second part of addition in the Eq. (6) will be 0.

After extracting all the useful semantic information from all question sessions in one category, we can build the user interaction semantic links in the user question-answer interaction graph.

3.4 Discovering Latent Relation Links

Up to present, the discussed relationship link between users is only a direct ques-tion-answer relationship. However, since all the users answer questions randomly in the Q&A community, considering only the direct question-answer relationship may not re-flect the real relationship between users. For example, if another user C who has an-swered similar questions with user B’s or C can answer B’s question but this relationship cannot be explicitly discovered, there will be no link between user B and C. Since latent links cannot be obtained directly from the question sessions as direct link analysis does, we propose to discover those latent links from user profiles such as association relation between users [14].

Latent link analysis [15] is used to find the latent relationship links among users who have no direct links. The basic idea for finding the latent link from two users, ui and uj (i.e., whether the user ui is the latent answerer of user uj), is to measure the similarity between user ui’s answered question profiles ua_ik

JJJG

and uj’s asked question profiles uqk_j JJJG

in category ck. The similarity is denoted as Latent_Relation(ui, uj⎪ck) and can be calculated

(8)

using the cosine measure as follows, ( , | ) ( , ) | | | | i i k k j k k i j k i j _k _k j ua uq Latent_Relation u u c Sim ua uq ua uq ⋅ = = JJJG JJJG JJJG JJJG JJJG JJJG , (7) where uaJJJG_ik and uqk_j JJJG

are vectors of user profiles as introduced in section 3.1. If the latent relation link score between ui and uj is higher than threshold θ, we will consider there is a latent question-answer relationship link from ui to uj. Then, we will add an edge from ui to uj in the user question-answer interaction graph.

Since there are a great number of users in each category, it is time consuming to discover all the latent relationship between each other. In this paper, we just recognize the latent answer links from candidate experts to all users. Those candidate experts are esti-mated by a co-occurrence language model. Each candidate’s expertise score is estiesti-mated by the probability of a user u being an expert for a given category c, i.e., P(u | c). It could be calculated by the sum of probability of user u being an expert to all the questions UcQ he answered in category c as the following Eq. (8) describes.

( | ) ( | ) Q c q u P u c P u q ∈ =

∑

(8)

Based on the co-occurrence model proposed in [9], P(u | q) can be calculated as Eq. (9). ( | ) ( , | ) ( | ) ( | , ),

t T t T

P u q P u t q P t q P u t q

∈ ∈

=

∑

=

∑

⋅ (9)

where T represents the vocabulary of all the terms. P(t | q),which indicates the relevance between t and q, can be calculated by Eq. (10). After that, Eq. (9) can be simplified as Eq. (11).

0 if ( | ) , 1 if t q P t q t q ∉ ⎧ = ⎨ _∈ ⎩ (10) ( | ) ( | , ) ( | ). t q t q P u q P u t q P u t ∈ ∈ =

∑

=

∑

(11)

From the above, we estimate the user expert score as the following Eq. (12).

( | ) ( | ) ( | ) Q Q c c t q q U q U P u c P u q P u t ∈ ∈ ∈ =

∑

=

∑ ∑

(12)

Based on the Bayes Rule, we can calculate the conditional probability of user u un-der term t for each question q in category c as Eq. (13) shows, where P(u, t) represents the co-occurrence probability of the user u and term t; P(t) indicates the probability of term t occurring in all user profiles of category c.

( , ) ( | ) ( ) P u t P u t P t = (13)

Afterwards, on the basis of initial expert scores obtained, we rank all the users in a de-scending order and choose top δ ones as answer expert candidates according to the

(9)

con-clusion of Bouguessa et al. [16]. The parameter setting will be discussed specifically in section 4.2.

After identifying the latent links, we can obtain a new user question-answer interac-tion graph with both direct semantic links and latent links. We present an example of the graph in Fig. 1.

Fig. 1. Example of user question-answer interaction graph with both direct semantic links and la-tent links.

3.5 Semantic Propagation for Finding Answer Experts

After generating the whole user question-answer interaction graph with direct se-mantic links and latent links, we employ a propagation-based algorithm [7] to rank an-swer experts for each category. The basic idea underlying the propagation method is that a user will have higher expertise level if he answers lots of experts’ questions.

From the generated user question-answer interaction graph for each category ck, we use different weights to represent the importance of different kinds of relationships. The weight for each link indicates how well the expert score of a user propagates to its neighbours and back. At the beginning, the expert score for each user in the graph is set as 1. The propagation process runs in iterations. In each iteration, the expert score of each user will be calculated based on the expert score of him and his neighbours in last iteration. After that, each expert score will be normalized by dividing the maximal expert score of current iteration. The expert score S(ui)p+1 of user ui in the p+1 iteration phase is computed from S(ui)pin p iteration phase as follows,

1 ( ) ( ) (( , ), ) ( ) j ij p p p i i i j j u U e r S u + S u w u u e S u ∈ ∈ = +

∑ ∑

⋅ (14)

where w((ui, uj), e) represents the propagation coefficient and e∈rij is one kind of rela-tionship from user ui to uj; U stands for all neighbouring nodes being answered by ui in the graph; rij stands for two relationships from the user ui to uj, i.e., direct semantic ques-tion-answer relation link and latent quesques-tion-answer relation link. Therefore, the propa-gation coefficient is the weight of edge in the user question-answer interaction graph which is calculated as Eq. (15) shows.

(10)

2 2

( )/2 If is direct semantic link (( , ), ) ( , | ) If is latent link qid i i i j i j k qd ar aq e w u u e Latent_Relation u u c e ⎧ _× ₊ ⎪ = ⎨ ⎪⎩ (15)

The propagation will stop when the maximal change of the expert score is below a threshold ε (which we set here as 0.001). Base on the propagation theory introduced in [17], each expert score will converge to a constant value. After the propagation, new ex-pert scores for users in each category will be obtained. Sorting the score in a descending order, we can obtain the new experts ranking in each category.

3.6 Semantic Language Model

The language model (LM) based approach of finding answer experts measures user’s expertise level mainly based on the term occurrence between user profiles and their answered questions. To the best of our knowledge, semantic information extracted from question session is never used in the LM-based approaches before. Therefore, we propose the semantic language model (SLM) which incorporates the proposed semantic information into the traditional language model. Specifically, SLM estimates the prob-ability of P(u | q) in Eq. (8) through the semantic weight mentioned in Eq. (15). In SLM, P(ui | q) is calculated according to Eqs. (16) and (17).

( | ) ( | ), i Q c q i u i q U P u c sw P u q ∈ =

∑

⋅ (16) 2 2 ( )/2, i q qid i i u sw =qd × ar +aq (17) where q ui

sw is the semantic weight of the answer provided by user ui to question q.

4. EXPERIMENTS AND EVALUATION

To evaluate the performance of the proposed methods, five different evaluation cri-teria are introduced in section 4.1. After that, we choose the best parameter setting in section 4.2. Finally, we compare our proposed methods with the baseline methods and discuss the results in section 4.3.

4.1 Evaluation Criteria

In our experiment, we evaluate the effectiveness of the proposed answer experts finding method on Yahoo! Answers dataset provided by Liu et al. [18]. Experiment con-ducted on the whole dataset aims to find experts for each category. For evaluation, we obtain an expert rank list for each category as the ground truth based on the scoring rules in Yahoo! Answers portal. In this expert rank list, the higher score a user obtains, the more expertise he has. We choose the top δ ones as the experts in this category. The evaluation is conducted based on the following five evaluation metrics used for the ex-pert finding task in the TREC Enterprise Track: Mean Average Precision (MAP), Mean Reciprocal Rank (MRR), Precision@N (P@N), R-Precision and bpref [19].

(11)

4.2 Parameters Setting

In this section, we discuss how to set the parameters of in the proposed method. The cutoff value for identifying answer expert candidates is an important parameter in the first step of discovering latent links. Taking the category “Books&Authors” as an exam-ple, the statistic information of best answer numbers users received is shown in Fig. 2. As observed in the figure, most of users receive only 1 to 5 best answers. According to the conclusion in [16], authoritative answers occupy only about 0.6%-0.7% of total users in each category. Therefore, in the determination of expert candidates and expert identi-fication in the final expert ranking, we choose the parameterδ as 1%.

Another important parameter needs to be discussed is the threshold for distin-guishing latent links θ. In the process of discovering latent relationship links, lower weight links will have a weak impact in the link analysis. Hence, choosing an appropriate threshold is critical in the latent link discovering procedure. For this evaluation, we test the value for θ from 0 to 0.5. As shown in Fig. 3, it is easy to find that when θ is set to 0.01, the task of expert finding obtains the best performance. In addition, as the value of θ increases, the discovered latent links will be less, and the performance goes down, which indicating that latent relationship links is effective in the expert finding task. The chosen value for θ also demonstrates that not all the latent links are helpful for the expert finding task. Some of them may introduce noise. Therefore, an appropriate threshold for distinguishing latent links is important for answer experts finding.

Fig. 2. Histogram of the statistic of user best answer count in category of “Books & Authors”.

Fig. 3.Performance evaluation of different values of parameter θ.

4.3 Evaluation of Semantic Propagation and Semantic Language Model

We compare our proposed approach which considering different kinds of interaction information between users with the traditional link analysis method (baseline) in the task of expert finding. In traditional link analysis method (TL), we just consider the direct ask- answer relation between users, in which the weight for each link is equal. Different from traditional direct link, the direct semantic link method (DSL) considers the semantic in-formation extracted from user interaction into the weight measurement for each link. Latent link analysis (LL), as a special semantic link, reflects potential links between us-ers extracted from their answer contents.

(12)

considering the semantic information extracted from the user interaction is effective to improve the precision of answer experts finding. Compared with the traditional link analysis method, the direct semantic link method improves the precision of answer expert finding task. After incorporating the latent link into the direct semantic method, the pre-cision of answer expert identification is improved. From the results and analysis, we find that the expertise level of a user depends on the question difficulty he/she has answered and the relevance and quality of answer he/she has provided. Therefore, we can find that the most desirable expert is the person who answers a lot of difficult questions and pro-vide many high-quality answers. In addition, taking the latent link relation into account can further improve the accuracy.

To evaluate the effectiveness of the extracted the semantic language model (SLM), we choose the LM-based approach in the step of generating answer expert candidates in section 3.5 as the baseline method and compare it with SLM. The experimental result is shown in Fig. 5. From the figure, we can see that the LM-based approach for expert find-ing is also improved after incorporatfind-ing the semantic information. Since the semantic information is obtained from question sessions and user profiles in UIQA, it can be re-garded as important background knowledge. Therefore, it is reasonable to witness the improvement of traditional methods for answer experts finding task if such background knowledge is included.

Fig. 4. Performance evaluation of comparing different ranking methods.

Fig. 5. Comparison of semantic language model and the traditional language model approaches for answer expert finding task.

5. CONCLUSION AND FUTURE WORK

In this paper, we introduce two approaches, semantic propagation and semantic lan-guage model, for the answer experts finding task which combine different kinds of se-mantic information extracted from user interaction in UIQA system. Experimental results on Yahoo! Answers collection demonstrate the effectiveness of the two expert-finding approaches and also evidence the usefulness of the extracted semantic information. Since the semantic information mentioned in this paper is straightforwardly calculated from the question sessions, different aspects of the semantic information can be further studied in the future.

(13)

REFERENCES

1. W. Y. Liu, T. Y. Hao, W. Chen, and M. Feng, “A web-based platform for user-interactive question-answering,” World Wide Web: Internet and Web Information Sys- tems, Vol. 12, 2009, pp. 107-124.

2. X. Y. Liu, W. B. Croft, and M. Koll, “Finding experts in community-based question answering services,” in Proceedings of ACM 14th Conference on Information and Knowledge Management, 2005, pp. 315-316.

3. P. Jurczyk and E. Agichtein, “Hits on question answer portals: Exploration of link analysis for author ranking,” in Proceedings of the 30th Annual International ACM SIGIR Conference, 2007, pp. 845-846.

4. P. Jurczyk and E. Agichtein, “Discovering authorities in question answer communi-ties by using link analysis,” in Proceedings of ACM 17th Conference on Information and Knowledge Management, 2007, pp. 919-922.

5. L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank citation ranking: bringing order to the web,” Stanford Digital Library,working paper SIDL-WP-1999- 0120, 1999.

6. J. M. Kleinberg, “Authoritative sources in a hyperlinked Environment,” in Proceed-ings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 668- 677.

7. J. Zhang, J. Tang, and J. Z. Li, “Expert finding in a social network,” in Proceedings of the 12th International Conference on Database Systems for Advanced Application, 2007, pp. 1066-1069.

8. N. Craswell, A. P. de Vries, and I. Soboroff, “Overview of the TREC-2005 enter-prise track,” in Proceedings of the 14th Text REtrieval Conference, NIST Special Publication: SP 500-266, 2005.

9. Y. B. Cao, J. J. Liu, S. H. Bao, and H. Li, “Research on expert search at enterprise track of TREC 2005,” in Proceedings of the 14th Text REtrieval Conference, NIST Special Publication: SP 500-266, 2005.

10. J. Zhang, J. Tang, L. Liu, and J. Z. Li, “A mixture model for expert finding,” in Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2008, pp. 466-478.

11. Y. H. Zhou, G. Cong, B. Cui, C. S. Jensen, and J. J. Yao, “Routing questions to the right users in online communities,” in Proceedings of the 25th International Con-ference on Data Engineering, 2009, pp. 700-711.

12. W. C. Kao, D. R. Liu, and S. W. Wang, “Expert finding in question-answering web-sites: A novel hybrid approach,” in Proceedings of the 25th ACM Symposium on Ap-plied Computing Conference, 2010, pp. 867-871.

13. B. X. Wang, X. L. Wang, C. J. Sun, B. Q. Liu, and L. Sun, “Modelling semantic relevance for question-answer pairs in web social communities,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 2010, pp. 1230-1238.

14. X. F. Luo, Z. Xu, J. Yu, and X. Chen, “Building association link network for seman-tic link on web resources,” IEEE Transactions on Automation Science and Engi-neering, Vol. 8, 2011, pp. 482-494.

(14)

puting.

finding in user-interactive question answering services,” in Proceedings of the 5th International Conference on Semantic, Knowledge and Grid, 2009, pp. 54-59. 16. M. Bouguessa, B. Dumoulin, and S. R. Wang, “Identifying authoritative actors in

question-answering forums – The case of Yahoo! Answers,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 866-874.

17. P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,” International Journal of Computer Vision, Vol. 70, 2006, pp. 41-54. 18. Y. Liu, J. Bian, and E. Agichtein, “Prediction information seeker satisfaction in com-

munity question answering,” in Proceedings of the 31st Annual International ACM SIGIR Conference, 2008, pp. 483-490.

19. C. Buckley and E. M. Voorhees, “Retrieval evaluation with incomplete information,” in Proceedings of the 27th Annual International ACM SIGIR Conference, 2004, pp. 25-32.

Yao Lu (路遙) is currently a Ph.D. student in the School of

Computer Science and Technology, University of Science and Tech- nology of China. He also joins in the collaborated Ph.D. education scheme of the City University of Hong Kong in 2008. He received his B.S. degree from the Anhui Agriculture University in 2007. His research interests include information retrieval, machine learning, data mining and question answering.

Xiaojun Quan (權小軍) is currently a Ph.D. student in

de-partment of Computer Science, City University of Hong Kong. He received the B.E. degree in Computer Science from the Chang’an University in 2005 and the M.E. degree in Computer Science from University of Science and Technology of China in 2008. His re-search interests include data mining, information retrieval, question answering and anti-phishing.

Jingsheng Lei (雷景生) is currently a professor with the

School of Computer and Information Engineering, Shanghai Uni-versity of Electronic Power. His research interests include web information retrieval, machine learning, data mining, and cloud computing. He received his B.S. in Mathematics from Shanxi Normal University in 1987, and M.S. and Ph.D. in Computer Science from Xinjiang University in 2000 and 2003 respectively. Currently, he is leading a group of research students doing research on Cloud com-

(15)

Xingliang Ni (倪興良) is a Ph.D. student in the School of Computer Science and Technology at University of Science and Technology of China. He also joins in the collaborated Ph.D. education scheme of the City University of Hong Kong. He received his B.S. degree from the Hefei University of Technology in 2006. His research interests include information retrieval, machine learning and natural language processing.

Wenyin Liu (劉文印) is an assistant professor in the

com-puter science department at the City University of Hong Kong. Be-fore that, he was a full time researcher at Microsoft Research China/ Asia. His research interests include question answering, anti-phishing, graphics recognition, and performance evaluation. He has a B.Eng. and M.Eng. in computer science from Tsinghua University, Beijing and a DSc from the Technion, Israel Institute of Technology, Haifa. In 2003, he was awarded the International Conference on Document Analysis and Recognition Outstanding Young Researcher Award by the International Association for Pattern Recognition (IAPR). He had been TC10 chair of IAPR for 2006-2010 and has been a guest professor of University of Science and Technology of China (USTC) since 2005. He is a Fellow of IAPR and a senior member of IEEE.

Yinlong Xu (許胤龍) received his B.S. in Mathematics from

Peking University in 1983, and M.S. and Ph.D in Computer Sci-ence from University of SciSci-ence and Technology of China (USTC) in 1989 and 2004 respectively. He is currently a professor with the School of Computer Science and Technology at USTC. Prior to that, he served the Department of Computer Science and Technol-ogy at USTC as an assistant professor, a lecturer, and an associate professor. Currently, he is leading a group of research students in doing some networking and high performance computing research. His research interests include network coding, wireless network, combinatorial optimization, design and analysis of parallel algorithm, parallel program-ming tools, etc. He received the Excellent Ph.D. Advisor Award of Chinese Academy of Sciences in 2006.