Expertise Inference with Topic Relation Regularization

4. Topical Expertise Inference on Twitter: Integrating Sentiment and Topic

4.2. Sentiment-Weighted and Topic Relation-Regularized Learning for User

4.2.3. Expertise Inference with Topic Relation Regularization

This section details the construction of the complete Sentiment-weighted and Topic Relation-regularized Learning (SeTRL) model. To take the topic relatedness into consideration in the process of expertise inference, it first needs to construct a model that can simultaneously infer a user's expertise on multiple topics. As presented in the previous section, the problem of expertise inference is formulated as a classification problem.

Thus, for each topic t, we can easily learn a separate predictive model based on linear regression, and predict the label of user i on topic t as follows:

𝑦_𝑡𝑖 = 𝑓_𝑡(𝒙𝒊) = 𝒙𝒊𝒘𝒕 (4.1)

where xi is the feature vector of user i; wt is model parameter vector that represents the

linear mapping function ft of topic t. By using linear regression, a base model for jointly

learning the expertise of users on multiple expertise topics can be formulated as learning an optimal solution W = [w1,w2,…,wT] by solving the following minimization problem:

min 𝑊 ∑ ∑ ℓ(𝑦𝑡𝑖, 𝑓𝑡(𝒙𝒊)) 𝑇 𝑡=1 𝑁 𝑖=1 + 𝛽||𝑾||₁ (4.2)

where ||W||1 denotes the l1 norm of matrix W which is used to address the model sparsity

problem and stabilise Eq. (2) [Ti96] and β is the regularization parameter to control the sparsity; ℓ is the least square loss function for regression:

ℓ(𝑦𝑡𝑖, 𝑓𝑡(𝑥𝑖)) =

2(𝑦𝑡𝑖− 𝒙𝒊𝒘𝒕)

2_(4.3)

which is equivalent to a linear discriminant analysis for binary classification [Fr01a] and has been widely applied in various classification algorithms [Ri03].

Thus, a base model has been built that can simultaneously learn the predictive models of all expertise topics. However, this optimization model simply assumes that each inference task selects its own relevant features individually, which might be unrealistic in practice. For example, the inference of expertise topics "data science" and "machine learning" tend to share a common set of relevant features but these features are less likely to be useful for the topic "gardening". As also discussed in Section 4.1, expertise topics of a person could be related to each other, e.g. C language and Java programming language. This relatedness of topics could reflect the fact that they share common discriminant features. This observation propels us to characterise the relatedness of topics by an undirected graph G with E edges and encode this structure information in the base model by using a Tikhonov regularizer [Go99]:

∑ ||𝒘_𝒆(𝟏)− 𝒘_𝒆(𝟐)||₂2 𝐸

𝑒=1

where we(1) is the model parameters corresponding to a topic of edge e,where an edge

connects two related topics. This regularizer can penalize the difference of two inference tasks that have an edge between them, through which we can incorporate the topic relatedness in the model leaning process. Two specific methods of constructing the topic relation graph G are introduced in Section 4.2.5.

Now, the SeTRL model can be constructed based on the above formulation. By combining Eqs. (4.2)-(4.4), we have the following optimization problem:

min 𝑾 ∑ ∑ 1 2(𝑦𝑡𝑖 − 𝒙𝒊𝒘𝒕) 2 𝑇 𝑡=1 𝑁 𝑖=1 + 𝛼 ∑ ||𝒘_𝒆(𝟏)− 𝒘_𝒆(𝟐)||₂2 𝐸 𝑒=1 + 𝛽||𝑾||₁ (4.5)

where α is the regularization parameter to control the contribution of the topic relation information.

4.2.4. Model Learning and Prediction

The next issue to examine is how to learn the SeTRL model, i.e. to estimate a parameter configuration of W that minimizes the objective function Eq. (4.5). Because the l1 norm

term in Eq. (4.5) (i.e. ||W||1) is not differentiable, the gradient descent method is not

suitable for the model learning. In this work, we use the Accelerated Proximal Gradient (APG) method [Ji09] to solve the optimization problem. Different from the traditional gradient method that uses the latest point in every iteration, APG takes a linear combination of the previous two points as the search point to achieve high convergence speed.

Specifically, we first decompose the objective function Eq. (4.5) to a non-smooth part: 𝑔(𝑾) = 𝛽||𝑾||₁ (4.6) and a smooth convex part:

ℎ(𝑾) = ∑ ∑1 2(𝑦𝑡𝑖− 𝒙𝒊𝒘𝒕) 2 𝑇 𝑡=1 𝑁 𝑖=1 + 𝛼 ∑ ||𝒘_𝒆(𝟏)− 𝒘_𝒆(𝟐)||₂2 𝐸 𝑒=1 (4.7).

Then, the generalised gradient updating step can be approximately expressed by: 𝑹_𝒔(𝑾, 𝑾_(𝒎)) = 𝒉(𝑾_(𝒎)) +〈𝑾 − 𝑾_(𝒎), 𝜵𝒉(𝑾_(𝒎))〉 + 𝒔

𝟐||𝑾 − 𝑾(𝒎)||𝑭

𝑟𝑠(𝑾(𝒎)) = 𝑎𝑟𝑔 𝑚𝑖𝑛_𝑾 𝑅𝑠(𝑾, 𝑾(𝒎)) (4.9)

where ||·||F denotes the Frobenius norm; 𝜵𝒉(𝑊_(𝒎))is the gradient of h(W) at the point

𝑊_(𝒎) of the mth_{iteration and}_〈A_,_B〉_{= tr(}_AT_B_{) denotes the inter product of matrixes;}_s

is the iteration step size. After rewriting Eq. (4.9), we can have: rs(𝑾(𝒎)) = 𝑎𝑟𝑔 𝑚𝑖𝑛_𝑾 1 2||𝑾 − 𝑽(𝒎)||2 2₊1 𝑠𝑔(𝑾) (4.10) where V(m) = W(m) - 1

𝑠 𝛁𝒉(𝑊(𝒎))and s can be determined by using line search.

The key aspect of the APG method is how to efficiently update the parameter matrix W

in each iteration. Finally, we adopt the strategy proposed in [Ch09] to further optimize the problem of Eq.(4.10). The gradient update rule of Eq.(4.10) is decomposed to K sub- problems, for which the analytical solutions can be easily obtained. In addition, instead of performing gradient descent based on W(m), the search point is computed as:

𝑼_(𝒎) = 𝑾_(𝒎)+ 𝜎_𝑚(𝑾_(𝒎)− 𝑾_{(𝒎−𝟏)}) (4.11) where σm = (1-λm-1) λm/ λm-1 and λm = 2/(m+3). In Algorithm 4-2, we summarize the

detailed optimization procedure.

Algorithm 4-2 APG Algorithm for Learning SeTRL Initialization: s0 > 0, η >1, W(0) ∈ RK×T, U(0) = W(0) and

λ0 = 1, m = 0

Output: W(m)

1. form = 0, 1, 2,... until convergence of W(m):

2. Set s = sm; 3. while (g(rs(U(m)))+ h(rs(U(m))) > Rs(rs(U(m)),U(m)): 4. Set s = ηs; 5. endwhile 6. Set sm+1 = s; 7. 𝑾(𝒎+𝟏) = arg min 𝑾 𝑹𝑠𝑚+1(𝑾, 𝑼(𝒎)); 8. λm+1 = 2/(m+3);

9. Compute U(m+1) using Eq.(4.11);

In document User Expertise Modelling Using Social Network Data (Page 80-84)