4. Topical Expertise Inference on Twitter: Integrating Sentiment and Topic
4.2. Sentiment-Weighted and Topic Relation-Regularized Learning for User
4.2.3. Expertise Inference with Topic Relation Regularization
This section details the construction of the complete Sentiment-weighted and Topic Relation-regularized Learning (SeTRL) model. To take the topic relatedness into consideration in the process of expertise inference, it first needs to construct a model that can simultaneously infer a user's expertise on multiple topics. As presented in the previous section, the problem of expertise inference is formulated as a classification problem.
67
Thus, for each topic t, we can easily learn a separate predictive model based on linear regression, and predict the label of user i on topic t as follows:
๐ฆ๐ก๐ = ๐๐ก(๐๐) = ๐๐๐๐ (4.1)
where xi is the feature vector of user i; wt is model parameter vector that represents the
linear mapping function ft of topic t. By using linear regression, a base model for jointly
learning the expertise of users on multiple expertise topics can be formulated as learning an optimal solution W = [w1,w2,โฆ,wT] by solving the following minimization problem:
min ๐ โ โ โ(๐ฆ๐ก๐, ๐๐ก(๐๐)) ๐ ๐ก=1 ๐ ๐=1 + ๐ฝ||๐พ||1 (4.2)
where ||W||1 denotes the l1 norm of matrix W which is used to address the model sparsity
problem and stabilise Eq. (2) [Ti96] and ฮฒ is the regularization parameter to control the sparsity; โ is the least square loss function for regression:
โ(๐ฆ๐ก๐, ๐๐ก(๐ฅ๐)) =
1
2(๐ฆ๐ก๐โ ๐๐๐๐)
2 (4.3)
which is equivalent to a linear discriminant analysis for binary classification [Fr01a] and has been widely applied in various classification algorithms [Ri03].
Thus, a base model has been built that can simultaneously learn the predictive models of all expertise topics. However, this optimization model simply assumes that each inference task selects its own relevant features individually, which might be unrealistic in practice. For example, the inference of expertise topics "data science" and "machine learning" tend to share a common set of relevant features but these features are less likely to be useful for the topic "gardening". As also discussed in Section 4.1, expertise topics of a person could be related to each other, e.g. C language and Java programming language. This relatedness of topics could reflect the fact that they share common discriminant features. This observation propels us to characterise the relatedness of topics by an undirected graph G with E edges and encode this structure information in the base model by using a Tikhonov regularizer [Go99]:
โ ||๐๐(๐)โ ๐๐(๐)||22 ๐ธ
๐=1
68
where we(1) is the model parameters corresponding to a topic of edge e,where an edge
connects two related topics. This regularizer can penalize the difference of two inference tasks that have an edge between them, through which we can incorporate the topic relatedness in the model leaning process. Two specific methods of constructing the topic relation graph G are introduced in Section 4.2.5.
Now, the SeTRL model can be constructed based on the above formulation. By combining Eqs. (4.2)-(4.4), we have the following optimization problem:
min ๐พ โ โ 1 2(๐ฆ๐ก๐ โ ๐๐๐๐) 2 ๐ ๐ก=1 ๐ ๐=1 + ๐ผ โ ||๐๐(๐)โ ๐๐(๐)||22 ๐ธ ๐=1 + ๐ฝ||๐พ||1 (4.5)
where ฮฑ is the regularization parameter to control the contribution of the topic relation information.
4.2.4.
Model Learning and Prediction
The next issue to examine is how to learn the SeTRL model, i.e. to estimate a parameter configuration of W that minimizes the objective function Eq. (4.5). Because the l1 norm
term in Eq. (4.5) (i.e. ||W||1) is not differentiable, the gradient descent method is not
suitable for the model learning. In this work, we use the Accelerated Proximal Gradient (APG) method [Ji09] to solve the optimization problem. Different from the traditional gradient method that uses the latest point in every iteration, APG takes a linear combination of the previous two points as the search point to achieve high convergence speed.
Specifically, we first decompose the objective function Eq. (4.5) to a non-smooth part: ๐(๐พ) = ๐ฝ||๐พ||1 (4.6) and a smooth convex part:
โ(๐พ) = โ โ1 2(๐ฆ๐ก๐โ ๐๐๐๐) 2 ๐ ๐ก=1 ๐ ๐=1 + ๐ผ โ ||๐๐(๐)โ ๐๐(๐)||22 ๐ธ ๐=1 (4.7).
Then, the generalised gradient updating step can be approximately expressed by: ๐น๐(๐พ, ๐พ(๐)) = ๐(๐พ(๐)) +ใ๐พ โ ๐พ(๐), ๐ต๐(๐พ(๐))ใ + ๐
๐||๐พ โ ๐พ(๐)||๐ญ
69
๐๐ (๐พ(๐)) = ๐๐๐ ๐๐๐๐พ ๐ ๐ (๐พ, ๐พ(๐)) (4.9)
where ||ยท||F denotes the Frobenius norm; ๐ต๐(๐(๐))is the gradient of h(W) at the point
๐(๐) of the mth iteration andใA, Bใ= tr(ATB) denotes the inter product of matrixes; s
is the iteration step size. After rewriting Eq. (4.9), we can have: rs(๐พ(๐)) = ๐๐๐ ๐๐๐๐พ 1 2||๐พ โ ๐ฝ(๐)||2 2+1 ๐ ๐(๐พ) (4.10) where V(m) = W(m) - 1
๐ ๐๐(๐(๐))and s can be determined by using line search.
The key aspect of the APG method is how to efficiently update the parameter matrix W
in each iteration. Finally, we adopt the strategy proposed in [Ch09] to further optimize the problem of Eq.(4.10). The gradient update rule of Eq.(4.10) is decomposed to K sub- problems, for which the analytical solutions can be easily obtained. In addition, instead of performing gradient descent based on W(m), the search point is computed as:
๐ผ(๐) = ๐พ(๐)+ ๐๐(๐พ(๐)โ ๐พ(๐โ๐)) (4.11) where ฯm = (1-ฮปm-1) ฮปm/ ฮปm-1 and ฮปm = 2/(m+3). In Algorithm 4-2, we summarize the
detailed optimization procedure.
Algorithm 4-2 APG Algorithm for Learning SeTRL Initialization: s0 > 0, ฮท >1, W(0) โ RKรT, U(0) = W(0) and
ฮป0 = 1, m = 0
Output: W(m)
1. form = 0, 1, 2,... until convergence of W(m):
2. Set s = sm; 3. while (g(rs(U(m)))+ h(rs(U(m))) > Rs(rs(U(m)),U(m)): 4. Set s = ฮทs; 5. endwhile 6. Set sm+1 = s; 7. ๐พ(๐+๐) = arg min ๐พ ๐น๐ ๐+1(๐พ, ๐ผ(๐)); 8. ฮปm+1 = 2/(m+3);
9. Compute U(m+1) using Eq.(4.11);
70