Twitter Network Topic model (TNTM) - Nonparametric Bayesian Topic Modelling with Auxiliary Data

Zvi et al., 2004], the tag-topic model [Tsai, 2011], the supervised LDA [Mcauliffe and Blei, 2008], and theTopic-Link LDA[Liuet al., 2009]. These models only deal with one kind of additional information and thus do not work well with tweets. Note that the tag-topic model treats hashtags as hard labels and uses them to group tweets, which is not appropriate due to the noisy nature of the hashtags.

On the other hand, theTwitter-LDA[Zhaoet al., 2011] and thebehaviour-topic model [Qiuet al., 2013] were designed to explicitly model tweets. In contrary to LDA, both models are not admixture models since they impose a limit of only one topic per document. The behaviour-topic model analyses tweets’ posting behaviour39 for each topic, and uses them for user recommendation. Alternatively, the biterm topic model [Yan et al., 2013] uses only the biterm co-occurrences to model tweets, discarding document level information. Both the biterm topic model and the Twitter-LDA do not incorporate any auxiliary information. All the mentioned topic models also have a limitation in that the number of topics need to be specified in advance, which is difficult since this number is not known.

Some recent work makes use of the link between documents (e.g., citations) in topic modelling, including the CNTM in Chapter 7, the relational topic model[Chang and Blei, 2010], the Poisson mixed-topic link model [Zhu et al., 2013] and the Link- PLSA-LDA [Nallapati et al., 2008]. Some other work models the authors’ network information, such as the Topic-Link LDA, which models author community using a generalised linear model, and the Author Cite Topic Model [Kataria et al., 2011], which models the authors citation network. However, these models are parametric in nature and can be restrictive. On the contrary, Lloyd et al. [2012] use a very flexible nonparametric model for network data by utilising random function priors, but they do not model text. We note that the TNTM makes use of the random function network model of Lloyd et al. [2012], but we apply modifications to the random function network model that leads to significant model improvement, this is discussed in the next section.

8.3 The Twitter Network Topic model

The TNTM makes use of the accompanying hashtags, authors, and followers network to model tweets better. The TNTM is composed of two main components: a HPYP topic model for the text and hashtags, and a GP based random function network model for the followers network. The authorship information serves to connect the two together. The HPYP topic model is illustrated by region b in Figure 8.1 while the network model is captured by region a.

110 Modelling Text and Author Network on Tweets

ν

µ

η

a

θ

_d′

θ

z

_dm′

z

y

w

ψ

γ

ψ

_k′

A

D

K

M

N

F

x

E

a

⃝

b

⃝

Figure 8.1: Graphical model for the Twitter Network Topic model (TNTM). The latent variables are unshaded and the observed variables are shaded. The TNTM is composed of a HPYP topic model (region b) and a GP based random function network model (region a). The author–topic distributions ν serve to link the two together. Each tweet is modelled with a hierarchy of document–topic distributions denoted by η,θ0, andθ, where each is attuned to the whole tweet, the hashtags, and the words, in that order. With their own topic assignmentsz0 andz, the hashtagsyand the words ware separately modelled. They are generated from the topic–hashtag distributions ψ0 and the topic–word distributions ψ respectively. The variables µ0, µ1 and γ are priors for the respective PYPs. The connections between the authors are denoted by x, which are modelled by random functionF.

8.3.1 HPYP Topic Model

We design the HPYP topic model as follows. For the word distributions, we first generate a parent word distribution priorγfor all topics:

γ∼PYP(αγ,βγ,Hγ), (8.1) where Hγ is a discrete uniform distribution over the complete word vocabulary

V.40 Then, we sample the hashtag distributionψ0_k and word distributionψ_k for each

§8.3 Twitter Network Topic model (TNTM) 111

topick, withγas the base distribution:

ψ_k0|γ∼PYP(αψk0_,_βψ0k_,_γ₎_, _(8.2)

ψk|γ∼PYP(αψk,βψk,γ), for k =1, . . . ,K. (8.3) Note that the tokens of the hashtags are shared with the words, that is, the hashtag #happy shares the same token as the word happy, and are thus treated as the same word. This treatment is important since some hashtags are used as words instead of labels.41 Additionally, this also allows any words to be hashtags, which will be useful for hashtag recommendation.

For the topic distributions, we generate a global topic distribution µ0 that serves

as a prior. Then generate the author–topic distribution νi for each author i, and a miscellaneous topic distribution µ1 to capture topics that deviate from the authors’

usual topics:

µ0 ∼GEM(αµ0,βµ0), (8.4)

µ1|µ0 ∼PYP(αµ1,βµ1,µ0), (8.5)

νi|µ0 ∼PYP(ανi,βνi,µ0), for i=1, . . . ,A. (8.6)

For each tweetd, givenνand the observed authorad, we sample the document–topic distributionη_d, as follows:

ηd|ad,ν∼PYP(αηd,βηd,νad), for d=1, . . . ,D. (8.7)

Next, we generate the topic distributions for the observed hashtags (θ0_d) and the observed words (θd), following the technique used in the adaptive topic model [Du et al., 2012a]. We explicitly model the influence of hashtags to words, by generating the words conditioned on the hashtags. The intuition comes from hashtags being the themes of a tweet, and they drive the content of the tweet. Specifically, we sample the mixing proportionsρθd0, which control the contribution of_η_d and_µ₁ for the base

distribution ofθ0_d, and then generate θ_d0 givenρθ0d:

ρθd0 ∼_Beta λθ 0 d 0 ,λ θ0_d 1 , (8.8) θ0_d|µ1,ηd∼PYP αθ0d,_βθ0d,_ρθd0_µ₁+ (1−_ρθd0)_η_d . (8.9)

We setθ0_d andη_d as the parent distributions ofθ_d. This flexible configuration allows us to investigate the relationship betweenθd,θ0_dandηd, that is, we can examine ifθd is directly determined byηd, or through the θ_d0 . The mixing proportions ρθd _{and the}

41_{For instance, as illustrated by the following tweet:} _{i want to get into #photography. can someone}

112 Modelling Text and Author Network on Tweets

topic distributionθd is generated similarly: ρθd ∼Beta λθ₀d,λθ₁d , (8.10) θ_d|η_d,θ_d0 ∼PYP αθd,βθd, ρθdηm+ (1−ρθd)θ_d0 . (8.11)

The hashtags and words are then generated in a similar fashion to LDA. For them-th hashtag in tweetd, we sample a topicz0_dmand the hashtagydmby

z0_dm_|θ_d0 ∼Discrete θ0_d, (8.12) y_dm_|z0_dm,ψ0 ∼Discrete ψ0_z0 dm , for m=1, . . . ,M_d, (8.13) where Md is the number of seen hashtags in tweet d. While for the n-th word in tweetd, we sample a topiczdn and the wordwdn by

zdn|θd∼Discrete(θd), (8.14) w_dn_|z_dn,ψ∼Discrete ψzdn

, for n=1, . . . ,N_d, (8.15) where N_d is the number of observed words in tweetd. We note that all above α, β andλare the hyperparameters of the model. We show the importance of the above modelling with ablation studies in Section 8.8. Although the HPYP topic model may seem complex, it is actually a simple network of PYP nodes since all distributions on the probability vectors are modelled by the PYP. The advantage of such modelling was discussed in Chapter 5.

8.3.2 Random Function Network Model

The network modelling is connected to the HPYP topic model via the author–topic distributionsν, where we treat νas inputs to the GP in the network model. The GP, represented byF, determines the link between two authors (xij), which indicates the existence of the social links between author iand author j. For each pair of authors, we sample their connections with the following random function network model:

Qij|ν∼ F(νi,νj), (8.16)

xij|Qij ∼Bernoulli s(Qij)

, for i=1, . . . ,A; j=1, . . . ,A, (8.17) wheres(_·)is thesigmoid function:

s(t) = 1

1+e−t . (8.18)

In document Nonparametric Bayesian Topic Modelling with Auxiliary Data (Page 133-137)