3.5 Conclusion and Future Work
4.2.1 Attributed SBM model definition
In this section we provide the details for our version of the attributed stochastic block model and the inference procedure used to learn the model parameters.
✓
X
A
z
Figure 4.1: Modeling community membership in terms of attributes and connectivity. Node-to- community assignments specified byZare determined in terms of adjacency matrix information,Aand attribute matrix information,X.AandXare assumed by be generated from a stochastic block model and a mixture of multivariate Gaussian distributions, parameterized byθandΨ, respectively.
4.2.1.1 Objective
We seek to incorporate both connectivity (A) and attribute information (X) to infer node-to- community assignments,Z. Note that for a network withN nodes,Kcommunities andpmeasured attributes,A,X, andZhave dimensionsN×N,N×pandN×K, respectively. In particular,Zis a binary indicator matrix, where entryzicis 1 if and only if nodeibelongs to communityc. We also define zto be theN-dimensional vector of node-to-community assignments.We assume that connectivity and attributes are conditionally independent, given the community membership label. The graphical model for the relationship between node-to-community labels, connectivity and attribute information is shown in Figure 4.1.
To infer theZthat best explains the data, we adopt a likelihood maximization approach. That is, we seek to find the partition of nodes to communities that best describes the observed connectivity and attribute information. Given the conditional independence assumption ofXandA, we can express the log likelihood of the data,Las the sum of connectivity and attribute log likelihoods,LAandLX, respectively, as
L=LA+LX . (4.1)
This likelihood reflects the joint distribution of the adjacency matrix,A, the attribute matrix,X, and the matrix of node-to-community indicators,Z; formally, we have
Given that Zis a latent variable that we are trying to infer, we can approach the problem using the expectation maximization (EM) algorithm (Dempster et al., 1977). By doing this, we will alternate between estimating the posterior probability that a nodeihas community labelc, or
p(zic = 1|X,A) (4.3)
and estimates for θ,Ψ, i.e., the model parameters specifying the adjacency and attribute matrices, respectively.
4.2.1.2 Attribute Likelihood
For a network withKcommunities, we assume that each particular communityihas an associated p-dimensional meanµiandp×pcovariance matrix,Σi. Note that these parameters uniquely identify ap-dimensional multivariate Gaussian distribution. To specify this model for allK communities, we define the parameterΨ={µ1,µ2, . . .µk,Σ1,Σ2, . . .ΣK}.
The log likelihood for the mixture of Gaussians on the attributes is written as,
P(X|Ψ) = N X i=1 log{ K X c=1 πcN(xi |µc,Σc)} (4.4)
Here,N(xi |µc,Σc)is the probability density function for the multivariate Gaussian andπcis the probability that a node is assigned to communityc.
4.2.1.3 Adjacency Matrix Likelihood
For the adjacency matrix,Aand theK×Kmatrix of stochastic block model parameters,θ, the complete data log likelihood can be expressed as
log(P(A|z)) = 1 2 X i6=j X k,l zikzjl[aijlog(θkl) + (1−aij) log(1−θkl)] . (4.5)
4.2.1.4 Inference
To use EM to maximize the likelihood of the data, we break the process into the E-step and M-Step, and perform this step sequence iteratively until the estimates converge.
E-Step. During the E-step, we use the current value of learned model parameters, θ and Ψto compute the posterior given in Eq. (4.3) at each step. The posterior at each step, γ(zic), of nodei belonging to communityc, is given by
γ(zic) =p(zic= 1|xi,ai)
= PKp(xi|zic= 1)p(ai|zic= 1)πc c=1p(xi|zic= 1)p(ai|zic= 1)πc
.
(4.6)
Here,xiandaidenote the attribute and connectivity patterns for nodei, respectively.
M-Step.In the M-step, we can compute updates forθandΨusing this expectation.
Since, the attributes follow a Gaussian mixture model, it can be shown that the update for the mean vector describing communityc,µc, can be computed as
µc= PN i=1γ(zic)xi PN i=1γ(zic) . (4.7)
Similarly, the update for the covariance matrix describing a community,Σc, is computed as
Σc= PN i=1γ(zic)(xi−µc)(xi−µc)T PN i=1γ(zic) . (4.8)
To update the parameters of θ, we follow the method in (Daudin et al., 2008) and update the probability of an edge existing between communityqandl, given byθqlas,
θql = P i6=jγ(ziq)γ(zjl)xij P i6=jγ(ziq)γ(zjl) (4.9)
We continue the process of iterating between the E-step and M-step until the change in the data log-likelihood,L, is below a predefined tolerance threshold.
4.2.1.5 Initialization
Likelihood optimization approaches are often sensitive to initialization because it is easy to get stuck in a local optimum. As an initialization strategy for the nodes, we simply cluster the nodes in the network using the Louvain algorithm (Blondel et al., 2008). We chose this approach because this algorithm is efficient and stable.