As described in Section 1.5, different statistical models have been developed (Toivonen et al., 2009) to explain the impact of local network mechanisms on global network structures or to characterize the global network structures in terms of local network structures. Two similar algorithms are used in this study: the RL algorithm and the MCMC algorithm implemented in the βergmβ package (Hunter et al., 2008) for the R programming language. They both assume that the nodes tend to create such a constellation of links that results in a desirable distribution of subgraphs of size three or other network characteristics. Both approaches are described and compared in more depth in the following sections.
3.3.1 Generating networks with the Relocating Links algorithm (RL algorithm)
The RL algorithm (see Algorithm 3.1), which is based on the approach of relocating links, requires that all considered local network statistics for an ideal network be represented by the vector β. The number of elements π of this vector equals the number of local network statistics considered. The numbers of different types of triads are considered here, but other local network statistics could also be chosen. The distribution of all or only a subset of all triad types can be given (for forbidden triad types, corresponding values of β equal zero). Beside β, the initial random network ππ has to
84
of the initial network must be in line with the desired global network structure16. Before the
iterative procedure starts, ππ is saved as a new network ππππ€.
The iterative procedure is repeated many times. Upon each iteration, a pair of linked nodes π and π and a pair of unlinked nodes π and π are randomly chosen. Then, the link between π and π is dissolved and a link between π and π is established. The modified network is saved as the proposed network ππ. From ππ, the number of each triad type considered βπ is calculated. The proposed
network is saved as ππππ€ (the new network) if the CR ratio is greater than 1. The CR ratio is defined
as πΆπ (β, βππ, βππππ€) = β (βπ π β βπ) 2 π π=1 β (βππππ€β β π)2 π π=1 (3. 1)
Then, the new iteration is performed and, after many iterations, the last ππππ€ is the final solution.
Besides the ππππ€, the values of πΆπ can be saved and further analysed. Algorithm 3.1: The Relocating Links algorithm
85
Compared to the MCMC algorithm introduced in the next subsection, the RL algorithm is deterministic since a link is only allocated if the distribution of the triads of the proposed network is closer to the distribution of the triads in the case of an ideal blockmodel. This may result in lower variability of the global network structure of networks generated when the RL is used since, compared to the MCMC algorithm, RL strive to generate networks with the exact number of the selected types of triads. However, the risk of a local optimum exists, which could be avoided by further improving the algorithm. Moreover, RL is computationally very intensive: a higher number of iterations is required, especially in the case of denser networks.
3.3.2 Generating networks with the MCMC algorithm
To generate networks, the Metropolis-Hastings algorithm was used, as implemented in the βergmβ package (see Section 2.2. for more details on MCMC algorithms in ERGM). The benefit of this approach is that by selecting a suitable proposal distribution one can place appropriate restrictions on the network, e.g. fixed density.
The definition of the probability of accepting the proposed network for the Metropolis-Hastings algorithm is similar to the definition of the CR: both compare the proposed network with the current one through the values of the proposed statistics. The elements of β are the exact values from the network with the ideal global network structure (where the number of nodes plays a significant role) while the values of π are regression coefficients and are, therefore, less directly related to the global network structure. In the case of the RL algorithm, a link is relocated (i.e., one link is dissolved and one is established) always when CR is greater than 1 and never when it is below 1.The RL algorithm could be defined in line with the logic behind the Metropolis-Hastings algorithm. In this case, the link is relocated if the value of CR is higher than 1. If the value of CR is below 1, the link is relocated with a given probability. This approach would incorporate an extra level of randomness in the generating process.
As described in Section 2.2, the method most often used to estimate parameters π is MCMC-MLE, which can be computationally hard to estimate. In this study, the parameters can be estimated based on networks with a given blockmodel without or with only very low levels of errors. When using this approach, in many cases the estimation algorithm does not converge, probably due to
86
the high level of multicollinearity of the triads. In addition, from a researcherβs point of view, estimating the values of all parameters for each blockmodel type would be very difficult.
Instead, the values of the ERGM parameters π are arbitrarily set to 2 (allowed) or β2 (forbidden). It has been shown (see subsection 3.2) that some triad types are much more likely to appear in an ideal network (compared to a random network). By setting all the parametersβ values to 2 or β2, we essentially assume that all types of allowed triads have the same importance (and similar for all forbidden triad types). Such a setting is critical when all types of triads are included in the model and result in a relatively unstable model, particularly when the density is not fixed.
All types of triads are used by considering the two approaches of considering the number of links: (i) the number of links is fixed (to the same value as in ideal networks) and (ii) the number of links is free (with the density being the variable). With the latter, the value of parameter edge is set to such a value that the mean density of 30 generated networks lies within the ideal-density interval Β±0.05.