Exponential Random Graph Models (ERGM) - Local Mechanisms Affecting the Evolution of Blockmodel

Toivonen et al. (2009) classified ERGM as a separate category in their social network models.

ERGMs are “a family of statistical models for social networks that permit influence about prominent patterns in the data, given the presence of other network structures” (Robins, 2011, pp.

484–485). They are used to check to what extent the global network structure can be explained, considering the structure of links and/or characteristics of the nodes. Compared to NEMs, ERGMs do not consider the (evolution) process, although the MCMC algorithms (see section 1.4.3) can be used to model the evolution of social networks (Snijders, 2001).

The above definition of ERGM is mostly summarized by Hunter et al. (2008). Let us assume a given random network 𝑌 (the concrete realization of a random network is denoted by 𝑦), consisting of 𝑁 nodes. In such a network, the link between the 𝑖-th and 𝑗-th nodes is represented by a random variable 𝑌_𝑖𝑗. The set of all possible networks is denoted by 𝒴. Therefore, the distribution of 𝑌 can be written as

𝑃_𝜃,𝒴(𝑌 = 𝑦) =exp{𝜃^𝑇𝑔(𝑦)}

𝜅(𝜃, 𝒴) , 𝑦 ∈ 𝒴 (2. 1)

where 𝜃 is a vector of coefficients and 𝑔(𝑦) is a vector of statistics, estimated based on the matrix 𝑦. 𝜅(𝜃, 𝒴) is a normalizing constant which ensures that the sum of probabilities equals 1. The estimation is computationally very intensive, especially when networks with a higher number of nodes are involved. In the context of generating random networks based on the model above, the change statistics (Wasserman & Pattison, 1996) must be mentioned. It is defined as

𝛿_𝑔(𝑦_𝑖𝑗) = 𝑔(𝑦_𝑖𝑗⁺) − 𝑔(𝑦_𝑖𝑗⁻) (2. 2) where 𝑔(𝑦_𝑖𝑗⁺) is the vector of statistics obtained on the network with the link between the 𝑖-th and 𝑗-th node, and 𝑔(𝑦_𝑖𝑗⁻) is the vector of statistics obtained on the network without a link between the 𝑖-th and 𝑗-th node. It can be shown that probability 𝑃_𝜃,𝒴(𝑌_𝑖𝑗 = 1|𝑌_𝑖𝑗^𝑐 = 𝑦_𝑖𝑗^𝑐) (where 𝑌_𝑖𝑗^𝑐 represents the rest of the network other than the single variable 𝑌_𝑖𝑗) depends on 𝑦_𝑖𝑗^𝑐 only through 𝛿_𝑔(𝑦)_𝑖𝑗, which holds practical implications when generating random networks since it is often easier to compute 𝛿_𝑔(𝑦)_𝑖𝑗 than 𝑔(𝑦_𝑖𝑗⁺) and 𝑔(𝑦_𝑖𝑗⁻) separately.

There are several types of Markov Chain Monte Carlo (MCMC) algorithms for generating random networks. Generally, the start is represented by an empty network. Then, based on the uniform distribution one of the links or non-links is chosen. According to the model, the probability of establishing or dissolving a link is calculated and then, based on this probability, the chosen link or non-link is established or dissolved. The process is iterative. For each iteration, the change in the values of the estimated statistics before and after the change in the link between 𝑖 and 𝑗 is estimated – in other words, for each iteration 𝛿_𝑔(𝑦)_𝑖𝑗 is calculated. The iterative process stops when the approximate convergence to 𝑃_𝜃0,𝒴(𝑌=𝑦_{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑}) is reached (Hunter et al., 2008). When the Metropolis algorithm is used, the network with a changed link is accepted with the probability

min {1,^𝑃_𝑃^𝜃0,𝒴^(𝑌=𝑦^{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑}⁾

𝜃0,𝒴(𝑌=𝑦𝑐𝑢𝑟𝑟𝑒𝑛𝑡) } (2. 3)

More general is the Metropolis-Hastings algorithm (Hastings, 1970; Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953) which is currently implemented in the “ergm” package of the R computer language (Hunter, Goodreau, & Handcock, 2013). However, in the case of the Metropolis-Hastings algorithm, the 𝑦_{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑} is chosen based on an auxiliary distribution which depends on 𝑦_{𝑐𝑢𝑟𝑟𝑒𝑛𝑡}. 𝑃(𝑦₂|𝑦₁) represents the probability in the Metropolis-Hastings algorithm that 𝑦₂ becomes a new proposed network on the condition that current network 𝑦₁ is given. Therefore, the proposed network is accepted with the probability

𝑚𝑖𝑛 {1, 𝑃_𝜃₀_,𝒴(𝑌 = 𝑦_{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑}) 𝑃_𝜃₀_,𝒴(𝑌 = 𝑦_{𝑐𝑢𝑟𝑟𝑒𝑛𝑡})

𝑃(𝑦_{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑} |𝑦_{𝑐𝑢𝑟𝑟𝑒𝑛𝑡})

𝑃(𝑦_{𝑐𝑢𝑟𝑟𝑒𝑛𝑡}|𝑦_{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑})} (2. 4) When using the Metropolis or Metropolis-Hastings algorithms (or Gibbs sampling), the 𝜅(𝜃, 𝒴) disappears from the ERGM likelihood ratio (Equation 2.1) and the ratio is simplified to

𝑃_𝜃0,𝒴(𝑌=𝑦𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑)

𝑃_𝜃0,𝒴(𝑌=𝑦𝑐𝑢𝑟𝑟𝑒𝑛𝑡) = 𝑒𝑥𝑝{[𝑔(𝑦_{𝑝𝑟𝑜𝑝𝑜𝑠𝑒𝑑}) − 𝑔(𝑦_{𝑐𝑢𝑟𝑟𝑒𝑛𝑡})]} (2. 5) Different methods can be used to estimate the parameters. A well-known one is the maximum pseudo likelihood estimator (MPLE) for which the estimates can be biased in some cases (Corander, Dahmström, & Dahmström, 1998), which arises from the assumption of independence between the nodes. The latter can also result in higher values of standard errors (Van Duijn, Gile,

& Handcock, 2009). Desmarais & Cranmer (2012) proposed an efficient approach (based on non-parametric bootstrapping) to compute the confidence intervals for MPLE estimates.

The parameters’ estimates are less biased and less variable when the Markov Chain Monte Carlo maximum likelihood estimator (MCMC-MLE) is used. The estimates are obtained using Monte Carlo simulations, which can be computationally very intensive, especially when the analysed network consists of a higher number of nodes. However, with the number of nodes the MPLE estimates approach the MCMC-MLE estimates (Hyvarinen, 2007; Strauss & Ikeda, 1990). Besides the limitation on networks with a smaller number of nodes, the main issue with ERMG is that the distribution of sufficient statistics can be multimodal (Snijders, 2002), particularly when statistics related to transitivity are included in the model (Jonasson, 1999). So-called degenerativity emerges

when the empirical data do not fit the estimated model (e.g. due to inadequate selection of the terms or explanatory variables). In such cases, the generated networks based on the model do not fit the empirical network on average (the distribution of estimated parameters is not appropriate).

Usually, such generated networks are either completely empty or completely full (Handcock et al., 2003).

Model degenerativity can be avoided with a well-defined research question and thus the appropriate selection of explanatory variables or terms as they are called in the ERGM context.

Use of the terms is typically based on the theory and the context (Goodreau, 2007; Morris et al., 2008). Different terms can be added and removed – at the end, the model that best fits the empirical data is chosen. Another technique for choosing the terms is by considering different forms of dependencies between the nodes (Frank & Strauss, 1986), which can be ordered in a partly ordered dependence hierarchy for ERGM (Block et al., 2016). Block, Stadtfeld & Snijders (2016) summarized the following most frequently used terms in practice: (i) when the independence of ties (all links are established with a certain probability, which does not depend on other links) is assumed, only the term "edge" (a term for the density) should be included in the ERGM model;

(ii) when dyadic independence (the link 𝑖 → 𝑗 depends only on 𝑗 → 𝑖 and is independent of all other links in the network) is assumed, only the term for the density and the term for mutuality must be included in the ERGM model; and (iii) Frank and Strauss (1986) proposed Markov dependence (two links are conditionally independent of all other links in the network unless they have at least one node in common). When a Markov dependence is assumed, terms related to transitive triads and in-stars and out-stars must be considered in ERGM; (iv) in the case of social circuit dependence (the links between 𝑖 and 𝑗, ℎ and 𝑙 are conditionally independent of other links in the networks if there are links exist between nodes 𝑖 and 𝑗 and also between ℎ and 𝑙) proposed by Pattison & Robins (2002) the terms of a type 4-cycle and the terms related to different types of a geometrically-weighted edgewise shared partners must be included in the ERGM model.

Nevertheless, when studying empirical networks, the term edge is almost always included in the model since it controls the density (when only the term edge is included in the model the networks are generated by the 𝐺(𝑛, 𝑝) model). The density is generally treated as a random variable in ERGM models, which is more realistic in the case of modelling social networks compared to an assumed fixed density (although it is also possible to construct models with a fixed density), since

the density is seen as a product of social processes and it therefore cannot be known in advance (Hunter et al., 2008).

In document Local Mechanisms Affecting the Evolution of Blockmodels (Page 48-52)