Toivonen et al. (2009) classified ERGM as a separate category in their social network models.
ERGMs are βa family of statistical models for social networks that permit influence about prominent patterns in the data, given the presence of other network structuresβ (Robins, 2011, pp.
484β485). They are used to check to what extent the global network structure can be explained, considering the structure of links and/or characteristics of the nodes. Compared to NEMs, ERGMs do not consider the (evolution) process, although the MCMC algorithms (see section 1.4.3) can be used to model the evolution of social networks (Snijders, 2001).
48
The above definition of ERGM is mostly summarized by Hunter et al. (2008). Let us assume a given random network π (the concrete realization of a random network is denoted by π¦), consisting of π nodes. In such a network, the link between the π-th and π-th nodes is represented by a random variable πππ. The set of all possible networks is denoted by π΄. Therefore, the distribution of π can be written as
ππ,π΄(π = π¦) =exp{πππ(π¦)}
π (π, π΄) , π¦ β π΄ (2. 1)
where π is a vector of coefficients and π(π¦) is a vector of statistics, estimated based on the matrix π¦. π (π, π΄) is a normalizing constant which ensures that the sum of probabilities equals 1. The estimation is computationally very intensive, especially when networks with a higher number of nodes are involved. In the context of generating random networks based on the model above, the change statistics (Wasserman & Pattison, 1996) must be mentioned. It is defined as
πΏπ(π¦ππ) = π(π¦ππ+) β π(π¦ππβ) (2. 2) where π(π¦ππ+) is the vector of statistics obtained on the network with the link between the π-th and π-th node, and π(π¦ππβ) is the vector of statistics obtained on the network without a link between the π-th and π-th node. It can be shown that probability ππ,π΄(πππ = 1|ππππ = π¦πππ) (where ππππ represents the rest of the network other than the single variable πππ) depends on π¦πππ only through πΏπ(π¦)ππ, which holds practical implications when generating random networks since it is often easier to compute πΏπ(π¦)ππ than π(π¦ππ+) and π(π¦ππβ) separately.
There are several types of Markov Chain Monte Carlo (MCMC) algorithms for generating random networks. Generally, the start is represented by an empty network. Then, based on the uniform distribution one of the links or non-links is chosen. According to the model, the probability of establishing or dissolving a link is calculated and then, based on this probability, the chosen link or non-link is established or dissolved. The process is iterative. For each iteration, the change in the values of the estimated statistics before and after the change in the link between π and π is estimated β in other words, for each iteration πΏπ(π¦)ππ is calculated. The iterative process stops when the approximate convergence to ππ0,π΄(π=π¦ππππππ ππ) is reached (Hunter et al., 2008). When the Metropolis algorithm is used, the network with a changed link is accepted with the probability
49
min {1,πππ0,π΄(π=π¦ππππππ ππ)
π0,π΄(π=π¦ππ’πππππ‘) } (2. 3)
More general is the Metropolis-Hastings algorithm (Hastings, 1970; Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller, 1953) which is currently implemented in the βergmβ package of the R computer language (Hunter, Goodreau, & Handcock, 2013). However, in the case of the Metropolis-Hastings algorithm, the π¦ππππππ ππ is chosen based on an auxiliary distribution which depends on π¦ππ’πππππ‘. π(π¦2|π¦1) represents the probability in the Metropolis-Hastings algorithm that π¦2 becomes a new proposed network on the condition that current network π¦1 is given. Therefore, the proposed network is accepted with the probability
πππ {1, ππ0,π΄(π = π¦ππππππ ππ) ππ0,π΄(π = π¦ππ’πππππ‘)
π(π¦ππππππ ππ |π¦ππ’πππππ‘)
π(π¦ππ’πππππ‘|π¦ππππππ ππ)} (2. 4) When using the Metropolis or Metropolis-Hastings algorithms (or Gibbs sampling), the π (π, π΄) disappears from the ERGM likelihood ratio (Equation 2.1) and the ratio is simplified to
ππ0,π΄(π=π¦ππππππ ππ)
ππ0,π΄(π=π¦ππ’πππππ‘) = ππ₯π{[π(π¦ππππππ ππ) β π(π¦ππ’πππππ‘)]} (2. 5) Different methods can be used to estimate the parameters. A well-known one is the maximum pseudo likelihood estimator (MPLE) for which the estimates can be biased in some cases (Corander, DahmstrΓΆm, & DahmstrΓΆm, 1998), which arises from the assumption of independence between the nodes. The latter can also result in higher values of standard errors (Van Duijn, Gile,
& Handcock, 2009). Desmarais & Cranmer (2012) proposed an efficient approach (based on non-parametric bootstrapping) to compute the confidence intervals for MPLE estimates.
The parametersβ estimates are less biased and less variable when the Markov Chain Monte Carlo maximum likelihood estimator (MCMC-MLE) is used. The estimates are obtained using Monte Carlo simulations, which can be computationally very intensive, especially when the analysed network consists of a higher number of nodes. However, with the number of nodes the MPLE estimates approach the MCMC-MLE estimates (Hyvarinen, 2007; Strauss & Ikeda, 1990). Besides the limitation on networks with a smaller number of nodes, the main issue with ERMG is that the distribution of sufficient statistics can be multimodal (Snijders, 2002), particularly when statistics related to transitivity are included in the model (Jonasson, 1999). So-called degenerativity emerges
50
when the empirical data do not fit the estimated model (e.g. due to inadequate selection of the terms or explanatory variables). In such cases, the generated networks based on the model do not fit the empirical network on average (the distribution of estimated parameters is not appropriate).
Usually, such generated networks are either completely empty or completely full (Handcock et al., 2003).
Model degenerativity can be avoided with a well-defined research question and thus the appropriate selection of explanatory variables or terms as they are called in the ERGM context.
Use of the terms is typically based on the theory and the context (Goodreau, 2007; Morris et al., 2008). Different terms can be added and removed β at the end, the model that best fits the empirical data is chosen. Another technique for choosing the terms is by considering different forms of dependencies between the nodes (Frank & Strauss, 1986), which can be ordered in a partly ordered dependence hierarchy for ERGM (Block et al., 2016). Block, Stadtfeld & Snijders (2016) summarized the following most frequently used terms in practice: (i) when the independence of ties (all links are established with a certain probability, which does not depend on other links) is assumed, only the term "edge" (a term for the density) should be included in the ERGM model;
(ii) when dyadic independence (the link π β π depends only on π β π and is independent of all other links in the network) is assumed, only the term for the density and the term for mutuality must be included in the ERGM model; and (iii) Frank and Strauss (1986) proposed Markov dependence (two links are conditionally independent of all other links in the network unless they have at least one node in common). When a Markov dependence is assumed, terms related to transitive triads and in-stars and out-stars must be considered in ERGM; (iv) in the case of social circuit dependence (the links between π and π, β and π are conditionally independent of other links in the networks if there are links exist between nodes π and π and also between β and π) proposed by Pattison & Robins (2002) the terms of a type 4-cycle and the terms related to different types of a geometrically-weighted edgewise shared partners must be included in the ERGM model.
Nevertheless, when studying empirical networks, the term edge is almost always included in the model since it controls the density (when only the term edge is included in the model the networks are generated by the πΊ(π, π) model). The density is generally treated as a random variable in ERGM models, which is more realistic in the case of modelling social networks compared to an assumed fixed density (although it is also possible to construct models with a fixed density), since
51
the density is seen as a product of social processes and it therefore cannot be known in advance (Hunter et al., 2008).