• No results found

2.5 Maximum likelihood estimation and network simulation

2.5.4 Model degeneracy

While parameter estimation of the ERGM class is known to be dicult since Frank and Strauss (1986) it was not until the advent of ecient network simulation met- hods that problems with particular model specications became obvious. A com- mon problem is that for particular choices of sucient network statistics algorithms like (1) are not able to nd a maximum to the log likelihood ratio (2.39). Snijders (2002) speculate that the convergence issues are an algorithmic shortcoming but Snijders et al. (2006) realize that the problems are caused by some specications which are intended to model transitivity. If triangles are used as sucient network

statistics in the ERGM specication, the networks simulated from p(y|ˆθ) may not look like the observed network at all even though ˆθ appears to be a reasonable parameter value. Rather the simulated networks are either completely empty with no tie existing or are full graphs with all possible ties present. This is the result of what Handcock (2003b) denes as degeneracy of an ERGM model specication. The expected value Eθˆ[s(y)]is close to the relative boundary of the convex hull of

the set

{s(˜y) : ˜y ∈ Y} ,

see Rinaldo et al. (2009). As a result, most of the probability of p(y|ˆθ) is concentra- ted on the full graph and the empty graph but little or no probability is placed on realistic congurations similar to the observed network y. MCMC-ML estimation might fail or the estimates might have huge variance. This is the reason why nding a suitable starting value for algorithm (1) is so dicult as a value θ0 which falls

into the degenerate region will cause non-convergence of the approximate Fisher scoring algorithm. Also, the TNT sampler used for network simulation will not converge.

Model degeneracy is related to phase transitions in the Ising model as discussed by Besag (1974) and Frank and Strauss (1986) and becomes apparent in network simulation as a sudden jump in the expected value Eθ[s(y)] as a function of θ.

This function may be continuous but shows a sudden increase in its gradient re- sulting in a jump right over the value ˆθ which satises Eθˆ[s(y)] = s(y), rendering

Eθ[s(y)]a near discontinuous function. This phase transition occurs at the value ˆθ

which otherwise would be accepted as ML estimate. Unfortunately, this estimate produces only nonsensical simulated networks yθˆ which are a mixture of full and

empty graphs. The result will be a bimodal probability distribution of the network statistic s(y|θ) for parameters near ˆθ. A parameter estimate from that degenerate region on average reproduces the observed value s(y) so that indeed

ˆ

Eθˆ[s(y)] = s(y)

but both modes of s(y|ˆθ) are far from the mean ˆEθˆ[s(y)]. The tted model is not

able to reproduce meaningful network data. The TNT sampler used for network simulation will not converge but rather jump between the two modes of s(y|ˆθ) while almost no probability mass will fall in between. Schweinberger (2011) give details on ERGM degeneracy and under which model specications this problem is likely

to occur.

Snijders et al. (2006) hint that the reason for model degeneracy is not an algorithmic problem but is inherent to the ERGM likelihood under the assumption of Markov dependence. They describe an avalanche eect that can occur under model degeneracy if a graph has moderate to low density but shows substantial transitivity which is typical for social networks. If the ERGM shall include a parameter of transitive triangles or k−stars, there is a tendency towards huge parameter values for these statistics. The alternating sings of the shared parter statistics discussed in section 2.4.4 are meant to prevent this avalanche eect. Hunter (2007) introduce a model specication similar to Snijders et al. (2006) based on the GW ESP (y) and GW DSP (y) statistics discussed in section 2.4.4 which can prevent model degeneracy. These sucient network statistics require the partial conditional dependence assumption of Pattison and Robins (2002). Further, the assumption of nodal homogeneity may contribute to ERGM instability, see Thiemichen et al. (2016). Handcock (2003b) proposes to ameliorate the algorithmic diculties caused by near degeneracy with a Bayesian approach using suitable prior distributions on the sample space of θ. Indeed, Bayesian ERGM estimation is more robust to model degeneracy than MCMC-ML and will be discussed in chapter 4.

The phenomenon of model degeneracy is illustrated using simulations of small networks on n = 10 nodes. Two sets of networks are simulated from dierent ERGM specications m1 and m2 both containing L(y) and GW ESP (y) as su-

cient network statistics. θL = −2.4 is xed in m1 and θGW ESP is ranging from

0.5 to 1.5 in steps of 0.01. GW ESP (y) is specied with αE = 1.2 which is a

way too high value for such a small network. In m2 αE = 0.2 which is a much

better value as only few shared partners contribute to the change statistics of GW ESP (y). For each step of θGW ESP, R = 1, 000 networks are simulated and

the mean of GW ESP (y) is calculated. For αE = 1.2 in m1 there is strong evi-

dence of model degeneracy as there is a massive jump in the expected value of Eθ[GW ESP (y)] around θ = 0.68, see gure 2.9. The resulting probability dis-

tribution of GW ESP (y) is bimodal where substantial probability mass is placed on the empty graph at GW ESP (y) = 0 and on a high density graph whereas the density at the mean of GW ESP (y) is very low. Obviously αE = 1.2 is too large

for this particular network and a social circuit model should be specied that puts more weight on a lower degree of shared partners. If this weight is reduced to αE = 0.2 as in m2, see gure 2.10, the phase transition disappears and only at the

Figure 2.9: Model degeneracy resulting from m1:

Left panel: The phase transition is obvious around the value of θ = 0.68 indicated by the red line.

Right panel: The resulting distribution of GW ESP (y) given that parameter value is bimodal where both modes are far from the mean of the network statistic (blue line). αE= 1.2causes model degeneracy.

Figure 2.10: No model degeneracy with m2: Left panel: No phase transition observable.

Right panel: The resulting distribution of GW ESP (y) at θ = 1.14 is weakly bimodal with the major mode being identical to the mean. The low value of αE = 0.2 greatly

steepest point of E [GW ESP (y)|θGW ESP]the density of GW ESP (y) shows some

weak bimodality. If the weight is further reduced to αE = 0.1 (not shown), the

bimodality completely disappears.

Hunter and Handcock (2006) show how to estimate the tuning parameter αE

in geometrically weighted network statistics. This approach has the drawback of drastically decreasing the numerical stability and speed of MCMC parameter esti- mation. Throughout this work we specify αE a priori and keep the value relatively

low, see chapter 3 and 4.