Application to U.S Migration Network - Flexible Model Specification

6.4 Flexible Model Specification

6.5.2 Application to U.S Migration Network

We next apply the GERGM to the U.S. migration network analyzed in Desmarais and Cranmer (2012). Historically, interstate migration has played an important role in the understanding of local financial markets, public infrastructure, and the political climate within each state (Clark and Ballard, 1981; Levine and Zimmerman, 1999; Gimpel and Schuknecht, 2001). The network that we model contains 51 nodes that represent the 50 U.S. states as well as Washington, D.C. Directed edges are placed between states in which there was a change in interstate migration flow from 2006 and 2007. The weight, yi,j, associated with the directed edge from nodei to nodej is

the total change in interstate migration from stateito statejbetween 2006 to 2007. This data set also contains demographic descriptive information for each state that we use to parameterize the marginal transformation ofy onto a restricted network x∈[0,1]m.

To reflect the skewed nature of the edge weight distribution for this network, we specifiedti,j as

a Cauchy pdf for eachi, j∈N whose mean,µi,j, is a linear function of the demographic covariates

ti,j(y, z,Λ) = π 1 + (yi,j−µi,j)2 −1 , µi,j = Λ0+ 10 X k=1 Λkzi,j(k)

The demographic information of each pair of states i and j is represented by the vector zi,j.

The vector Λ ∈ _R11 _{parameterizes the transformation of each pair of states through a linear} regression on zi,j. The chosen covariates describe dispersion, geographic distance, high January

temperature, income, unemployment, and population of the sending and receiving states iand j. To model the network structure of the restricted edge collectionx, we used network statistics that represent reciprocity, cyclic triads, in-two-stars, out-two-stars, and transitive triads. Let yi,j be

the weight between nodes i and j and let xi,j be the corresponding weight on the unit interval

so that yi,j = Ti,j−1(x,Λ). Our model specification is described by (6.1) and (6.2) where θθθ =

(θR, θT T, θCT, θOT S, θIT S)0, and α = (1,1,1,1,1). We note that the model used here is the same

non-degenerate specification described in Desmarais and Cranmer (2012) and that Gibbs sampling is applicable.

We fit the above model with Metropolis-Hastings, Gibbs, and MPLE. For Gibbs, we simulated 1000 networks with a set burnin of 1000 networks in each iteration. For Metropolis-Hastings, we set σ2 to 0.01 and simulated 1000 networks with a burnin of 10000 on each run. To obtain an uncorrelated sample of 1000 networks, we thin a sample of ten million networks by sampling every one-thousandth network. The resulting average acceptance rate for Metropolis-Hastings was approximately 0.25. The parameter estimates and associated confidence intervals are shown in Figure 6.3.

Figure 6.3 reveals several important aspects of the U.S. Migration network. All three methods provide comparable parameter estimates. This suggests that each method simulates from the same distribution. All three estimation methods identified eight statistically significant covariates at the 95% level. Four of these significant covariates were topological: transitive triads, out-two-stars, in-two-stars, and cyclic triads. Three of the remaining significant covariates were demographic covariates representing unemployment, population, and the average January temperature. These results suggest three noticeable and interesting trends in the data: 1) there was increased migration

Figure 6.3: Estimates of the network parameters of the GERGM model when fit to the U.S. migration network. Shown are the results for the Gibbs, Metropolis-Hastings, and Maximum Pseudo-Likelihood pro- cedures. Lines represent 90 and 95% confidence intervals for each estimate.

away from states with high unemployment, 2) there was decreased migration to states with a large population, and 3) there was decreased migration away from states with high average January temperatures.

To evaluate the goodness of fit of the estimated models, we subsequently simulated 10000 networks using our Metropolis-Hastings procedure with a burnin set to 10000 and σ2 = 0.01. Figure 6.4 shows the goodness of fit for six network statistics for each of the fitted models. In each plot, we compare the distribution of the network statistics generated from the 10000 simulated networks with the true observed network statistics of the migration network. Figure 6.4 reveals that the distribution of network statistics from the simulated networks of each fitted model closely matches the observed statistic from the migration network suggesting good fit of each method.

To assess the convergence of the Metropolis-Hastings simulation procedure, we use the Geweke dignostic test for stationarity (Geweke, 1991). We test the convergence of the edge weight, reciprocity, and transitive triads statistics for the 10000 simulated networks from the M-H procedure.

Figure 6.4: U.S. migration network goodness of fit plots. For each model, 10000 networks were simulated using the Metropolis-Hastings sampling procedure. Each plot compares the empirical densities of each statistic from the simulated networks for each model. The vertical line marks the observed network statistic in the migration network.

For each statistic, we fail to reject convergence according to the Geweke two-sided test that tests the equivalence of the first and last third of the samples (p-values = 0.86, 0.91, 0.84, respectively).

In document Wilson_unc_0153D_15574.pdf (Page 124-127)