Simulation Results - Generalised stochastic blockmodels and their applications in the analysis

In Figure 5.2 (a) and (b), we show the box plots of ARI scores for the two scen- arios of repeated measurements (T ₂ _{1,3_}) and variance 2 = 1 over S = 1000 network realisations. We note that the Het-Mixed-SBM struggles to correctly estimate the cluster structure for PI3 and that this is particularly pronounced in the cases with Mildly Unbalanced and Unbalanced designs, networks with 50 nodes and samples with 10 & 20 subjects (see Figure 5.2 (a)). However, as shown in Figure 5.2 (b), with a larger number of visits (T = 3), the accuracy of cluster estimates is improved. Similarly, for PI6 and PI7, the Unbalanced proportion design, networks with 50 nodes and 10 subjects, the Het-Mixed-SBM struggles to correctly estimate the cluster structure. However, the model improves the accuracy of the cluster estimates as the number of subjects increases and it tends to perform even better in the samples with more visits (Figure 5.2 (b)). In Figure 5.3, we show the box plots

(a)One visit and 2=1

(b)Three visits and 2=1

Figure 5.2: ARI scores over S=1000 network realisations with increasing number of subjects along the x-axis. The Het-Mixed-SBM fits are evaluated with respect to (i) varying proportion designs (Balanced, Mildly Unbalanced and Unbalanced, on each column), (ii) varying network sizes (n₂_{50,100_}) with no age e↵ect (on each of the first three columns), and (iii) varying connectivity structures (PI3 & PI6-8, on each row and plotting colour).

of ARI scores for di↵erent variance settings ( 2 2{0.5,1,2}), Unbalanced proportion designs, one visit (T = 1) and varying subjectsK 2{10,20,40}. In the cases with a small number of nodes and subjects, there seems to be some evidence that the estimates of cluster structure are less accurate when the variance is larger (e.g., in the cases of PI7 & PI8 and their respective samples with 10 and 20 subjects). However, in other examples (e.g., PI3,n= 50 and 20 subjects), this influence is less apparent.

Next, in Figure 5.4 (a) and (b), we show the RMSE scores for the estimates of ˆ↵ for the cases of one and three visits and variance 2 = 1. For both visit

Figure 5.3: ARI scores over S = 1000 network realisations with 2 along thex-axis. The Het-Mixed-SBM fits are evaluated with respect to the Unbalanced proportion design, (i) varying network sizes (n2{50,100}, on each of the first three columns) with no age e↵ect, (ii) varying numbers of subjects (K ₂_{10,20,40_}, on each of the first three columns), and (iii) varying connectivity structures (PI3 & PI6-8, on each row and plotting colour).

counts, the RMSE scores seems to decrease with an increasing number of subjects and an increasing number of nodes. This behaviour is consistent with the behaviour reported in Section 4.4.1 (see Figure 4.4) and seems to be also linked with the overall accuracy of the estimated cluster structure. For example, the cluster structure estimates exhibit some degree of variability over 1000 realisations for PI3 and this is captured in the RMSE scores that tend to be the largest. In Figure 5.5 (a) and (b), we show the RMSE of the intercept estimates ( ˆ) in the cases of one and three visits, and 2=1. The RMSE is generally smaller for PI8 than for the other connectivity structures. This can be explained by the presence of a bias which occurs when the Het-Mixed-SBM struggles to correctly estimate the cluster structure. As PI8 was the least a↵ected by this (see Figure 5.3), it is not surprising that its RMSEs are the lowest. It is also interesting to note that the RMSE is typically decreasing with an increasing number of subjects. This can be simply explained by the decrease of variance of ˆ with an increasing number of subjects. Note however that the variance does not markedly change with either number of visits or number of nodes.

In Figure 5.6, we show the bias of ˆ2 in the cases of one and three visits, and 2=1. We clearly see that the estimates tend to have an appreciable negative bias in small samples, which decreases with an increasing number of subjects.

In Figure 5.7, we show the FPR obtained using a Wald test on both the intercepts and slopes, and in the cases of one and three visits, and 2=1. In small samples, the Wald test is liberal, but becomes more accurate when the number of subjects increases. Our simulations seem to suggest that a sample with 80 subjects

(a)One visit and 2=1

(b)Three visits and 2=1

Figure 5.4: RMSE of ↵ˆ whose individual block elements are given along thex-axis. The RMSE scores are evaluated with respect to (i) varying proportion designs (Mildly Unbalanced and Unbalanced, on each of the first four columns), (ii) varying numbers of subjects (K ₂_{10,20,40,80_}, on each of the first four columns), (iii) varying network sizes (n2{50,100}) with no age e↵ect (on each row), and (iv) varying connectivity structures (PI3 & PI6-8, plotting symbols and colour). is sufficient to allow a relatively accurate control of the FPR.

(a)One visit and 2=1

(b)Three visits and 2=1

Figure 5.5: RMSE of ˆ-intercepts whose individual block elements are given along the x-axis. The RMSE scores are evaluated with respect to (i) varying proportion designs (Mildly Unbalanced and Unbalanced, on each column), (ii) varying numbers of subjects (K ₂ _{10,20,40,80_}, on each of the first four columns), (iii) varying network sizes (n ₂ _{50,100_}) with no age e↵ect (on each row), and (iv) varying connectivity structures (PI3 & PI6-8, plotting symbols and colour). Note that, for clarity, the RMSE of ˆ₃₃ in the first column and row of (i) is not shown and its value is 2.74.

(a)One visit and 2=1

(b)Three visits and 2=1

Figure 5.6: Bias of ˆ2 whose individual block elements are given along the x-axis. The bias scores are evaluated with respect to (i) varying proportion designs (Mildly Unbalanced and Unbalanced, on each column), (ii) varying numbers of subjects (K 2 {10,20,40,80}, on each of the first four columns), (iii) varying network sizes (n ₂ _{50,100_}) with no age e↵ect (on each row), and (iv) varying connectivity structures (PI3 & PI6-8, plotting symbols and colour).

(a)Intercept for three visits and 2=1

(b)Slope for three visits and 2=1

Figure 5.7: False Positive Rates (FPR) at 5% significance level for the Wald test on each element of ˆ, whose individual block elements are given along the x-axis. The FPR scores are evaluated with respect to (i) varying proportion designs (Mildly Unbalanced and Unbalanced, on each column), (ii) varying numbers of subjects (K ₂ _{10,20,40,80_}, on each of the first four columns), (iii) varying network sizes (n 2 {50,100}) with no age e↵ect (on each row), and (iv) varying connectivity structures (PI3 & PI6-8, plotting symbols and colour).

5.4 Het-Mixed-SBM Fit to Multi-subject Functional Con-

In document Generalised stochastic blockmodels and their applications in the analysis of brain networks (Page 121-128)