2.4.1 Assigning p-values to hubs
Under the null hypothesis of block sparse covariance matrix Σx the result of The-
orem II.1 along with the approximations (2.32) and (2.33) can be used to assign p-values to the observed degrees {di}pi=1 of nodes v1, · · · , vp. The procedure for as-
signing p-values is as follows:
1. Choose an initial threshold ρ∗.
2. Select a value δ ∈ {1, · · · , max1≤i≤pdi}, where di’s are the vertex degrees in Gρ∗(Φ).
3. For each 1 ≤ i ≤ p let ρδ(i) be the δ-th largest element of {|Φij|, j 6= i, 1 ≤ j ≤ p}.
4. Approximate the p-value corresponding to vertex vi as
in which FΛ(δ−1) is the cumulative distribution function of a Poisson random variable
with rate Λ computed at δ − 1, i.e. FΛ(δ − 1) = e−ΛPδ−1l=0 Λl/l!.
The above procedure is similar to the procedures introduced in correlation and predictive correlation screening (Hero and Rajaratnam, 2011, 2012; Firouzi et al., 2013) with the difference that here the local Poisson rates Λi,p,n,ρ are used to ap-
proximate the p-values whereas correlation screening methods use the global rates for the avergae number of hubs in the (partial) correlation graphs. The advantage of using procedure above comes from the fact that the error bounds for the convergence of the local Poisson rates are at least p times faster than the error bounds for the convergence of the global rates (Hero and Rajaratnam, 2011, 2012; Firouzi et al., 2013). This leads to a larger convergence region in terms of p, n and ρ for the rates introduced in Theorem II.1. Therefore, local hub screening applies to a wider range of operating conditions.
2.4.2 Phase transition threshold
The average degree of vertex vi in Gρ(Φ) exhibits a phase transition as a function
of the correlation threshold ρ (see Fig. 2.3). For a given n there is a critical threshold ρc such that as ρ ↓ ρc the average degree of vertex vi in the graph Gρ(Φ) is small
and increases very slowly. As ρ continues to decrease to values below ρc, the average
degree of vertex vi increases rapidly. The rapidity of the phase transition depends on
the value of n. For large values of n the phase transition is more evident. We define the critical threshold ρcto be the point where dE[di]/dρ = −(p − 1). An approximate
value for the critical threshold can be obtained using the approximation (2.11):
ρc≈
√
where cn = (2J (fUi,U∗−i))
−2/(n−4). The value of ρ
c depends on p only through the
quantity J (fUi,U∗−i)). Therefore, in the cases where the approximation (2.32) is valid
the value of the critical threshold does not depend on p nor does it depend on the distribution of the data.
Generally the value of the initial threshold ρ∗ will be application dependent, re- flecting the minimal correlation that is scientifically significant. In cases where a minimal threshold is not specified by the experimenter, the critical phase transition threshold ρccan be used as ρ∗. This ensures that the full range of statistically signif-
icant hub correlations is covered in the local hub screening process.
Figure 2.3: Average vertex degree as a function of correlation threshold ρ. The aver- age is obtained for a specific vertex by performing 104 experiments. The
plots correspond to n = 2000, 1000, 500, 200, 100, 50 from left to right, respectively. The samples are draws of p = 1000 i.i.d. standard normal random variables. As we see there is a phase transition in the mean ver- tex degree as a function of ρ. The phase transition becomes sharper as n grows. The critical phase transition threshold ρc obtained from (2.35) is
shown on the plots using black stars. The values for the critical threshold can be found in Table 2.1
n 2000 1000 500 200 100 50 ρc 0.0263 0.0373 0.0528 0.0840 0.1197 0.1723
Table 2.1: The value of critical threshold ρcobtained from formula (2.35) for different
values of n. The predicted ρcapproximates the phase transition thresholds
in Fig. 2.3.
2.4.3 Application to Connectomics
We illustrate the proposed procedure on a fMRI dataset to assign p-values to dif- ferent seeds in human brain connectome for being a hub. Studies show that detection of hubs plays a key role in the field of connectomics and can provide insights into the structure of human brain. (Bullmore and Sporns, 2009; He and Evans, 2010).
In this experiment, the dataset consists of 30 human subjects from which 17 are diagnosed with attention deficit hyperactivity disorder (ADHD). For each subject a number of n samples (which varies between 78 to 340 for different subjects), are used to construct the sample correlation matrix between the resting state blood-oxygen- level dependent (BOLD) signals of p = 1166 seeds in the brain.
We applied the procedure described in Sec. 2.4.1 to assign p-values to vertices of the correlation graphs constructed by thresholding the correlation matrices corre- sponding to each subject. Figure 2.4 shows the waterfall plots of p-values correspond- ing to the 30 different subjects. For a fixed δ, the waterfall plot corresponding to each subject is obtained by linearly interpolating the pairs {(ρδ(i), log log(1 − pvδ(i))−1)}pi=1
which are ordered based on the absolute values of their first components (i.e., the quantities |ρδ(i)|). The initial threshold is chosen to be ρ∗ = 0.86 which is well be-
yond the critical thresholds for different subjects. Note that since the number of samples n is different for each subject, the statistical significance obtained by (2.34) using a specific value of ρδ(i) is different for each subject. For this reason the waterfall
plots for different subjects do not intersect. The results are shown for δ = 1, 2, 3, 4. We can see that as δ becomes larger there are less discoveries since more seeds fails
to pass the degree threshold. Also, despite the fact that there are fewer healthy sub- jects (13 out of 30), the healthy subjects tend to be more persistent in appearing in waterfall plots for larger values of δ.
0.85 0.9 0.95 1 −180 −160 −140 −120 −100 −80 −60 −40 −20 loglog(1−pv(i)) −1 ρ Healthy ADHD 0.85 0.9 0.95 1 −140 −120 −100 −80 −60 −40 −20 loglog(1−pv(i)) −1 ρ Healthy ADHD 0.85 0.9 0.95 1 −120 −100 −80 −60 −40 −20 loglog(1−pv(i)) −1 ρ Healthy ADHD 0.85 0.9 0.95 1 −120 −100 −80 −60 −40 −20 loglog(1−pv(i)) −1 ρ Healthy ADHD
Figure 2.4: Waterfall plots of p-values for a fMRI dataset plotted in terms of log log(1 − pvδ(i))−1. The seeds plotted correspond to vertices with degree
at least δ in the correlation graph with initial threshold ρ∗ = 0.86. Upper left, upper right, lower left, and lower right plots correspond to δ = 1, 2, 3, and 4, respectively.