Data Analysis: The Human Connectome Project

The Human Connectome Project is a multi-institutional venture aimed at mapping functional connections between parts of the human brain. The project has collected vast amounts of brain scan data, all of which is publicly available to researchers online at www.humanconnectome.org.1 In this analysis, we made use of a dataset from the “500 Subjects MR” data release, which con- sists of functional magnetic resonance imaging (fMRI) brain scans for 542 healthy adult subjects. Participants performed a variety of tasks during the MR scan, designed to isolate certain types of brain functionality. Measurements of brain activation were taken at frequent time steps over the course of the tasks (316 steps for language tasks; 284 for motor tasks) at locations corresponding to∼30,000 voxels (the brain’s white matter interior) and ∼60,000 greyordinates (the grey matter brain surface). We applied Differential Correlation Mining to data from a single subject.2 Our analysis compared two task categories:

Language-based tasks: During the scan, subjects were told brief stories and asked to answer questions after each one about what they were told.

Motor-based tasks: Subjects were attached to motion sensors at the hands, feet, and tongue. They were then asked to move one appendage at a time, in blocks of repetitions.

1_Data _was _available _in _{pre-processed} _form; _see _{http://www.humanconnectome.org/about/project/}

MR-preprocessing.htmlfor further detail.

Differential Correlation Mining was applied the data for 91,282 brain locations to find DC cliques of voxels and greyordinates that exhibit more correlation over time during language tasks than during motor tasks, as measured by sample correlation across measurements at time steps. On a home computer, this process took under a minute to find the first DC clique, running in Matlab. Continuing to completion took approximately an hour. No additional methods were applied, as the dataset was too large to be computationally feasible for any of the approaches suggested in Section 3.6. The DCM algorithm discovered 10 total empirical DC cliques, summarized in Table 3.4.

Table 3.4: Summary of DC cliques found in Human Connectome Data Label Size Mean Corr, Lang Tasks Mean Corr, Motor Tasks

1 1688 0.2000 0.1000 2 137 0.2044 0.0506 3 407 0.1856 0.0143 4 111 0.2497 0.0359 5 377 0.1658 0.0097 6 82 0.3253 0.0639 7 266 0.1649 0.0121 8 259 0.1482 0.0098 9 198 0.1732 0.0116 10 20 0.2981 0.1019

The first empirical DC clique selected by Differential Correlation Mining is very large, contain- ing 1688 nodes located on the cortical surface. These nodes, or “greyordinates”, are visualized as points on the smoothed exterior of the brain in Figure 3.9. The clear locational pattern in the nodes - despite the fact that the analysis did not take location into account - is striking. Additionally, the empirical DC clique in Figure 3.9 includes a concentrated group in the rear of the left cortex. This general brain region is known to be specifically associated with language processing and auditory input (Wernicke’s Area, see Wang et al. (2015)).

We also studied two other artifacts of the data for comparison, displayed in Figure 3.10. First, we identified the 1000 nodes exhibiting the strongest differential first-order behavior. These show higher mean activation during the language tasks than during the motor tasks, as measured by standard two-sample t-tests. We saw a clear grouping of nodes in the right frontal lobe. This pattern is unsurprising and appears in many studies of brain functionality that examine differential activation for language processing (Voets et al., 2006). This basic first-order analysis suggests that differential correlation is not redundant. None of the empirical DC cliques selected by Differential

Figure 3.9: Brain locations of DC clique for languages tasks versus motor tasks.

(a)High differential mean activation. (Right cortex, exterior view.)

(b) High correlation during language tasks. (Left cortex, interior view.)

Figure 3.10: Brain locations showing high first-order differences and high non-differential correlation.

Correlation Mining show high frontal lobe concentration; instead, they exhibit “trail-like” patterns such as the ones shown in Figure 3.9.

Second, we identified 1000 nodes found to be highly correlated over time for the language task data, irrespective of their behavior in the motor task data. These nodes were observed to be very tightly grouped in the interior left hemisphere. This is likely due to the nature of data measurement: fMRI brain scans measure oxygen flow in the brain, so measurements for adjacent regions tend to “blur” and show high artificial correlation (Derado et al., 2010). In this case, the same node set is also highly correlated during motor tasks, suggesting that it is likely a byproduct of data collection. Even if this node set does represent a meaningful result - regions, perhaps, that are universally correlated regardless of task - it is not differential.

This example illustrates the advantage of taking a differential approach like Differential Cor- relation Mining. Effects due to fMRI-driven spatial correlation or strong universal correlation can drown out signal that is truly specific to a particular sample condition. By comparing language tasks to the similar but distinct condition of motor tasks, we are able to isolate signals that are unique to language processing. The fact that the identified DC cliques show emergent locational patterns suggests that Differential Correlation Mining is capturing a true facet of the data rather than arbitrary correlation. Since this output is unique in form, while maintaining some consistency with known brain functionality, we believe it merits further scientific investigation.

In document Bodwin_unc_0153D_17074.pdf (Page 63-66)