Data mining in brain imaging

(1)

1 Introduction

Data mining in brain imaging is an emerging field of high importance for providing prognosis, treatment, and a deeper understanding of how the brain functions. The field ofdata miningaddresses the question of how best to use this data to discover new knowledge and improve the process of decision making. The discovery of associations between human brain structures and functions (i.e. human brain mapping) has been recognized as the main goal of the Human Brain Project,1 which is a high-priority project funded by several government initiatives. Mining problems can be grouped in three categories:2 identifying classifications, finding sequential patterns, and discovering associations. Although data mining is a powerful knowledge discovery technique, there are constraints in the way it can be applied: it is application-dependent, different applications usually require different mining techniques, and data must be of a certain size and format.3 In this paper we survey current mining methods, give a critical review of the main computational obstacles that lie behind our ability to perform automatic data mining on brain imaging and propose some solutions.

There are various problems in mining of brain images that need to be addressed. The first problem is that most fundamental mining algorithms (rule-based learning systems, neural networks, decision trees, Bayesian networks, logistic regressions, and so on), which have been used with great success in medicine, assume that data sets contain only simple numeric and symbolic entries. It is important, therefore, to

Address for correspondence: V Megalooikonomou, Department of Computer and Information Sciences, Temple University, 314 Wachman Hall, Philadelphia, PA 19122, USA. E-mail: [email protected]

ÓArnold 2000 0962-2802(00)SM221RA

Data mining in brain imaging

Vasileios Megalooikonomou, James Ford, Li Shen, Fillia MakedonDepartment of Computer Science, Dartmouth Experimental Visualization Laboratory, Dartmouth College,

Hanover, New Hampshire, USA andAndrew SaykinBrain Imaging Laboratory, Departments

of Psychiatry and Radiology, Dartmouth Medical School, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire, USA

Data mining in brain imaging is proving to be an effective methodology for disease prognosis and prevention. This, together with the rapid accumulation of massive heterogeneous data sets, motivates the need for efficient methods that filter, clarify, assess, correlate and cluster brain-related information. Here, we present data mining methods that have been or could be employed in the analysis of brain images. These methods address two types of brain imaging data: structural and functional. We introduce statistical methods that aid the discovery of interesting associations and patterns between brain images and other clinical data. We consider several applications of these methods, such as the analysis of task-activation, lesion-deficit, and structure morphological variability; the development of probabilistic atlases; and tumour analysis. We include examples of applications to real brain data. Several data mining issues, such as that of method validation or verification, are also discussed.

(2)

consider how to preprocess brain images (multidimensional arrays of data) so that we can transform them to data representations which are amenable to data mining techniques. A second problem is that, although there are algorithms for classifying images, there is a lack of effective algorithms for learning from images directly.4 Again, this implies the use of methods that transform images to a format conducive for learning algorithms. Most early medical analysis ignored the image or raw sensor portions of the medical record or summarized them in a very simplified form (e.g. ‘normal’ or ‘abnormal’). A third problem in mining brain images is the heterogeneity of the brain imaging data: different modalities, formats and resolutions prevent a common analysis and require integration. Integrating data from different studies often means integrating different formats, which, in turn, implies imposing several assumptions on the data representation. Many studies today, especially those using functional imaging, only focus on a specific clinical question and deal only with a small set of subjects, mainly due to the high cost of acquiring image data.

Other difficulties inherent in mining of associations in brain images are that: (1) due to inter-subject variation and noise, a large number of subjects have to be studied; (2) functions may correspond to more than one location and can relocate in the presence of structure abnormalities; (3) brain lesions and other abnormalities have a complex spatial distribution, typically covering multiple brain structures; and (4) normal brain function can be affected in varying ways due to the complexity of the functional organization of the human brain.

These obstacles aside, there have been recent technological advances that make available enormous amounts of data. Imaging studies of the human brain at active medical institutions today routinely accumulate more than 5 terabytes of clinical data per year. Data in this domain usually consist of three-dimensional (3-D) images from different medical imaging modalities that capture structural (e.g. MRI,y CT,z histologyx) and functional/physiological (e.g. PETjj , fMRI,{ SPECTyy) information about the human brain. On the other hand, there is now a wide availability of non-invasive methods for assessing macroscopic brain structure, particularly magnetic resonance (MR) techniques that complement clinical functional assessment.5 Also, there is continued development of improved functional imaging techniques and normalization methods. Greater computer capabilities are leading to the creation of large databases of structure/function information, the efficiency of which, depends on interoperable multimedia data representation that is easy to search. This trend is reflected in the work of Foxet al.,6–9 Evanset al.,10,11the QBISM database by Aryaet al.,12 the BRAID database by Letovsky et al.,13,14 the BrainMap database of neuro-imaging data,15 and other neuroimaging databases.16,17

The problem of multidimensional data (e.g. brain images), can be solved with newer mining methods which are applied directly to the images in order to capture most of their information content. As was mentioned, mining is heavily dependent on statistical methods for discovering associations and classifications among disparate

yMagnetic resonance imaging: shows soft-tissue structural information. zComputed tomography: shows hard-tissue structural information.

xHistology images are acquired by physically slicing and photographing tissue. jjPositron emission tomography: shows physiological activity.

{Functional-magnetic resonance imaging: shows physiological activity. yySingle photon emission computed tomography: shows physiological activity.

(3)

types of data. We exploit this fact and consider methods that combine the information from image and behavioural data about the brain and present methods for developing probabilistic brain atlases. The results of these methods are new representations of the information content of brain images and statistical maps.

The rest of this paper is organized as follows: in Section 2 we present the preprocessing phase of data mining in brain imaging, including the segmentation and spatial normalization of images. Although smoothing and reduction of noise can be considered to be part of preprocessing, here we include them as part of the mining methods in Section 3. In Section 3 we present mining methods that have been used in structural and functional imaging. These methods are useful for: (a) the efficient discovery of associations between structures and functions; (b) the classification of structural information, including both normal structures and abnormalities such as tumours; and (c) recently, the discovery of associations between gene expressions, morphology, and function. In Section 4 we present important issues in mining of brain images that are common to structural and functional brain data. We also consider the problem of verification of mining methods. This review concludes with a discussion in Section 5.

2 Data preprocessing

After image data is collected for each subject and before mining is performed, the data has to pass through a preprocessing phase. This phase identifies and normalizes the brain objects to be stored in a database and mined later. In anatomically related studies, after an anatomical (i.e. structural) image is collected for each subject, each lesion, area, or structure of interest is delineated (segmented) as a region of interest (ROI) on each slice automatically, semi-automatically, or manually. Extensive work has been done on automating image segmentation.18–21 The first segmentation methods were solely intensity based.22 However, since many structures were not distinguishable on the basis of signal intensity alone, prior spatial information was incorporated either in the form of intensity gradients,23 prior spatial probability distribution over signal intensity,18,23,24 or by registering the image to a segmented atlas25–27using a spatial probability map for each voxel.28

Using the slice data, each ROI is reconstructed in three dimensions. Then, in both functional and anatomical images, normalization or image registration has to be performed to make image data comparable across subjects without morphological and acquisition variability. This process maps homologous anatomical regions to the same location in a stereotaxic space, such as the Talairach anatomical atlas.29Several linear and nonlinear spatial transformations have been developed to bring the 3-D atlas and the subject’s 3-D image into register, i.e. spatial coincidence.27,30–32As an example, the effect of registration of an MR image to the Talairach atlas using a nonlinear method based on a 3-D elastically deformable model32 is presented in Figure 1.

In addition to 3-D image data, modern brain image databases usually contain a set of generic anatomic atlases of the human brain that model the exact shapes and positions of anatomical structures. A raw MR or fMR image does not identify the structure to which each voxel (3-D volume element) belongs, but an anatomical atlas

(4)

can supply this information (with the accuracy of the registration methods) when overlaid on the image (see Figure 1). A variety of brain structure maps have been derived, at several spatial scales, from 3-D tomographic images,33 anatomic speci-mens,29,34,35 and a variety of histologic preparations that reveal regional cyto-architecture36 and molecular content. Other brain maps have concentrated on function,37,38 or neuronal connectivity and circuitry.39,40 Here, for clarity of presentation, we concentrate on the widely used Talairach atlas.

3 Data mining

We first present methods for discovering associations between brain functions and structures. Traditionally, two approaches have been employed for functional brain mapping. The first approach seeks associations between lesioned structures and con-comitant neurological or neuropsychological deficits – for example, between trauma lesions and deficits like left visual field deficit. The second approach measures brain activation in subjects as they are asked to perform certain tasks. We present methods for both approaches. The problem of efficiently finding similar ROIs in brain image databases is also addressed. We present methods for extracting knowledge about the morphological variability of brain structures and about abnormalities such as tumours. Methods that can be potentially applied to both structural and functional imaging are also presented.

3.1 Functional imaging of the brain

Functional brain imaging uses technologies such as PET, SPECT or fMRI with specifically designed experiments to identify activated regions of the brain under different conditions. The cumulative nature of PET greatly limits its spatial and temporal resolution. fMRI has much better temporal resolution, and a spatial resolution on the order of 2–4 mm that is limited by the characteristics of the underlying vascular structure (Ogawa et al.41 present an in depth review of fMRI). Recently, diffusion tensor imaging (DTI) has emerged as a new application of MR technology to the problem of human brain mapping. DTI calculates 3-D diffusion

(a) (b) (c)

Figure 1 A slice of (a) original MR image, (b) atlas, and (c) atlas image overlaid on the deformed MR image. Picture from Megalooikonomouet al.97

(5)

tensor maps by measuring water proton mobility in diffusion weighted MR images.42 DTI can be used to indirectly image nerve fibre bundles, especially those in white matter areas that are connecting links between various (grey matter) brain areas and ‘invisible’ so far in other imaging.

The data from functional imaging scans are typically in the form of measurements at thousands of voxels. In PET, a voxel measurement reflects the amount of activity in a brain region. In fMRI, a voxel’s time series of measurements is meaningless by itself, but becomes useful when compared to another series of measurements at the same place under different conditions. This is possible after registration and motion cor-rection.43–46 Since different subjects may have different strategies for accomplishing the same task, it can be useful to be able to average all subjects in a multi-subject study to see the common areas of activation.45Averaging multiple subjects also increases the statistical power of the analysis, and is usually necessary for fMRI with its low signal-to-noise ratio. Smoothing is usually then performed by applying a spatial smoothing filter to reduce the effect of motion that was not completely removed, and other un-wanted noise.47 The typical choice is a low-pass Gaussian filter in the spatial domain, which smooths high frequency variation in the data. Several researchers have suggested that the use of Gaussian filters for denoising in the spatial domain introduces unwanted biases,48–53 and that the optimal filter width is a function of the size of activation foci,50,54which cannot generally be determineda priori except possibly by reference to underlying neuroanatomy. The low-pass filter approach will also blur and displace activated areas and remove the areas of least activation, reducing spatial resolution.

There have been suggestions for addressing these limitations of Gaussian filtering by focusing on signal restoration instead of noise removal, looking at multiple scales to overcome the problem of scale-specificity, or using approaches that do not impose a priori models on the data. The scale-space approach50,55,56 builds a multi-resolution representation of functional data. Voxels are organized in clusters (called ‘blobs’50,55) that may be of different shapes and sizes at different scales. Blobs themselves are hierarchically organized, with ‘blob trees’ dividing scales higher in the tree (lower resolutions) into smaller blobs at lower scales, down to individual voxels at scale 0. Each scale s> 0 corresponds to the convolution of the original n-D signal with a Gaussian kernel of width s.50The multifiltering approach57 is similar in that analysis is done on smoothed and unsmoothed data (although only one level of smoothing is used) for the purpose of picking up large regions of relatively weak activations. Other analysis techniques operate directly on the data, for example in the wavelet domain,58–60 and do not explicitly smooth the data. In this case, the fact that back-ground noise is distributed compared to spatially localized signals can aid in useful extraction of the signals. The complex-denoising noise reduction process can also increase signal-to-noise by thresholding discrete wavelet transform coefficients.61 Temporal smoothing is also often done in fMRI, and the signal series at each voxel is usually convolved with some approximation of the haemodynamic response function (HRF) – e.g. a Poisson function, the Gamma function, a linear combination of the Gamma function and its temporal derivatives, or the Gaussian function – that is chosen heuristically.62 This approach identifies activated voxels, although particular analysis methods can take many forms. There has been recent interest in improving HRFs.62,63

(6)

3.1.1 Statistical parametric mapping (SPM)

One of the most common analysis approaches currently in use, called statistical parametric mapping (SPM),64,65 analyses each voxel’s changes independently of the others and builds a map of statistic values for each voxel (see Figure 2). The significance of each voxel can be ascertained statistically with a Student’st-test, an F-test, a correlation coefficient, or any other univariate statistical parametric test43(more details about the use oft-test in SPM are in appendix A1). The result of thet-test is a t-value that, when indexed with the number of degrees of freedom, gives a probability value indicating how likely it is that this difference could occur by chance. The significance threshold (alpha value) is typically chosen to be 0.05 or less, indicating a 5% or smaller chance of a ‘false positive’ decision of significance. The t-value can also be expressed as a Z-score, indicating how many standard deviations the t-value represents on a Gaussian distribution.Z-scores and alpha values are the most popular means for reporting the significance of this difference. The t-test is mathematically equivalent to a one-way analysis of variance (ANOVA), and both can be expressed in the general linear model for regression analysis64

Y tð Þ ¼Xð Þ þt "ð Þt

at each voxel. The general linear model finds (and tests significance of) linear regressions between two data setsXand Yfor each member of X(including dummy members added to represent experimental conditions). Methods based on general linear models have more variables (voxels) than samples (e.g. the number of examinations) and thus are severely underconstrained.

By applying a uniform threshold of statistical significance it is possible to determine which voxels are likely to have had significant changes between conditions, and with what likelihood, assuming a null hypothesis of no changes between conditions. Data from individuals can be combined into groups, in which case statistical inferences can be drawn either about the significance of differences between subjects, or the likelihood of consistent activation in the population from which the subjects were drawn.66 Some early studies have examined the possibility of doing post-analysis on statistical maps, allowing comparisons of activations between independently analysed

Figure 2 Statistical parametric map. These displays of a statistical parametric map (SPM), produced using the SPM99 software package, show two views (one overlaid on a surface map, the other an axial slice) of 3D brain activations for a simple motor task (courtesy Brain Imaging Lab, Dartmouth-Hitchcock Medical Center).

(7)

subjects.55,67One problem here is that when using voxel data, comparison of different scans may not align activations that are slightly misregistered. Recently, the spatial extent, i.e. the area of activation, has been used to detect significant regions of activation. If a spatial extent of activation is reported, however, it is sometimes only given as a voxel count above a threshold, which has been found to be a very unstable indicator of activation across trials.68

Another issue of concern in functional imaging is the consistency of observations across subjects. It is common for some activations during a study to be similar across subjects (and thus significantly correlated with the task under study), and for others to be idiosyncratic. There has been some report in the literature about the difficulty of reproducing voxel level significance maps in fMRI.68,69In the case of Tegeler,69 even the 2% most significant voxels were found to vary considerably across runs, subjects, and analysis techniques. Variability in activation maps is a concern; however, reproducibility of fMRI activations at a regional level has been found to be good in general across sites, subjects, and techniques,70 and comparisons of signal changes rather than significance values may address problems in voxel-level comparisons.68

A third issue in functional imaging is the choice of analysis methods for generating activation data. Several reports have substantiated the difference in activations observed using different analysis techniques.69,71,72These may be a concern in a large database system. However, if results are tagged according to the analysis technique and parameters used to generate them, multiple analysis methods may actually be beneficial in supporting a ‘pluralistic’ strategy for analysis.72

A fourth issue in functional imaging is the necessity of limiting assumptions in analysis. For example, the SPM model assumes a uniform distribution of noise with covariance between voxels estimated by a Gaussian distribution that decays with increasing distance, and unfortunately not all data conforms well to this noise model (noise from unmodelled biological variability such as venous activity may not, for example). It is difficult to assess the extent that conditions deviate from the model in practice,52 but tests with synthetic data have demonstrated large biases in the false positive rate73as a result of low frequency physiological fluctuations and a variation of signal-to-noise ratio with imaging rate. Techniques for correcting long-term changes in mean signal intensity can improve the sensitivity of statistical analysis in this particular case.74

3.1.2 Other methods

An analysis method similar in concept to statistical parametric mapping is correlation of observed signal changes with experimental blocks.75–79 This method can be applied at the voxel level, with a resulting activation map much like an SPM (but not necessarily statistical in nature). Other variations analyse groups of voxels with an overall mean change in signal,80 incorporate prior information81 or wavelet analysis82 into an SPM-type framework, use expectation maximization to estimate labelling parameters in a Markov random field model,83 or combine tests for significance with detection of high intensity signals.84

Another approach used in the analysis of brain activations is to analyse relations between voxels, by using, for example, a six-dimensional correlation map correlating every voxel with each other voxel in order to detect correlated changes85 or by using

(8)

structural equation modelling to relate functional neuroimaging signals to underlying neurobiological activities.86 Similarly, one can use an empirical assessment of the relation between data in random pairs of conditions in place of statistical tests of significance87 although this is computationally intensive. Other approaches are based on the generation of cross-correlation image maps,88 Fourier-analysis-based time-series regression models,89 principal component analysis,90 partial least squares,76 independent component analysis,91 or structural equations to model functional connectivity among ROIs.92Spatial-lattice models are often applied to image analysis; these techniques are often based on Markov random fields, with inference techniques based on various modifications of likelihood-maximization procedures.93

Another alternative is to divide brain images into meshes, treat functions as classes and meshes as attributes, and find rules like ‘A and B ) positive’, which means if mesh A and mesh B are active, then some function is positive/on:94 thus the problem can be reduced to supervised inductive learning. The usual inductive learning algorithms such as C4.595do not work for this problem, however, because (1) there are strong correlations between attributes and (2) there are usually too many attributes (say 100100 = 10 000) and too few samples (say 100). However, nonparametric regression can be applied to solve these problems. One algorithm for the discovery of rules from brain images consists of the following two steps: (1) a nonparametric regression96is applied on the training data set. The results are linear formulas of the formy¼p1x1þ. . .þpnxnþpnþ1, whereyis a dependent variable (i.e. function), and

x1;. . .;xn are Boolean independent variables (i.e. grids). (2) Rules are extracted from the linear formulay(yis normalized to [0,1]) by converting it to a Boolean functiony0 using an approximation: if y0:5, then y0_¼_{1; otherwise,} _y0_¼_{0. Since the naive}

approach always runs exponentially to complete the second step and becomes un-realistic in practice, a better algorithm is to generate terms from low order to high order while applying a pruning strategy. Experiments on artificial data showed:94 (1) when there are no correlations between adjacent attributes, the accuracies are almost the same as the accuracies of C4.5, and (2) when there are strong correlations between adjacent attributes, the algorithm works better than C4.5 in terms of the accuracy of the result.

3.2 Structural imaging of the brain

3.2.1 Lesion-deficit analysis

In principle, lesion-deficit data could be analysed using methods similar to those for activation studies (e.g. SPM). We now present several additional statistical methods to determine structure–function relations through the study of lesions and associated deficits. After the preprocessing, i.e. the segmentation of lesions and registration of the binary images to a common standard, the binary images consist of ‘normal’ and ‘abnormal’ (lesioned) voxels. This type of structural image data, combined with the behavioural variables, form the data for each subject. Mining methods for the discovery of the structure–function associations from this data can operate on a resolution range from the spatially distinct structures of an anatomical atlas (atlas-based analysis) to the voxel level (voxel-(atlas-based analysis).97

(9)

3.2.1.1 Atlas-based analysis. In the case where anatomical structures represent functional units, the atlas-based analysis is more sensitive than voxel-based analysis since the atlas provides significant prior knowledge. The first step in the atlas-based analysis is to calculate for each structure si and subject pj, the fraction of lesioned volume, fsi;pj, which is defined as the volume of the lesioned part of si divided by the volume of si. These fractions form the continuous structural variables. Here, we present methods for both categorical and continuous structural variables.13,14,97In the case of categorical variables, the lesioned fraction determines if a structure is lesioned (abnormal) or not. For example, a patient might be treated as having a lesion in a structure if the intersection of all his/her lesions with the structure is at least one voxel. To eliminate thresholding effects, the atlas structures can also be analysed as continuous variables, considering for each one the fraction that is lesioned.

When the search for the model that explains the data can be directed through specific hypotheses or prior knowledge, the situation is easier. Hypotheses can be formed after using explorative visualization or other methods, and can be tested using statistical analysis. An example where visualization helps to reduce the search space for a model is shown in Figure 3. If there is little preconception about the relationships between the variables, all the possibilities may have to be explored. This exploratory, or data mining, analysis is presented below. There are two analysis approaches one can follow: bivariate (pairwise) and multivariate.

Bivariate analysis. Let F be the number of functional and S be the number of structural variables respectively. In the case of categorical structural variables,FS

ADHD+

ADHD-

Tal-113 Tal-116 Tal-119 Tal-124

Figure 3 Visualization helps reducing the search space for the model that explains the data. Sum of lesions for the ADHD+ and the ADHDÿgroup of patients (four slices of the Talairach atlas are shown for each group). The right putamen and left thalamus are highlighted. Picture from Megalooikonomouet al.181

(10)

two-way contingency tables are constructed and for each the Fisher exact test98 is computed. The associations between structures and deficits are sorted in order of the p-values returned from the exact tests, and the ones with the lowest p-values are reported. For the same type of analysis with continuous structural variables one can use the Mann–Whitney test and logistic regression analysis (the Mann–Whitney stati-stic is appropriate because the distributions of the fractions of lesioned volumes are not Gaussian). In exploratory analysis of either categorical or continuous structural variables, computing a statistic for many pairwise tests leads to the multiple com-parison problem, i.e. the situation where a certain undesirably high number of the tests are expected to be positive by chance (see Section 4.1 for a more complete treatment of the multiple comparison problem).

Multivariate analysis. Multivariate analysis may find complex multivariate associa-tions not found by multiple uses of bivariate statistics. For example, consider a deficit that is associated with two structures and appears only when both of them are lesioned. Multivariate analysis is free of the multiple comparison problem, since it evaluates an entire model with one statistic. One multivariate extension of the chi-square test for categorical variables is log-linear analysis; logistic regression is another multivariate method that can be used to relate the log-odds of having a particular deficit to the fraction of lesioned structures. The stepwise logistic regression has also been used,97 where the algorithm for discovering the model that explains the interactions starts with no associations and a greedy approach is applied to add (or delete) associations based on their relative strength.

3.2.1.2 Voxel-based analysis. Atlas-based analysis results are only as good as the atlas that is being used. Instead of imposing any high level structure on the image data, one can analyse them on a voxel-by-voxel basis. Voxels are typically labelled as either normal or abnormal, and thus structural variables are in this case categorical. Given that the number of voxels that are considered is typically on the order of 107_{, a like}

number (i.e. 107_{) of Fisher exact tests have to be performed for each of the functional}

variables that are examined. This procedure can be seen as clustering the voxels by functional association.97The calculation of the contingency table and the Fisher exact test is computationally intensive, and the multiple-comparison problem is also severe due to the large number of tests that must be performed. However, in this case it can be attacked with clustering analysis since false positives will not tend to cluster.

Voxel-based regression analysis can also be used to determine whether voxels in a certain region are associated with a functional variable. One can construct a regression equation that relates lesions in a sphere of a given radius and centre to a deficit, and the ‘causal brain region’ in which lesions are most strongly associated with that deficit can then be identified.13,97Let lbe a lesion,o a sphere,vðrÞ the volume of a regionr, and iðr1, r2Þ the intersection of two regions r1 and r2. Then the identification of the

causal region is done by calculating the optimal centre and radius for the logistic regression equation:

(11)

where odds_d¼p_dðÞ=ð1ÿp_dðÞÞ, p_dðÞ is the probability of having a certain deficit d, fs¼vðiðl;oÞÞ=vðoÞ is the fraction of the sphere that is lesioned, a=(log odds of d)/ (lesioned fraction of sphere volume), and b is the prior log odds of deficitd.

Given the centreðx;y;zÞ and the radiusrof the sphere, one can find values for the parametersaandbsuch that the sum of squares of residuals is minimized. The goal is to optimize the sphere parameters ðx;y;z;rÞ to obtain the best fit of the data to a regression line. The solution is the sphere that best discriminates between lesions that are and are not associated with deficit d. This nonlinear optimization procedure is computationally intensive, and cannot describe multifocal functional associations.

3.2.1.3 Results from mining lesion-deficit associations. In this section we present results from the mining process in BRAID.13,14 The Brain Image Database includes images and clinical information from over 700 subjects from two different studies: the Cardiovascular Health Study (CHS)99 and the Frontal Lobe Injury in Children (FLIC) study.100 Visualization applied prior to the analysis procedure can help direct the analysis by choosing certain structures of the anatomical atlas to examine further using the statistical tests. Figure 3 shows the sum of lesions over all subjects that did and did not develop ADHD (Attention-Deficit Hyperactivity Disorder), i.e. ADHD+ and ADHDÿ, respectively. Based on these images and on previous research implicating a frontal lobe-basal ganglia-thalamic pathway, the right putamen and the left thalamus (highlighted in Figure 3)ywere chosen for further analysis using the Fisher exact test for categorical and the Mann–Whitney test for continuous structural variables. The p-values in Table 1 confirm a strong association between lesions in the two structures and development of ADHD.

Running an exploratory analysis on the CHS data set (300 subjects) using the chi-square test to evaluate two-way contingency tables for all pairwise combinations of atlas structures (90) and functional variables (14) returns a list (sorted byp-value) of structure–function associations.13,97 The five most significant associations are presented in Table 2. Highly significant lesion-deficit associations detected by BRAID, such as visual field deficit and lesions in contralateral orbital or cuneate gyrus, are also consistent with current clinical knowledge.101 The incorrect association between the left hippocampus and a right visual field deficit is due to registration error, since the hippocampus is next to the optic radiations that are very well known to be correlated with a visual field deficit.

Preliminary stepwise logistic regression analysis using continuous structural variables from the FLIC data set show similar results for the development of ADHD. This method identifies the left SupCerebellarA (which is lateral to the left putamen area) as a strong predictor. Results from a preliminary voxel-based analysis for the ADHD variable of the FLIC data set are presented in Figure 4(a). Each voxel represents the p-value for the association between the voxel being lesioned and the development of ADHD. A 3-D reconstruction is shown in Figure 4(b). These results

yDue to the compromised connections between the frontal lobe and these two structures, it is believed that the frontal lobe is not able to exert its normal oversight function to suppress impulsive urges and behaviours. A common behavioural pattern in patients with ADHD is impulsivity and lack of self-control.

(12)

are consistent with those of the atlas-based analysis. Figure 5 shows one representative slice (119) of the Talairach atlas for the voxel-based regression analysis for ADHD. These results are consistent with all the previous ones for ADHD.

(a)

(b)

.

p-value

Figure 4 Voxel-based analysis for development of ADHD. Six slices (a) of the Talairach atlas and a colour bar that shows the correspondence betweenp-values and colour values are shown. (b) Visualization of the voxel-based analysisp-value volume for development of ADHD. The higher the intensity the lower thep-value. Picture from Megalooikonomouet al.97

(13)

3.2.2 Structure morphology analysis

Several methods have been applied in extracting knowledge about the morphology variability of brain structures. Study of the location, size, surface area, volume, and shape of specific brain regions is critical for discovering normal brain organization, for defining anatomically-driven search areas for brain activity in functional imaging (PET, fMRI) scans, and for investigating pathological changes in the case of diseases affecting these structures. Some of the same voxel-based analysis techniques described in relation to functional studies have been applied to anatomy as well; in general, voxel-based morphometry identifies changes in gray matter on a voxel-by-voxel basis.102–105,186This method is used to study the different composition of brain tissue after macroscopic shape differences are discounted using spatial normalization.

Another common approach is to use a warping (deformation) of an individual’s brain to an anatomical template (e.g. the Talairach atlas29) and gathers details about the warping that are used in the analysis.106,107A deformation functiondðu;vÞ, defined at each point ðu;vÞof the atlas structure,S; of interest, measures the enlargement or shrinkage associated with the transformation from an infinitesimal region around a point in the atlas space to its corresponding infinitesimal region in the subject space. In this method a comparison of two different brains or, more generally, two popu-lations is achieved by comparing the corresponding deformation fields: regions with statistically significant differences are regions of morphological differences between the two populations. Results from applying this methodology106 to a study of the corpus callosum for a small group of elderly subjects are shown in Figure 6. More details on the use of a deformation function in the analysis of morphological varia-bility can be found in Appendix A2.

Surface-based mesh modelling is a similar approach.108,109 After minimal regis-tration a parametric mesh is stretched over the surface contour of a structure or ROI Table 1 Visualization directed mining. Statistical analysis of selected Talairach atlas structures for association with ADHD (FLIC data set)97

Structure Fisher’s exactp-value Mann–Whitneyp-value

R putamen 0.065 0.033

L thalamus 0.095 0.093

Table 2Explorative analysis The five most significant structure-function associations given by the chi-square analysis on the CHS data set97

Structure Function Chi-squarep-value S-Bonf. Correct.p-value

R globus pallid. R hemiparesis 0.00001 0.0039 L hippocampus R visual defect 0.00001 0.0095 R gyri angular L pronat. drift 0.00002 0.0195 R gyri orbital L visual defect 0.00003 0.0225 R gyri cuneus L visual defect 0.00003 0.0224

(14)

(see Figure 7). It is then compared to an average parametric mesh that is formed by calculating the mean and variation between corresponding points on the mesh. Finally, displacement vectors are generated for each individual structure. A local profile of change in structures in certain conditions can be provided through colour-coded topographic maps (see Figure 8). This method first aligns each brain volume using distance scaling to control for head size differences, allowing for inter-individual and group comparisons. A strategy for creating a population-based brain atlas using

(a) (b) (c)

Figure 5 The optimal regression sphere (c) that best discriminates the two groups, i.e. between lesions that are (a) and are not (b) associated with the development of ADHD. Picture from Megalooikonomouet al.97

(a)

(b) (c)

Figure 6 Morphological variability of the corpus callosum between women and men for a group of elderly subjects. The posterior part (in white) was found to be significantly larger in women than in men (a). The average shape of the corpus callosum for (b) men and (c) women in a study by Davatzikoset al.106_Pictures

(15)

volumetric warps is shown in Figure 9. The application of these methods in several studies has already revealed differences in the shape and size of certain structures related to gender (e.g. corpus callosum106,110), in disorders such as schizophre-nia,107,111,112 in normal aging, and in Alzheimer’s disease.113 Probabilistic atlas approaches have been used for studying both normal and abnormal brains.114

Another approach attempts to identify and registerlandmark configurations(defined as point sets that correspond biologically across images).115 Image deformation algorithms designed to accomplish these goals are useful for identifying and measur-ing variations in structure, although they are not designed for tasks like findmeasur-ing tumours or activations. The Procrustes distance is one of the core tools of image deformation algorithms, and is calculated for two landmark configurations with the same landmarks by minimizing the sum of distances between corresponding landmark points while rotating around the normalized centroid of each. Finally, point-wise t-tests, ANOVAs, and partial correlations116 as well as eigenvector and related analysis115,117–119 have been used in computational neuroanatomy to study group differences in morphology and its associations with cognitive variables.

Other related work is the study of human anatomy,120,121 which presents the most difficult challenges to the understanding of typicality and variablity. While biological shapes are highly structured, they are not rigid. Miller’s group have been using Grenander’s deformable anatomical templates for the representation of typicality and variability. For this, complex anatomical templates (human and macaque brains) are annotated with coordinate systems defined within them. High-dimensional vector fields applied to these coordinate systems carry the templates with all of its geometry into the target. This allows for understanding modulo individual variation.

3.2.2.1 Morphological analysis of tree-like structures. Another tool for the analysis of brain structure and function is through the morphological characterization of neurological brain structures. Tree-like structures, such as nerve-fibre tracing in Figure 7 Extracting meshes (a) to create a cortical surface database, to search for differences where the deformation is regarded as an observation from a random vector field. Variability is calculated based on 3D displacement maps, which locally encode the amount of deformation required (b) to drive each subject’s gyral pattern into exact correspondence with the average cortex for the group. Pictures from Thompsonet al.184

(16)

(a) (b)

Figure 8 Three-dimensional visualizations of structural variability, asymmetry and group-specific differences. (a) Anatomical variability of the cerebral cortex in male schizophrenia patients and controls. Variability is shown on an average surface representation of the cortex derived from schizophrenia (left) and normal control (right) populations. Individual variations in brain structure in frontal association areas are greater in schizophrenia. Variability is calculated based on 3D displacement maps, which locally encode the amount of deformation required to drive each subject’s gyral pattern into exact correspondence with the average cortex for the group. Picture from Narr et al.182 _{(b) Ventricle variability maps for Alzheimer’s disease. Pictures from Thompson}

et al.183

Figure 9 Creating a population-based brain atlas to quantify local structural variations. A family of high-dimensional volumetric warps relating a new 3D MRI scan to each normal scan in a brain image database is calculated (I–II, above). The resulting warps encode the distribution in stereotaxic space of anatomic points that correspond across a normal population (III), and their dispersion is used to determine the likelihood (IV) of local regions of the new subject’s anatomy being in their actual configuration. Colour-coded topographic maps highlight regional patterns of deformity in the anatomy of the new subject. Abnormal structural patterns are quantified locally, and mapped in three dimensions. Pictures from Thompsonet al.185

(17)

DTI MR angiography or confocal microscopy, are registered (after segmentation and skeletonization) with standard structural and functional volumes. In addition, morphological analysis of these structures using various path analyses tools is per-formed. Morphological descriptors such as Sholl analysis,122 moment analysis,123–125 and fractal dimension analysis are used to support content-based retrieval operations in 3D cell-centred neuronal databases.126,127 Recently, visual data mining techniques combined with computational neural modelling have developed a very effective means to detect morphological influences on neuronal function.128

3.2.2.2 Brain tumour analysis and classification. Classification is an important problem in data mining. Classifiers are useful for building taxonomies of images and sub-sequently performing image context based searches.129 Methods for finding similar tumour shapes in structural images130 can also be used for brain tumours. Kornet al. use concepts from mathematical morphology, namely the ‘pattern spectrum’ of a shape, to map each shape to a point in n-dimensional space. Starting from a natural similarity function (the ‘maximum morphological distance’), they first prove a lower bound for it and then demonstrate how to search efficiently for nearest neighbours in large collections of tumour-like shapes using R-trees131 and the ‘Feature index’ (F-index) approach.132 The technique was applied to realistic tumour shapes generated using an established tumour-growth model133 and the results were very encouraging (see Figure 10). Fractal features and texture analysis have also been used for the quantitative description and recognition of brain tumours in 3-D MR images.129

3.3 Combined structural and functional imaging of the brain

Structural and functional imaging are often combined. It is common to restrict activation studies to a certain area of interest that corresponds to an anatomical structure. Here, we present a new area of research where both structural and functional images have to be mined together, and methods that can be potentially applied to both.

3.3.1 Gene expression, morphology and function

Discovering patterns of gene expression and their complex interaction with brain morphology and function is a fundamental goal in recent molecular biology and neurobiology studies. In situ hybridization and MRI have provided very high resolution images of gene expressions in animal models. In addition, gene expression brain atlases for the mouse and the rat have started to appear.134–136 After the registration of anatomic and gene expression images across modalities and subjects through more involved methods137–141 than those presented in Section 2, spatial statistics methods have to be applied to find associations between anatomic, genetic, and nonimage variables such as behavioural measures, response to drugs, or onset of disease. The main challenge in finding associations among patterns of gene expressions and phenotype is the synthesis of temporal information, spatial information, and static data. Similar work has been done in the analysis of functional images (as described earlier) where changes in signal intensity occur in response to

(18)

processing different kinds of stimuli. However, considering that multiple genes can be expressed in the same brain location, and that the time sequence of gene expression may also be important, makes the problem even more challenging.

3.3.2 Bayesian networks

Multivariate analysis methods like log-linear regression and logistic regression provide relatively simple methods for generating candidate models, usually relying on modifications of greedy search and making assumptions about cell frequencies or total number of samples that may not hold for rare cases. A more promising approach generates models called Bayesian networks142 that consist of graphical structures along with statistical independence models. This method scores each model M, and returns the most probable model that could have generated the dataDat hand (i.e. the multivariate multinomial distribution that generatedD).143,144

Briefly, a Bayesian network is a directed acyclic graph in which nodes represent variables of interest, such as structures or functions, and edges represent associations among these variables. Each node has a conditional-probability table that quantifies the strength of the associations between that node and its parents. Given the prior probabilities for the root nodes and conditional probabilities for other nodes, we can derive all joint probabilities145 over these variables. An approach for generating a Bayesian network from data is described in Appendix A3.

Recently the Minimum Description Length (MDL) principle has been applied to Bayesian network learning.146,147 The principle states that the best model of a collection of data is the one that minimizes the sum of the encoding lengths of the data and the model itself.148 The MDL metric is defined to measure the total description length DLof a network structure G, which is the sum of description lengths of each node.147,149The description length of each node is defined from two components, the network description length and the data description length. The first is the description length for encoding the network structure, which measures the simplicity of the network. The second is the description length for encoding the data, which measures the accuracy of the network.

Figure 10 Query tumour images (left column) and their nearest neighbours, with respect to morphological distance. Picture from Kornet al.130

(19)

3.3.3 Behavioural imaging

Another approach for modelling structure–function relationships is to transform neuropsychological test scores that assess cognitive functions to a 3-D spatial representation of the predicted sites of regional dysfunction. Gur et al.150 presented such an algorithm for display and analysis of neuropsychological test scores that produces regional values from standardized (z-transformed) neuropsychological test scores using the formula:

Bj¼

X

W ið Þ;j Si

=XW ið Þ;j

where Bj is the index of behavioural functioning for a given region, Wði;jÞ is the weight assigned to thejth brain region for theith behavioural score, andSi is the test score. The method was demonstrated on a sample of hemi-Parkinson patients151 and later used to examine the sensitivity of cognitive test scores to lesions in specific ROIs, inter-expert agreement, and intra-expert reliability.152 The method can be used to relate cognitive test scores to the results of structural and functional imaging, and has great potential for integrative data mining. Turkheimer et al.187 also quantitatively examined the relationship between neuropsychological test scores and lesion locations on structural neuroimaging.

4 Important issues in mining of brain images 4.1 The multiple comparisons problem

Using an exploratory analysis and computing a statistic for many tests (as in the case of pairwise test) leads to the multiple comparisons problem, i.e. the situation where a certain undesirably high number of the tests are expected to be positive by chance. A standard Bonferroni correction98,153suggests that one divide the significance threshold by the number of independent tests performed. This typically overestimates the number of independent tests performed, since test results are often correlated for neighbouring structures (activations or lesions often extend over neighbouring structures), and leads to loss of sensitivity. A heuristic modification of the Bonferroni correction, the sequential Bonferroni correction,98 can be used to get less pessimistic results. To do this, one sequentially increases the value of the significance threshold as hypotheses are evaluated. In task-activation studies, increasing the threshold for statistical significance increases the number of false positive activations that are detected.

The Bonferroni correction only applies in the case where a null hypothesis is to be rejected (i.e. any lesions or activations at all are treated as unexpected). The correction is not necessary in the case where a hypothesized region of lesion/activation or a structure is chosen,154 since in this case the null hypothesis of no changes is necessarily relaxed. If the cross-correlation between adjoining voxels is considered, a higher threshold can be used more safely. Single voxels that have signal changes of low significance may be the result of noise, but several voxels together that have a correlated change have a higher likelihood of representing a true lesion or activation.155 One heuristic alternative to the Bonferroni correction is cluster filtering; in this method, clusters smaller than a certain size (number of voxels) are simply

(20)

discounted. Taking advantage of this kind of approach increases sensitivity, but it also adds risk of error at the cluster level (rather than only at the voxel level).84The overall error rate can still be controlled, so the net effect is to reduce errors.

4.2 Clustering voxels

Clustering is the process of finding, in a contiguous spatial region, voxels with similar significance in a voxel-based activation or lesion-deficit study. Clustering can be done after independent significance tests are performed by grouping adjacent significant voxels, or clusters can be calculated directly from the data by detecting correlated changes.85,156–158In the latter case, clustering can be viewed as a method for generating hypotheses that statistical testing can evaluate.159

Clustering has been used to differentiate functional activations from other activity in the brain using statistical methods160–165and neural networks.166,167Clustering has been claimed as a higher quality analysis tool than correlation analysis because of its ability to detect unanticipated difference in response, such as differing levels of activation168 or similarities in the time-course of fMRI signal changes and stimuli.163 The cluster filtering approach mentioned earlier depends on having an estimate of the likelihood of each cluster size, which depends on the noise distribution. Images of the noise distribution can be obtained by, for example, subtracting the results of one condition from a repetition of that condition. Using simulated images derived with the same spatial correlation as these images, one can estimate the probability of observing clusters above a given size and thus the probability of each cluster in the original data.164 Then, clusters below a desired probability threshold can be discarded as too uncertain.

4.3 Verification of mining and power considerations

In previous sections we discussed methods for finding associations between tasks and activations, or between lesions and deficits. However, the evaluation of the discovered knowledge for the structure–function analysis methods is not usually addressed. Several researchers have studied the correspondence of sample size to power for statistical tests such as the chi-square and Fisher exact tests of independence,169 and compared the relative power of different statistical tests of independence.170–173 In addition, simulations have studied the power of chi-square analysis in sample spaces of much higher dimensionality, as one would expect to find in many epidemiological studies.173–176However, no closed-form power analyses exist that can account for the simultaneous effects of image noise and registration error, in addition to the characteristics of the statistical methods being employed.

One can use a simulator177 to not only test the scalability of mining methods, but also evaluate different methods as a function of the number of samples needed, the strength and complexity of associations, the spatial distribution of ROIs, and the registration method used. A simulator can generate a large number of artificial subjects and construct a probabilistic model of lesion-deficit or task-activation associations. One can then model the error of a given registration method, apply it to the image data, perform mining, and compare the generated associations with those detected by the mining methods. The number of subjects required to recover the known associations reflects the statistical power of the particular combination of image-processing and statistical methods being evaluated.

(21)

4.3.1 Using a simulator

As a case study, we show results from the evaluation of the Fisher exact test for the detection of lesion-deficit associations.177 The results quantify the sensitivity and accuracy of the mining method as a function of the number of subjects in the sample, the strength and complexity of the associations, and the errors that arise due to imperfect registration.

Comparing the results of simulated analysis to known associations allows one to quantify the performance of a mining method. For this study, the simulation para-meters for the distributions were obtained from data collected as part of the Frontal Lobe Injury in Childhood (FLIC)100 lesion-deficit study. Simulated lesions were generated using distributions for the number, size, and location of lesions. Because misregistration introduces noise in the form of false-negative and false-positive associations, this source of error was modelled by assuming that it follows a 3-D nonstationary Gaussian distribution. Registration error was estimated by measuring the error on distinct anatomical landmarks on a number of subjects and then interpolating the error in the rest of the brain. The lesion-deficit-association model, with its conditional-probability tables and prior probabilities, describes the relation-ships between structures and functions. In the case where structure and function variables are categorical (normal vs abnormal), these associations can be modelled using Bayesian networks (BNs)145as covered in Section 3.3.2. To examine the effect of the strength of the lesion-deficit associations on the ability of the mining methods to detect them, Table 3 presents three cases corresponding to strong, moderate, and weak associations. Thus, a strong association between a structure si and a function fj is denoted by conditional probabilities p(fj = A| si = N) = 0, p(fj = A|si = A) = 1, p(fj = N| si = N) = 1 and p(fj = N|si = A) = 0, where A means abnormal and N normal. Moderate and weak associations were defined similarly. Nondeterministic disjunctive interactions between more than one structure and a function were modelled using a noisy-OR model.142

The prior probability of structure abnormality for each structure si, in each subject pj, was calculated fromfsi;pj: the fraction of the volume ofsithat overlapped with lesions for pj. The conditional probability p( si|fsi;pj) is expected to be a sigmoid function, although a step function with an appropriate threshold is used for simplicity. Each structure with at least 1% of its volume overlapping with lesions was labelled as abnormal for that subject. For each pair of simulated subject and structure, the prior-probability distribution was sampled and a binary vector for the structures was generated. By instantiating the states of all structure variables of the BN, the Table 3Three cases of BNs considered in a simulator177

Case Association Conditional probabilities for functions

1 Strong 0/1

2 Moderate 0.25/0.75

(22)

conditional probability for each function variable was determined by table lookup. This probability was then used to generate the binary vector for the function variables, and Fisher’s exact test of independence was applied to each structure-function pair. 4.3.2 Results from the evaluation of a mining system

In this section, we describe how a lesion deficit simulator can be used to determine the number of subjects needed to discover the simulated lesion-deficit associations represented by a Bayesian network, the strengths of associations, the number of associations, the degree of the network (i.e. the number of structures related to a particular function), and the prior probabilities for structural abnormalities. A Bayesian network with sufficient complexity was used177 to demonstrate the use of a simulator in reaching meaningful results regarding the performance of the Fisher exact test and the effects of misregistration. Since the performance of any method for detecting associations depends on the characteristics of the conditional-probability tables, three cases (see Table 3) were examined to study this effect. The prior probability of abnormality for each structure was set to 0.5 to allow testing the behaviour of the Fisher exact test for the optimal value of the prior probability. To generate the conditional-probability table for those function variables that were related to more than one structure, a noisy-OR model was used. The threshold 0.001 was used for the p-value, since this gives a good trade-off between the number of simulated associations and the number of false positives detected. Figure 11 demonstrates the dramatic effects of the different conditional-probability distributions on the power of lesion-deficit analysis. As expected, more samples are required to detect weaker associations.

The degree of the associations of the Bayesian network was found to have a much greater effect on the performance of the Fisher exact test than the total number of associations. This result implies that, for functions that are associated with many structures, identification of structure–function associations is difficult and requires a larger sample size. Figure 12(a) shows the performance of the Fisher exact test for three networks of 20, 40, and 80 edges and of the same degree (4) for the moderate case (i.e. case 2) of the conditional-probability tables. Figure 12(b) shows the effect of increasing the degree of the network (the number of structures affecting a particular function) while fixing the total number of edges using the moderate case of the conditional-probability tables.

Figure 11(b) demonstrates the performance of the Fisher exact test for the three cases of Bayesian network conditional probabilities (see Table 3) when the prior probability of a given structure being abnormal is obtained from the simulated data set. The number of edges that could actually be discovered is 55 (80%), since there were 14 edges from structures that did not intersect any lesions. Comparing this figure with Figure 11(a), in which uniform prior probabilities were used, more subjects are required to recover all associations when data-derived prior probabilities are used instead of uniform prior probabilities, as expected. Also as expected, the number of subjects needed is inversely proportional to the smallest prior probability. The detection of false-positive associations is due to the existence of associations among neighbouring structures due to lesions that intersect more than one structure. Additional false positives can be observed in cases where associations occur between

(23)

(a)

(b)

t

Figure 11 Evaluating a mining method. Performance of the Fisher exact test (p0.001) for (a) uniform (0.5) prior probabilities and (b) data-derived prior probabilities of structure abnormality, for the three strengths of lesion-deficit associations from Table 3 that correspond to strong (case 1), moderate (case 2) and weak (case 3) associations. The difference between the total number of associations detected and the number of true associations detected is the number of false-positive associations detected for each case. The horizontal line in (a) represents the total number of simulated edges (69) and in (b) represents the total number of simulated edges that can be detected (55). Graphs from Megalooikonomouet al.177

(24)

(a)

(b)

Figure 12 (a) Evaluating a mining method. Performance of the Fisher exact test (p0.001) for BNs with degree 4 with 20, 40 and 80 edges. (b) Performance of the Fisher exact test (p0.001) for BNs with 48 edges, and with degree 4, 6, and 8. Graphs from Megalooikonomouet al.177

(25)

behavioural variables. On average the specific registration method used reduces the number of associations discovered by 13% for the same number of subjects when compared with perfect registration.

5 Concluding remarks

In this review we have presented data mining methods that have been or could be used for knowledge discovery from brain images of different modalities along with other clinical data. We have focused on the problems of: (1) finding associations between structures and functions through task-activation and lesion-deficit studies, (2) studying the morphological variability of brain structures and finding associations with certain conditions, (3) classifying shapes of brain structures, including tree-like structures such as nerve fibres and abnormalities such as tumours, and searching for similarity, and (4) finding associations between gene expressions, morphology, and function. We have presented results of applying mining methods to epidemiological data that demonstrate detection of several clinically meaningful associations in different studies. These methods can lead to interesting conclusions about the funct-ional mapping of the human brain, the effect of lesions or other abnormalities in the development of neurological and neuropsychological deficits, and the effect of certain diseases and gene expressions on structural morphology and function.

Visualization can help reduce the inherently enormous search space in statistical analysis. Exploratory analysis through the use of a statistic for many tests produces reasonable results, although one has to deal with the multiple-comparison problem. Voxel-based approaches show encouraging results, but are computationally intensive and even more severely impacted by the multiple comparison (although the latter can be addressed with clustering analysis). Statistical simulations show that more advanced mining methods and large sample sizes are required to determine lesion-deficit associations accurately, with reduced number of false positive associations.

Simulators can be used for verifying and comparing mining methods in brain imaging. Their use is very important especially in determining the number of subjects needed to detect all associations while reducing false positives. In particular, in lesion-deficit analysis, simulators have shown that the number of subjects required to detect all and only those associations in the underlying model (i.e. the ground truth) may be in the thousands, even for strong associations, particularly if the spatial distribution of lesions does not extend to all structures. The more one descends from the 0.5 level for prior probabilities, the more difficult it becomes to discover associations. These results underline the necessity of developing large image databases for the purpose of meta-analysis of data pooled from multiple studies, so that more meaningful results can be obtained. The testing procedure framework is very important, since it can be used to characterize the power of methods for detecting multivariate associations while taking into account the effects of registration and noise. Simulators can also be used in the evaluation of new analysis methods, as well as in the study of the effect of different registration and segmentation algorithms.

Existing mining algorithms are limited in that they typically assume data will consist of individual numeric and symbolic features. We still lack effective algorithms

(26)

for learning from data that is represented as a combination of various types (i.e. multimedia data). Predictions based on the full medical record could potentially achieve much greater accuracy than those that are limited to one data type. In addition, prediction accuracy can be improved by inventing more appropriate features to describe the brain data. We need new methods that actively generate optimal experiments to collect the most informative data. Another obstacle is integrating data from different investigators and analysing them jointly. Brain imaging data are usually collected in a single database for a specific study and with a specific data mining task in mind, so an additional important issue is interoperability and the ability to learn from multiple databases.178 Also, the mining algorithms developed so far tend to be fully automated and therefore do not allow active experimentation, i.e. guidance from experts at key stages in the search for brain data regularities. Ideally, human experts should be able to collaborate closely with a mining algorithm to form hypotheses and test them against the data. In addition, mining methods need to be able to scale to extremely large data sets. Research during the past few years has already produced more efficient algorithms for such problems as learning association rules2and efficient visualization of large data sets.179 A closer integration of machine learning algorithms into database management systems is also needed.

Acknowledgements

The authors wish to thank Christos Davatzikos, Eddie Herskovits, Christos Faloutsos, Paul Thompson, David Isecke, Ling Cheng, and Tilmann Steinberg for providing pictures, comments, and other helpful information. This work was supported in part by the Ira DeCamp Foundation, NARSAD and New Hampshire Hospital. Support was also provided by the Dartmouth Experimental Visualization Laboratory (DEVLAB).

References

1 Koslow SH, Huerta MF eds.Neuroinformatics: an overview of the Human Brain Project. Mahway, NJ: Lawrence Erlbaum, 1997. 2 Agrawal R, Imielinski T, Swami A. Database

mining: a performance perspective.IEEE Transactions on Knowledge and Data Engineering

1993;5: 914–25.

3 Huerta MF, Koslow SH, Leshner AI. The Human Brain Project: an international resource.Trends in Neuroscience1993;16: 436– 38.

4 Mitchell TM. Machine learning and data mining.Communications of the ACM1999;42: 31–36.

5 Anderson S, Damasio H. Neuropsychological impairments associated with lesions caused by tumor or stroke.Archives in Neurology1990;47: 397–405.

6 Fox P, Mintum M, Reiman E, Raichle M. Enhanced detection of brain responses using intersubject averaging and

change-distribution analysis of subtracted PET images.Journal of Cerebral Blood Flow Metabolism1988;8: 642–53.

7 Fox P. Functional brain mapping with positron emission tomography.Seminars in Neurology1989;9: 323–9.

8 Fox P, Mintum M. Noninvasive functional brain mapping by change-distribution analysis of averaged PET images of H2150 tissue activity. Journal of Nuclear Medicine

1989;30: 141–49.

9 Fox P. Physiological ROI definition by image subtraction.Journal of Cerebral Blood Flow Metabolism1991;11: A79–82.

10 Evans A, Beil C, Marrett S, Thompson C, Hakim A. Anatomical–function correlation using an adjustable MRI-based region of interest atlas with positron emission tomography.Journal of Cerebral Blood Flow Metabolism1988;8: 513–30.

(27)

11 Evans A, Marrett S, Torrescorzo J, Ku S, Collins L. MRI–PET correlation in three dimensions using a volume-of-interest (VOI) atlas.Journal of Cerebral Blood Flow Metabolism

1991;11: A69–78.

12 Arya M, Cody W, Faloutsos C, Richardson J, Toga A. A 3D medical image database management system.International Journal of Computerized Medical Imaging and Graphics

1996;20: 269–84.

13 Letovsky SI, Whitehead SH, Paik CHet al. A brain image database for structure–function analysis.American Journal of Neuroradiology

1998;19: 1869–77.

14 Herskovits EH, Megalooikonomou V, Davatzikos C, Chen A, Bryan RN, Gerring JP. Is the spatial distribution of brain lesions associated with closed-head injury predictive of subsequent development of attention-deficit hyperactivity disorder?: Analysis with brain-image database.Radiology1999;213: 389–94. 15 Nielsen FA, Hansen LK. Modeling of

brainmap data. In:NIPS ‘99. Denver, Colorado; 1999.

16 Nowinski WL, Fang A, Nguyen BTet al. Multiple brain atlas database and atlas-based neuroimaging system.Computer Aided Surgery

1997;2: 42–66.

17 Levrier O, Poline J, Tzourio N, Mazoyer B, Salamon G. Individual functional

neuroanatomy using PET–MRI integration. In:31st Annual Meeting of the American Society of Neuroradiology. Vancouver, BC, 1993. 18 Rajapakse J, Giedd J, Rapoport J. Statistical

approach to segmentation of single-channel cerebral MR images.IEEE Transactions on Medical Imaging1997;16: 176–86.

19 Pal N, Pal S. A review on image segmentation techniques.Pattern Recognition1993;26: 1277– 94.

20 Zhang Y. A survey on evaluation methods for image segmentation.Pattern Recognition1996;

29: 1335–46.

21 Worth A, Makris N, Caviness V, Kennedy D. Neuroanatomical segmentation in MRI: Technological objectives.International Journal of Pattern Recognition and Artificial Intelligence

1997;11: 116–87.

22 Vannier M, Butterfield R, Rickman D, Jordan D, Murphy W, Biondetti P. Multispectral magnetic resonance image analysis.CRC Critical Reviews in Biomedical Engineering1987;

15: 117–44.

23 Held K, Korps E, Krause B, Wells W, Kikinis R, Muller-Gartner H. Markov random field segmentation of brain MR images.IEEE

Transactions on Medical Imaging1997;16: 878– 86.

24 Chang M, Sezan M, Tekalp A, Berg M. Bayesian segmentation of multislice brain magnetic resonance imaging using three-dimensional Gibbsian priors.Optical Engineering1996;35: 3206–21.

25 Collins D, Evans A. ANIMAL: validation and applications of nonlinear registration-based segmentation.International Journal of Pattern Recognition and Artificial Intelligence1997;11: 1271–94.

26 Gee J, Reivich M, Bajcsy R. Elastically deforming 3D atlas to match anatomical brain images.Journal of Computed Assisted

Tomography1993;17: 225–36.

27 Miller M, Christensen G, Amit Y, Grenander U. Mathematical textbook of deformable neuroanatomies.Proceedings of the National Academy of Sciences1993;90:11944–48. 28 Kamber M, Shingal R, Collins D, Francis G,

Evans A. Model-based 3-D segmentation of multiple sclerosis lesions in magnetic resonance brain images.IEEE Transactions on Medical Imaging1995;14: 442–53.

29 Talairach J, Tournoux P.Co-planar stereotaxic atlas of the human brain. Stuttgart: Thieme, 1988.

30 Bookstein F. Principal warps: thin-plate splines and the decomposition of

deformations.IEEE Transactions on Pattern Analysis and Machine Intelligence1989;11: 567– 85.

31 Collins D, Neelin P, Peters T, Evans A. Automatic 3D intersubject registration of MR volumetric data in standardized Talairach space.Journal of Computer Assisted Tomography

1994;18: 192–205.

32 Davatzikos C. Spatial transformation and registration of brain images using elastically deformable models.Computer Vision and Image Understanding1997;66: 207–22.

33 Damasio H.Human brain anatomy in

computerized images. Oxford: Oxford University Press, 1995.

34 Talairach J, Szikla G.Atlas d’anatomie stereotaxique du telencephale: etudes anatomo-radiologiques. Paris: Masson, 1967.

35 Ono M, Kubik S, Abernathey C.Atlas of the cerebral sulci. Stuttgart: Thieme, 1990. 36 Brodmann K. Vergleichende

Lokalisationslehre der Grosshirnrinde in ihren Principien dargestellt auf Grund des Zellenbaues, Barth, Leipzig. In:Some Papers on the Cerebral Cortex. Springfield, IL: Thomas, 1960: 201–30.