• No results found

Statistical Methods of Neuroimaging Data Analysis.

N/A
N/A
Protected

Academic year: 2020

Share "Statistical Methods of Neuroimaging Data Analysis."

Copied!
107
0
0

Loading.... (view fulltext now)

Full text

(1)

LI, ZHOU. Statistical Methods for Neuroimaging Data Analysis. (Under the direction of Lexin Li and Wenbin Lu.)

Larscale neuroimaing studies have been collecting neuroimaging, clinical and ge-netic data of study individuals. Depending on a specific objective, the use of neuroimaging data falls into two categories. It can be either treated as a predictor variable or a response variable within a regression framework to investigate the structure and function of the brain and their relationship to neurodegenerative disorders and individual genetic background. The first category requires new statistical models that take multidimensional neuroimaging data as predictors, which further involves variable selection and dimension reduction. The second category usually uses high-level measurements derived from the neuroimaging data. Existing methodologies may be applied to facilitate the study; however, the main goal under this category is to address a scientific question from a new perspective. In this dissertation, we study statistical methods for neuroimaging data analysis under both of these two categories.

In the first part (Chapter 2), we propose a sparse multi-response tensor regression method to model multiple outcomes jointly as well as to model multiple voxels of an image jointly. The proposed method is particularly useful to both infer clinical scores and thus disease diagnosis, and to identify brain subregions that are highly relevant to the disease outcomes. We conducted experiments on the Alzheimer’s Disease Neuroimaging Initiative dataset, and showed that the proposed method enhances the performance and clearly outperforms the competing solutions.

(2)

neuroimaging modality by extending the univariate tensor regression model. Based upon this metric, we address the interpretative question of the proportion of total variation of the clinical outcome that is accounted for by each individual modality, and we attain a system-atic comparison of contributions of different brain regions in terms of the contributions of different modalities. An analysis of a dataset from the Alzheimer’s Disease Neuroimaging Initiative using the proposed metric has yielding findings consistent to the Alzheimer’s disease literature.

(3)
(4)

Statistical Methods for Neuroimaging Data Analysis

by Zhou Li

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

Statistics

Raleigh, North Carolina

2016

APPROVED BY:

Arnab Maity Ana-Maria Staicu

Lexin Li

Co-chair of Advisory Committee

Wenbin Lu

(5)
(6)

BIOGRAPHY

(7)

I would like to extend my most sincere gratitude to my supervisors, Dr. Lexin Li for his immense help during my graduate study at NCSU. I really appreciate his insightful guid-ance, generous support and unending encouragement throughout every single step of my research.

I would also like to thank all my committee members, Dr. Wenbin Lu, Dr. Arnab Maity and Dr. Ana-Maria Staicu for sharing their valuable suggestions for this dissertation.

I am grateful to the Department of Statistics, NC State University for offering a great academic support for my graduate study. I would like to thank all the faculty members for providing a comprehensive collection of lectures. I would also like to thank all the staff members for their excellent service.

I also owe my gratitude to my mentor, David Wei from Community Care of North Carolina where I worked as a Graduate Industrial Traineeship; and Anukul Goel from Amazon and Chaitanya Chemudugunta from Blizzard Entertainment where I worked as summer interns.

(8)

TABLE OF CONTENTS

LIST OF TABLES . . . vii

LIST OF FIGURES. . . viii

Chapter 1 INTRODUCTION. . . 1

1.1 Neuroimaging as predictors . . . 2

1.2 Neuroimaging as response variables . . . 3

1.3 Thesis organization . . . 4

Chapter 2 Sparse multi-response tensor regression. . . 8

2.1 Introduction . . . 9

2.1.1 Motivation . . . 9

2.1.2 Literature review . . . 10

2.1.3 Outline . . . 11

2.2 Materials . . . 12

2.2.1 Subjects . . . 12

2.2.2 Preprocessing . . . 14

2.3 Method . . . 15

2.3.1 Model . . . 15

2.3.2 Penalized likelihood . . . 19

2.3.3 Estimation . . . 21

2.3.4 Alternative algorithm . . . 25

2.4 Results . . . 27

2.4.1 Signal recovery and prediciton . . . 27

2.4.2 Stability, convergence and computation time . . . 29

2.4.3 ADNI data analysis . . . 34

2.5 Discussion . . . 41

Chapter 3 Interpretable multi-modality analysis. . . 44

3.1 Introduction . . . 45

3.1.1 Motivation . . . 45

3.1.2 Literature review . . . 46

3.1.3 Outline . . . 47

3.2 Materials . . . 48

3.2.1 Subjects . . . 48

3.2.2 MRI/PET scanning and image processing . . . 48

3.3 Method . . . 49

3.3.1 Tensor Regression . . . 49

(9)

3.4.1 Region of Interest Analysis . . . 55

3.4.2 Within Region Analysis . . . 56

3.4.3 Between Region Analysis . . . 58

3.5 Discussion . . . 58

Chapter 4 Nonlinear Neuroanatomical Heritability . . . 63

4.1 Introduction . . . 64

4.1.1 Motivation . . . 64

4.1.2 Literature review . . . 65

4.1.3 Outline . . . 66

4.2 Materials . . . 67

4.2.1 Neuroimaging . . . 67

4.2.2 Genotyping . . . 67

4.3 Method . . . 68

4.3.1 Model . . . 68

4.3.2 Kernel Selection . . . 70

4.3.3 Estimation . . . 73

4.3.4 Computation Tricks . . . 74

4.4 Results . . . 78

4.4.1 Simulation . . . 78

4.4.2 Real data analysis . . . 81

4.5 Discussion . . . 82

(10)

LIST OF TABLES

Table 2.1 Demographic and clinical information of the subjects. SD=Standard Deviation. . . 14 Table 2.2 Root mean squared error of prediction under varying noise level (σ=

10, 15, 20), number of response variables (q =3, 9), and response pair-wise correlation (ρ=0, 0.9). Reported are the mean RMSE and stan-dard error (in parenthesis) based on 50 data replications. . . 30 Table 2.3 Root mean squared error of prediction under varying noise level (σ=

10, 15, 20), number of response variables (q =3, 9), and response pair-wise correlation (ρ=0, 0.9). Reported are the mean RMSE and stan-dard error (in parenthesis) based on 50 data replications. . . 32 Table 2.4 Estimation of the two clinical scores by various methods. Reported

are the average and standard error (in parenthesis) of the root mean square error and the Pearson correlation coefficient between the pre-dicted and the observed scores based on 10-fold cross-validation. . . 37 Table 2.5 AAL regions (colored cells) selected by the sparse multivariate tensor

regression model with varying tuning parameters, along with their support in the literature. 13 additional regions which are only selected whenλ=50 are not shown here in the interest of space. Abbreviations: AMYGD=Amygdala; HIPPO=Hippocampus; NL=Lenticular nu-cleus, putamen; IN=Insula; PARA_HIPPO=Parahippocampal gyrus; COB=Olfactory cortex; T1A=Temporal pole: superior temporal gyrus; T2=Middle temporal gyrus. . . 40 Table 3.1 Contributions inR2

a of MRI, PET and combined in the AAL ROIs. . . . 60 Table 4.1 Simulation results of estimated heritability ˆh2=σˆ2

g/(σˆ

2

g+σˆ

2

e)in model

Y =Xβ+g+ebased on 500 replications. TruegfollowsM V Nn(0,σ2gKLi n e a r) 79 Table 4.2 Simulation results of estimated heritability ˆh2=σˆ2

g/(σˆ

2

g+σˆ

2

e)in model

Y =Xβ+g+ebased on 500 replications. TruegfollowsM V Nn(0,σ2gKI B S) 80 Table 4.3 Simulation results of estimated heritability ˆh2=σˆ2

g/(σˆ

2

g+σˆ

2

e)in model

Y =Xβ+g+ebased on 500 replications. TruegfollowsM V Nn(0,σ2gKG a u s s i a n) 81 Table 4.4 Heritability estimated by the proposed method (Mixed) and the GCTA

(11)

Figure 2.1 A schematic overview of the proposed sparse multi-response tensor regression with multivariate cognitive assessments. . . 13 Figure 2.2 An illustration of the low-rank and sparse estimation of the coefficient

signal whenq =2,D =2,R =1 andp1=p2=4.◦denotes the outer

product. Dotted lines connect the elements corresponding to the same region but across different responses, which are encouraged to enter or drop from the model simultaneously. Different colors denote different strength of association. . . 21 Figure 2.3 Estimated coefficient images under varying noise level (σ=10, 15, 20)

and number of response variables (q =3, 9). The coefficient patterns are the same for all responses. . . 31 Figure 2.4 Estimated coefficient images under varying noise level (σ=10, 15, 20)

and number of response variables (q =3, 9). The coefficient patterns are different for different responses. One half adopts “cross", while the other half “triangle". . . 33 Figure 2.5 Algorithm stability with respect to the varying regularization

param-eterλ=0, 10, 100 and varying rankR=1, 2, 3. . . 35 Figure 2.6 Convergence behavior with 100 randomly generated starting values

and the corresponding run time. . . 36 Figure 2.7 Scatter plots of the predicted MMSE and ADAS-Cog versus the

ob-served scores. . . 38 Figure 2.8 Regions (red part) selected by the sparse multivariate tensor

regres-sion model that are relevant to the Alzheimer’s disease. The optimal tuning parameter based on cross-validation isλ=100. . . 39 Figure 3.1 The variance of the responseY can be divided into four parts, I, II, III,

and IV, with I+II+III+IV=1. Region II corresponds to the semi-partial R2

s p, i.e., the unique contribution, forX1, IV is the semi-partialR

2

s p, i.e., the unique contribution, forX2, III is the common contribution for bothX1andX2, and I is the part that can not be explained by neitherX1norX2. In addition, II+III, III+IV and II+III+IV are the

correspondingR2for a model containing onlyX

1, onlyX2, and both

(12)

Figure 4.1 Histogram of the off-diagonal elements of linear kernel, IBS kernel and Gaussian kernel. Upper-left is the linear kernel with[K(Z)]i i0= zT

(13)

1

INTRODUCTION

(14)

neuroimag-1.1. NEUROIMAGING AS PREDICTORS CHAPTER 1. INTRODUCTION

ing procedure using MRI technology that measures brain activity by detecting changes associated with blood flow. PET is a functional imaging technique that is used to observe metabolic processes in the body. DTI is a promising method for characterizing microstruc-tural changes or differences with neuropathology and treatment. It has been demonstrated that different neuroimaging technologies (modalities) provide complementary information of the brain[Fan07; Fan08b; Wal10; Dav11; Hin11; Zha11; Dai12; Wee12; Zha12; Liu14a].

1.1

Neuroimaging as predictors

One primary goal of neuroimaing analysis is to better understand associations between brains and clinical outcomes. Under such senarios, the neuroimaging modalites are used as predictors, or independent varialbes to predict outcomes from neurodegenerative disorders, for example, the Alzheimer’s disease (AD). AD, characterized by progressive impairment of cognitive and memory functions, is an irreversible neurodegenerative disorder and the leading form of dementia in elderly subjects. The number of affected subjects increases significantly every year, and is projected to be 1 in 85 by the year 2050[Bro07]. Amnestic mild cognitive impairment (MCI) is often a prodromal stage to AD, and individuals with MCI may convert to AD at an annual rate of as high as 15%[Pet99]. There has been a vast body of literatures studying AD and MCI using one or more neuroimaging modalities.

(15)

for some excellent reviews. Moreover, in addition to classifying a binary or categorical disease outcome given brain image scans, there were studies establishing associations between image activity patterns and a continuous clinical outcome. A variety of cognitive and memory scores have been used as the response, including the Mini-Mental State Examination (MMSE)[Duc05; Duc09; Sto10; Wan10], Boston Naming Testing[Wan10], Dementia Rating Scale, Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), and Auditory Verbal Learning Test[Sto10].

1.2

Neuroimaging as response variables

Neuroimaging can also be used to obtain volumetric measurements of the brain and brain sub-regions. Such measurements are useful to study associations between brain anatomy and genetic background. Under such senarios, the measurements derived from neuroimaging are used as response variables, or dependent variables. And the primary goal is to investigate whether and how much they are related to the genetic background of human. As a measure of this relationship, heritability is defined as the proportion of total phenotypic variance explained by the genetic factors[Vis08b]. Heritability is a key parameter to understand the genetic architecture of complex traits. It allows investigators to prioritize resources for future genetic studies[Sta12].

(16)

1.3. THESIS ORGANIZATION CHAPTER 1. INTRODUCTION

information, the genetic information can be then used as a protective or risk factor for timely therapy and possible delay of the disease.

1.3

Thesis organization

The rest of this dissertation is divided into two parts. The focus of the first part (Chapter 2) is to propose new statistical models to address certain constraints and improve the perfor-mance. The focus of the second part (Chapter 3 and 4) is to address scientific questions from a new perspective with the help of existing methodologies.

(17)

relevant features suggested directly by the data. Moreover, instead of conducting feature extraction and association modeling at two separate steps, our method simultaneously derives relevant features and builds their association with the outcomes. Last but not least, we develop a highly scalable computational algorithm that makes our method applicable to a range of massive imaging data. The proposed method is particularly useful to both infer clinical scores and thus disease diagnosis, and to identify brain subregions that are highly relevant to the disease outcomes. Our numerical analyses and experiments on the Alzheimer’s Disease Neuroimaging Initiative dataset have demonstrated that the proposed method enhances the performance and clearly outperforms the competing solutions.

In Chapter 3, we target a different aspect of multi-modality analysis, interpretation. To measure individual contributions of imaging modalities, we adopt a well established metric in regression analysis, the coefficient of determination, orR2. However, the use of

(18)

1.3. THESIS ORGANIZATION CHAPTER 1. INTRODUCTION

structure of the image covariate, but introduces a low rank approximation to the coefficient tensor. This approximation exploits the special structure of tensor and reduces substantially the dimensionality of the model, which in turn leads to efficient estimation and prediction. Compared to[Zho13a], our study builds upon their proposed tensor regression model, while we extend from a single modality analysis to multiple imaging modalities, and develop appropriate individual modality contribution metrics. An analysis of a dataset from the Alzheimer’s Disease Neuroimaging Initiative using the proposed metric has yielding findings consistent to the Alzheimer’s disease literature.

(19)
(20)

CHAPTER

2

SPARSE MULTI-RESPONSE TENSOR

(21)

2.1

Introduction

2.1.1

Motivation

More recently, there have emerged studies that associate image scans with multiple clinical outcomes[Zha12; Zhu14]. Our motivating data example consists of 194 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), among which 93 are AD patients and 101 healthy controls. For each subject, the collected data include a 32×32×32 MRI scan, after proper preprocessing and downsizing, and two clinical scores. One is the MMSE, which examines orientation to time and place, immediate and delayed recall of three words, attention and calculation, language and visuo-constructional functions[Fol75]. The other is ADAS-Cog, which is a global measure encompassing the core symptoms of AD

[Ros84]. The ADAS-Cog is usually more sensitive, but requires more than 30 minutes for

(22)

2.1. INTRODUCTION CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

2.1.2

Literature review

Whereas there is an enormous body of statistics literature on modeling multivariate predic-tors, there have been much fewer works on modeling multivariate responses. Assume each subject has a vector ofq responses andp predictors and denoteY ∈IRn×qthe response matrix andXIRn×pthe predictor matrix. The multi-response problem can be formulated in a regression framework,Y =XB+E, whereBIRp×qandEIRn×qare the coefficient matrix and error matrix respectively. Without further assumptions, the model is equivalent to regressing each of theq response variables on the predictors separately and does not take advantage of the correlated response variables.

Some popular multi-response solutions have been proposed to overcome the problem. One branch of methods is based on finding a small number of linearly transformed predic-tors that drive the variation in the multiple responses. The partial least squares (PLS)[Hel90; Hel92]aim to find the multidimensional direction in the predictor space that explains the maximum multidimensional variance direction in the response space. The canonical correlation[Hot36; ZH08]aims to find linear combinations of predictors and response variables that maximize the correlation between each other. The reduced-rank regressions (RRR)[Ize75; VR13; Yua07]make a restriction that the coefficient matrixBis low-rank and can be decomposed as a product of two lower rank matrices. Although these dimension re-duction methods lead to a more efficient estimate, none of them are applicable for variable selection.

(23)

criterion for PLS. The sparse multi-response regressions[Tur05; ST07; Pen10]assume that the multivariate response variables are associated with the same subset of predictors. They useL2-norm orL∞-norm penalties to encourage the row-wise sparsity of the coefficient

matrixB. The sparse reduced-rank regressions[CH12]combine the idea of reduced-rank regressions and sparse multi-response regressions to achieve dimension reduction and variable selection simultaneously. That is, the coefficient matrixBis assumed to be both low-rank and row-wise sparse.

Overall, all existing multi-response modeling methods universally treat the predictors as a vector and estimate a corresponding vector of coefficients. However, in neuroimaging analysis, the predictors take a more complex form of tensor. Naively turning an array into a vector would result in extremely high dimensionality. For instance, a 32×32×32 MRI image would imply 323=32, 768 parameters. Moreover, vectorizing an array would also destroy all the inherent spatial structural information of the image.

2.1.3

Outline

(24)

2.2. MATERIALS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Directly modeling a tensor image predictor takes into account spatial correlations among the voxels, and is intuitively superior than the one-voxel-at-a-time modeling solution that ignores such correlations. This extension, however, is far from trivial, since[Zho13a]only considered a univariate response, and our proposal for multi-response requires a new form of penalty and a new optimization algorithm. Third, our method offers a competitive alternative to the common modeling strategy in neuroimaging literature that first groups individual voxels by predefined regions of interest (ROI) and then extracts a vector of useful features from ROIs. By contrast, our solution does not rely on any prior knowledge of ROIs but derives highly relevant features suggested directly by the data. Moreover, instead of conducting feature extraction and association modeling at two separate steps, our method simultaneously derives relevant features and builds their association with the outcomes. Last but not least, we develop a highly scalable computational algorithm that makes our method applicable to a range of massive imaging data.

2.2

Materials

2.2.1

Subjects

(25)

non-Figure 2.1A schematic overview of the proposed sparse multi-response tensor regression with multivariate cognitive assessments.

Preprocessed MRI data

MMSE; ADAS-Cog

MMSE; ADAS-Cog Clinical scores

Predicted scores

Training set

Testing set

Sparse multivariate tensor regression

Optimalλ

Rank-Rsparse estimate: MMSE

(26)

2.2. MATERIALS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Table 2.1Demographic and clinical information of the subjects. SD=Standard Deviation.

Group AD (n=93) NC (n =101) Female/Male 36/57 39/62

Age (Mean±SD) 75.5±7.4 75.9±4.8 MMSE (Mean±SD) 23.5±2.1 28.9±1.1 ADAS-Cog (Mean±SD) 27.7±10.7 10.4±4.2

demented; (b) mild AD subjects: an MMSE score between 18 and 27 (inclusive), a CDR of 0.5 or 1.0, and met the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS/ADRDA) criteria for probable AD. The AD and NC groups were matched in age (with the two-sample t-testp-value=0.68) and gender (with the two-sample proportion testp-value=1.00). Table 2.1 presents the demographic characteristics of the subjects.

2.2.2

Preprocessing

(27)

then performed, followed by cerebellum removal. We visually checked the skull-stripped images to ensure clean and dura removal. We next employed FAST of the FSL package (

http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/

) to segment the MR images into three tissue types: gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). We used HAMMER[SD02]to spatially normalized all three tissues onto a standard space, based on a brain atlas aligned with the MNI coordinate space. Next, we generated the regional volumetric maps, called RAVENS maps, using a tissue preserving image warping method

[Dav01]. In this study, we considered only the spatially normalized GM densities (GMD),

due to its relatively high relevance to AD compared to WM and CSF[Liu12]. Finally, we downsized the GMD maps to 32×32×32 voxels. Downsizing is for estimation and compu-tational convenience, as it would considerably reduce the number of unknown parameters and save computation time and cost. It is a tradeoff and admittedly does lose some image information; however, our results and previous studies[Liu14b]suggest that the sacrifice in prediction is relatively limited.

2.3

Method

2.3.1

Model

Recall thatY = (Y1, . . . ,Yq)T∈IRqdenotes a vector ofq responses. For our AD data,q =2, andY = (Y1,Y2)T, whereY1=MMSE andY2=ADAS-Cog. LetX∈IRp1×...×pD denote aD-way

tensor predictor. For our AD data,D =3,p1=p2=p3=32, andX denotes the MRI scan.

(28)

2.3. METHOD CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Y =

   

〈B1,X〉

.. . 〈Bq,X

   

+e, (2.1)

whereBj ∈ IRp1×...×pD, j =1, . . . ,q, denotes the tensor coefficient that captures associa-tion between the tensor predictorXand the jth responseYj. The inner product〈Bj,X〉 between two tensorsBj andXis defined as

〈Bj,X=vec(Bj)T

vec(X) = X i1,...,iD

βj i1,...,iDxi1,...,iD,

where vec(X)denotes a tensor operator that stacks the entries ofX into a column vec-tor, andβj i1,...,iD,xi1,...,iD denotes the(i1, . . . ,iD)th element ofBj andX, respectively.e=

(e1, . . . ,eq)T∈IRqdenotes a vector ofq errors, each of which follows a normal distribution

with zero mean and constant variance. Without loss of generality, we omit the intercept term in Eq. 2.1.

The tensor coefficients{Bj}qj=1in Eq. 2.1 are the parameters of interest and require estimation given the observed data. If imposing no additional constraint, the total number of unknown parameters inBj, which equalsQDd=1pd, is prohibitive. For instance, for our AD data, there are 323=32, 768 parameters to estimate for eachB

(29)

Bj = R

X

r=1

β(r)

j1◦ · · · ◦β

(r)

j D, (2.2)

whereβ(j dr) IRpd are all column vectors,d =1, . . . ,D,r =1, . . . ,R, and◦denotes an outer

product among vectors. For convenience, the CP decomposition is often represented by a shorthand,Bj = ¹Bj1, . . . ,Bj Dº, whereBj d = (β(j d1), . . . ,β(j dR)) IRpd×R for d = 1, . . . ,D.

With this low-rank decomposition, the number of unknown parameters inBj decreases substantially from the order ofQDd=1pd to that ofR×

PD

d=1pd. For the 32×32×32 MRI

image in the AD example, the dimensionality reduces from 32, 768 to the order of 96 for a rank-1 model, and 288 for a rank-3 model.

Introducing this CP decomposition to{Bj} q

j=1, Eq. 2.1 becomes

Y =     

〈PRr=1β

(r)

11◦ · · · ◦β

(r)

1D,X〉 ..

. 〈PRr=1β

(r)

q1◦ · · · ◦β

(r)

q D,X〉

   

+e. (2.3)

This is our base model for multivariate responses and tensor predictors, upon which the subsequent regularization and estimation are built.

(30)

2.3. METHOD CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION   Y1 Y2   =  

(β12⊗β11)Tvec(X)

(β22⊗β21)Tvec(X)

 +   e1 e2   =   βT

11Xβ12

βT

21Xβ22

 +   e1 e2   (2.4)

where⊗denotes the Kronecker product. The first equality in Eq. 2.4 comes from the fact that

〈Bj,X=j2βj1), vec(X),

when the matrixBj admits a rank-1 CP decomposition (Eq. 2.2). Furthermore, the Kro-necker product equals the outer product whenβj1andβj2are both column vectors. The second equality in Eq. 2.4 holds because

βT

j1Xβj2=vec(β

T

j1Xβj2) = (β

T

j2⊗β

T

j1)vec(X),j =1, 2.

Examining Eq. 2.4, we see that our proposed multivariate response tensor regression model in this special case essentially postulates that the relation between thejth response variable Yj and the matrix predictorXis in the form of left multiplying a coefficient vectorβ

T

j1then

right multiplying another coefficient vectorβj2with the matrix imageX. This relation is a natural extension of the classical linear model whenXis a vector and the response-predictor relation is governed byβT

(31)

2.3.2

Penalized likelihood

Imposing the CP low-rank structure on the coefficient tensorBj substantially reduces the ultrahigh dimensionality of Eq. 2.1 to a manageable level, leading to feasible estimation and prediction. However, the resulting number of unknown parameters can still be much larger than the available sample size. For instance, for our AD example, imposing a rank-3 CP structure yields 576=2×3×(32+32+32)parameters, and the sample size is merely n=194. Moreover, Eq. 2.3 itself treats the components of multivariate responseY1, . . . ,Yq separately, while it is commonly conceived that the multivariate outcomes, i.e., MMSE and ADAS-Cog in this work, are correlated and often provide complementary informa-tion. Regularization through penalized estimation is particularly useful to both handle the small-n-large-p challenge and to incorporate potential correlations among the response variables. Therefore, we further introduce regularization into our multivariate response tensor regression model Eq. 2.3, and propose penalized likelihood estimation.

Givennindependent and identically distributed sample observations{(X1,Y1), . . . ,(Xn,Yn)}, we propose to minimize the following objective function over{Bj}

q j=1,

`(B1, . . . ,Bq) =L(B1, . . . ,Bq) +λJ(B1, . . . ,Bq). (2.5)

(32)

2.3. METHOD CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

L(B1, . . . ,Bq) = n

X

i=1

q

X

j=1

Yi j− 〈Bj,Xi

2

= n

X

i=1

q

X

j=1

‚

Yi j− 〈 R

X

r=1

β(r)

j1◦ · · · ◦β

(r)

j D,Xi

Œ2

,

whereYi jis thejth response variable of subjecti. The second componentJin the objective Eq. 2.5 is a penalty function, which could have multiple choices. Here, we choose the group lasso penalty[YL06], which, in our context, takes the form,

J(B1, . . . ,Bq) = D

X

d=1

R

X

r=1

pd X

k=1

q

X

j=1

β(r) j d k

2

!1/2

, (2.6)

whereβ(j d kr) is thekth element ofβ(j dr) in the CP decomposition Eq. 2.2. By imposing such a penalty, the individual coefficientsβj d k(r) that correspond to same subregion in an image but across different response variables are penalized as a group. As such, Eq. 2.6 encourages a subregion to drop out as a group if it is not associated with any of the multivariate response variables, which in effect takes into account potentially correlated responses. To further illustrate how the low-rank structure works and how the sparsity is imposed by this group penalty function, we consider a special case whenq =2,D =2,R =1, andp1=p2=4.

(33)

A similar group penalty was also used in the classical multi-response model [ST07]. It encourages identification of the same subregions of the coefficient images{Bj}

q

j=1across

different responses. Meanwhile, it permits different magnitude of{Bj} q

j=1, i.e., strength of

association, for different responses.

Figure 2.2An illustration of the low-rank and sparse estimation of the coefficient signal when

q=2,D=2,R=1 andp1=p2=4.◦denotes the outer product. Dotted lines connect the elements corresponding to the same region but across different responses, which are encouraged to enter or drop from the model simultaneously. Different colors denote different strength of association.

2.3.3

Estimation

(34)

2.3. METHOD CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

min c,c>0

1 2(c x

2+1

c) =|x|,

when c = |x|−1,x 6= 0. Consequently, minimizing (2.5) overB

1, . . . ,Bq is equivalent to minimizing the following objective function

n

X

i=1

q

X

j=1

Yi j− 〈Bj,Xi

2

+λ 2

D

X

d=1

R

X

r=1

pd X

k=1

‚

µ(r) d kkb

(r) d kk

2+ 1

µ(r)

d k

Œ

, (2.7)

over bothBj andµ( r)

d k, whereb

(r)

d k= [β

(r)

1d k, . . . ,β

(r)

q d k]∈IRq. Optimization of Eq. 2.7 can then be achieved in an alternating fashion, updating iteratively with one set of parameters renewed and the others fixed.

Algorithm 1Algorithm for minimizing`(B1, . . . ,Bq).

InitializeBj dIRpd×Ras a random matrix,j =1, . . . ,q,d =1, . . . ,D.

repeat

ford =1, . . . ,D do

Updateµ(dt+1), givenB1(t), . . . ,Bq(t). end for

for j =1, . . . ,q do

UpdateB(jt+1)=arg minBj`(B

(t+1)

1 , . . . ,B

(t+1)

j−1 ,Bj,B( t)

j+1, . . . ,B(

t)

q ), givenµ(1t+1), . . . ,µ(Dt+1).

end for

until`(B1(t+1), . . . ,B(t+1)

q )converges.

Specifically, withBj fixed, the update ofµ( r)

d k is simply

µ(r)

(35)

Withµ(d kr) fixed, the update ofBj is achieved through a block relaxation algorithm[Zho13a]. That is, by imposing the CP structure onBj =¹Bj1, . . . ,Bj Dº, the first part of Eq. 2.7 can be

written as

n

X

i=1

q

X

j=1

Yi j− 〈Bj d,Xi(d)(Bj D · · · Bj,d+1Bj,d−1 · · · Bj1)〉

2

,

whereXi(d)denotes the mode-dmatricization that maps tensorXiinto apd×

Q

d06=dmatrix such that the(i1, . . . ,iD)th element ofXi maps to the(id, 1+

P

d06=d(id0−1)

Q

d00<d0,d006=dpd00)th element ofXi(d), anddenotes the Khatri-Rao product[RM71]. This reformulation allows

one to focus the estimation onBj d while keeping all the other parameters fixed. Meanwhile, the second part of Eq. 2.7 can be written as

λ

2[ D

X

d=1

R

X

r=1

pd X

k=1

1

µ(r) d k

+ q

X

j=1

D

X

d=1

vec(Bj d)

T

diag€µ(d11), . . . ,µ(d p1)

d, . . . ,µ

(R)

d1, . . . ,µ

(R)

d pd Š

vec(Bj d)].

Therefore, the objective in Eq. 2.7 is essentially a quadratic function of individual Bj d when all the other parameters are fixed. There is a closed form solution forBj d, such that vec(Bj d)equals

‚ n

X

i=1

˜

Xi j di j dT +

λ

2diag

€

µ(1)

d1, . . . ,µ

(1)

d pd. . . ,µ

(R)

d1, . . . ,µ

(R)

d pd Š

Œ−1 n

X

i=1

Yi ji j d,

where

˜

(36)

2.3. METHOD CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

We make a few remarks about the above optimization procedure. First, although the objective value decreases monotonically through iterations, the convergence to a global optimum is not guaranteed, since Eq. 2.5 involves a nonconvex optimization and there exist potentially multiple local minima. We adopt the common practice of using multiple starting values. In our setup, the stability of the algorithm with respect to initial values depends on several factors. A large sample size, a stronger signal strength, and a low-rank true image signal would all foster fast convergence and increase the chance to locate the global optimum from different initializations. In Section 2.4.2, we report the numerical convergence behavior of our algorithm with multiple starting values. Second, the com-putation of our method is fast, since both steps of iterations have closed form solutions. Actually the computational complexity for each iteration isO(n p2q+p3q)forD2, and

O(n pDq)for D >2, when p

(37)

has to construct relatively large matrices involving{B1d, . . . ,Bq d}jointly. For that reason, we choose the variational method as our optimization solution.

2.3.4

Alternative algorithm

We can also solve the objective function Eq. 2.5 directly. Similar to the last section, if we focus on{B1d, . . . ,Bq d}and keep all the other parameters fixed, we can rewrite the objective function into

(Y˜ X˜dβ˜ d)

T(˜

Y dβ˜d) +λ R

X

r=1

pd X

k=1

q

X

j=1

β(r)

j d k

2

!1/2

, (2.8)

where ˜Y = [Y11, . . . ,Y1q, . . . ,Yn1, . . . ,Yn q]T∈IRnq,

˜ Xd=

                        ˜ XT

11d 0 · · · 0

0 X˜T

12d · · · 0 ..

. ... ... ...

0 · · · 0 X˜T

1q d ..

. ... ... ... ˜

XT

n1d 0 · · · 0

0 X˜T

n2d · · · 0 ..

. ... ... ...

0 · · · 0 X˜T

n q d

                       

(38)

2.3. METHOD CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

and

˜ βd=

   

vec(B1d) .. . vec(Bq d)

     =                  

β(1)

1d .. . β(R)

1d .. . β(1)

q d .. . β(R)

q d                  

∈IRRpdq.

Note that the objective in Eq. 2.8 is a standard form of least squares plus a group lasso penalty. Many algorithms have been developed to solve such a problem. For example, the alternating direction method of multipliers (ADMM) algorithm is a generic method for solving many regularization problems. It solves the following optimization problem

minimize f(x) +g(y)

subject toAx+By=c,

(39)

2.4

Results

In this section, we first carry out Monte Carlo simulations to investigate the empirical performance of our proposed method. We then investigate its stability, convergence and computation time. Finally, we analyze the ADNI dataset to illustrate the efficacy of the new method.

2.4.1

Signal recovery and prediciton

We evaluate the empirical performance by two criteria: the prediction accuracy of the responses measured by root mean squared error, and the estimation accuracy of the tensor coefficient shown by a plot. We compare our method with two alternative solutions. The first is to fit a tensor regression model for one response at a time. A comparison with this method would clearly show the gain of the new method that jointly models the multivariate responses. The second is to vectorize the image predictor, ignoring all potential correlations among the image voxels, then fit a mluti-response linear regression model with a group lasso penalty[ST07]. This comparison would show the gain of the new method that respects the image tensor structure. For abbreviation, we call our method and the two alternatives as “Multi-Resp", “Uni-Resp", and “Vectorized", respectively.

(40)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Among the three patterns, “cross" is of an exact rank 2, while “triangle" and “butterfly" are of infinitely high rank, whereas we use a fixed rank 3 to approximate all three patterns. We then generateq response variables following Eq. 2.3, with the errorseIRqfollowing a normal distribution with mean zero and covarianceσ2Σ, whereΣIRq×qhas diagonal elements equal to 1 and off-diagonals equal toρ. Consequently, the pairwise correlation among the q responses is governed byρ, whileσ2controls the relative noise level. We examine a series

of values ofq,σ2andρto investigate the empirical performance of different methods

under varying response correlation strength and noise level. We marginally standardize the response variables, by subtracting mean and dividing by standard deviation. The models are fitted on a training set of sizen =750, tuned on an independent validation set, and evaluated on another independent testing set of the same size.

Table 2.2 shows the root mean squared error (RMSE) of prediction of “future" responses in the testing data under 50 Monte Carlo replications, whereas Fig. 2.3 gives a graphical summary of the estimated coefficient signal based on one data replication under the scenario (i). Since the underlying true patterns are the same for all responses here, we only report the estimator for the first one. We present in the table the results whenq =3, 9,

σ=10, 15, 20, andρ=0, 0.9, respectively, but in the figure omit the results whenρ=0.9,

(41)

information. In addition, both the multivariate and univariate tensor regression solutions have produced a much better estimation than the vectorized solution, which fails to identify any meaningful patterns.

Similarly, Table 2.3 and Fig. 2.4 provide a summary of the results for the scenario (ii). Again, the proposed method clearly outperforms the two competitors. If one assumes the resulting criteria from multiple replications are normally distributed, then the two-samplet-test would yield significant differences between “Multi-Resp" and “Uni-Resp" (withp-values less than 0.001) for all combinations of(q,ρ,σ), except for(q,ρ,σ) = (2, 0, 10) (p-value=0.61) and(q,ρ,σ) = (2, 0.9, 10)(p-value=0.08). Moreover, differences between “Multi-Resp" and “Vectorized" are significant for all situations (p-values=0). It is also noteworthy that, in most multi-response literatures, the models are of a similar form as in the scenario (i). This does not necessarily imply that all the multiple response variables must admit exactly the same association with the predictors. The magnitude of those coefficients could vary. In our simulation, for simplicity, we only let the signalBj take the value of 0 or 1. The proposed method works best under the scenario (i), but outperforms the competing solutions under the scenario (ii) as well.

2.4.2

Stability, convergence and computation time

We next investigate the stability of the algorithm with respect to the regularization parameter

λand the rankRof the CP decomposition. We adopt the setting of scenario (i), the “triangle"

(42)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Table 2.2Root mean squared error of prediction under varying noise level (σ=10, 15, 20), num-ber of response variables (q=3, 9), and response pairwise correlation (ρ=0, 0.9). Reported are the mean RMSE and standard error (in parenthesis) based on 50 data replications.

Signal Parameter Root mean squared error

q ρ σ Multi-Resp Uni-Resp Vectorized Cross 3 0 10 0.599 (0.003) 0.614 (0.003) 0.993 (0.003)

(43)
(44)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Table 2.3Root mean squared error of prediction under varying noise level (σ=10, 15, 20), num-ber of response variables (q=3, 9), and response pairwise correlation (ρ=0, 0.9). Reported are the mean RMSE and standard error (in parenthesis) based on 50 data replications.

Signals Parameter Root mean squared error

(45)
(46)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

recovered signal. The rankR of the CP decomposition essentially offers a bias-variance tradeoff. A larger rank implies a more flexible model and a smaller bias, but also more unknown parameters and thus a larger variation, whereas a smaller rank implies a more parsimonious model and a smaller estimation variability, but possibly a larger bias. In reality, the true signal tensor is hardly of an exactly low-rank structure. However, given the usually limited sample size in imaging studies, a low-rank estimate often provides a reasonable approximation to the true tensor regression parameter, even when the true signal is of a high rank[Zho13a]. In our analysis, we usually fix the rank atR =3, which offers a good balance between model complexity and estimation accuracy.

Fig. 2.6 shows the convergence behavior of the algorithm, as reflected by the objective value, under 100 randomly generated starting values, and the corresponding computation time. We see that, although there exist multiple local minima, the algorithm often converges to the same or similar point. The run time is recorded on a standard laptop computer with 2.9 GHz Inter i7 CPU. For instance, the median run time of fitting a rank-3 model in this example takes about 21 seconds.

2.4.3

ADNI data analysis

We analyze the ADNI dataset reviewed in Section 2.2. The analysis consists of two parts: estimation of clinical scores and identification of brain subregions that are highly relevant to the clinical outcomes.

(47)
(48)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Figure 2.6Convergence behavior with 100 randomly generated starting values and the corre-sponding run time.

standard deviation) to avoid different response scales. A 10-fold cross-validation is per-formed. The rank of the coefficient tensor is set asR=3, and the regularization parameter

λis optimized based solely on the training set with another nested 10-fold cross-validation.

We then employ the resulting model to estimate the clinical scores on the testing data. The root-mean squared error (the smaller the better), and the Pearson correlation coefficient (the larger the better), between the predicted and the observed clinical scores on the testing data are reported in Table 2.4. Moreover, we show the scatter plots of the predicted versus observed scores for MMSE and ADAS-Cog in Fig. 2.7.

(49)

selection via a group lasso penalty and estimation via support vector regression[Zha12], support vector regression with feature selection via lasso (“SVR+lasso"), and without any feature selection (“SVR"). It is important to note that, this family of methods do not directly handle a tensor image, but a vector of features extracted from an image. As such, we employ the Automated Anatomical Labeling (AAL)[TM02]to partition the image into 90 regions of interest (ROI) and then use the average intensity of each ROI as the extracted features. The results are again reported in Table 2.4. We see that, our solution clearly outperforms the one that models one response at a time, demonstrating the advantage of jointly modeling the multivariate responses that are correlated and complementary. We also see that, after taking the standard error into account, our new method performs essentially as well as the best solutions in the literature such as “M3T" and “SVR+lasso". On the other hand, our method works without requiring a specific image atlas, and thus avoid its dependence and the choice among different atlases.

Table 2.4Estimation of the two clinical scores by various methods. Reported are the average and standard error (in parenthesis) of the root mean square error and the Pearson correlation coefficient between the predicted and the observed scores based on 10-fold cross-validation.

(50)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Figure 2.7Scatter plots of the predicted MMSE and ADAS-Cog versus the observed scores.

(51)

summary of the associated literatures. Moreover, to show the path of region selection, we repeat the same procedure with a gradually increasing sparsity tuning parameterλ. We summarize the results in Fig. 2.8, where we map the estimate back to the original high-resolution 256×256×256 MRI image, and Table 2.5. It is worth noting that a smaller tuning parameter would result in a larger number of selected regions and the potential problem of overfitting. Conversely, a larger tuning parameter would result in a smaller number of selected regions and potentially underfitting.

Figure 2.8Regions (red part) selected by the sparse multivariate tensor regression model that are relevant to the Alzheimer’s disease. The optimal tuning parameter based on cross-validation is

(52)

2.4. RESULTS CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

Table 2.5AAL regions (colored cells) selected by the sparse multivariate tensor regression model with varying tuning parameters, along with their support in the literature. 13 additional regions which are only selected whenλ=50 are not shown here in the interest of space. Abbreviations: AMYGD=Amygdala; HIPPO=Hippocampus; NL=Lenticular nucleus, putamen; IN=Insula; PARA_HIPPO=Parahippocampal gyrus; COB=Olfactory cortex; T1A=Temporal pole: superior temporal gyrus; T2=Middle temporal gyrus.

AAL Region λ=50 λ=100 λ=150 λ=200 Literature Support

AMYGD_L [Mis09; Kar04]

AMYGD_R

HIPPO_L [Mis09; Con00; Che02]

HIPPO_R

NL_L [DJ08; Dai12]

NL_R

IN_L [Mis09; Kar04]

IN_R

PARA_HIPPO_L [Con00; CB03]

PARA_HIPPO_R

COB_L [Dev00]

T1A_L [Con00]

T1A_R

T2_L [Con00]

(53)

2.5

Discussion

We have proposed a sparse multivariate response tensor regression model in this article. The new method models multiple response variables jointly, so to exploit the correlated and complementary information possessed in multivariate responses. It also models multiple voxels in an image tensor jointly, so to account for inherent spatial correlation in image covariates. The method is designed to simultaneously infer multiple responses and to identify brain subregions highly relevant to the outcomes. As such it is useful for both AD/MCI diagnosis, and for locating brain regions contributing to the disease. Our numerical analyses have demonstrated that the proposed method outperforms its competitors.

There are some alternative choices within our proposed model formulation. One is to consider a different penalty function than the group lasso penalty Eq. 2.6, and the other is an alternative tensor regression model formulation than Eq. 2.3. First, an alternative to the group lasso penalty (Eq. 2.6) for multi-response regression is theLtype penalty[Tur05]. In our context, the penalty function takes the form,

J(B1, . . . ,Bq) = D

X

d=1

R

X

r=1

pd X

k=1



max j |β

(r)

j d k|

‹

. (2.9)

This penalty, similar to the group lasso penalty, also induces row-wise sparsity to the regres-sion parameters. However, it differs from the group lasso penalty in that, theL∞penalty

(54)

2.5. DISCUSSION CHAPTER 2. SPARSE MULTI-RESPONSE TENSOR REGRESSION

constrained quadratic optimization problem, which can be solved by an interior-point algorithm[Fri07].

Second, an alternative to the multi-response tensor regression model (Eq. 2.3) is the model

Y =B(D+1)vec(X) +e, (2.10)

whereBis a(D+1)-dimensional tensor that is assumed to admit a rank-R CP decom-position, andB(D+1)denotes its mode-(D+1)matricization. That is,B=¹B1, . . . ,BD+1º whereBd= [β(d1), . . . ,β(dR)]IRpd×R, ford=1, . . . ,D, andB

D+1∈IRq×R. To better understand

this model, we again consider its special case whenq =2,D =2,R=1, i.e., two response variables with a matrix image predictor. In this case,Bis app2×2 tensor that admits

a rank-1 decomposition,B=¹β1,β2,β3º=β1β2◦β3, andβ3= [β31,β32]T. Then model

(2.10) becomes   Y1 Y2  =  

β31(β2⊗β1)üvec(X)

β32(β2⊗β1)üvec(X)

 +   e1 e2 

. (2.11)

A few remarks are in order for the comparison of model (Eq. 2.11) with model (Eq. 2.4), which is a special case of model (Eq. 2.3) under the same setup ofq =2,D=2,R =1. While model (Eq. 2.4) permits different coefficient vectorsβ11β12andβ21⊗β22for different

response variablesY1andY2, model (Eq. 2.11) imposes the same coefficient vectorsβ1⊗β2

(55)

MMSE and ADAS-Cog carry different levels of information with respect to the cognitive capability, it is more reasonable to assign different coefficients in predicting the two clinical scores. This is the reason we have primarily focused on model Eq. 2.3.

(56)

CHAPTER

3

INTERPRETABLE MULTI-MODALITY

(57)

3.1

Introduction

3.1.1

Motivation

Numerous imaging modalities have been employed for AD and MCI diagnosis, including structural MRI[Fan08a; McE09], fMRI[Mac09; PS08], PET[Gra03; SM10], and DTI[Hal10; Wee11]. While the past literatures primarily focused on a single imaging modality in one study, a number of more recent studies have begun to simultaneously use multiple imaging modalities for AD and MCI classification. It has been demonstrated that different modalities provide complementary information, and multi-modal imaging using integrative informa-tion could significantly improve the classificainforma-tion accuracy of the AD diagnosis compared to the single-modal analysis[Fan07; Fan08b; Wal10; Dav11; Hin11; Zha11; Dai12; Wee12; Zha12; Liu14a].

(58)

3.1. INTRODUCTION CHAPTER 3. INTERPRETABLE MULTI-MODALITY ANALYSIS

3.1.2

Literature review

(59)

3.1.3

Outline

In our study, we utilize two modalities, MRI and PET, as there have been accumulative evidences that individuals with AD have both structural and functional changes in brains. MRI can assess the progressive structural damage caused by AD, by measuring cerebral atrophy or ventricular expansion. Historical studies have found that temporal lobe atrophy is closely associated with AD, and hippocampus, amygdala and entorhinal cortex are partic-ularly vulnerable to AD pathology[Bra98; Gra13]. Meanwhile, PET can assess brain function in terms of the rate of cerebral glucose metabolism. Studies have found both AD and MCI are associated with significantly reduced glucose metabolism in affected regions such as temporal and parietal lobes and the posterior cingulate cortex[Her02; Lan09]. Moreover, PET can detect changes in metabolism before corresponding structural changes are visible

[Ais10].

To measure individual contributions of imaging modalities, we adopt a well established metric in regression analysis, the coefficient of determination, orR2. However, the use of

(60)

3.2. MATERIALS CHAPTER 3. INTERPRETABLE MULTI-MODALITY ANALYSIS

dimensionality that compromises estimation and inference, and destroy all inherent spatial structural information. By contrast, tensor regression preserves the multidimensional array structure of the image covariate, but introduces a low rank approximation to the coefficient tensor. This approximation exploits the special structure of tensor and reduces substantially the dimensionality of the model, which in turn leads to efficient estimation and prediction. Compared to[Zho13a], our study builds upon their proposed tensor regression model, while we extend from a single modality analysis to multiple imaging modalities, and develop appropriate individual modality contribution metrics.

3.2

Materials

3.2.1

Subjects

The subjects in this study are the same as those used in Chapter 2. See Section 2.2.1 for detailed information.

3.2.2

MRI

/

PET scanning and image processing

(61)

were downsized to 64×64×64 voxels.

3.3

Method

3.3.1

Tensor Regression

LetY ∈IR1denote a univariate response variable, andXIRp1×...×pDdenote aD-dimensional

tensor predictor. In the AD example in Section 3.4,Y is the Mini Mental State Examination (MMSE) score,X can be either 3D MRI image scan, or 3D PET image scan. Analogous to the commonly used classical linear model for vector-valued predictors, we assume the following model with a tensor predictor,

Y =α+〈B,X+", (3.1)

where α is the intercept, B is the coefficient array that captures the effects of tensor covariateX and is of the same dimension as X, and " is a normal random error with zero mean and constant variance. The inner product between two arrays is defined as 〈B,X =vecB, vecX =P

i1,...,iDβi1...iDxi1...iD, and vec(B)is an operator that stacks the

entries ofBinto a column vector. This model, however, is prohibitive if there is no fur-ther simplification, because of its gigantic dimensionality:QDd=1pd. For instance, for a 64×64×64 MRI predictor, the dimension of model (Eq. 3.1) is 643=262, 144.

(62)

3.3. METHOD CHAPTER 3. INTERPRETABLE MULTI-MODALITY ANALYSIS

B=

R

X

r=1

β(r)

1 ◦ · · · ◦β

(r)

D, (3.2)

whereβ(dr)IRpd are all column vectors,d =1, . . . ,D,r =1, . . . ,R, and◦denotes an outer

product among vectors, such that the outer productb1b2◦ · · · ◦bDofD vectorsbdIRpd,

d =1, . . . ,D, yields thep1× · · · ×pD array with entries (b1◦b2◦ · · · ◦bD)i1···iD =

QD

d=1bd id.

Combining Eq. 3.1 and Eq. 3.2 results in the CP tensor regression model of[Zho13a], in which the dimensionality decreases to the scale ofp0+R×

PD

d=1pd. For the 64×64×64 MRI predictor

example, the dimensionality reduces from 262, 144 to the order of 192 for a rank-1 model, and 576 for a rank-3 model. Reducing the ultrahigh dimensionality of (3.1) to a manageable level in turn leads to efficient model estimation and prediction. Geometrically, the CP tensor regression model in effect approximates the true signal arrayBby the combination ofRrectangular shapes. Practically, it provides a low rank approximation to the true signal, and it has been demonstrated to offer a sound recovery of both low rank and high rank signals[Zho13a].

Next we extend model (Eq. 3.1) to accommodate multiple imaging modalities. Here we take the two-modality case as an example, while extension to more than two modalities is straightforward. In the numerical study in Section 3.4, each subject had both MRI and PET images. Accordingly, we consider the following CP tensor regression model with two tensor predictors,

Y =α+〈B1,X1+B2,X2+", (3.3)

(63)

is the other tensor predictor that isD2-dimensional. Note thatD1andD2are not necessarily

equal. For instance,X1could represent an MRI image with D1 =3, andX2 could be a

resting state fMRI image withD2=4[Dai12].B1andB2are the corresponding coefficient

arrays that capture the effects ofX1andX2, respectively. Model (Eq. 3.3) can be viewed as

the tensor counterpart of the multiple linear model with two scalar predictors. In principle, one may include additional terms such as voxel-wise interactions betweenX1andX2. However, for ease of interpretation as well as confining the total number of free parameters, we focus on the model form (Eq. 3.3) in this article.

3.3.2

Measure of Contribution

To measure and compare individual contributions of different imaging modalities, we adopt the well studied and commonly used goodness-of-fit metric, the coefficient of deter-mination, orR2, under the model setup (Eq. 3.3). Specifically, it is defined as

R2=1−

Pn

i=1(yiyˆi) 2

Pn

i=1(yiy¯)2 ,

where ¯y is the average of the observed responses, and ˆyi is the predicted value ofyifor the i-th subject under a given model. Note that the numerator of the second term essentially measures the variation of the residuals after model fitting while the denominator measures the variation of the response itself. Therefore,R2quantifies the percentage of the variation

of the response that has been accounted for by a given model.R2is always bounded between

(64)

3.3. METHOD CHAPTER 3. INTERPRETABLE MULTI-MODALITY ANALYSIS

the semi-partialR2measure[Coh13], which is defined as the reduction inR2for a model

with and without that particular modality. That is, the semi-partialR2for the modalityX 1

is

Rs p2 (X1) =R2(X1,X2)−R2(X2),

whereR2(X

1,X2)is the coefficient of determination for a model containing both tensor

predictorsX1andX2, andR2(X

2)is for a model containing onlyX2.

Fig. 3.1 illustrates the contributions of different modalities under model (Eq. 3.3) through R2and semi-partialR2. In particular, the total variation of the responseY can be divided

into four parts, I, II, III, and IV, standardized such that I+II+III+IV=1. The metricR2(X 1,X2)

measures the contribution of bothX1 andX2and corresponds to the region II+III+IV

in the plot. The metricR2(X

1)measures the contribution ofX1and corresponds to the

region II+III, while the metricR2

s p(X1)measures the contribution only attributed toX1

and corresponds to the region II. Similarly,R2(X

2)measures the contribution ofX2and

corresponds to the region III+IV, while the metricR2

s p(X2)measures the contribution only

attributed toX2and corresponds to the region IV. Finally, the variation that can not be explained by neitherX1norX2corresponds to the region I in the plot.

We also note thatR2monotonically increases with an increasing number of

parame-ters. Therefore, to compare the contributions of two modalities with different number of parameters, we adopt the adjustedR2[The58], which is defined as

(65)

Figure 3.1The variance of the responseY can be divided into four parts, I, II, III, and IV, with I+II+III+IV=1. Region II corresponds to the semi-partialRs p2 , i.e., the unique contribution, for

X1, IV is the semi-partialRs p2 , i.e., the unique contribution, forX2, III is the common

contribu-tion for bothX1andX2, and I is the part that can not be explained by neitherX1norX2. In

addition, II+III, III+IV and II+III+IV are the correspondingR2for a model containing onlyX 1,

onlyX2, and bothX1andX2, respectively.

3.3.3

Extensions

There are several lines to extend model (Eq. 3.3) and the associated contribution metric. First, it is straightforward to incorporate covariates in addition to the image predictors, by considering the model

Y =α+〈B1,X1+B2,X2+γT

Z+",

whereZIRp0 denotes a vector of covariates and could contain variables such as age and

gender that are often deemed relevant to the clinical outcome, andγis the corresponding coefficient vector capturing the effects ofZ onY. Under this model, one can continue to useR2or semi-partialR2to evaluate the individual contributions of different imaging

modalities after taking into account the contributions ofZ.

Figure

Figure 2.1 A schematic overview of the proposed sparse multi-response tensor regression withmultivariate cognitive assessments.
Table 2.1 Demographic and clinical information of the subjects. SD = Standard Deviation.
Fig. 2.2 shows that, for each response variable and its associated coefficient Bj, j = 1,2,
Figure 2.2 An illustration of the low-rank and sparse estimation of the coefficient signal whenq = 2,D = 2,R = 1 and p1 = p2 = 4
+7

References

Related documents

At each time step: (i) Maxwell’s equations are advanced independently on a MPI group using a distributed FFT and (ii) guard cells are exchanged between adjacent MPI groups..!. As we

Nearly two-thirds of CFOs expect the domestic supply of natural gas to increase in the coming year, while the majority expect both global and domestic demand to increase, as well

In Kenya’s smallholder dairy sector, the appropriate combination of improved cattle breeds, artificial insemination services, specially formulated livestock feeds and improved

Multiple books, notes, websites, journals, videos, audio, images, are accessible in a single device Quiz: Learning and testing become integrated. Clearing a quiz takes the student

To study the effect of different insulin administration protocols, we performed three intravenous glucose tolerance tests in each of seven obese subjects (age, 20–41 yr; body

Following the same developments, the potential is similarly huge also for point studies of many weather parameters in any type of terrain, with a spatial resolution relevant for

Metod/urval: Sjuksköterskor och apotekare mailades en anonym enkät för att undersöka deras användning av alkohol, tobak, läkemedel och droger, och för att undersöka eventuell

The following individuals have generously contributed their effort and time to make prepayment schemes possible in Rwanda: the former and current Ministers of Health, Vincent Biruta