Maximum Likelihood ICA - Temporal-spatial modeling for fMRI data

Early ICA algorithms were developed to minimize the mutual information between the components of the estimate Wx, which is known as infomax (Bell and Sejnowski, 1995). However, the mutual information is difficult to approximate and optimize based on a finite sample. Another common family of algorithms of ICA is to make use of the maximum likelihood (ML) method for estimating the optimal unmixing matrix (Stone, 2004; Hastie and Tibshirani, 2002; Bach and Jordan, 2002; Chen, 2005). It has been proven that the infomax method is essentially equivalent to the ML approaches (Stone, 2004). In the following, we study maximum likelihood ICA in more detail.

In the ICA model (4.1), the independent source components (ICs)s1, . . . , sKare taken as latent variables. To make the problem (4.1) solvable, it’s necessary to assume that

N ≥K. Without loss of generality, we assume that N =K. Hence, the mixing matrix

A is of dimension K×K.

The ML approaches include a specification of the probability density function (pdf) of the unknown source signalss. The goal of ML ICA is then to find an unmixing matrix that generatesWx with a joint pdf as similar as possible to the joint pdf of the unknown source signals.

Suppose that the density function ofxisf(·) and eachsk has a density functiongk(·) for k = 1,2, . . . , K. Let W = A−1 _{be the unmixing matrix. If} _s

1, s2, . . . , sK are independent with marginal density functionsg1, g2, . . . , gK, thenf(x) =

Q_K

k=1|W|gk(e0kWx), where ek is the kth column of the K×K identity matrix so that sk =e0kWx. Thus the log-likelihood function of W based on the data is

logf(x) =X k

loggk(e0kWx)|W|.

For fMRI data in which x = xi = (xi1, xi2, . . . , xiK)0 denotes a voxel time series which is a mixture random vector at voxel i with density function f(·), and s = si =

(si1, si2, . . . , siK)0 a source vector at voxel i with density function g(·) whose marginal density functions areg1, g2, . . . , gK, to account for all the temporal data, the log-likelihood function of W is obtained by taking the time average of the above likelihood so that

l(W) = 1 M M X i=1 K X k=1 loggk(e0kWxi) + log|W|, (4.2)

where M is the number of voxels in the fMRI data set.

Traditional ICA algorithm FastICA assumes the sources are identically distributed with a common density functiong1 whose functional form is also known (Hyv¨arinen and Oja, 2000) and it’s thus limited to the parametric form of g1. Recently, some nonpara- metric methods have been proposed to estimate the unknown distributions of the hidden sources. For example, Hastie and Tibshirani (2002) uses penalized splines for the esti- mation ofg1, g2, . . . , gK, while Bach and Jordan (2002) and Chen (2005) consider kernel estimates (KDICA). More recently, Kawaguchi and Truong (2007) proposes a new ML ICA algorithm that models the distribution of the independent source components using polynomial splines with data-dependent knot locations. This is referred to as SICA. We will provide a more detailed description of these methods in the next two sections.

4.2.1 KDICA

Chen (2005) proposes a fast KDICA algorithm, which considers kernel density estimates of g1, g2, . . . , gK. The goal of the algorithm is to estimate W and g1, g2, . . . , gK by maximizing the log-likelihood function (4.2). Since both W and gk, k = 1, . . . , K are unknown, the algorithm starts with an initial W, which can be obtained from FastICA or other ICA algorithms. When W is known, gk is identical to the density function of e0

kWx. Hence gk can be estimated by the kernel density estimator ˆ

gk(s) = 1/(Kh)

P_K

i=1K((e0kWxi−s)/h), where the Laplacian kernel is used for the kernel functionK(·) and the bandwidth his selected as 0.6ˆσK−1/5 _{with ˆ}_σ _{being the sample}

standard deviation of e0

plugged back into (4.2) and W can be updated by maximizing the log-likelihood function. The algorithm iterates in the above way till convergence, using Amari metric (Bach and Jordan, 2002), a measure of the closeness of two matrices defined as

d(A0,A) = 1 2m m X i=1 ¡Pm_j₌₁|rij| maxj|rij| −1¢+ 1 2m m X j=1 ¡Pm i=1|rij| maxi|rij| −1¢, (4.3)

where rij = (A0A−1)ij and the dimension of A0 and A is m×m.

When estimating the kernel densities of gk’s, Chen (2005) proposes a FastKDE method, which improves the computation performance to a great extent.

4.2.2 SICA

More recently, Kawaguchi and Truong (2007) developed a new approach using polynomial splines to model the logarithmic of g1, g2, . . . , gK (SICA). In their study, each logarithmic density is modeled using polynomial splines

loggk(s) = βk00+βk01s+

βk1i(s−rki)3+,

where β_k = (βk00, βk01, βk11, . . . , βk1mk) is a vector of coefficients, rki are the knots and

mk is the number of knots for the kth source density function. The knot selection in this algorithm starts with an initial knot placement, which is set to be the minimum, median and maximum values of the data. Then the knot selection methodology involves stepwise knot addition, stepwise knot deletion and final model selection based on Bayesian information criterion (AIC), which is defined by

BICk=−2l(ˆβk) +mklogK.

The same as KDICA, the algorithm starts with an initial W and optimize the density functionsg1, g2, . . . , gK andWin an alternate way until convergence using Amari metric.

In document Temporal-spatial modeling for fMRI data (Page 52-55)