• No results found

7.3 Case Study: Comparison of Pixel-Mapping Techniques for DCE-MRI

8.1.2 Kernel Principal Component Analysis

Kernel principal component analysis [Sch¨olkopf et al., 1998, Sch¨olkopf et al., 1999b, Sch¨olkopf and Smola, 2002] is a nonlinear extension of the PCA algorithm. Similar to other kernel-based methods, data are embedded in a feature space F nonlinearly related to the data space X by a nonlinear function

Φ : X → F , x 7→ Φ(x).

For the time being, it is assumed that the mapped data are centred in F . After embedding the data in the feature space, the linear PCA algorithm is executed in F . To this end, the eigenvectors ξ of the covariance matrix

C = 1 N N X i=1 Φ(xi)Φ(xi)T (8.2)

have to be computed by solving the eigenvalue equation

λξ = Cξ (8.3)

with ξ ∈ F \ {0} and λ ≥ 0. Substitution of (8.2) into (8.3), resulting in

λξ = 1 N N X i=1 Φ(xi)Φ(xi)Tξ = 1 N N X i=1 hΦ(xi), ξiΦ(xi),

indicates that solutions ξ with λ 6= 0 lie within the span of Φ(x1), . . . , Φ(xN). Thus, the dual

form of a solution ξ is defined as

ξ =

N

X

i=1

αiΦ(xi) (8.4)

and we may alternatively consider the set of equations

By combining (8.4) and (8.5) to λ N X i=1 αihΦ(xj), Φ(xi)i = 1 N N X i=1 αi D Φ(xj), N X k=1 Φ(xk)hΦ(xk), Φ(xi)i E , ∀j = 1, . . . , N, the eigenvalue equation becomes

N λKα = K2α with the N × N matrix K with entries

Kij = hΦ(xi), Φ(xj)i, i, j = 1, . . . , N. (8.6)

This form of the eigenvalue equation allows to compute the mapping to the feature space implicity by substituting a suitable kernel function K(xi, xj) fulfilling Mercer’s theorem [Sch¨olkopf et al.,

1999a] for the inner products hΦ(xi), Φ(xj)i in (8.6) (cf. chapter 4.2.2). In this case, K is also

referred to as kernel matrix.

In practice, the N eigenvectors of K are determined by solving

N λα = Kα (8.7)

for λ > 0. α denotes the parameters of the dual form of the eigenvectors (8.4). The parameter vectors α1, . . . , αN of the eigenvectors ξ1, . . . , ξN can be sorted in descending order according

to their eigenvalues λ1, . . . , λN and provide a basis of orthogonal and uncorrelated basis vectors.

Centring Data in the Feature Space

So far, the data were assumed to have zero mean in F . While it is difficult to centre the data in F explicitly, a corresponding transformation of the data set Γ can be achieved by calculating the centred kernel matrix [Sch¨olkopf and Smola, 2002]. The centred kernel matrix is obtained by modifying the entries of the original kernel matrix according to

˜

Kij = (K − 1NK − K1N + 1NK1N)ij, i, j = 1, . . . , N (8.8)

with the matrix 1N with entries (1N)ij := N1. Subsequently, the centred kernel matrix ˜K is

substituted for the kernel matrix K in equation (8.7). Computing Principal Components

Comparable with the linear PCA, the j-th principal component ˜xj is the inner product between

an example x and the j-th eigenvector. In contrast to the linear PCA, the inner product is not computed in the data space but in the feature space

˜

xj = hΦ(x), ξji. (8.9)

By substituting the dual form of the eigenvectors given by equation (8.4), the principal components can be expressed as weighted linear combinations of kernel functions

˜ xj = 1 pλj N X i=1 αjiK(xi, x). (8.10)

Data Space Feature Space ξ1 Φ Data Space PCA: KPCA: ξ1

Figure 8.1:Illustration of PCA and KPCA applied to data in two dimensions. The eigenvector ξ1determined by PCA lies in the direction of the highest data variance in X . For PCA, projections onto ξ1(as depicted by the dashed iso-contour lines and the colour of the data points) vary linearly in the direction of the eigenvector. If ξ1 is computed by KPCA with a nonlinear kernel function, constant projections onto ξ1 describe linear contour lines in the feature space F but nonlinear ones in X due to the nonlinear relation between F and X .

The coefficient √1

λj

normalises αj such that the eigenvector ξj has unit length in F :

j, ξji = 1, j = 1, . . . , N Choosing a Kernel Function and Its Parameters

An essential component of KPCA and other kernel-based algorithms is the transformation Φ which maps examples from the data space X to the feature space F . This transformation is typically implicitly defined by the choice and parametrisation of the kernel function K(xi, xj). In fact,

the transformation Φ, which is implicitly computed by the chosen kernel function, even needs not to be known explicitly as long as the kernel function fulfils Mercer’s theorem and, therewith, computes the inner product in some suitable feature space. If the Gaussian kernel

K(xi, xj) = exp  −kxi− xjk 2 2σ2  ,

is employed (as in this work), the user is required to set the kernel bandwidth σ to a reasonable value. Typically, the most suitable value varies for different applications. Since there is no general guideline for adjusting the kernel bandwidth in KPCA, different values are commonly tested by e.g. manually evaluating the outcome of KPCA.