Dictionary Learning - Technical Background

2.2 Technical Background

2.2.3 Dictionary Learning

In the setting of sparse coding we assumed to know the (overcomplete dictionary) D before hand. However, the dictionary can also be learnt directly from the data [101]. One classical way to do this is to choose a set of training signals or a basis of an overcomplete wavelets, curvelets, Fourier transforms and etc. A more recent approach is to learn dictionaries based on training signals instead of predefined dictionaries. Given a set of training signals Y = [y1, y2, . . . , yi, . . . , yn] ∈ Rm×N, a dictionary D can be defined

to represent each signal in Y sparsely:

h ˆD, ˆXi = argmin

D,X

kY − DXk2₂ subject to kXk₀ ≤ L (2.11) where L is the sparsity parameter and columns xi of X ∈ RK×N

represents sparse coding coefficients. Since the optimization of equation 2.11 is defined over D and X; this problem can be solved by fixing one parameter and applying the optimization on the other parameter. This optimization strategy starts with initializing the dictionary by using randomly selected training signals. Then, the sparse solutions X of the training signals Y are computed by keeping the dictionary D fixed. After that, the objective function in equation (2.9) can be optimized over D by keeping the sparse solutions X fixed. This alternating optimization process is repeated until some convergence criterion is reached, such as a number of iterations or a desired approximation error. It should be mentioned that finding the global optimal solution cannot be guaranteed by using this

iterative optimization strategy. The method of optimal directions (MOD) [30] and the K-SVD [2] are two efficient algorithms to learn dictionaries which utilize variants of this iterative optimization strategy. In practice, it has been observed that K-SVD converges with fewer iterations than MOD. In the next section, we will give a detailed introduction of the K-SVD algorithm.

K-SVD

The K-SVD algorithm is inspired from the k-means clustering algorithm, which is also an NP-hard problem [2]. The aim of k-means clustering is to partition all the signals into K clusters, in which each training signal belongs to the cluster with the nearest mean. It employs an iterative approach to find the solution of K clusters and there are two steps at each iteration: In the first step, each training signal is assigned to its nearest cluster; in the second step, the Kclusters are updated as the centroids of their assigned training signals. The K-SVD follows a similar iterative two-step process to learn the dictionary and find the sparse solutions. After initializing the dictionary D, the solution of the sparse coefficients is found by keeping D fixed, followed by a second stage searching for a better dictionary. The K atoms in the dictionary are updated separately in the dictionary update stage. This is a direct generalization of the k-means algorithm, in which K clusters are also updated separately. The iterative process of the K-SVD is illustrated in Algorithm 2. The iterative process is repeated to update the K atoms of the dictionary Dusing the singular value decomposition (SVD) decomposition, thus the name K-SVD. In Algorithm 2, OMP is used for sparse coding. It should be mentioned that the K-SVD algorithm is flexible and can work with other sparse coding methods [2].

The major difference between the K-SVD algorithm and other dictionary learning (DL) methods is that the sparse coding coefficients X are not fixed in the dictionary update step. In the K-SVD algorithm, an atom in D and its corresponding row in X are updated simultaneously as shown in Algorithm 2. This accelerates

Algorithm 2K-SVD Algorithm Input:Training Signals yii = 1, . . . , N

Output:An overcomplete dictionary D ∈ <m×K_{and sparse coding}

coefficients X ∈ RK×N

1: Initialize dictionary D with K randomly selected training sig-

nals

2: whileconverged do

3: Sparse Coding:

4: foreach training signal yi, use OMP to compute the corre-

sponding coding coefficients xido

5: min

kx_ik₀ subjectto yi = Dxi, i = 1, . . . , N

6: end for

7: Dictionary Update:

8: for j = 1, . . . , K, update the jth atom dj and the jth row xj_T

of the coding coefficients X do

9: Find the groups that use dj:wj = {i ∈ {1, . . . , N } :

xj_T(i) 6= 0}and xj_Ris obtained by eliminating the zero entries in xj

10: Compute representation error matrix: Ej = Y −

i6=jdixj_T.

11: Obtain ER

j by selecting the columns of Ejcorresponding

to wj.

12: Apply SVD decomposition ER

j = U ΣVT, update the

atom djwith the first column of U, and update xjRwith the first

column of V multiplied by Σ(1, 1).

13: end for

the convergence of the learning process, making the K-SVD algorithm more appealing. Despite the fact that the K-SVD algorithm converges fast, it is still computationally expensive at each iteration as a SVD decomposition must be calculated K times and all the N training signals are used for sparse coding at each iteration.

In document Joint registration and segmentation of CP-BOLD MRI (Page 43-46)