Linear Discriminant Analysis - Feature Selection Techniques

2.4 Role of a Feature Extractor in the Generalized Machine Learning

2.4.5 Types of Feature Extractors

2.4.5.1 Feature Selection Techniques

2.4.5.2.1 Linear Discriminant Analysis

analysis is to find a single subspace for two or more sub classes of the data. The high dimensional data is projected on this subspace and the distance between data points within each class is reduced whereas the distance between data points in two or more different classes is maximized. In other words it is a method which finds a series of projections which maximizes the ratio of between class and within class variance. The projections of two outcomes and the decision region of Linear Discriminant Analysis (LDA) is shown in Figure2.7.

Figure 2.7: Visualization of two outcomes

Suppose K1 = {k11, ..., kl11} and K2 = {k

1, ..., kl22} be the samples from two

different classes belonging to the same data K. Linear Discriminant Analysis is given by the vector w which maximizes

J(w) = w T_S Bw wT_S ww (2.16) where

SB = (m1− m2)(m1− m2)T (2.17) and SW = X i=1,2 X kǫKi (k − mi)(k − mi)T (2.18)

are the between and within class scatter matrices respectively and mi is de-

fined as mi = _l1_i Pl_j=1i kij. The rationale behind maximizing J(w) is to find a

direction which maximizes the projected class means (the numerator) and simul- taneously minimizes the classes variance in the same direction (the denominator). Figure 2.8 shows the distribution of iris data by the first single dimension using standard Fisher’s Linear Discriminant Analysis technique.

LDA1

0 50 100 150

class1 class2 class3

Figure 2.8: Single Dimensional Projection of IRIS Dataset using LDA

2.4.5.2.1.1 Related State of The Art In [86], the authors presented an incremental least square solution to linear discriminant analysis (LDA) by proposing its online incremental version. This approach dynamically updates

the least square solution (minimizes the sum of squares of the errors best for data fitting) to LDA by calculating the pseudo-inverse of the centered data matrix, and the indicator matrix without eigen-analysis. This strategy makes the incremental updation mechanism simple. The only drawback of this method is its high computational complexity, since every new incoming instance requires updation of least square solution matrix and other intermediate matrices including centered matrix, mean matrix, indicator matrix and total scatter matrix. In [87] the authors have proposed a novel CCA-based incremental linear discriminant analysis method for action recognition. This procedure iteratively learns the multi-linear discriminant subspace using canonical correlation analysis. It performs incremental updation of the discriminant transformation matrix and maximizes the canonical correlations of the intra-class data samples while simul- taneously minimizes the canonical correlations of the inter-class data samples. In [88] the authors used the concept of spanning set approximation for each new incoming data point to approximate all of the between-class, total and within class scatter matrices. The proposed method is computationally very expensive as it requires the updation of three matrices for each new point. Another incremental approach to linear discriminant analysis (LDA) [89] proposed incremental LDA deriving discriminant eigen-space in a streaming environment without updating the eigen-decomposition. By including a new data point, the means and the scatter matrices need to be recalculated. As a result, this method has also a computationally expensive criteria but the eigen decomposition has no update criteria. Infact, update is only required for mean and scatter matrices. The upda-

tion criteria for mean, within and between class matrices are presented for both sequential computation (one data point at a time), and for information coming in more than one chunks.

In [90] the authors have proposed an incremental supervised learning method called Generalized Singular Value Decomposition-Incremental Linear Discrimi- nant Analysis (GSVD-ILDA) for adaptively learning face images. The proposed GSVD-LDA can incrementally learn an adaptive subspace instead of recomputing the LDA/GSVD again, efficiently reducing the computational cost.

The advantage of the proposed algorithm includes the processing of samples in chunks or in a sequence desired for large image datasets. Secondly by dynamically adding samples, the algorithm can lesser the computational cost. The only drawback of this method is that more than one updations are required including the updation of global mean, rank approximation of the left singular vectors, the corresponding singular values and projection matrix on each adaptive input. Similarly in [91] the authors resolved the scalability problem of complete linear discriminant analysis [92] technique. Which is a PCA plus LDA algorithm, by first presenting a new implementation of complete linear discriminant analysis (CLDA) in which two steps of QR decomposition, rather than singular value decomposition are used; to obtain the orthonormal bases of the range and null spaces of with-in scatter matrix followed by presenting its incremental version which efficiently perform QR decomposition adaptively on each new incoming chunk without recomputing the CLDA again. In [93] the authors proposed a fast incremental version of linear discriminant analysis including the computing and

updating the QR factorization of the data matrix coming in both chunk by chunk and point by point manner. The only problem in this fast ILDA is the lack of incorporation of regularization approach to avoid over sampled problems.

Most of the incremental versions of linear discriminant analysis proposed in the past are domain dependent.

In document Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications (Page 62-67)