• No results found

2.3 Classical Learning Methods

2.3.3 Graph-based Embedding

Graph-based Embedding methods generally transform the objective data from a original space of a high dimensionality to a low dimensional space, preserving as much of the significant structure as possible, such as linear structure (e.g., principal direction variance [63], Euclidean distance [64, 65]) and nonlinear ge- ometric characteristic (e.g., local tangent [66], local linearities [67], local heat kernel [68], geodesic distance [69], diffusion distance [70]). In Chapter 5 of this thesis, the two sets of projections which preserve both the intra-modality vari- ance and inter-modality covariance are learned to embed the images captures by different cameras into a one common space. Based on this idea which only used for person re-identification, in Chapter6, an more advanced framework of hetero- manifold regularisation is explored to project samples from multiple modalities into a common Hamming space, with preserving both high-order intra-modality and inter-modality structures. Simply, these methods could be categorised into unsupervised learning, semi-supervised learning and supervised learning.

Unsupervised Algorithms: Principal Component Analysis (PCA) [63] and metric multidimensional scaling (MDS) [64] are the two representative unsuper- vised approaches for linear dimensionality reduction. As for nonlinear dimen- sionality reduction algorithms, the representative methods include local tangent space alignment (LTSA) [66], locally linear embedding (LLE) [67], Laplacian eigenmaps (LE) [68], isometric feature mapping (Isomap) [69], and DM [70], etc. These algorithms are generally named as manifold learning which is an emerging and promising approach in nonlinear dimensionality reduction. A manifold is a topological space that is locally Euclidean. LTSA [66] obtains the low intrinsic manifold by global minimization of the reconstruction error of the set of all local tangent spaces in the data set. LLE [67] and LE [68] focus on the preservation of local neighbor structure. Isomap [69] seeks the subspace that best preserves the geodesic distances between any two data points. DM method relates the spec- tral properties of Markov processes on a weighted graph (G, W ) and preserves the diffusion distance introduced in DM [70]. These linear and nonlinear unsu- pervised methods are mainly designed to embed high dimensional data into low dimensional space with preserving geometric information. Such methods only utilize the geometric relationship between samples, such as linear structure and nonlinear geometric characteristic. Mostly, such geometric information is not sufficient to discriminate different samples especially they are very close in the transformed spatial space. Consequently, the introduced label information can play an important role and provide useful information for accurate and robust classification.

Supervised Algorithms: Linear discriminant analysis (LDA) [65] is a well- known linear supervised algorithm. LDA maximizes the ratio of inter-class vari- ance to the intra-class variance to guarantee maximal separability. LDA projects data into low dimensional space with preserving Euclidean distance and the label information are used as constrains. In recent years, many dimensionality reduc- tion algorithms which preserve different kinds of geometric information with label constrains have been proposed.

To the supervised algorithms, it only exploits the geometric and label infor- mation of the labeled samples. Fukumizu et al. [71] presented a novel kernel method for dimensionality reduction with Reproducing Kernel Hilbert Spaces in the setting of supervised learning. In [72], a general framework of supervised di- mensionality reduction was proposed, which viewed both features and class labels as exponential-family random variables, and allowed to mix-and-match data- and label- appropriate generalized linear models for classification and regression. In [73], an improved version of Isomap, namely S-Isomap, was proposed. S-Isomap utilizes class information to guide the procedure of nonlinear dimensionality re- duction which was not sensitive to noise. Kouropteva et al. [74] and Li et al. [75] also built the supervised based extension of LLE and LTSA, respectively.

In [76], Sajama and Orlitsky presented a method based on maximum conditional likelihood estimation of mixture models which ensured that the selected subspace retained maximum possible mutual information between feature vectors and class labels. Liang and Li [77] developed a general regularization framework for dimen- sionality reduction by allowing the use of different functions in the cost function. The framework can be used as supervised learning with prior knowledge of label information. In [78], most popular subspace learning algorithms, unsupervised or supervised, were unitedly explained as instances of a ubiquitously supervised prototype.

Semi-supervised Algorithms: These supervised algorithms are very effec- tive for learning the low dimensional representation of labeled samples. But from an engineering point of view, it is clear that collecting labeled data is generally more difficult than collecting unlabeled data [79]. As a result, some data sets include a small amount of labeled samples and a large number of unlabeled sam- ples. To use the geometric and label information contained in data sets more effectively, a few semi-supervised frameworks were proposed for dimensionality reduction. Some methods use label information based on the framework of LDA for defining the different similarity metrics or neighborhoods [80, 81, 82, 83]. In [80], Zhang et al. defined the cannot-link and must-link constraints as prior in- formation corresponding to the between-class and within-class matrices of LDA, respectively. Zhang et al. [81] and Sugiyama et al. [82] presented a similar framework which defined within and between similarity based on LDA for global preserving, and local similarity based on LPP [84] for local preserving. In [83], Song et al. proposed a method which defined the within-manifold, between- manifold and total-manifold scatter matrices similar to that in LDA. Xu and Yan [85] presented a semi-supervised subspace learning algorithm by integrating the tensor representation and the complementary information conveyed by unlabeled data.

There are some other methods which consider dimensionality reduction as a regression algorithm from a high dimension space to a low dimension space [86,

87]. They assume that the low intrinsic coordinates of a part of trained samples are known. Yang et al. [86] showed that classical unsupervised algorithms could be modified by taking into account prior information on exact mapping of certain data points. They reformulate the minimization problem of classical methods using the label information, so that the global low dimensional coordinates could be computed by solving a linear set of equations. In [87], Gong et al. converted the classical minimization problem with a special kernel to an optimization problem with equality constraints, and the final solution could be obtained by diffusion from the labeled data points.