4.2 A survey on unsupervised multi-view learning
4.2.2 Manifold alignment
Manifold alignment algorithms work on the assumption that the views to be aligned lie on the same low-dimensional manifold S. Mathematically, they seek a transformation of the data for each view which projects the data points to a new low-dimensional space Z, such that if two subjects are close on the manifold S, they are close in terms of Euclidean distance in the new coordinate system Z. An illustration of this principle can be found in
4.2 A survey on unsupervised multi-view learning 142
Figure 4.2: An illustration of manifold alignment, figure taken from [Wang et al., 2011]. Each dot represents a subject and similar subjects are represented by dots colored with similar spectra. The two datasets X and Y may contain different subjects and different features, however, it is assumed that the features lie on the same low-dimensional manifold S. Manifold alignment algorithms embed the data points to a new low-dimensional space Z, by applying mappings f and g respectively, such that local similarities in S are preserved in Z.
Figure4.2. The projection in the new coordinate system provides a unified representation of the subjects based on features from multiple views.
Manifold alignment algorithms generally fall into two categories which are referred as “two-step” alignment and “joint” alignment. Two-step alignment algorithms firstly project the data points in each individual view to low-dimensional spaces, where stan- dard manifold learning algorithms can be exploited, such as Laplacian eigenmaps [Belkin
and Niyogi, 2003], ISOMAP [Tenenbaum et al., 2000], locally linear embedding [Saul
and Roweis, 2000]. In the second step, the projected spaces are aligned by applying rota-
tion or re-scaling transformations, which include multi-dimensional scaling (MDS), Pro- crustes matching, and canonical correlation analysis based matching which were reviewed
in [Priebe et al., 2013]. Algorithms belong to this category include diffusion map based
alignment [Lafon et al., 2006], Procrustes alignment [Wang and Mahadevan, 2008], and
Shen and Priebe [2015] which combined ISOMAP and MDS. For fully matched subjects
across datasets, it is notable that neither the initial embedding for each view or the rotation and re-scaling transformations in the alignment step accounts for the correspondences of subjects from different views, hence there is no guarantee that such corresponding subjects are close to each other in the final projection space Z.
A general framework of joint alignment algorithms can be found inWang et al.[2011], chapter 5, which consists of three steps. Assume that the datasets from from views 1, 2, ..., v are arranged in data matrices X(1), ..., X(v), where X(i) has dimensionality n
ni samples in each dataset may or may not correspond to the same subjects. In the first
step, an (Pv
i=1ni) × (
Pv
i=1ni) joint adjacency matrix W is constructed either from prior
knowledge or inferred from the data matrices, where the (j, k)th entry gives a similarity
score between the corresponding samples which may come from the same dataset or data matrices from different views. In the second step, the (Pv
i=1ni) × (
Pv
i=1ni) joint Lapla-
cian matrix L is constructed from W in the same way as defined in (2.22). In the third step, embeddings of the data points in the low-dimensional space in Z are obtained by minimising an objective function which usually involves a weighted sum of the Euclidean distances between all pairs of data points in Z where the weights are defined by the Lapla- cian L. Note in particular, since manifold alignment algorithms assume that the datasets from disparate views have the same manifold structure, the Laplacian associated with each view can be regarded as an individual sample of the true underlying manifold, hence con- catenating the individual Laplacians as described would be expected to benefit estimating the true manifold by exploiting a larger number of coherent samples. Examples of joint alignment computation algorithms includeHam et al. [2005] which computes the eigen- decomposition of the joint Laplacian L, andXiong et al.[2007] which uses semi-definite programming. Some notable variants include the joint diagonalisation of Laplacians which seeks a common eigenbasis of the Laplacians individually computed from each view [Ey-
nard et al., 2012], and the sparse manifold alignment algorithm [Wang et al., 2012a]. In
particular, the latter extends the algorithm inWang et al.[2011] by reformulating the eigen- vector problem into a regularised least squares optimisation with sparsity induced penalties, and thus prunes view-dependent features which are not shared across the datasets [Wang
et al.,2012a]. The multi-view spectral embedding [Xia et al., 2010], in which the subjects
in all views are assumed to be the same and only the local topology of the K nearest neigh- bours is considered, may well be formulated into the above framework where the (i, j)th
entry of the Laplacian L is non-zero if and only if i = j or the corresponding data points are in the same dataset and subject j is among the K nearest neighbours of subject i.
A special class of manifold alignment algorithms that attracted substantial research in- terests are linear embeddings. In nonlinear manifold alignment, the solutions obtained from the optimisation problem are the new coordinates of the embedded data points, however, it is generally infeasible to derive the precise transformation from each manifold. Linear em-
4.2 A survey on unsupervised multi-view learning 144
bedding algorithms seek a set of linear mappings from the data points, one for each view, to the new and aligned space, such that if two data points are close in terms of the geodesic distance in the underlying manifold which is assumed to be shared by all views, they are close in terms of the Euclidean distance in the new coordinate system. The primary merit of being able to identify the transformation functions is that it allows us to embed new data into the projected space without having to employ an interpolation method. Besides, a data point from one manifold can be transformed to a manifold associated with another view, from which associations between features from different views can be learned. Assuming the samples are matched across views, Feng et al.[2013] incorporated linear embeddings in manifold alignment and proposed the AUMFS. By penalising the `1/`2norm (defined in
(3.37)) of linear coefficients, AUMFS can select features which best summarise the simi- larities between the datasets. A similar method was proposed inQuadrianto and Lampert
[2011] in which the loss function not only pulls similar samples across different views close to each other in the aligned Euclidean space but also pushes dissimilar samples apart. Where sample correspondence between datasets is unknown, Yan et al. [2013] proposed to jointly estimate such correspondence information and the linear embeddings where an entropy regularisation is employed to control the uncertainty of the correspondence. An alternative algorithm was proposed inCui et al.[2014] where between datasets sample cor- respondence was formulated into an 0 − 1 integer matrix to be jointly estimated with the linear embeddings.
For our problem, we are mostly interested in identifying features (gene expressions) which best characterise the shared and tissue-specific patterns, rather than similarities and clustering patterns of the subjects. Moreover, in some cases the subjects recruited in mul- tiple datasets can be different in which case manifold alignment can provide limited infor- mation to our interest. Most importantly, like co-training algorithms, manifold alignment algorithms assume a unified low-dimensional manifold for data points in different views, which ignore the differences between views.