Setup - Domain adaptation using manifold alignment

Therefore, Eq (4.13) can be substituted in Eq (4.10) to obtain the SVM classification decision function :

ˆy = f(x) =

i=1

↵iyiK(x, xi) + b (4.14)

Many kernels are known. For a classification problem, one has to find the appropriate one which will define the right type of feature space, with the right parameters. In this work, we consider two kernel functions, the linear one and the Gaussian one, also known as the Radial Basis Function kernel (RBF).

Using a linear kernel reduces to the case discussed for a linear SVM, the kernel function being the dot product :

K(x, xi) =hx, xii (4.15)

The RBF kernel has the form :

K(x, xi) = exp ✓ -kx - xik 2 2 ◆ (4.16) where parameter is the standard deviation of the Gaussian bell. This kernel is known to give good results and allows the results to be interpreted as similarity between points. For a classification problem, one has to evaluate the best to avoid over or under fitting.

For this work, the LibSVM library written by Chih-Jen Li, and modified by Jordi Muñioz to run on Matlab1_.

4.3 setup

The data were computed according to Chap.3. From now on we will refer to the nadir and off-nadir as X1 _{and X}2 _{respectively. In}

order to have comparable values of pixels and to allow LibSVM to run well, all pixels from both images were rescaled by dividing by the maximum value of the two images maxvalue = max(X1, X2). In

order to work within a controlled framework to run the algorithms, all the data used were taken from the GTs.

Both sets X1 and X2 were split in two subsets, a training set

Xntrain, and a test set Xntest, n = [1, 2]. The training sets rep-

resent the pool of data for the algorithm to compute the manifold alignment and to train the models. The test sets were used to assess the algorithm performance.

Figure 13: Workflow process of data preparation.

A problem arose on the consistency of the labeled data. The difference in the amount of pixels per class was of very high with varying orders of magnitude of 102 _{between the minimum and the maxi-}

mum number of elements per class. If one were to use the data in this state, the algorithms would most certainly suffer from over- representation of the water class,leading to erroneous results. The proposed method was thus modified to account this fact. Consider- ing the number of labeled samples per class, we consider the median number of elements medN and impose the algorithm to randomly

choose at most medNper class to create our training set.

4.3.1 Hypothesis

As explained in Chap. 3.2.2, the MA technique used is a semisu- pervised method and thus needs both labeled and unlabeled elements from both domains. Remembering Chap.2, we know that the availability of labeled samples for multiple images set is a costly step in domain adaptation. The proposed approach of [3] does not pay any particular attention to the amount of labeled elements per domain to be taken to achieve a reasonable classification accuracy. In this work, we dispose of a fully computed GT for each image. To assess this dependency, we need to set the number of labeled element

4.3 setup 33

per class in X2 as a parameter of the global performance. A total

of 150 points sample per class was taken for X1 and the algorithm

iterates over the number of labeled element per class in X2. We call

mthis parameter to test.

m2 [10; 150] (4.17)

In the same way, no information on the ratio between labeled and unlabeled element for each image was proposed. The relation is indirectly held in the cost function (eq. 3.15). The way the graph Wsand Wd were computed stated to assign a 0 value if one or both

of instances’ labels were unknown. Note that unlabeled samples only play a role in the geometrical similarity graph W. In order to evaluate the impact of the amount of unlabeled elements taken, we define it as an other parameter of performance analysis and call it U. As we only use data from the GT, the actual labels are known. The labeles corresponding to the U elements were set to 0.

U2 [500, 3000] (4.18)

As said in Chap. 3.2.2, the µ parameter in the cost function acts as a weight parameter between topology preservation and matching instances. [3] holds that it should be equal to 1 in the case were both properties are equally important. Therefore, an attention is given to the variation of the parameter. During the computation of the cost function elements, we considered the magnitude of the different graph Laplacian computed2_.

We observed that the the magnitude of both Ls and Ld was way

greater than the one of L. Remembering the derivation of the different Laplacians in Chap. 3.2.2, the Laplacian representing the geometrical similarity with a k-nn graph will have small values on its diagonal. On the other hand, as we choose 150 samples per class in X1 and varying numbers for X2, Ls will have values of the range of

102. Ld considering the unlabeled samples will have a magnitude

of 103_{. Therefore, two cases are considered : µ = 1 and µ = 100 to}

evaluate the impact of this gaps of magnitude. As said, µ is a weight parameter and will not modify L in itself, but rescale its magnitude regarding Ldand Ls.

4.3.2 Scenarios

Four different scenarios were considered to evaluate the performance of the domain adaptation. We first consider ordinary supervised classification for both domain. Then we use the case of a single

ground truth computed over X1 samples without adaptation to clas-

sify X2. A supervised approach is then made by creating a model

with labeled element from both domains. Finally we assess the performance of the manifold alignment method with regards to m and U.

For notation purpose and understandability, let’s call model1 the

model built on X1elements, model2 the model built on X2 elements,

model1,2 the model built over elements of the two domains, and

modelp_1,2, the model computed after the manifold alignment process. The different scenarios are :

A : Classification of X1 with model1

B : Classification of X2 with model2

C : Classification of X2 with model1,2

D : Classification of the projected data of X2 with modelp1,2

The four scenarios are tested on the different classifiers presented in Chap. 4.2: the Naive Bayes classifier and the SVM using an RBF kernel. To Evaluate the performance, two measures are done; the overall accuracy (OA) of the classification and the Cohen’s Kappa presented in [39] which measures the agreement of the estimation and the observation.

5

R E S U LT S

5.1 matrices involved

The geometrical similarity graphs were presented in Chap. 3. In Fig. 14, we present the similarity and dissimilarity graphs built on the labeled data. The graphs illustrates the symmetrical property of the similarity. The upper left part corresponds to the similarity within X1, and the lower right the one for X2. The two other blocs

are the similarity between the two domains. Note that the size corresponding to X2 grows with the increase of parameter m and U.

(a) (b)

Figure 14: Instance matching similarity graphs

In this figure, the white part means a value of 1 and the black a value of 0.

5.2 numerical performances

The performance for the two classifiers is shown as a function of parameter m in Fig. 15. The non-continuous lines are the results for the projected results for three different U : 1000, 2000 and 3000. The red line corresponds to scenario C, where no unlabeled elements are used. This scenario is the combination of the two GT without projection and is therefore the goal to overtake. The green line is the results for the classification of X2 with its own ground truth. It

is a best case scenario that we also want to overtake. In both cases, the results between OA and Kappa are very similar. As scenario A corresponds to the the case of scenario C with m = 150 for X1, the

result could not represented as function of m. The results coincided

with the the specific case of the scenario C, being the upper right point on the full red lines. We can see that the NB gives poor results for the use of two GT without manifold alignment. Its sensibility to the variation in PDF between two images does not allow it to ap- preciate the input of both images. The added labeled element from X2 allow to increase its performance but at a low rate. Compared

to the SMV method, the amount of sample from the second image needs to be greater in order to show an increase of accuracy. When using the manifold alignment, NB is able to reach the performance of a classification preformed on the single GT case with both a µ of 1 and 100. It also interesting to note that there is no increasing trend for scenario B with this classifier.

5.2 numerical performances 37

The RBF SVM shows better results for every scenarios showing its ability to adapt itself to the data’s complexity, which is a clear advantage in comparison to NB. The gap between B and the different set for D is also smaller than what is seen for NB at small m. The main advantage of this classifier, when using MA, is that it is able to overpass scenario B with both µ parameters.

The DPC performance is shown in Fig. 16. The plots show the results for both corrected and uncorrected projection functions for the two µ analysed. In this case, the dimension of the latent space is set to dmax. Surprisingly, the assumption of increasing the perfor-

mance by correcting the geometrical error in the projections is not validated for any classifier. The residual differences is of the order of 10-6 _{which clearly refutes the hypothesis.}

Figure 16: Sensibility of the different classifiers to the DPC

Remembering Fig. 9, the BDM was showing mismatched projections for the 8th_{dimension forth in the uncorrected case. [}₃_{] did not}

give much information on the optimal dimension of H. In the case of using dmaxas the dimension of the latent space, we assumed to

bestow the maximum variance of data for the classification. How- ever, if one makes the parallel between the MA technique and a principal component analysis (PCA), we can assume that the first dimension of H should inherit the maximum of the variance ex- plaining common information among the domains, both in geometrical and correspondences. This way, each new dimension should add knowledge but with a decreasing amount as the dimension of H increases, the last dimension should therefore contain the noise, e.g. the difference among domains. To assess this, we computed the performance of the classifiers for each single dimension of the latent space. This is shown in Fig. 17. This analysis gives very interesting results as the aforementioned hypthesis of the last dimensions hold noise is refuted. Indeed, the figures shows that the different dimension are not ordered in decreasingly manner regarding their single

performance. Further analysis should allow to better understand this property.

Figure 17: performance of classifiers for each dimension of H separaly

Beyond assessing the performance of each dimension one its own, we also considered the case of increasing the dimension of H. This is represented Fig.18. The accuracy grows very fast until the 4th

dimension where it reaches a near maximum state for all three classifiers. Interestingly enough, the dimension where the performance reaches this asymptotic behaviour is smaller than the initial dimension of each domain. The source of this might be the initial property of manifold alignment being a dimensionality reduction problem. Anyhow, the BDM was still showing a possibility to improve the projection and might be more useful as a measure of the residual noise in the data found in the highest dimensions of the latent space. 5.3 parameters sensibility

5.3.1 To parameter µ

Fig. 15also showed the role of µ in MA. For both classifiers at low m, the case when µ is scaled to the Laplacians magnitudes shows better results than the unscaled case. It is particularly interesting for the SVM as the results in this case are always overpassing the scenario B. In the case of very few labeled data available for X2, this

parameter could therefore play an valuable role.

In Fig. 19 µ shows once more some interesting results. For NB, increasing the value of µ seems to introduce noise in the results, but also shows better performance at low m. The graph is more chaotic and does not show a homogeneous evolution of the performance.

5.3 parameters sensibility 39

Figure 18: performance as function of the latent space’s dimension

However, RBF does not suffer this heterogeneity. The rescaling step must therefore be made with precaution, with good consideration on the classifier used.

5.3.2 To m and U

The study of the performance as a function of m and U is presented in Fig.19. For each classifier, the dependency over the number of labeled sample from X2 is clear. The increase is relatively fast

for the first addition of samples but rapidly reaches a flatter state around m = 50 elements per class taken for X2. From there on the

progression of performance is relatively stable for NB. In the case of RBF SVM, it keeps getting a little higher. The ability of RBF to fit more complex data might be the explanation. Indeed, in the point of view of PDF matching, the more m increases, the more the system is constraint in order to fit the two images together. As the amount of samples increases, so does the size of both Ws and Wdand thus

the complexity of the algorithm, which might explain why the RBF shows this light progression for bigger m.

The number of unlabeled samples show a different results. It seems only relevant for the NB at very low m. The SVMs seem to be relatively unaffected by this parameter. The spectral information needed for the preservation of topology has to be sufficiently represented by the labeled data used.

Figure 19: performance for the three classifiers Kappas on the left and Overall Accuracy (OA) on the right.

5.4 classification maps

The results of classification are shown in Fig. 20. The previous observations are confirmed with respect to the classifiers’ accuracy. The RBF offers more precise classified maps than the NB for each scenario. Indeed, NB seems to assign to much pixels to the class "Shadows" which is also sometimes mixed with water. The use of feature recognition algorithm might increase the results, as we can see that some of the missclassified pixels lie inside bigger structures.

5.5 discussion 41

NB RBF

Figure 20: Classified maps of X2. In a descending way we have scenarios

B, C and D, all runned with m = 50.

5.5 discussion

The topological preservation with respect to matching instances seems to be well respected. However, further analysis should be made to be sure of this. We could think of taking less labeled points in X1, or having varying amount of labeled data for both domains.

One should also remember that the two images were taken from the same satellite at a very short time shift, making the spectral variance

between the two images somehow the smallest possible. Further analysis such as a multi temporal problem or matching images from different sensors are ought to show different results by using the MA technique. In such analysis the dimension of the latent space at which the performance becomes asymptotic might also change.

The link between the pk input dimensionalities and the optimal

d cannot be shown on a two images dataset, and opens the door to an important quantity of further analysis on the proposed MA technique.

6

C O N C L U S I O N

The approach considered in this work proposed to use manifold alignment for domain adaptation between two images from a multiangular dataset one considered as the source domain, and the second as the target domain. The algorithm allowed to project both domain in a common high dimensional latent space were two classifiers were trained, a Naive Bayes and a support vector machine. The results were compared to corresponding analysis in the initial domains. Different parameters were tested to determine the key elements of the method : the amount of labeled data available for the target domain, the amount of unlabeled data taken in both images and a parameter acting as a weight between the preservation of topology and the matching of instances.

For both classifiers the method allowed to improve classification accuracy with regards to conventional approaches. The use of few labeled samples from the target domain was sufficient to achieve this improvement. The amount of unlabeled data used was mostly interesting for few labeled data. However, unlabeled data only holds spectral information whereas labeled data contains both spectral and correspondence information. As the two images were acquired by the same sensor in a very small time shift, the sensitivity of the approach with respect to these two kind of information could not be properly evaluated. This opens the field to further analysis, such as multi-temporal approach or the use of multi-sensor datasets. The approach also showed an appreciation of the latent space’s dimension. However, results could not define an optimal case and this element still needs to be studied for a better understanding of manifold alignment performance in domain adaptation.

A

A N N E X E S

a.1 dot product correction graphs

Figure 21: Projection of the two domains on the dimension 6-8 without correction

Figure 22: Projection of the two domains on the dimension 6-8 with correction

A.2 classification of the two images 47

Figure 23: Classification of whole X2with NB, scenario D

B I B L I O G R A P H Y

[1] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford3. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.

[2] Chang Wang, Peter Krafft, and Sridhar Mahadevan. Manifold Learning Theory and Applications, chapter Manifold Alignment. CRC Press Inc, 2011.

[3] Chang Wang and Sidhar Mahadevan. Heterogenous domain adaptation using manifold alignment. Proceedings of the Twenty-Second international joint conference on Artificial Intelli- gence, 2:1541–1546, 2011.

[4] Fabio Pacifici, Jocelyn Chanussot, and Qian Du. 2011 grss data fusion contest : Exploiting worldview-2 multi-angular acquisi- tions. In IGARSS 2011, 2011.

[5] Clement Chatelain. Les support vector machine (svm). Techni- cal report, 2003.

[6] Gong Jianya, Sui Haigang, Ma Guorui, and Zhou Qiming. A review of multi-temporal remote sensing data change detection algorithms. ISPRS Congress Beijing, 27:757–762, 2008.

[7] Nathan Longbotham, Chuck Chaapel, Laurence Bleiler, Chris Padwick, William J. Emery, and Fabio Pacifici. Very high resolu- tion multiangle urban classification analysis. IEEE Transactions on Geoscience and Remote Sensing, 50:1155–1170, 2012.

[8] David J. Diner, Gregory P. Asner, Roger Davies, Yuri Knyazikhin, Jan-Peter Muller, Anne W. Nolin, Bernard Pinty, Crystal B. Schaaf, and Julienne Stroeve. New direction in earth observing scientific application of multiangle remote sensing. Bulletin of the American Meteorological Society, 1999.

[9] Sicong Liu and Peijun Du. Multi-angular satellite remote sensing and forest inventory data for carbon stock and sink capacity in the eastern united stats forest ecosystems. ISPRS Journal of Photogrammetry & Remote Sensing, 2004.

[10] Stavros Stagakis, Nikos Markos, Olga Sykioti, and Aris Kyparis- sis. Monitoring canopy biophysical and biochemical parameters in ecosystem scale using satellite hyperspectral imageery : An application on a phlomis fruiticosa mediterranean ecosystem using multiangular CHRIS/PROBA obersvations. Remote Sensing of the Environment, 114:977–994, 2010.

[11] Riccardo Duca and Fabio Del Frate. Hyperspectral and multi-

In document Domain adaptation using manifold alignment (Page 41-62)