Procedia Computer Science 46 ( 2015 ) 1817 – 1826
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the International Conference on Information and Communication Technologies (ICICT 2014) doi: 10.1016/j.procs.2015.02.140
ScienceDirect
International Conference on Information and Communication Technologies (ICICT 2014)
Spatial Preprocessing Based Multinomial Logistic Regression For
Hyperspectral Image Classification
Nidhin Prabhakar T V
a*, Gintu Xavier
a, Geetha P
a, Soman K P
a aCentre for Excellence in Computational Engineering and Networking, Amrita Vishwa VidyapeethamCoimbatore, India
Abstract
The paper presents a fast, reliable and efficient method for improving hyperspectral image classification aided by segmentation. The Multinomial Logistic Regression(MLR) algorithm can be extended to a semi-supervised learning of the posterior class distribution using unlabeled samples actively selected from the dataset. Classification results obtained from regression model is improved by performing a maximum a posteriori segmentation as it considers the spatial information of the hyperspectral image. The addition of the spatial processing step prior to the above mentioned classification scheme improves the overall accuracy of the process. The accuracies obtained before and after applying the preprocessing are compared.
© 2014 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the International Conference on Information and Communication Technologies (ICICT 2014).
Keywords: diffusion; hyperspectral image classification; multinomial logistic regresion; semisupervised learning; hyperspectral image
segmentation.
1. Introduction
Advances in hyperspectral sensing technologies and computational mathematics has opened up new opportunities in earth remote sensing and related applications. Land cover mapping, target detection, material identification, precision agriculture, abundance estimation etc are few applications that utilize the large wealth of information
obtained through earth remote sensing1,2,3. The huge chunk of data obtained from a hyperspectral sensor need to be
* Corresponding author. Tel.: +91-9446471110.
E-mail address:[email protected]
© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
converted to a tractable form of information for easy storage, data handling and ease of interpretation and drawing conclusions.
Classification is one such way of information extraction in data mining, where each constituent voxels in the image are identified and a corresponding thematic map is generated. The quality of further analysis and decision making for an application depends on the accuracy of the classification performed. The hyperspectral images are inherent to noises which may hinder proper classification and a preprocessing step becomes necessary to reduce the noise corruption. A band-by-band nonlinear diffusion applied prior to classification ensures increased class separability and noise reduction. This step ensures that the noise in the data is not propagated along a cascaded processing scheme.
The selection of a classification strategy should ensure high performance and speed, which in hyperspectral
scenario is a challenging problem2. The numerous contiguous spectral bands and relatively fewer labeled samples
available for training leads to Huges phenomenon3,6. Feature extraction by transformation to subspaces, band
selection etc are ways of reducing the curse of dimensionality. But these steps are time consuming and may result in
loss of information, affecting the classification accuracy2.
A deterministic classification approach is a satisfactory solution for the aforementioned bottlenecks. Multinomial logistic regression (MLR) is such a deterministic approach which directly learns and models the posterior class probabilities in Bayesian framework without learning the class-conditional densities, facilitating classification with
smaller training set and less complexity5,6,7,8. A semi-supervised scheme with active learning incorporates unlabeled
samples along with the labeled samples for training, solving the problem of limited labeled samples for training the
model6.
The classification accuracy of the above model can be further improved by incorporating the spatial information
of the hyperspectral image along with its spectral information6, 9,10. Hyperspectral segmentation using the posterior
class distribution modeled by the MLR, and the spatial information modeled by multilevel logistic (MLL) prior, offers better accuracy. The final segmentation is obtained through maximum a posteriori (MAP) segmentation computed using α–expansion min-cut based optimization technique.
2. Proposed Method
Flow diagram of the proposed method is as shown in Fig.1. Descriptions of the steps involved are explained in the following sections.
3. Denoising
The hyperspectral images are inherent to noises as hyperspectral sensors makes noises due to photon effect, sensor distortions, calibration etc. The aim is to enhance the quality of the image by selecting a denoising technique, so as to smoothen the homogenous regions of the imagery without affecting the significant edge information for better classification accuracy.
For a 2-D image X, PDE based general diffusion equation is represented as
( ( , , ))X x y t div c x y t X x y t( ( , , ) ( , , )) (1)
t
where c x y t the diffusion coefficient, diffusivity, t denotes the number of iterations and represents the ( , , )
gradient operator12,13,14. This diffusion is linear and space invariant when c x y t is constant, but has the ( , , )
disadvantage of blurring the edges resulting in the loss of significant information during denoising.
In order to reduce the smoothing effect along the edges, Perona and Malik proposed a diffusion coefficient,
based on the gradient of the image at time t13. This edge seeking function is given as
2 || || (|| ||) (2) X k c X e
where || X and k makes the diffusion equation non-linear and space variant. The value of k controls the || sensitivity to the edges. A larger value of k implies larger c and the edges will be heavily blurred on convolution, whereas a smaller k value leads to slower diffusion. The value of k is usually chosen experimentally.
Fig. 1. Flow graph of the proposed method
The norm of gradient || X is larger at the edges and smaller at the homogenous regions, whereby ensuring ||
less blurring at edges and more smoothing effect at homogenous regions. Hence the Perona-Malik non-linear diffusion(when applied to each band of the hyperspectral imagery) improves the image quality by modifying the spatial content (spectral information is not considered).
For a hyperspectral image, X s( ) (X s1( ),...,X sd( ))T ; s ( , )x y , with n pixels in each of the d bands, the
evolution of the PDE in (1) for the multi-scale representation of the image is given as12
i ( ) (3)
i
X
div c X t
where i =1,2,…., d. The sequential denoising of individual band is followed by concatenation to form the hyperspectral image cube and the feature vector formation by reshaping the hypercube. Classification is performed
MLR Modeling and Regression Coefficient
Estimation Hypercube & Feature Vector
Formation
Learning Posterior Probability Distribution
Graph-Cut based Segmentation
Land Cover Map Training Phase Testing Phase Sp atia l P re pro ce ss in g Su pe rv ise d Cl assific atio n Se gm en ta tio n
Load Hyperspectral Image Data
Band-by-Band Perona-Malik Diffusion
on this pre-processed image.
4. Classification: A Deterministic Approach 4.1. Problem Statement
Let the image be represented asx ( ,...,x1 xn , where the ) n-pixels are represented as d-dimensional feature vectors.
{1,..., }
I n , be the set of integers indexing the n-pixels of the hyperspectral image (or n feature vectors) and
{1,..., }
L K be the set of Kclass labels. Fori I, let the image of class labels be represented asy ( ,...,y1 yn), where
yi L denotes class label for the ith pixel. The aim of hyperspectral image classification is to infer yi Lfor alli,
from the feature vector xi and to generate a 2-D image of class labels representing the class information of the
original d-dimensional hyperspectral image. 4.2. MLR Modelling and Regressor Estimation
The aim of MLR based supervised learning algorithm is to design a classifier, based on L labeled training samples ( L I ), that is capable of distinguishing Kclasses when feature vector(s) (or a basis function computed from the observed feature) is given as the input for classification. The algorithm used in this paper includes a training phase
for estimating the regressors w and a testing phase for finding posterior probability of each feature vectors from
which the class labels are inferred6,9,15.
4.2.1. Training-Supervised
Let the L training samples with known class labels be represented as a training setDL {( , ),...,( ,x y1 1 x yL L)}. The
posterior class distributions under this MLR model are calculated for the MAP estimation of multinomial logistic
regressors w(initial w is an assumption)16. The general MLR model representation9,15 is:
( ) ( ) 1 exp( ) ( | , ) (4) exp( ) T T k i i i K k i k w x P y k x w w x
where the right hand side represents the probability that x belongs to class k for k L .i w( )k is the set of logistic
regressors for class k where w (w(1),...,w(K 1)) and w( )K is usually set to ’0’ as the conditional probabilities for
Kth class can be directly found by subtracting the sum of (K 1) conditional class16.
1
( ,... )t
x x x , represents the feature vectors selected to train the model. Non-linear functions like symmetric kernels improve the data separability
in the transformed space17 and hence are ideal for hyperspectral image classification. This paper uses a Gaussian
Radial Basis Function represented as K x x( ,i j) exp( ||xi xj|| /(22 2)) for representing the training vectors17.
The expressions for the MAP estimation of wusing expectation maximization (EM) and block Gauss-Seidel
iterative algorithm is briefed below (detailed explanation in9,11,18).
In Bayesian approach, the posterior density of w with Y and L X as the set of labels and set of feature vectors L
in the given labeled training samples respectively, is given as
( |p w Y XL, L) p Y X w p w X( L| L, ) ( | L) (5)
The regressorswcan be estimated as the value of wthat maximizes the conditional log data likelihood. This
MAP estimate is given by
ˆ arg max{ ( ) log ( | L)} (6)
w
w l w p w X
1 ( ) ( ) 1 1 ( ) log ( | , ) log ( | , ) ( i log ( exp( )) (7) L L L i i i L K y T T j i i i j l w p Y X w p y x w x w x w ( | L)
p w X is the prior on wand acts as the regularizers for promoting better solutions during the testing phase11.
The Gaussian prior used is
( | ) exp{ 1 } (8)
2
T
p w w w
The MAP estimation is performed using an expectation-maximization algorithm9,19. EM is an iterative procedure
where the E-step (for mean) and M-step (for maximizing) are performed during iterations. Specifically the tth
iteration can be briefed as E-step ( | t) [log ( , | L) | t] (9) Q w w E p w D w M-step 1 arg max ( | ) (10) t t w w Q w w
Detailed explanations and algorithms are presented in9,11,18,19.
4.2.2. Training-Semisupervised
For cases where the number of labeled samples available is less, a larger set of unlabeled samples, 1,....,
{ }
U L L U
X x x and a smaller set of labeled samples, DL {( , ),...,( ,x y1 1 x yL L)}form the training set. Methods
like random selection and maximum entropy are used6 to actively select the unlabeled samples from the dataset. The
MAP estimation of the regressors follows the same formulation as in supervised except for the modified training set
representation during the iterations6.
The dataset used in this paper being AVIRIS Indian Pines with all the label information, this semi-supervised learning approach is not given importance in the performance evaluation of adding the preprocessing step.
4.2.3. Testing
The estimated regression coefficients along with the testing samples are fed to the model for determining the posterior class densities of each feature vectors, in all the K classes. The testing data can be the entire feature vectors including the training samples or the feature vectors excluding the training samples. The index corresponding to the maximum posterior class probability for a feature vector represents the class label for that particular vector. Likewise the complete classification map can be generated.
5. Segmentation 5.1. Problem Statement
The hyperspectral earth images are very likely to have similarity among the neighbouring pixels, and exploiting this peculiarity by segmenting the similar regions improves the classification performance. For the image
represented asx ( ,...,x1 xn with ) I {1,..., }n as the set of integer indexes and L {1,..., }K as the set class labels,
such that pixels in each regions are similar in some sense. This process incorporating the contextual information improves the overall classification accuracy.
The MAP segmentation under the Bayesian framework is given as9
1
ˆ arg max (log ( | ) log ( )) log ( ) (11)
n i i i y L i y p y x p y p y Or equivalently 1
ˆ arg min (log ( | ) log ( )) log ( )ˆ (12)
n
i i
y L i
y p y w p y p y
where p y w is outcome of classification step and ( )( | )i ˆ p y is the MLL prior.
5.2. MLL Prior Estimation
MLL prior promotes a piecewise smooth segmentation and the segmentation assumes the adjacent pixels to be in
same class.This Markov Random Field (MRF) prior is widely used in segmentation21. The prior for segmentation
has a Gibbs’s distribution21,23 and is represented as
( ) 1 c C c( ) (13)
V y
p y e
Z
where Z is normalizing constant, V y is the prior potentials corresponding to the set of cliques over the imagec( ) 6,
21, 28.
Equation (13) can be rewritten as
( , ) ( ) 1 ( ) (14) yi i j i I i j C V y y p y e Z
where (yi y is the unit impulse function representing the pairwise interaction and controls the level of j)
smoothness for segmentation6.
5.3. Graph-cut Segmentation
From (12) and (14), the MAP estimation can be finally given as6
1
( , )
1 ( , )
ˆ arg min{ (log ( | ) log ( ))ˆ
( log ( ) ( ))}
ˆ
arg min log ( | ) ( )
n i i y L i i i j i I i j C n i i j y L i i j C y p y w p y p y y y p y w y y (15)
Equation (15) is an integer optimization problem. Satisfactory approximation to this MAP segmentation problem can be obtained by mapping it on to a graph and performing -expansion algorithm through min-cut computations
24-27.
6. Experimental Results
This paper claims a fast, reliable and efficient method for improving the supervised classification aided by
segmentation6 of the Indian Pines scene. The justification is provided in the following sections by furnishing the
6.1. Dataset description
The data collected by AVIRIS system, operated by NASA Jet Propulsion Laboratory, over the Northwestern Indiana in 1992 is used for validating the result (Fig. 2.).
a) b)
Fig. 2. (a) AVIRIS data scene (b) Ground truth image with class descriptions
The Indian Pines scene comprises of 224 spectral channels (bands) in the wavelength range of 400-2500nm, with nominal spectral resolution of 10nm and spatial resolution of 20m per pixel. The image has 145x145 pixels per band. Fig. 2(a) shows the AVIRIS Indian Pines data scene and Fig. 2(b) shows the labeled ground truth information of the 16 mutually exclusive classes of which ten corresponds to different crops, five represents vegetation types and one represent building. The background is represented as white and is not considered during classification.
6.2. Accuracy Assessment Measures
The best way of representing the classification outcome is through confusion or error matrix. But, as a common measure of accuracy for classification and segmentation stages, statistical parameters are chosen. They are: Overall Accuracy(OA), Class Accuracy, Average Accuracy, Kappa index etc. In this paper we consider only the Overall Accuracy.
Overall Accuracy Total number of correctly classified pixels
Total number of pixels
Since the method used in this paper is probabilistic, Monte Carlo (MC) runs are performed to obtain a stable result. 5 MC runs are used in this paper. Hence, mean of Overall Accuracies after all the MC runs are computed for evaluation.
6.3. Results and Discussions
First, the Perona-Malik diffusion algorithm is applied band-by-band, to the Indian Pines image. Number of iterations, step size and diffusion control parameter k are chosen experimentally to give best result in the classification stage. Three iterations with step size less than 0.2 and k=0.1 is chosen by trial and error method for this experiment. The outcome of this step for a single band is as shown in Fig. 3.
Each diffused image (145x145) is concatenated to form the hypercube (145x145x220) and is reshaped into a
matrix (220x21025) in which, each column represents feature vectors. The 20 noisy and water absorption bands29
are removed to form 200x21025 matrix. This matrix is saved as a ‘.mat’ file which is loaded for the classification in the proposed method.Fig. 4 shows the classification and segmentation results obtained before and after applying preprocessing step. The parameters chosen for the supervised MLR based classification and segmentation are same for both methods (with and without preprocessing) except for the dataset used. The MLR probabilistic model being
predictive7 will generate results with slight variation on each run. Hence, Monte Carlo runs are performed for
getting more accurate and stable values as output. Linear and radial basis function (RBF) kernel learning methods are used for comparison and for deriving a fast and reliable method. For the training phase, 500 training samples (20
from each class) are selected. As the number of training samples increases, the accuracy also increases. Due to space
limit, only the most significant outcomes are furnished. The smoothness parameter μ chosen for segmentation is 46.
For testing, all the feature vectors are used along with the training samples, since method6 is fast and predictive.
a) b)
Fig. 3. Effect of Perona-Malik diffusion (a)Original Band 3 (b) Denoised and smoothened Band 3 after diffusion
Table 1 shows the comparison of the Mean Overall Accuracy (MOA) for different learning methods, before and after preprocessing. From Table 1 it can be inferred that, the proposed method offers better accuracies than the method without preprocessing. Also, the linear method gains accuracies equal to or greater than that of RBF method upon preprocessing. Table 1 also shows that, the computational time taken by linear method (using feature vectors directly) is much lesser than that of RBF method because RBF computes the function values among all feature vectors. In this case, the matrix dimension for w is 501x15, whereas for linear, w is 201x15.
Table 1. Comparison of the performance of proposed and existing method (MC runs=5) Method Linear Kernel RBF Kernel
Computation
Time (mins) Classification (MOA %) Segmentation (MOA %) Computation Time (mins) Classification (MOA %) Segmentation (MOA %) Without
spatial
processing 1.30 50.30 73.50 7.04 74.87 83.47 With spatial
processing 2.05 86.37 90.62 7.45 86.32 90.31
Hence, the preprocessing makes the linear learning method for MLR based classification aided by segmentation, the fast, reliable (since MC runs can be increased to get more consistent result as it fast) and accurate method, that this paper claimed.
7. Conclusion
A spatial preprocessing step prior to MLR based hyperspectral image classification, for improving the overall performance was proposed in this paper. The experimental results and discussions prove the efficiency of the proposed method. Through comparisons it is clear that, when the proposed preprocessing step is applied, linear kernel learning method becomes the fast, reliable and efficient method that improves the classification and segmentation results when applied to AVIRIS Indian Pines dataset.
Acknowledgements
The authors would like to thank Ms. Jun Li, Dr. Jose M. Bioucas Dias and Antonio Plaza for sharing the demo codes
of MLR based hyperspectral classification and segmentation along with the dataset files16. We also thank Mrs.
a) b)
c) d)
Fig. 4. (a)-(b) Classification map (OA 53%) and Segmentation map (OA 70%) before preprocessing resp. (c)-(d)Classification map (OA 86%) and Segmentation map (OA 90%) after preprocessing resp.
References
1. Muhammad Ahmad, Sungyoung Lee, Ihsan Ul Haq and Qaisar Mushtaq. Hyperspectral remote sensing: Dimensional reduction and end member extraction. International Journal of Soft Computing and Engineering; May 2012. 2-2.
2. Heesung Kwon, Xiaofei Hu, James Theiler, Alina Zare and Prudhvi Gurram. Algorithms for multispectral and hyperspectral image analysis.
Journal of Electrical and Computer Engineering; 2013.
3. G.Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Trans. on Information Theory; Jan 1968. 14.
4. Gustavo Camps - Valls, Luis Gomez-Chova, Jordi Mu˜noz-Mari, Joan Vila-Frances and Javier Calpe-Maravilla. Composite Kernels for Hyperspectral Image Classification. IEEE Geoscience and Remote Sensing Letters.; Jan 2006. 93 – 97. 3-1.
5. Y. Dan Rubinstein. Discriminative vs. informative learning. Dissertation submitted to Stanford University; Jan 1998.
6. Jun Li, José M. Bioucas-Dias and Antonio Plaza. Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning.IEEE Trans on Geoscience and Remote Sensing; Nov 2010. 48.
7. Michael I Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naïve Bayes. Advances in neural
information processing systems; 2002. 841.
8. V.N Vapnik. Statistical learning theory. John Wiley and Sons; 1998.
9. J. Borges, J. Bioucas-Dias and A. Marçal. Bayesian hyperspectral image segmentation with a discriminative class learning. Proceedings
of.IEEE Int.Geosci.Remote Sens. Symp.; 2007. 3810-3813.
10. Mathieu Fauvel , Jon Atli Benediktsson , Jocelyn Chanussot , Johannes R. Sveinsson. Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE trans on Geoscience and Remote Sensing; 2008. 3804-3814:46-11.
11. Balaji Krishnapuram, DavidWilliams, Ya Xue, Alex Hartemink and Lawrence Carin. On Semi-Supervised Classification. Advances in
neural information processing systems; 2004. 721-728.
12. Kavitha Balakrishnan, Sowmya V and Dr. K.P. Soman. Spatial Preprocessing for Improved Sparsity Based Hyperspectral Image Classification. International Journal of Engineering Research & Technology; July 2012. 1-5.
13. P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell.; Jul. 1990.
629-639:12- 7.
14. S. K. Weeratunga and C. Kamath. A Comparison of PDE-based Nonlinear Anisotropic Diffusion Techniques for Image Denoising. Image
Processing: Algorithms and Systems II, SPIE Electronic Imaging.Santa Clara; Jan 2003.
15. D.Böhning. Multinomial logistic regression algorithm. Ann. Inst. Stat. Math; Mar. 1992. 197-200:44-1. 16. http://www.lx.it.pt/~jun/#code.
17. G. Camps-Valls and L. Bruzzone. Kernel-based methods for hyperspectral image classificaiton. IEEE Trans. Geosci. Remote Sens.; Jun. 2005. 1351-1362:43-6.
18. J.Bernardo and A. Smith. Bayesian Theory. Wiley; 1994.
19. A.Dempster, N.Laird and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J.R. Stat.Soc.; 1977. 1-38:39-1. 20. D.MacKay. Information-based objective functions for active data selection. Neural Comput; Jul. 1992. 590-604:4-4.
21. S. Gemen, D. Gemen. Stochastic relaxation Gibbs distribution and Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell.; Nov.1984. 721-741:6.
22. S.Z.Li. Markov random field modeling in computer vision. Springer-Verlag;1995.
23. J.Besag. Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc B; 1974. 192-236:36-2.
24. Yuri Boykov, Olga Veksler and Ramin Zabih. Fast approximate energy minimization via graph cut. IEEE trans. on PAMI; Nov 2001.
1222-1239:20-12.
25. Vladimir Kolmogorov, Ramin Zabih. What energy functions can be minimized via graph cuts?. IEEE Trans on Pattern Analysis and
Machine Intelligence; Feb 2004. 147-159:26- 2.
26. Yuri Boykov, Vladimir Kolmogorov. An experimental comparison of Min-Cut/Max-Flow algorithms for energy minimization in vision.
IEEE Trans on Pattern Analysis and Machine Intelligence; Sep 2004. 1124-1137:26-9.
27. Shai Bagon. Matlab wrapper for Graph Cut. Available at www.wisdom.weizmann.ac.il/~bagon. December 2006. 28. S.Z Li. Markov Random Field modeling in image analysis. 2nd ed. Springer-Verlag; 2001.
29. Landgrebe, http://dynamo.ecn.purdue.edu/˜biehl/MultiSpec; 1992.
30. Santiago Velasco-Forero, Dr. Vidya Manian. Improving hyperspectral image classification using spatial preprocessing. IEEE Trans.