International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
388
New Fully Automatic Multispectral Image Classification based
on Scatterplot Method
S.M. Ali
Remote Sensing Research Unit, College of Science, University of Baghdad, Iraq, Baghdad, Al-Jaderyia
Abstract— A new multispectral image classification method is presented. The method is based on dividing the Near Infrared “NIR” and Visible Red “VR” scatterplot diagram into regions corresponding to their reflectance values. The best line discriminating the Soil’s components from the vegetated area is recognized by utilizing the least square fitting criterion. The vegetate line which differentiate the fully vegetated area from the partially vegetated (wet and dry) regions then identify by a line parallel to soil line. Water area, wet and dry vegetated areas are separated by lines perpendicular on the soil line. A tremendous encouraging classification results are obtained if they compared with the traditional supervised and unsupervised classification techniques.
Keywords — Image Segmentation, Image Classification, Scatterplot Classification, Scatterplot Line, Multispectral Image Classification
I. INTRODUCTION
Satellite images afford excellent means to monitor on-going and potential ecological threats on the Earth’s surface. Fast monitoring operation requires the implementation of certain image pattern recognition technology. Among these areas, the automatic classification of remote sensed data has received greatest attention at the past recent years. Landuse classification and detection methods (e.g. crop inventor, crop-disease detection, forestry, monitoring of water and air pollutions, etc) were the most interesting among them. Several methods have been proposed to classify remotely sensed images; generally, the developed techniques have been categorized as supervised and unsupervised methods. The supervised methods using samples of known identity to classify pixels of unknown identity, while the unsupervised methods segmenting, automatically, the image into spectral classes based on natural grouping found in the data, for instance see ([1], [2], [3]). Multispectral remote sensing technology has proven as a powerful tool for assessing the identity, characteristics, and growth potential of most structures of Landcover materials; e.g. water (such as oceans, seas, lakes and rivers) have a rather low reflectance in both ―NIR‖ and ―VR‖ spectral bands (at least away from shores) and thus result in very low reflectance values, soils, generally, exhibit a ―NIR‖ spectral reflectance somewhat larger than the red, while, because live green plants absorb solar radiation in the photosynthesis process, they appear relatively dark in the photo-synthetically active radiation (PAR) spectral region, while they appear relatively bright in the ―NIR‖ radiation.
By contrast, clouds and snow tend to be rather bright in the ―VR‖ (as well as other visible wavelengths) and quite dark in the ―NIR‖, ([4]). In this research, ―NIR” via ―VR‖
scatterplot will be used to partitioning the remotely sensed images into regions, based on their reflectance and observance behaviors. As illustrated in Fig.1, the soil line is defined first; other landcover regions (i.e. water, wet and dry vegetations and soils, and the fully vegetated areas) are then distinct by delineating their separating lines, either as parallel or perpendicular to the soil line.
Fig.1 Classification Scheme Based on Scatterplot.
II. SAMPLES OF THE TESTED SCENES
Two available scenes (TM sensor, resolution 30m/pixel, taken at 2000) for different Iraqi’s regions (shown in Fig.2), have been chosen to be used for testing the introduced segmentation (or classification) algorithm; these are
Fig.2(b): Covers about 1802 km2, enclosed by the following geographic (UTM/WGS84 system) lines; Latitudes 34.354o 35.050oN, and longitudes 43.185 o 43.764 o E. The area (as illustrated in Fig.2a) is located West of KARKUK City, Al-TAMIM one of the Iraqi’s provinces. As it is obvious, the area involves the junction region between the Tigris and the Lower Al-Zab Rivers which contains different types of Landcover classes (e.g. water, vegetate, dry and wet soils areas).
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
389
The area (as illustrated in Fig.2a) is situated on both banks of the Tigris river, the BALAD City on its north-east side belong to SALAH AD DIN province, while AL-KHLIS City on its south-west bank belongs to BA’QUBA
province. Almost, the same Landcover classes involved by the previous scene are presented in this image.
Fig.2: (a) The Iraqi’s administration map showing the locations and the territories of their provinces and illustrates the positions of chosen studied areas, (b) & (c) represent the false colored version of the studied scenes (painted by mixing, respectively, the Near-Infrared (band4), the Visible-Red
(band3), and the Green (band2) as RGB..
III.SUPERVISED AND UNSUPERVISED CLASSIFICATION
METHODS
Generally, classification techniques are categorized into
unsupervised and supervised classification methods,
unsupervised classification provides a simple way to segment multispectral data using the data statistics, while in the supervised methods the input pattern is identified as a member of a predefined class, thus it is much more accurate for mapping classes, but depends heavily on the cognition and skills of the image specialist.
A.
Unsupervised Classification or Clustering Methods:In which the patterns are assigned to unknown classes; e.g. a group multiband spectral response patterns into clusters that are statistically separable. Thus, a small range of digital numbers (e.g. 3 bands) can establish one cluster that is set apart from a specified range combination for another cluster. Separation will then depend on the parameters which are chosen for differentiating, for the details see [5].
The clusters can be modified so that their total number can vary arbitrarily when more bands are involved. Each pixel in an image is assigned to one of the clusters as being most similar to it in digital number combination value. Generally, in an area within an image, multiple pixels in the same cluster correspond to some ground feature or class so that patterns of gray levels result in a new image depicting the spatial distribution of the clusters. These levels can then be assigned colors to produce a cluster map.
The trick then becomes one of trying to relate the different clusters to meaningful ground categories. We do this by either being adequately familiar with the major classes expected in the scene, or, where feasible, by visiting the scene (GroundTruth) and visually correlating map patterns to their ground counterparts. Since the classes are not selected beforehand, this latter method is called
Unsupervised Classification. Among the too many
published classification methods that are categorized as to be Unsupervised Classification techniques, the Isodata and
K-Means are the most common used methods.
A1. Isodata and K-Means Unsupervised Classification Methods
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
390
Fig.3, demonstrates the classification results of Isodata and K-Means classification performed with different number of classes and iterations, using multispectral images shown in Fig.2 (b &c).
B. Supervised Classification Methods:
In these classification methods, the input pattern is identified as a member of a predefined class, it is much more accurate for mapping classes, but depends heavily on
the cognition and skills of the image specialist. The
strategy is simple: the specialist must recognize conventional classes in a scene from prior knowledge (such as personal experience with what's present in the scene). This familiarity allows the individual(s) making the classification to choose and set up discrete classes and then, assign them category names. The resulting Trainingsites are areas representing each known land cover category that appear fairly homogeneous on the image. In the computer display one must locate these sites and circumscribe them with polygonal boundaries drawn using the computer mouse. For each class thus outlined, mean values and variances of the digital numbers for each band used to classify them are calculated from all the pixels enclosed in each site. More than one polygon is usually drawn for any class, the classification program then acts to cluster the data representing each class. When the digital numbers for a class are plotted as a function of the band sequence, the result is a spectralsignature or spectral
response curve for that class. The multiple spectral
signatures so obtained are for all of the materials within the site that interact with the incoming radiation. Supervised classification can then proceeds by statistical processing in which every pixel is compared with the various signatures and assigned to the class whose signature comes closest. Consequently, a few pixels in the scene may not match and remain unclassified, because these may belong to a class not recognized or defined. Fig.4 illustrates the NIR –band of the processed scene Fig.2 (b), with six discretional classes representing the designated regions by our presented scatterplot classification method, number of points enclosed by each selected polygon are shown in the region of interest ROI control dialog. The following
Classical Supervised Multispectral Classification methods
are described in most remote sensing textbooks and are commonly available in today’s image processing software systems; i.e.
B1. Supervised Parallelepiped Classification Method:
This classification method uses a simple decision rule to classify multispectral data. The decision boundaries form an n-dimensional parallelepiped in the image data space. The dimensions of the parallelepiped are defined based upon a standard deviation threshold from the mean of each selected class. If a pixel value lies above the low threshold and below the high threshold for all n bands being classified, it is assigned to that class.
If the pixel value falls in multiple classes, then the pixel assigns to the last class matched. Areas that do not fall within any of the parallelepipeds are designated as unclassified. There are high and low limits for every signature in every band. When a pixel’s data file values are between the limits for every band in a signature, then the pixel is assigned to that signature’s class. For the technical details of two-dimensional parallelepiped classification technique see [7]. The performance of the parallelepiped classifier on the multispectral images (shown in Fig.2 b & c) are shown in Fig.5.
B2. Supervised Minimum Distance Classification Method:
The minimum distance decision rule (also called spectral distance) calculates the spectral distance between the measurement vector for the candidate pixel and the mean vector for each signature, for the details see [8]. The equation for classifying by spectral distance is based on the equation for Euclidean distance ([8], and [9]):
n
1 i
2 xyi ci
xyc ( X )
SD (1)
Where: ―n‖ number of bans, ―i‖ a particular band, ―c‖ a
particular class, ―Xxyi‖ the data file values in band i, ci the
mean of data file values in band i for the sample for class c, and SDxyc is the spectral distance from pixel x, y to the
mean of class c.
The performance of the Minimum Distance classifier on the multispectral images (shown in Fig.2 b & c) are shown in Fig.6.
Isodata 6 classes, 1-iteration
Isodata 6-classes, 4-iterations
Isodata 10-classes, 4-iterations
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
391
K-means: 6-classes, 4-iterations
K-means: 10-classes, 4-iterations
Fig.3: The classification results obtained by implementing the Isodata and the K-Means classification methods, using images shown in Fig.2
(b &c).
Fig.4: illustrates the selected ROI and the controls dialog representing the selected points within each polygon.
Fig.5: illustrating the performance of the supervised parallelepiped classification method.
B3. Supervised Mahalanobis Distance Classification Method:
The maximum likelihood decision rule is based on the probability that a pixel belongs to a particular class. The Mahalanobis distance algorithm assumes that the histograms of the bands have normal distributions. If this is not the case, we may have better results with the parallelepiped or minimum distance decision rule, or by performing a parallelepiped classification. This classifier is similar to minimum distance, except that the covariance matrix is used in the equation; i.e. Variance and covariance are figured in so that clusters that are highly varied lead to similarly varied classes, and vice versa. For example, when classifying urban areas—typically a class whose pixels vary widely—correctly classified pixels may be farther from the mean than those of a class for water, which is usually not a highly varied class. The equation for the Mahalanobis distance classifier is as follows ([8], [9]):
)
)(
(
)
(
c1 cT
c
Cov
X
M
M
X
D
(2)Where: Dthe Mahalanobis distance, c a particular class,
X is the measurement vector of the candidate pixel, Mcthe
mean vector of the signature of class c, Covcthe covariance matrix of the pixels in the signature of class
c,Covc1inverse of Covc, and T transposition function. The pixel is assigned to the class c for which D is the lowest.
The performance of the Mahalanobis classifier on the multispectral images (shown in fig.2 b & c) are shown in Fig.7.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
392
Fig.6 illustrating the performance of the supervised MinimumDistance classification method
Fig.7 illustrating the performance of the supervised Mahalanobis Distance classification method
B4. Maximum Likelihood
The maximum likelihood decision rule is based on the probability that a pixel belongs to a particular class. This classification method assumes that the statistics for each class in each band are normally distributed and calculates the probability that a given pixel belongs to a specific class. If this is not the case, we may have better results with the parallelepiped or minimum distance decision rule. Unless a probability threshold is selected, all pixels will be classified; i.e. each pixel is assigned to the class that has the highest probability, for the details see: ([7]). However, if we have a priori knowledge that the probabilities are not equal for all classes, we can specify weight factors for particular classes. This variation of the maximum likelihood decision rule is known as the Bayesian decision rule ([10]). The equation for the maximum likelihood/Bayesian classifier is as follows:
)] )( ( ) ( 5 . 0 [ |)] ln(| 5 . 0 [ )
ln( 1
c c
T c c
c Cov X M Cov X M
a
D
(3)
Where: Dthe Weighted distance (likelihood), ln natural logarithm function, all others (i.e. X, Mc,Covc, Covc1, and T ) as those in eq.(2).
Again, the pixel is assigned to the class c for which D is the lowest. The performance of the Maximum likelihood classifier on the multispectral images (shown in Fig.2 b & c) are shown in Fig.8.
Fig.8 illustrating the performance of the supervised Maximum Likelihood classification method
IV. THE AUTOMATIC CLASSIFICATION METHOD
It is well known that the soil has property which shows a linear relationship between ―NIR‖ and ―VR‖ reflectance
bands. The length of the linearity between the ―NIR‖ and
―VR‖ responses is affected by the soil’s dryness or wetness
contents; i.e. it is shortened for homogenous soils, and extended as the soil’s contents varies([1]). Therefore, as the soil line is defined, the corresponding reflectance regions of the other spectral classes can be decided accordingly by following the next operational steps:
Step1: The ―NIR‖ and ―VR‖ spectral bands should first
normalize to the reflectance ranges [0,109] and [0, 126], respectively.
Step2: The scatterplot extremist points (i.e. shortest and farthest distances from the origin) should be determined, defined respectively as Min{Dis} and Max{Dis}, where:
2 2
)
(
)
(
x
NIR
y
VR
Dis
(4)Step3: The soil line points can be determined, using the following straight line equation;
b
mx
y
(5)Where: ―b” represents the straight line’s intersection with ―NIR‖-axis, while ―m”is the slope of the straight line, given by:
)
(
)
(
)
(
)
(
Min
VR
Max
VR
Min
NIR
Max
NIR
m
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
393
Step4: The upper-lower boundaries of the soil’s points can be determined, using:1
T
b
mx
y
upper
,1
T
b
mx
y
lower
(7)Our experimental results showed that the best suggested value for T1=10.
Step5: The Soil line can now be define from the Scatterplot points enclosed by the boundaries given in Eq.(7), using best linear least square method, given by:
b
x
mVR
y
NIR
(
)
(
)
(8)Where:
Step-6: As the soil line is defined, the perpendicular line
“wet-dry-line” separating the wet and dry features can,
easily, be defined from its slope (m1=-1/m) and a point
which can be derived from the soil line. The best (x, y)
coordinates of the this point has been found as to be [x1=Mean {VR(x)},y1=Mean{NIR(y)}] of eq.(8).
Substituting (x1, y1, and m1) in eq.(8), the wet-dry-line
intersects point becomes;
1 1 1
1
y
m
x
b
(9)The wet-dry line equation, thus, become;
1 1
VR
(
x
)
b
m
)
y
(
NIR
(10)Step-7: The parallel water-wet line can, now, be defined by shifting down the wet-dry-line [eq.(10)] as given by;
65
b
)
x
(
VR
m
)
y
(
NIR
1
1
(11)Step-8: Finally, the lines (i.e. the vegetation-soil and the
Full-vegetation) which are parallel to the soil line can be defined by shifting them up, as follows;
b
)
x
(
mVR
)
y
(
NIR
+5 (12)b
)
x
(
mVR
)
y
(
NIR
+30 (13)It is remain to state that the shifting down and up constants beside the mean values of the VR(x) and NIR(x)
axes, that have been suggested to represent the intersect point between the wet-dry and the soil lines, are proposed depending on the experimental proficiency of the authors, gained from the huge number of implementations performed on various different multispectral image regions. Moreover, the areas of the classified regions have been computed (as the product of the number of pixels involved by each region by spatial resolution of the image points; i.e. 30m2/pixel) with the available true ground scan data. Fig.9 sketches the landcover feature’s boundaries mentioned by the operational steps, mentioned above.
Fig.9: Illustrates the landcover feature’s boundaries mentioned by the operational steps of the automatic classification method
To compare the classification results our presented
scatterplot classification method with those of the
supervised and unsupervised methods, we have found for
convince to perform them on the same number of Landover classes (i.e. 6-classes). The method has been implemented on the samples of multispectral images shown in fig.2 (b & c). The results are illustrated, with their scatterplots and a pie plot showing the percentage area of the classified regions, in Fig.10.
Fig.10: Classification result obtained by implementing our presented automatic scatterplot classification, shown in Fig.2b&c.
V.CONCLUSIONS
In this research, new classification technique for multispectral satellite images has been introduced. The method is stand on some physical phenomena; i.e. amount of reflected and absorbed light by the components of the Earth's surface. In order to prove superiority of our technique (named Scatterplot classification method), several well-known classification techniques (unsupervised and supervised) have been adopted and applied on the same satellite scenes.
126
x
VR
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)
394
For the unsupervised methods (Isodata and K-means), it was obvious that the results of classification depends firstly on the proposed number of areas, and secondly on the number of iterations used to classify images. However, there are other threshold features which were not mentioned in this research have great effects on the results (e.g. Maximum std from Mean, Maximum Distance Error, Maximum number of merge pairs, .etc).
For the supervised methods (Parallelepiped, Minimum
Distance, Mahalanobis Distance, and Maximum
Likelihood), the results showed despite they have used the
same Region of Interest Features ROI, different percents of classification for each involved areas have been appeared. For the introduced technique, the classification results were more influential; because it is standing on physics phenomena, and its results were more matches with the evidence obtained from the ground survey. However, despite the encouraging results satisfied by the introduced method, we think it may be improved more with few additional scientific opinions.
REFERENCES
[1] Jensen, J.R., 1986, ―Introductory digital image processing‖, Prentice-Hall A Division of Simon & Schiter, Inc.pp.166.
[2] Lillesand, T.M. & R.W. Kiefer, 1994. Remote Sensing & Image Interpretation. New York: Wiley.
[3] Schowengerdt, R.A., 2007, " Remote sensing: Models and Methods for image processing," Academic Press, 515 pages, ISBN 0123694078.
[4] N. Roshani, M. J. Valadan Zouj, Y. Rezaei, and M.Nikfar, ―Snow Mapping of Alamchal Glacier using Remote Sensing Data,‖ The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B8. Beijing 2008 [5] F.F. Sabins, Jr., ―Remote Sensing: Principles and Interpretation,‖ 2nd
Ed., Reproduced by permission of W.H. Freeman & Co., New York City, 1987.
[6] Tou, J. T. and R. C. Gonzalez, 1974. Pattern Recognition Principles, Addison-Wesley Publishing Company, Reading, Massachusetts
[7] J.A. Richards, 1994, Remote Sensing Digital Image Analysis, Springer-Verlag, Berlin, p.340.
[8] ERDAS Field Guide™, 5th Edition 1999, Revised and Expanded,
Printed in the USA , pp.270.
[9] Swain, Philip H., and Shirley M. Davis, 1978, ―Remote Sensing: The Quantitative Approach,‖ New York: McGraw Hill Book Company. [10] Hord, R. Michael. 1982. Digital Image Processing of Remotely