New Fully Automatic Multispectral Image Classification based on Scatterplot Method

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 10, October 2013)

388

New Fully Automatic Multispectral Image Classification based

on Scatterplot Method

S.M. Ali

Remote Sensing Research Unit, College of Science, University of Baghdad, Iraq, Baghdad, Al-Jaderyia

Abstract— A new multispectral image classification method is presented. The method is based on dividing the Near Infrared “NIR” and Visible Red “VR” scatterplot diagram into regions corresponding to their reflectance values. The best line discriminating the Soil’s components from the vegetated area is recognized by utilizing the least square fitting criterion. The vegetate line which differentiate the fully vegetated area from the partially vegetated (wet and dry) regions then identify by a line parallel to soil line. Water area, wet and dry vegetated areas are separated by lines perpendicular on the soil line. A tremendous encouraging classification results are obtained if they compared with the traditional supervised and unsupervised classification techniques.

Keywords — Image Segmentation, Image Classification, Scatterplot Classification, Scatterplot Line, Multispectral Image Classification

I. INTRODUCTION

Satellite images afford excellent means to monitor on-going and potential ecological threats on the Earth’s surface. Fast monitoring operation requires the implementation of certain image pattern recognition technology. Among these areas, the automatic classification of remote sensed data has received greatest attention at the past recent years. Landuse classification and detection methods (e.g. crop inventor, crop-disease detection, forestry, monitoring of water and air pollutions, etc) were the most interesting among them. Several methods have been proposed to classify remotely sensed images; generally, the developed techniques have been categorized as supervised and unsupervised methods. The supervised methods using samples of known identity to classify pixels of unknown identity, while the unsupervised methods segmenting, automatically, the image into spectral classes based on natural grouping found in the data, for instance see ([1], [2], [3]). Multispectral remote sensing technology has proven as a powerful tool for assessing the identity, characteristics, and growth potential of most structures of Landcover materials; e.g. water (such as oceans, seas, lakes and rivers) have a rather low reflectance in both ―NIR‖ and ―VR‖ spectral bands (at least away from shores) and thus result in very low reflectance values, soils, generally, exhibit a ―NIR‖ spectral reflectance somewhat larger than the red, while, because live green plants absorb solar radiation in the photosynthesis process, they appear relatively dark in the photo-synthetically active radiation (PAR) spectral region, while they appear relatively bright in the ―NIR‖ radiation.

By contrast, clouds and snow tend to be rather bright in the ―VR‖ (as well as other visible wavelengths) and quite dark in the ―NIR‖, ([4]). In this research, ―NIR” via ―VR‖

scatterplot will be used to partitioning the remotely sensed images into regions, based on their reflectance and observance behaviors. As illustrated in Fig.1, the soil line is defined first; other landcover regions (i.e. water, wet and dry vegetations and soils, and the fully vegetated areas) are then distinct by delineating their separating lines, either as parallel or perpendicular to the soil line.

Fig.1 Classification Scheme Based on Scatterplot.

II. SAMPLES OF THE TESTED SCENES

Two available scenes (TM sensor, resolution 30m/pixel, taken at 2000) for different Iraqi’s regions (shown in Fig.2), have been chosen to be used for testing the introduced segmentation (or classification) algorithm; these are

Fig.2(b): Covers about 1802 km2, enclosed by the following geographic (UTM/WGS84 system) lines; Latitudes 34.354o  35.050oN, and longitudes 43.185 o 43.764 o E. The area (as illustrated in Fig.2a) is located West of KARKUK City, Al-TAMIM one of the Iraqi’s provinces. As it is obvious, the area involves the junction region between the Tigris and the Lower Al-Zab Rivers which contains different types of Landcover classes (e.g. water, vegetate, dry and wet soils areas).

(2)

International Journal of Emerging Technology and Advanced Engineering

389

The area (as illustrated in Fig.2a) is situated on both banks of the Tigris river, the BALAD City on its north-east side belong to SALAH AD DIN province, while AL-KHLIS City on its south-west bank belongs to BA’QUBA

province. Almost, the same Landcover classes involved by the previous scene are presented in this image.

Fig.2: (a) The Iraqi’s administration map showing the locations and the territories of their provinces and illustrates the positions of chosen studied areas, (b) & (c) represent the false colored version of the studied scenes (painted by mixing, respectively, the Near-Infrared (band4), the Visible-Red

(band3), and the Green (band2) as RGB..

III.SUPERVISED AND UNSUPERVISED CLASSIFICATION

METHODS

Generally, classification techniques are categorized into

unsupervised and supervised classification methods,

unsupervised classification provides a simple way to segment multispectral data using the data statistics, while in the supervised methods the input pattern is identified as a member of a predefined class, thus it is much more accurate for mapping classes, but depends heavily on the cognition and skills of the image specialist.

A.

Unsupervised Classification or Clustering Methods:

In which the patterns are assigned to unknown classes; e.g. a group multiband spectral response patterns into clusters that are statistically separable. Thus, a small range of digital numbers (e.g. 3 bands) can establish one cluster that is set apart from a specified range combination for another cluster. Separation will then depend on the parameters which are chosen for differentiating, for the details see [5].

The clusters can be modified so that their total number can vary arbitrarily when more bands are involved. Each pixel in an image is assigned to one of the clusters as being most similar to it in digital number combination value. Generally, in an area within an image, multiple pixels in the same cluster correspond to some ground feature or class so that patterns of gray levels result in a new image depicting the spatial distribution of the clusters. These levels can then be assigned colors to produce a cluster map.

The trick then becomes one of trying to relate the different clusters to meaningful ground categories. We do this by either being adequately familiar with the major classes expected in the scene, or, where feasible, by visiting the scene (GroundTruth) and visually correlating map patterns to their ground counterparts. Since the classes are not selected beforehand, this latter method is called

Unsupervised Classification. Among the too many

published classification methods that are categorized as to be Unsupervised Classification techniques, the Isodata and

K-Means are the most common used methods.

A1. Isodata and K-Means Unsupervised Classification Methods

(3)

International Journal of Emerging Technology and Advanced Engineering

390

Fig.3, demonstrates the classification results of Isodata and K-Means classification performed with different number of classes and iterations, using multispectral images shown in Fig.2 (b &c).

B. Supervised Classification Methods:

In these classification methods, the input pattern is identified as a member of a predefined class, it is much more accurate for mapping classes, but depends heavily on

the cognition and skills of the image specialist. The

strategy is simple: the specialist must recognize conventional classes in a scene from prior knowledge (such as personal experience with what's present in the scene). This familiarity allows the individual(s) making the classification to choose and set up discrete classes and then, assign them category names. The resulting Trainingsites are areas representing each known land cover category that appear fairly homogeneous on the image. In the computer display one must locate these sites and circumscribe them with polygonal boundaries drawn using the computer mouse. For each class thus outlined, mean values and variances of the digital numbers for each band used to classify them are calculated from all the pixels enclosed in each site. More than one polygon is usually drawn for any class, the classification program then acts to cluster the data representing each class. When the digital numbers for a class are plotted as a function of the band sequence, the result is a spectralsignature or spectral

response curve for that class. The multiple spectral

signatures so obtained are for all of the materials within the site that interact with the incoming radiation. Supervised classification can then proceeds by statistical processing in which every pixel is compared with the various signatures and assigned to the class whose signature comes closest. Consequently, a few pixels in the scene may not match and remain unclassified, because these may belong to a class not recognized or defined. Fig.4 illustrates the NIR –band of the processed scene Fig.2 (b), with six discretional classes representing the designated regions by our presented scatterplot classification method, number of points enclosed by each selected polygon are shown in the region of interest ROI control dialog. The following

Classical Supervised Multispectral Classification methods

are described in most remote sensing textbooks and are commonly available in today’s image processing software systems; i.e.

B1. Supervised Parallelepiped Classification Method:

This classification method uses a simple decision rule to classify multispectral data. The decision boundaries form an n-dimensional parallelepiped in the image data space. The dimensions of the parallelepiped are defined based upon a standard deviation threshold from the mean of each selected class. If a pixel value lies above the low threshold and below the high threshold for all n bands being classified, it is assigned to that class.

If the pixel value falls in multiple classes, then the pixel assigns to the last class matched. Areas that do not fall within any of the parallelepipeds are designated as unclassified. There are high and low limits for every signature in every band. When a pixel’s data file values are between the limits for every band in a signature, then the pixel is assigned to that signature’s class. For the technical details of two-dimensional parallelepiped classification technique see [7]. The performance of the parallelepiped classifier on the multispectral images (shown in Fig.2 b & c) are shown in Fig.5.

B2. Supervised Minimum Distance Classification Method:

The minimum distance decision rule (also called spectral distance) calculates the spectral distance between the measurement vector for the candidate pixel and the mean vector for each signature, for the details see [8]. The equation for classifying by spectral distance is based on the equation for Euclidean distance ([8], and [9]):



  

n

1 i

2 xyi ci

xyc ( X )

SD  (1)

Where: ―n‖ number of bans, ―i‖ a particular band, ―c‖ a

particular class, ―Xxyi‖ the data file values in band i, ci the

mean of data file values in band i for the sample for class c, and SDxyc is the spectral distance from pixel x, y to the

mean of class c.

The performance of the Minimum Distance classifier on the multispectral images (shown in Fig.2 b & c) are shown in Fig.6.

Isodata 6 classes, 1-iteration

Isodata 6-classes, 4-iterations

Isodata 10-classes, 4-iterations

(4)

International Journal of Emerging Technology and Advanced Engineering

391

K-means: 6-classes, 4-iterations

K-means: 10-classes, 4-iterations

Fig.3: The classification results obtained by implementing the Isodata and the K-Means classification methods, using images shown in Fig.2

(b &c).

Fig.4: illustrates the selected ROI and the controls dialog representing the selected points within each polygon.

Fig.5: illustrating the performance of the supervised parallelepiped classification method.

B3. Supervised Mahalanobis Distance Classification Method:

The maximum likelihood decision rule is based on the probability that a pixel belongs to a particular class. The Mahalanobis distance algorithm assumes that the histograms of the bands have normal distributions. If this is not the case, we may have better results with the parallelepiped or minimum distance decision rule, or by performing a parallelepiped classification. This classifier is similar to minimum distance, except that the covariance matrix is used in the equation; i.e. Variance and covariance are figured in so that clusters that are highly varied lead to similarly varied classes, and vice versa. For example, when classifying urban areas—typically a class whose pixels vary widely—correctly classified pixels may be farther from the mean than those of a class for water, which is usually not a highly varied class. The equation for the Mahalanobis distance classifier is as follows ([8], [9]):

)

)(

(

)

(

c1 c

T

c

Cov

X

M

X

D





(2)

Where: Dthe Mahalanobis distance, c a particular class,

X is the measurement vector of the candidate pixel, M_cthe

mean vector of the signature of class c, Cov_cthe covariance matrix of the pixels in the signature of class

c,Cov_c1inverse of Cov_c, and T transposition function. The pixel is assigned to the class c for which D is the lowest.

The performance of the Mahalanobis classifier on the multispectral images (shown in fig.2 b & c) are shown in Fig.7.

(5)

International Journal of Emerging Technology and Advanced Engineering

392

Fig.6 illustrating the performance of the supervised Minimum

Distance classification method

Fig.7 illustrating the performance of the supervised Mahalanobis Distance classification method

B4. Maximum Likelihood

The maximum likelihood decision rule is based on the probability that a pixel belongs to a particular class. This classification method assumes that the statistics for each class in each band are normally distributed and calculates the probability that a given pixel belongs to a specific class. If this is not the case, we may have better results with the parallelepiped or minimum distance decision rule. Unless a probability threshold is selected, all pixels will be classified; i.e. each pixel is assigned to the class that has the highest probability, for the details see: ([7]). However, if we have a priori knowledge that the probabilities are not equal for all classes, we can specify weight factors for particular classes. This variation of the maximum likelihood decision rule is known as the Bayesian decision rule ([10]). The equation for the maximum likelihood/Bayesian classifier is as follows:

)] )( ( ) ( 5 . 0 [ |)] ln(| 5 . 0 [ )

ln( 1

c c

T c c

c Cov X M Cov X M

a

D     

(3)

Where: Dthe Weighted distance (likelihood), ln natural logarithm function, all others (i.e. X, M_c,Cov_c, Cov_c1, and T ) as those in eq.(2).

Again, the pixel is assigned to the class c for which D is the lowest. The performance of the Maximum likelihood classifier on the multispectral images (shown in Fig.2 b & c) are shown in Fig.8.

Fig.8 illustrating the performance of the supervised Maximum Likelihood classification method

IV. THE AUTOMATIC CLASSIFICATION METHOD

It is well known that the soil has property which shows a linear relationship between ―NIR‖ and ―VR‖ reflectance

bands. The length of the linearity between the ―NIR‖ and

―VR‖ responses is affected by the soil’s dryness or wetness

contents; i.e. it is shortened for homogenous soils, and extended as the soil’s contents varies([1]). Therefore, as the soil line is defined, the corresponding reflectance regions of the other spectral classes can be decided accordingly by following the next operational steps:

Step1: The ―NIR‖ and ―VR‖ spectral bands should first

normalize to the reflectance ranges [0,109] and [0, 126], respectively.

Step2: The scatterplot extremist points (i.e. shortest and farthest distances from the origin) should be determined, defined respectively as Min{Dis} and Max{Dis}, where:

2 2

)

(

)

(

x

NIR

y

VR

Dis





(4)

Step3: The soil line points can be determined, using the following straight line equation;

b

mx

y





(5)

Where: ―b” represents the straight line’s intersection with ―NIR‖-axis, while ―m”is the slope of the straight line, given by:

)

(

)

(

)

(

)

(

Min

VR

Max

VR

Min

NIR

Max

NIR

m



(6)

International Journal of Emerging Technology and Advanced Engineering

393

Step4: The upper-lower boundaries of the soil’s points can be determined, using:

1

T

b

mx

y

_upper





,

1

T

b

mx

y

_lower







(7)

Our experimental results showed that the best suggested value for T1=10.

Step5: The Soil line can now be define from the Scatterplot points enclosed by the boundaries given in Eq.(7), using best linear least square method, given by:

b

x

mVR

y

NIR

(

)



(

)



(8)

Where:

Step-6: As the soil line is defined, the perpendicular line

“wet-dry-line” separating the wet and dry features can,

easily, be defined from its slope (m1=-1/m) and a point

which can be derived from the soil line. The best (x, y)

coordinates of the this point has been found as to be [x1=Mean {VR(x)},y1=Mean{NIR(y)}] of eq.(8).

Substituting (x1, y1, and m1) in eq.(8), the wet-dry-line

intersects point becomes;

1 1 1

1

y

m

x

b





(9)

The wet-dry line equation, thus, become;

1 1

VR

(

x

)

b

m

)

y

(

NIR





(10)

Step-7: The parallel water-wet line can, now, be defined by shifting down the wet-dry-line [eq.(10)] as given by;

65 b

)

x

(

VR

m

)

y

(

NIR



₁



₁



(11)

Step-8: Finally, the lines (i.e. the vegetation-soil and the

Full-vegetation) which are parallel to the soil line can be defined by shifting them up, as follows;

b

)

x

(

mVR

)

y

(

NIR





+5 (12)

b

)

x

(

mVR

)

y

(

NIR





+30 (13)

It is remain to state that the shifting down and up constants beside the mean values of the VR(x) and NIR(x)

axes, that have been suggested to represent the intersect point between the wet-dry and the soil lines, are proposed depending on the experimental proficiency of the authors, gained from the huge number of implementations performed on various different multispectral image regions. Moreover, the areas of the classified regions have been computed (as the product of the number of pixels involved by each region by spatial resolution of the image points; i.e. 30m2/pixel) with the available true ground scan data. Fig.9 sketches the landcover feature’s boundaries mentioned by the operational steps, mentioned above.

Fig.9: Illustrates the landcover feature’s boundaries mentioned by the operational steps of the automatic classification method

To compare the classification results our presented

scatterplot classification method with those of the

supervised and unsupervised methods, we have found for

convince to perform them on the same number of Landover classes (i.e. 6-classes). The method has been implemented on the samples of multispectral images shown in fig.2 (b & c). The results are illustrated, with their scatterplots and a pie plot showing the percentage area of the classified regions, in Fig.10.

Fig.10: Classification result obtained by implementing our presented automatic scatterplot classification, shown in Fig.2b&c.

V.CONCLUSIONS

In this research, new classification technique for multispectral satellite images has been introduced. The method is stand on some physical phenomena; i.e. amount of reflected and absorbed light by the components of the Earth's surface. In order to prove superiority of our technique (named Scatterplot classification method), several well-known classification techniques (unsupervised and supervised) have been adopted and applied on the same satellite scenes.

126 x

VR

(7)

International Journal of Emerging Technology and Advanced Engineering

394

For the unsupervised methods (Isodata and K-means), it was obvious that the results of classification depends firstly on the proposed number of areas, and secondly on the number of iterations used to classify images. However, there are other threshold features which were not mentioned in this research have great effects on the results (e.g. Maximum std from Mean, Maximum Distance Error, Maximum number of merge pairs, .etc).

For the supervised methods (Parallelepiped, Minimum

Distance, Mahalanobis Distance, and Maximum

Likelihood), the results showed despite they have used the

same Region of Interest Features ROI, different percents of classification for each involved areas have been appeared. For the introduced technique, the classification results were more influential; because it is standing on physics phenomena, and its results were more matches with the evidence obtained from the ground survey. However, despite the encouraging results satisfied by the introduced method, we think it may be improved more with few additional scientific opinions.

REFERENCES

[1] Jensen, J.R., 1986, ―Introductory digital image processing‖, Prentice-Hall A Division of Simon & Schiter, Inc.pp.166.

[2] Lillesand, T.M. & R.W. Kiefer, 1994. Remote Sensing & Image Interpretation. New York: Wiley.

[3] Schowengerdt, R.A., 2007, " Remote sensing: Models and Methods for image processing," Academic Press, 515 pages, ISBN 0123694078.

[4] N. Roshani, M. J. Valadan Zouj, Y. Rezaei, and M.Nikfar, ―Snow Mapping of Alamchal Glacier using Remote Sensing Data,‖ The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B8. Beijing 2008 [5] F.F. Sabins, Jr., ―Remote Sensing: Principles and Interpretation,‖ 2nd

Ed., Reproduced by permission of W.H. Freeman & Co., New York City, 1987.

[6] Tou, J. T. and R. C. Gonzalez, 1974. Pattern Recognition Principles, Addison-Wesley Publishing Company, Reading, Massachusetts

[7] J.A. Richards, 1994, Remote Sensing Digital Image Analysis, Springer-Verlag, Berlin, p.340.

[8] ERDAS Field Guide™, 5th_{Edition 1999, Revised and Expanded,}

Printed in the USA , pp.270.

[9] Swain, Philip H., and Shirley M. Davis, 1978, ―Remote Sensing: The Quantitative Approach,‖ New York: McGraw Hill Book Company. [10] Hord, R. Michael. 1982. Digital Image Processing of Remotely