Classification - Data Mining A Heuristic Approach Abbass HA (2002) pdf

Neil Dunstan

University of New England, Australia Michael de Raadt

University of Southern Queensland, Australia

Copyright © 2002, Idea Group Publishing. Sensing devices are commonly used for the detection and classification of subsurface objects, particularly for the purpose of eradicating Unexploded Ordnance (UXO) from military sites. UXO detection and classification is inherently different to pattern recognition in image processing in that signal responses for the same object will differ greatly when the object is at different depths and orientations. That is, subsurface objects span a multidimensional space with dimensions including depth, azimuth and declination. Thus the search space for identifying an instance of an object is extremely large. Our approach is to use templates of actual responses from scans of known objects to model object categories. We intend to justify a method whereby Genetic Algorithms are used to improve the template libraries with respect to their classification characteristics. This chapter describes the application, key features of the Genetic Algorithms tested and the results achieved.

There has been increased interest in the use of sensing devices in the detection and classification of subsurface objects, particularly for the purpose of eradicating Unexploded Ordnance (UXO) from military sites (Putnam, 2001). A variety of sensor technologies have been used including magnetic, electromagnetic, thermal and ground penetrating radar devices. Depending on the technology and terrain, devices may be handheld or vehicular-borne. Scanning of a section of ground produces a two-dimensional data set representing the impulse response at each spatial location. Classifying subsurface objects involves matching a representation or model of each known object against that of an unknown object. Previous classification techniques have attempted to model objects in ways that are independent of depth and orientation. At the recent Jefferson Proving Ground Trials, it was found that current techniques do not provide adequate discrimination between UXO and non-UXO objects for cost-effective remediation of military sites (US Army Environment Center, 1999). Our approach (Dunstan and Clark, 1999) is to use templates of actual responses from known object scans to model objects. Template matching was used by Damarla and Ressler (2000) for airborne detection of UXO from Synthetic Aperture Radar data sets. Their results showed that a single template could correlate well against a range of large ordnance categories for the purpose of identifying sites requiring remediation. Hill et al. (1992) used Genetic Algorithms to match medical ultrasound images against derived templates of the human heart. The Genetic Algorithm was used to find the best match of an unknown ultrasound and a derived template. Our goal is primarily to achieve the capability of discrimi- nating between UXO and non-UXO objects and, if possible, between the various categories of UXO. Our approach to classification of scans of unknown objects is to match the scan data against a model of each known object. Each model consists of a set of templates of scans of objects known to be of that category. A match is based on correlations of each template against the scan data. Two measures are calculated: the Normalized Cross Correlation Value (NCV) —this is the Normal- ized Cross Correlation as a percentage of the optimum score; and Fitness Error Factor (FEF)—the absolute difference between the area of the object signal response and the area of the template as a percentage. FEF helps to invalidate correlations with good NCV but with templates significantly larger or smaller than the object’s response area. We define a Positive Correlation between a template and a scanned object to exist when the NCV > MinNCV and the FEF < MaxFEF. That is, the NCV correlation is sufficiently large and the FEF is sufficiently small. A classification function will then use correlation results from all templates from all categories to return a category type for the unknown object. Therefore, we would wish our template sets for each category to be truly representative of that category, and able to distinguish between objects of its own and other categories. Sadly, our template library is small and not systematic in its coverage of the depth/orientation spectrum. Nevertheless, our existing templates show some promise in ability and we seek to maximize their effect.

The background to this research is the Jefferson IV Field Trials, conducted by U.S. military agencies in 1998 to assess the abilities of current detection technolo-

gies to discriminate between buried Unexploded Ordnance (UXO) and non-UXO. Ten companies using a variety of sensors were allowed trial scans of the UXO and non-UXO objects to be used. In all there were 10 categories of UXO and about 40 categories of non-UXO. Participants had to classify each of the subsurface objects located out in the field as either UXO or non-UXO. They were assessed according to these measures:

• TruePositive (TP) - number of UXO objects declared UXO • FalsePositive (FP) - number of non-UXO objects declared UXO • TrueNegative (TN) - number of non-UXO objects declared non-UXO • FalseNegative (FN) - number of UXO objects declared non-UXO

An accuracy figure was calculated as (%TP + %TN) / 2, and 50% accuracy was deemed the “line of no discrimination”, that is, inability to discriminate between UXO and non-UXO. Only one company performed marginally better than 50% accuracy. In site remediation, the FalseNegatives (FNs) are the bombs missed because they are identified as non-UXO, and the FalsePositives (FP’s) are the junk dug up because they are incorrectly identified as a bomb. FNs and FPs can also be referred to as the “risk” and the “cost” respectively. Nominal figures of 5% risk and 25% cost were suggested as benchmarks. No technology presented at the Jefferson IV Field Trials was deemed “cost-effective”.

Our data set is trial scans from the Jefferson IV trial, which were generated using an electromagnetic sensor. They consist of 10 scans of each of the UXO categories and one to four scans of each of the non-UXO categories. Each scan is a file representing the signal response from the object over a spatial grid. Typical feature selection and category modeling consists of attempting to parameterize the response of a typical object independently of depth and orientation, that is, to look for common features. Our approach is to use a library of 2-D templates of the object scanned over the depth/orientation spectrum as a model of each UXO category. A template is the largest rectangular chunk of the data that we know to be part of the object’s response area. A close match of the data of an unknown object against any template will be a positive indication that this object belongs to the template’s category. The problem is that no such library of templates exists, though future research may develop such a library using empirical or algorithmic methods. Nevertheless we can construct a pilot system to investigate the feasibility of our approach using the data available. In order to develop a classifier function, the data set is divided into Training and Test sets, and the UXO categories are limited to just the largest five. The sixth category will be “unclassified,” meaning not any of the UXO categories. Models for each of the UXO categories will be sets of templates taken from the training set of those categories. Classification will be based on the match results of an unknown object against all templates from all models.

Genetic Algorithms have proven their worth in optimisation and search problems of a non-linear nature. Since their inception by Holland (1992), Genetic Algorithms have become widely used and their effectiveness has improved (Baker, 1985; Fogart, 1989; Goldberg, Deb & Clark, 1992). They are now being applied to varying problem domains including Data Mining. An example is Hill et al, (1992).

A simple Genetic Algorithm is described as follows. A set of possible solutions is generated to form a population of ‘individuals’. The individuals are assessed for their ‘fitness’. According to fitness values, individuals are selected to form a successive population. After crossing-over data within individuals (to focus the strengths that made them more ‘fit’) and adding mutations (to introduce variety into the population), this new population is then subjected to a fitness test, and the cycle continues.

We split our data into a training set and a test set (used for independent assessment of accuracy), and attempted to optimise a template set for each category using a Genetic Algorithm and the training set. Improved performance simplifies the final classification function by reducing conflicts arising when templates from more than one category register positive correlations for the same unknown object. In the context of UXO detection, it is sufficient to distinguish between UXO and non-UXO rather than between categories of objects. The Genetic Algorithm involved a population of 20 individual template sets. Each individual consisted of five templates from each category. The fitness of each individual was assessed on the basis of how well its set of templates identified membership within categories. The Genetic Algorithms succeeded in significantly increasing the accuracy of all template categories by around 10%.

SPATIAL DATA SETS

In document Data Mining A Heuristic Approach Abbass HA (2002) pdf (Page 153-156)